Statistical Inference I and II Probabilistic and Statistical Models Stanislav Katina1 1Institute of Mathematics and Statistics, Masaryk University Honorary Research Fellow, The University of Glasgow November 13, 2018 1 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Random variable, random vector, data, individuals random variable and random vector random variable X is a function from a sample space to a set of real numbers X : Y → R (a set of all possible outcomes) 2-dimensional random vector (X1, X2)T : Y → R2 k-dimensional random vector (X1, X2, . . . , Xk )T : Y → Rk data – data vector and data matrix – the elements of a vector and the rows of a matrix are measured on individuals (statistical units) data as realisations of X – n-dimensional vector x = (x1, x2, . . . , xn)T , where n is a sample size data as realisations of (X1, X2)T – (n × 2)-dimensional matrix with rows (xi1, xi2)T , i = 1, 2, . . . , n and columns x1 and x2 data as realisations of (X1, X2, . . . , Xk )T – (n × k)-dimensional matrix with rows (xi1, xi2, . . . , xik )T , i = 1, 2, . . . , n and columns x1, x2 and xk 2 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Model based on probabilistic sampling principles, the individuals are sampled from a population attribute – a specific value of a variable with certain precision, data are measured on individuals descriptive statistics – describing and summarising data inferential statistics (statistical inference) – inferring (drawing conclusions) about random variable based on a model fitted to data F is a set of models (probabilistic or statistical) X is characterised by a model F(∙), F ∈ F (X1, X2)T is characterised by a model F(2) (∙), F ∈ F (X1, X2, . . . , Xk )T is characterised by a model F(k) (∙), F ∈ F parameter – a numerical quantity that characterises a model – one-dimensional parameter θ, k-dimensional vector of parameters θ = (θ1, θ2, . . . , θk )T 3 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Distribution function, probability and density function useful assumption – Xi, i = 1, 2, . . . , n, are independently identically distributed random variables distribution function discrete random variable FX (x) = Pr (X ≤ x) = i:xi ≤x Pr (X = xi ) , where k(∞) i=1 pi = 1, Pr (X = xi ) = pi = fX (xi ) = f(xi ), ∀xi , where pi is probability mass function; {xi , pi } k(∞) i=1 , k ∈ N+ continuous random variable FX (x) = x −∞ f (t) dt, f (x) ≥ 0, where ∞ −∞ f (x) dx = 1, fX (x) = f(x) = ∂ ∂x FX (x) is density function 4 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Parametric and non-parametric model Θ is a parametric space, the support of F(∙; θ) is Yθ ⊆ Rn (the smallest set, where the distribution function is defined); sample space Y = ∪θ∈ΘYθ F as a parametric set of distribution functions F = F(∙; θ) : θ ∈ Θ ⊆ Rk , F as a parametric set of probability or density functions F = f(∙; θ) : θ ∈ Θ ⊆ Rk F as non-parametric set F = {a set of all density functions} , alternatively, probability or distribution function can be used 5 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Reading of mathematical notation the term ”probability model” is often reduced to ”distribution” ”Random variable X is distributed as F(x)” or ”random variable X is characterised by distribution F(x)”, notation X ∼ FX (x); symbol ”∼” means ”asymptotically”, ”for sufficiently large n” (notation X ∼ fX (x) is used very rarely) ”Random variable X is distributed as random variable Y” or ”Random variable X and Y are identically distributed” (notation X ∼ Y or FX (x) ∼ FY (y) the term ”statistical model” is often reduced to ”model” (usually referred as causal statistical model or model of causal dependence) ”Y depends on X”, where X is independent variable and Y is dependent variable (notation Y|X) 6 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Reading of mathematical notation ”X is normally distributed with parameters μ and σ2”, notation X ∼ N(μ, σ2), where θ = (μ, σ2)T ”X = (X1, X2)T is characterised by bivariate normal distribution with parameters μ1, μ2, σ2 1, σ2 2 and ρ”, notation X ∼ N2(μ, Σ), where θ = (μ1, μ2, σ2 1, σ2 2, ρ)T ”X = (X1, X2, . . . , Xk )T is characterised by multivariate normal distribution with parameters μ1, μ2, . . ., μk , σ2 1, σ2 2, . . ., σ2 k , and ρ1,2, . . ., ρk−1,k , ”, notation X ∼ Nk (μ, Σ), where θ = (μ1, μ2, . . . , μk , σ2 1, σ2 2, . . . , σ2 k , ρ1,2, . . . , ρk−1,k )T ”X is binomially distributed with parameter p”, notation X ∼ Bin(N, p), where θ = p ”X is characterised by distribution with parameter λ”, notation X ∼ Poiss(λ), where θ = λ ”X = (X1, X2, . . . , Xk )T is multinomially distributed with parameter p”, notation X ∼ Multk (N, p), where θ = p 7 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Measures of normal distribution ”X is normally distributed with parameters μ and σ2”, notation X ∼ N(μ, σ2), where θ = (μ, σ2)T Random variable Z (Z-transformation) Pr(Z = X−μ σ < x1−α) = 1 − α, Z ∼ N(0, 1) Rule ”90 − 95 − 99” Pr (a ≤ X ≤ b) = 1 − α, where 1 − α = 0.90, 0.95 and 0.99, a = μ − x1− α 2 σ and b = μ + x1− α 2 σ Rule ”68.27 − 95.45 − 99.73” Pr (a ≤ X < b) = Pr (X < b)−Pr (X < a) = FX (b)−FX (a), where a = μ − kσ, b = μ + kσ, k = 1, 2 and 3 8 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Approximation of binomial distribution by normal distribution Definition (approximation of binomial distribution by normal distribution) If random variable X is binomially distributed with parameter p, X ∼ Bin(N, p), where θ = p, if Np > 5 and Nq > 5, where q = 1 − p, then the distribution of random variable X can be approximated by normal distribution, X ∼ N(Np, Npq), where θ = (Np, Npq)T . Table: Examples of minimal N for fixed p p 0.1 0.2 0.3 0.4 0.5 q 0.9 0.8 0.7 0.6 0.5 N 51 26 17 13 11 9 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Approximation of binomial distribution by normal distribution Definition (approximation of binomial distribution by normal distribution, Hald condition) If random variable X is binomially distributed with parameter p, X ∼ Bin(N, p), where θ = p, if Npq > 9 (Hald condition), where q = 1 − p, then the distribution of random variable X can be approximated by normal distribution, X ∼ N(Np, Npq), where θ = (Np, Npq)T . Table: Examples of minimal N for fixed p p 0.01 0.02 0.05 0.10 0.15 0.20 0.30 0.40 0.50 1 − p 0.99 0.98 0.95 0.90 0.85 0.80 0.70 0.60 0.50 N 910 460 190 100 71 57 43 38 36 10 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Approximation of binomial distribution by normal distribution Example Let Pr(male) = 0.515 and Pr(female) = 0.485. Let X be the frequency of males and Y frequency of females. Assuming that X ∼ Bin(N, p), calculate (a) Pr(X ≤ 3), if N = 5, (b) Pr(X ≤ 5), if N = 10 and (c) Pr(X ≤ 25), if N = 50. Compare the results with normal approximation X ∼ N(Np, Npq). Solution (a) E[X] = Np = 5 × 0.515 = 2.575, E[Y] = 5 × 0.485 = 2.425, Pr(X ≤ 3) = k≤3 5 k 0.515k 0.4855−k = 0.793, Pr(X ≤ 3) = 0.648, N(5 × 0.515, 5 × 0.515 × 0.485). (b) E[X] = 10 × 0.515 = 5.15, E[Y] = 10 × 0.485 = 4.85, Pr(X ≤ 5) = k≤5 10 k 0.515k 0.48510−k = 0.586, Pr(X ≤ 5) = 0.462, N(10 × 0.515, 10 × 0.515 × 0.485). (c) E[X] = 50 × 0.515 = 25.75, E[Y] = 50 × 0.485 = 24.25, Pr(X ≤ 25) = k≤25 50 k 0.515k 0.48550−k = 0.471, Pr(X ≤ 25) = 0.416, N(50 × 0.515, 50 × 0.515 × 0.485). 11 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Approximation of binomial distribution by normal distribution 0 1 2 3 4 5 0.00.10.20.30.4 Bin(5,0.515) 0 1 2 3 4 5 0.00.20.40.60.81.0 Bin(5,0.515) 0 2 4 6 8 10 0.00.10.20.30.4 Bin(10,0.515) 0 2 4 6 8 10 0.00.20.40.60.81.0 Bin(10,0.515) 0 10 20 30 40 50 0.00.10.20.30.4 Bin(50,0.515) 0 10 20 30 40 50 0.00.20.40.60.81.0 Bin(50,0.515) Figure: Probability function (first row) and distribution function (second row) of binomial distribution superimposed by normal distribution (p = 0.515; N = 5, 10 and 50) 12 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Approximation of binomial distribution by normal distribution 0 5 10 15 0.00.10.20.30.40.50.6 Bin(5,0.1) 0 5 10 15 0.00.20.40.60.81.0 Bin(5,0.1) 0 5 10 15 0.00.10.20.30.40.50.6 Bin(10,0.1) 0 5 10 15 0.00.20.40.60.81.0 Bin(10,0.1) 0 5 10 15 0.00.10.20.30.40.50.6 Bin(50,0.1) 0 5 10 15 0.00.20.40.60.81.0 Bin(50,0.1) Figure: Probability function (first row) and distribution function (second row) of binomial distribution superimposed by normal distribution (p = 0.1; N = 5, 10 and 50) 13 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Binomial distribution Example (number of boys) Number of boys X in families with N children is binomially distributed, i.e. X ∼ Bin(N, p), where N = 12, number of families M = 6115 (Geissler 1889). Question: Calculate theoretical frequencies mn,E . You know that p = N n=0 nmn,O NM = 0.5192 (weighted average; average of number of families weighted by number of boys). Table: Observed and theoretical frequencies (mn,O and mn,E ) of families with n boys (O = observed, E = expected, theoretical) n 0 1 2 3 4 5 6 7 8 9 10 11 12 mn,O 3 24 104 286 670 1033 1343 1112 829 478 181 45 7 mn,E 1 12 72 259 628 1085 1367 1266 854 410 133 26 2 14 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Binomial distribution number of boys in the families with 12 children observedabsolutefrequencies 0200400600800100012001400 3 24 104 286 670 1033 1343 1112 829 478 181 45 7 0 1 2 3 4 5 6 7 8 9 10 11 12 number of boys in the families with 12 children expectedabsolutefrequencies 0200400600800100012001400 0 1 2 3 4 5 6 7 8 9 10 11 12 1 12 72 258 628 1085 1367 1266 854 410 133 26 2 Figure: Histograms of observed and expected frequencies 15 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution number of boys in the families with 12 children absolutefrequencies 0200400600800100012001400 0 1 2 3 4 5 6 7 8 9 10 11 12 2 12 32 28 42 −52 −24 −154 −25 68 48 19 5 q q q q q q q q q q q q q Figure: Comparison of observed and expected frequencies 16 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Binomial distribution Example (number of individuals with certain blood type) Number of individuals X = (X1, X2, X3, X4)T with certain blood group is multinomially distributed following Hardy-Wienberg equilibrium, i.e. X = (X1, X2, X3, X4)T ∼ Mult4(N, p), where N = 500 (Katina et al. 2015). Question: Calculate theoretical frequencies nj,E . attributes (groups) 0 A B AB nj,O 209 184 81 26 nj,E 210 183 80 27 17 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multinomial distribution blood type absolutefrequencies 050100150200 0 A B AB −1 1 1 −1 q q q q Figure: Comparison of observed and expected frequencies 18 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution Example (number of individuals with certain blood type) Let X = (X1, X2)T , where X1 = (X11, X12, X13, X14)T is number of individuals in Koˇsice (Slovakia) with certain blood group, X2 = (X21, X22, X23, X24)T is number of individuals in Prague (Czech Republic) with certain blood group. X is product-multinomially distributed, i.e. X ∼ ProdMult2(N, p), where N = (N1, N2)T , where N1 = 500 and N2 = 400 (Katina et al. 2015). Calculate theoretical frequencies nE,ij . Question: What are the probabilities of having particular blood group in Prague and Koˇsice? Table: Observed frequencies of particular blood group attributes (groups) 0 A B AB n1j,O=nKoˇsice,j,O 138 147 84 31 n2j,O=nPrague,j,O 209 184 81 26 19 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution 0.4180 0.3680 0.1620 0.0520 0.3450 0.3675 0.2100 0.0775 0.00 0.25 0.50 0.75 1.00 Kosice Prague factor(city) frequencies factor(blood.type) 0 A AB B 41.80% 36.80% 16.20% 5.20% 34.50% 36.75% 21.00% 7.75% 0.00 0.25 0.50 0.75 1.00 Kosice Prague factor(city) frequencies factor(blood.type) 0 A AB B Figure: Barplots of four blood types in Koˇsice and Prague (default palette) 20 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multinomial distribution Example (number of individuals with certain eye and hair colour) Let X = (X1, X2, . . . , X12)T be random vector of number of individuals, eye colour (with levels blue Bl, green Gr, brown Br) and hair colour (with levels blond Blo, light-brown LB, black Bla, red R), where X1 means Bl-Blo, X2 means Bl-LB, X3 means Bl-Bla, X4 means Bl-R, X5 means Gr-Blo, X6 means Gr-LB, X7 means Gr-Bla, X8 means Gr-R, X9 means Br-Blo, X10 means Br-LB, X11 means Br-Bla and X12 means Br-R. Let X ∼ Mult12(N, p), where N = 6800 (Yule and Kendall 1950). Question: Calculate probabilities of having (1) particular eye and hair colour, (2) particular hair colour conditional on eye colour, (3) particular eye colour conditional on hair colour. 21 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multinomial distribution Table: 3 × 4 contingency table of frequencies nj Blo LB Bla R row sums Bl 1768 807 189 47 2811 Gr 946 1387 746 53 3132 Br 115 438 288 16 857 column sums 2829 2632 1223 116 6800 22 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution 62.90% 28.71% 6.72% 1.67% 30.20% 44.28% 23.82% 1.69% 13.42% 51.11% 33.61% 1.87% 0.00 0.25 0.50 0.75 1.00 blue green brown factor(eyes) frequencies factor(hair) blond light−brown black red 62.50% 33.44% 4.07% 30.66% 52.70% 16.64% 15.45% 61.00% 23.55% 40.52% 45.69% 13.79% 0.00 0.25 0.50 0.75 1.00 blond light−brown black red factor(hair) frequencies factor(eyes) blue green brown Figure: Barplots of eye and hair colour (default palette) 23 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution 62.90% 28.71% 6.72% 1.67% 30.20% 44.28% 23.82% 1.69% 13.42% 51.11% 33.61% 1.87% 0.00 0.25 0.50 0.75 1.00 blue green brown factor(eyes) frequencies factor(hair) blond light−brown black red 62.50% 33.44% 4.07% 30.66% 52.70% 16.64% 15.45% 61.00% 23.55% 40.52% 45.69% 13.79% 0.00 0.25 0.50 0.75 1.00 blond light−brown black red factor(hair) frequencies factor(eyes) blue green brown Figure: Barplots of eye and hair colour (blue palette) 24 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution 62.90% 28.71% 6.72% 1.67% 30.20% 44.28% 23.82% 1.69% 13.42% 51.11% 33.61% 1.87% 0.00 0.25 0.50 0.75 1.00 blue green brown factor(eyes) frequencies factor(hair) blond light−brown black red 62.50% 33.44% 4.07% 30.66% 52.70% 16.64% 15.45% 61.00% 23.55% 40.52% 45.69% 13.79% 0.00 0.25 0.50 0.75 1.00 blond light−brown black red factor(hair) frequencies factor(eyes) blue green brown Figure: Barplots of eye and hair colour (spectral palette) 25 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multinomial distribution Example (number of individuals with certain socioeconomic status, political philosophy and political affiliation) Number of individuals X1, . . . , X8 with socioeconomic status, political philosophy and political affiliation is multinomially distributed, i.e. X = (X1, . . . , X8)T ∼ Mult8(N, p), where realisations x = (x1, x2, . . . , x8)T and N = 500 (Christensen 1990, modified). Question: Calculate probabilities of having particular socioeconomic status, political philosophy and political affiliation. 26 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multinomial distribution Notation: (1) socioeconomic status (high – H, low – Lo), (2) political philosophy (democrat – D, republican – R) a (3) political affiliation (conservative – C, liberal – Li). Then X1 (H-D-C), X2 (H-D-Li), X3 (H-R-C), X4 (H-R-Li), X5 (Lo-D-C), X6 (Lo-D-Li), X7 (Lo-R-C) and X8 (Lo-R-Li). Table: 2 × 4 contingency table of frequencies Xj D-C D-Li R-C R-Li H 60 60 60 20 Lo 90 90 90 30 27 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution 30% 30% 30% 10% 30% 30% 30% 10% 0.00 0.25 0.50 0.75 1.00 high low factor(status) frequencies factor(group) democrat−conservative democrat−liberal republican−conservative republican−liberal Figure: Barplots of socioeconomic status, political philosophy and affiliation (blue palette) 28 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Poisson distribution Example (Poisson distribution; killing by horse kicks) Data were published by Russian economist Ladislaus Bortkiewicz in his book entitled Das Gesetz der keinem Zahlen (The Law of Small Numbers) in 1898. Let X be the number of corps of soldiers with n annual deaths (killed by horse kicks) in the Prussian army within one year (von Bortkiewicz 1898; in 10 different army corps; in 20 years, between 1875 and 1894), n be the number of annual deaths, mn,O be the number of army corps with particular number of annual deaths, M = n mn,O = 10 × 20 = 200. Then X ∼ Poiss(λ), where λ = n nmn,O n mn,O = 0.61 (weighted average; average of number of army corps weighted by number of annual deaths). Question: Calculate theoretical frequencies mn,E . Table: Observed and theoretical frequencies (mn,O and mn,E ) of corps of solders with n annual deaths (killed by horse kicks) over 20 years n 0 1 2 3 4 ≥ 5 mn,O 109 65 22 3 1 0 mn,E 109 66 20 4 1 0 29 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Poisson distribution number of annual deaths absolutefrequencies 020406080100120 0 1 2 3 4 5+ 0 −1 2 −1 0 0 q q q q q q Figure: Comparison of observed and expected frequencies 30 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Poisson distribution Example (Poisson distribution; accidents in the factories) Let X be the number of workers having an accident in munition factories in England during First World War (Greenwood and Yule 1920), n be the number of accidents, mn,O be the number of workers with particular number of accidents, M = n mn,O = 647. Then X ∼ Poiss(λ), where λ = n nmn,O n mn,O = 0.47 (weighted average; average of number of workers weighted by number of accidents). Question: Calculate theoretical frequencies mn,E . Table: Observed and theoretical frequencies (mn,O and mn,E ) of workers with n accidents n 0 1 2 3 4 ≥ 5 mn,O 447 132 42 21 3 2 mn,E 406 189 44 7 1 0 31 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Poisson distribution number of annual deaths absolutefrequencies 0100200300400 0 1 2 3 4 5+ 41 −57 −2 14 2 2 q q q q q q Figure: Comparison of observed and expected frequencies 32 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Negative binomial distribution Example (Negative binomial distribution; accidents in the factories) Let X be the number of workers having an accident in munition factories in England during First World War (Greenwood and Yule 1920), n be the number of accidents, mn,O be the number of workers with particular number of accidents, M = n mn,O = 647. Question: Calculate theoretical frequencies mn,E . Table: Observed and theoretical frequencies (mn,O and mn,E ) of workers with n accidents n 0 1 2 3 4 ≥ 5 mn,O 447 132 42 21 3 2 mn,E 446 134 44 15 5 3 33 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Negative binomial distribution number of accidents absolutefrequencies 0100200300400 0 1 2 3 4 5+ 1 -2 -2 6 -2 -1 Figure: Comparison of observed and expected frequencies 34 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Zero-inflated Poisson (ZIP) distribution Example (ZIP distribution; number of movements of a foetal lamb) Let X be the number of movements of a foetal lamb in 240 five-second periods (Leroux and Puterman 1992), n be the number of movements, mn,O be the number of foetal lambs with particular number of movements. Question: Calculate theoretical frequencies mn,E using Poisson and ZIP distribution. Table: Observed and theoretical frequencies (mn,O and mn,E ) of workers with n accidents n 0 1 2 3 4 5 6 7 mn,O 182 41 12 2 2 0 0 1 mn,E (Poisson) 168 60 11 1 0 0 0 0 mn,E (ZIP) 182 37 16 4 1 0 0 0 35 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Zero-inflated Poisson (ZIP) distribution number of movements absolutefrequencies 050100150 0 1 2 3 4 5 6 7 14 -19 1 1 2 0 0 1 number of movements absolutefrequencies 050100150 0 1 2 3 4 5 6 7 0 4 -4 -2 1 0 0 1 Figure: Comparison of observed and expected frequencies, Poisson (left), ZIP (right) 36 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Formulations of hypotheses about probability distributions 1. binomial distribution – example – number of boys: Is the probability of number of boys in the families with 12 boys binomial? Is the probability of having a boy in the family equal to 0.5? 2. multinomial distribution – example – number of individuals with certain eye and hair colour: Are the rows and columns of a contingency table independent? Are the frequencies of individuals with certain eye colour (with levels blue, green, brown) independent of hair colour (with levels blond, light-brown, black, red)? 37 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Formulations of hypotheses about probability distributions 3. product-multinomial distribution: Are the vectors of frequencies the same in each row? Are the vectors of frequencies independent of the row index? example – number of individuals with certain socioeconomic status, political philosophy and affiliation – Are the vectors of frequencies of individuals (D-Li, D-C, R-Li, R-C) the same for each level of socioeconomic status (high and low)? example – blood groups – Is the distribution of the blood groups (0, A, B, AB) the same in Prague and Koˇsice? 4. Poisson distribution: example – killing by horse kick – Is the distribution of number of corps of soldiers with n annual deaths (killed by horse kicks) Poisson? example – accidents in the factories – Is the distribution of number of workers having an accident Poisson? 38 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Assignments in Assignment number of boys: 1 Draw probability mass function of number of boys in the families with 12 children. 2 What are the probabilities of having n boys in the family (n = 1, 2, . . . , 12)? What is the probability of having eight or more boys in the family? What is the probability of having five to seven boys in the family? Assignment killing by horse kick: 1 Draw probability mass function of number of corps with n annual deaths (killed by horse kicks). 2 What are the probabilities of having n annual deaths (n = 0, 1, 2, 3, 4, 5+)? What is the probability of having one or less annual deaths? Assignment accidents in the factories: 1 Draw probability mass function of number of workers having an accident. 2 What are the probabilities of having n accidents (n = 0, 1, 2, 3, 4, 5+)? What is the probability of having two or more accidents? 39 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Assignments in Assignment number of boys: Calculate p (the probability of having a boy in a family) and Var[p] (the variance of probability of having a boy in a family). Assignment killing by horse kick: Calculate λ (the mean number of annual deaths) and Var[λ] (the variance of mean number of annual deaths). Assignment accidents in the factories: Calculate λ (the mean number of accidents in the factories) and Var[λ] (the variance of mean number of accidents in the factories). 40 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Assignments in Assignment blood groups: In Prague and Koˇsice, calculate p (the probabilities of having certain blood group in particular city) and Var[p] (the covariance matrix of probability of having certain blood group in particular city). Assignment eye and hair colour: Calculate p (the probabilities of having certain eye and hair colour) and Var[p] (the covariance matrix of probability of having certain eye and hair colour). 41 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Types of contingency tables – multinomial distribution 1 × J contingency table of frequencies outcome 1 outcome 2 . . . outcome J sum x1 x2 . . . xJ N 1 × J contingency table of probabilities outcome 1 outcome 2 . . . outcome J sum p1 p2 . . . pJ 1 2 × J contingency table of frequencies outcome 1 outcome 2 . . . outcome J sum row 1 x11 x12 . . . x1J N1 row 2 x21 x22 . . . x2J N2 2 × J contingency table of probabilities outcome 1 outcome 2 . . . outcome J sum row 1 p11 p12 . . . p1J p1• = 1 row 2 p21 p22 . . . p2J p2• = 1 42 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Types of contingency tables – multinomial distribution K × J contingency table of frequencies outcome 1 outcome 2 . . . outcome J sum row 1 x11 x12 . . . x1J N1 row 2 x21 x22 . . . x2J N2 ... ... ... . . . ... ... row K xK1 xK2 . . . xKJ NK K × J contingency table of probabilities outcome 1 outcome 2 . . . outcome J sum row 1 p11 p12 . . . p1J p1• = 1 row 2 p21 p22 . . . p2J p2• = 1 ... ... ... . . . ... ... row K pK1 pK2 . . . pKJ pK• = 1 43 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Types of contingency tables – product-multinomial distribution 1 × J contingency table of frequencies (≈ multinomial distribution) outcome 1 outcome 2 . . . outcome J sum x1 x2 . . . xJ N 1 × J contingency table of probabilities (≈ multinomial distribution) outcome 1 outcome 2 . . . outcome J sum p1 p2 . . . pJ 1 2 × J contingency table of frequencies (≈ multinomial distribution) outcome 1 outcome 2 . . . outcome J sum group 1 x11 x12 . . . x1J N1 group 2 x21 x22 . . . x2J N2 2 × J contingency table of probabilities outcome 1 outcome 2 . . . outcome J sum group 1 p1|1 p2|1 . . . pJ|1 1 group 2 p1|2 p2|2 . . . pJ|2 1 44 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Types of contingency tables – product-multinomial distribution K × J contingency table of frequencies (≈ multinomial distribution) outcome 1 outcome 2 . . . outcome J sum group 1 x11 x12 . . . x1J N1 group 2 x21 x22 . . . x2J N2 ... ... ... . . . ... ... group K xK1 xK2 . . . xKJ NK K × J contingency table of probabilities outcome 1 outcome 2 . . . outcome J sum group 1 p1|1 p2|1 . . . pJ|1 1 group 2 p1|2 p2|2 . . . pJ|2 1 ... ... ... . . . ... ... group K p1|K p2|K . . . pJ|K 1 45 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Data structure for 1 × J contingency table – multinomial distribution outcome 1 outcome 2 . . . outcome J sum x1 1 0 . . . 0 1 x2 0 1 0 1 x3 0 1 0 1 x4 1 0 . . . 0 1 ... ... ... . . . ... ... xN−1 0 0 . . . 1 1 xN 1 0 . . . 0 1 sum= x x1 x2 . . . xJ N sum of each row is one sum of all row sums is N sum of each column is xj , where j = 1, 2, . . . , J sum of all xj , j = 1, 2, . . . , J, is N x = n 46 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Data structure for K × J contingency table – (product-)multinomial distribution outcome 1 outcome 2 . . . outcome J sum xk1 1 0 . . . 0 1 xk2 0 1 0 1 xk3 0 1 0 1 xk4 1 0 . . . 0 1 ... ... ... . . . ... ... xk,Nk −1 0 0 . . . 1 1 xk,Nk 1 0 . . . 0 1 sum= xk xk1 xk2 . . . xkJ Nk sum of each row is one sum of all row sums is Nk sum of each column is xkj , where j = 1, 2, . . . , J sum of all xkj , j = 1, 2, . . . , J, is Nk xk = nk , where k = 1, 2, . . . , K 47 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models (Univariate) normal distribution Definition (normal distribution) Random variable X is normally distributed with parameters μ and σ2, i.e. X ∼ N(μ, σ2), where θ = (μ, σ2)T and density is defined as f (x) = 1√ 2πσ2 e − (x−μ)2 2σ2 , x ∈ R, σ > 0. Definition (standardised normal distribution) Random variable X is normally distributed with parameters μ = 0 and σ2 = 1, i.e. X ∼ N(0, 1), where θ = (0, 1)T and density is defined as φ (x) = f (x) = 1√ 2π e− x2 2 , x ∈ R. Parameter μ is called mean of X and σ2 the variance of X. 48 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Bivariate normal distribution Definition (bivariate normal distribution) Random vector (X, Y)T is normally distributed with parameters μ and Σ, i.e. (X, Y)T ∼ N2(μ, Σ), where μ = (μ1, μ2)T and Σ = σ2 1 ρσ1σ2 ρσ1σ2 σ2 2 , θ = (μ1, μ2, σ2 1, σ2 2, ρ)T , (x, y)T ∈ R2, μj ∈ R1, σ2 j > 0, j = 1, 2, ρ ∈ −1, 1 ; density is defined as f (x, y) = 1 A exp − 1 B (x−μ1)2 σ2 1 − 2ρ(x−μ1)(y−μ2) σ1σ2 + (y−μ2)2 σ2 2 , where A = 2π σ2 1σ2 2 (1 − ρ2), B = 2 1 − ρ2 . 49 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Standardised bivariate normal distribution Definition (bivariate standardised normal distribution) Random vector (X, Y)T is normally distributed with parameters μ and Σ, i.e. (X, Y)T ∼ N2(μ, Σ), where μ = (0, 0)T and Σ = 1 ρ ρ 1 , θ = (0, 0, 1, 1, ρ)T , (x, y)T ∈ R2, ρ ∈ −1, 1 ; density is defined as f (x, y) = 1 2π 1 − ρ2 exp − x2 − 2ρxy + y2 2 (1 − ρ2) . 50 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Standardised bivariate and multivariate normal distribution Let x = x1, y = x2 and x = (x1, x2)T . Then the density of standardised bivariate normal distribution can be rewritten into matrix form: f (x) = 1 2π(det(Σ))1/2 exp − 1 2 xT Σ−1 x . Let (X1, X2, . . . , Xk )T ∼ Nk (μ, Σ) and x is k-dimensional vector, then the density is equal to f (x) = 1 (2π)k/2(det(Σ))1/2 exp − 1 2 xT Σ−1 x . Marginal distributions of: bivariate normal distribution – Xj ∼ N(μj, σ2 j ), j = 1, 2, . . . , k standardised bivariate normal distribution – Xj ∼ N(0, 1), j = 1, 2, . . . , k 51 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Bivariate normal distribution – simulation Simulation of pseudo-random numbers from bivariate normal distribution: 1 let X1 ∼ N(0, 1) and X2 ∼ N(0, 1) 2 then (Y1, Y2)T ∼ N2 (μ, Σ), where Y1 = σ1X1 + μ1 and Y2 = σ2(ρX1 + 1 − ρ2X2) + μ2 Example Simulate pseudo-random numbers from bivariate normal distribution, where θ = (μ1, μ2, σ2 1, σ2 2, ρ)T . (a) μ1 = 0, μ2 = 0, σ1 = 1, σ2 = 1, ρ = 0; (1) n = 50 and (2) n = 1000; (b) μ1 = 0, μ2 = 0, σ1 = 1, σ2 = 1, ρ = 0.5; (1) n = 50 and (2) n = 1000; (c) μ1 = 0, μ2 = 0, σ1 = 1, σ2 = 1.2, ρ = 0.5; (1) n = 50 and (2) n = 1000. 52 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Bivariate normal models Figure: Joint density of three different bivariate normal distributions (column by column); contour plots superimposed by image plots (first row), 3D surface plot (second row); simulation study 53 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Bivariate normal models Figure: Joint density of three different bivariate normal distributions (column by column); n = 50 (first row), n = 1000 (second row); contour plots superimposed by image plots; simulation study 54 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Mixture of two univariate and bivariate normal distribution The mixture of two univariate normal distribution is defined as follows: pN(μ1, σ2 1) + (1 − p)N(μ2, σ2 2), where θ = (p, μ1, μ2, σ2 1, σ2 2)T . The mixture of two bivariate normal distribution is defined as follows: pN2 (μ1, Σ1) + (1 − p)N2 (μ2, Σ2), where θ = (p, μ11, μ12, σ2 11, σ2 12, ρ1, μ21, μ22, σ2 21, σ2 22, ρ2)T . 55 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Different normal models – skewed, mesokurtic, platykurtic and leptokurtic −4 −2 0 2 4 0.00.10.20.30.4 N(0,1) t(0,1,df=10) sN(0,1) st(0,1,df=10) μ = 0, σ = 1 −4 −2 0 2 4 0.00.20.40.60.8 N(0,0.5) N(0,1) N(0,1.5) sN(0,0.5) sN(0,1) sN(0,1.5) μ = 0, σ1 = 0.5, σ1 = 1, σ2 = 1.5 −3 −2 −1 0 1 2 3 4 0.00.10.20.30.4 N(0,1) sN(0,1) μ = 0, σ = 1 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.24 −3 −2 −1 0 1 2 3−3−2−10123 0.02 0.04 0.06 0.0 8 0.1 0.12 0.14 0.16 0.18 −3 −2 −1 0 1 2 3 −3−2−10123 μ1 = 0, μ2 = 0, σ1 = 1, σ2 = 1, ρ = 0.5 0.02 0.04 0.06 0.08 0.1 0.1 2 0.14 −3 −2 −1 0 1 2 3 −3−2−10123 μ1 = 0, μ2 = 0, σ1 = 1, σ2 = 1, ρ = 0.5 Figure: Densities of different normal and skewed normal distributions (first row, skewed normal indicated as ”sN”), densities of different bivariate skewed normal distributions (second row) 56 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models The mixture of two bivariate normal distribution Figure: Joint density of bivariate normal distribution (left), density of the mixture of two bivariate normal distributions (middle), bivariate kernel density estimate superimposed by density of the mixture of two bivariate normal distributions (right); simulation study (contour plots superimposed by image plots) 57 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Mixture of two univariate normal distribution To express the binormal distribution formally, let Bi be (unobserved) iid Bernoulli(p) random variable, p ∈ (0, 1). If Bi = 1 then Xi is observed from N(μ1, σ2 1) distribution, otherwise it is observed from N(μ2, σ2 2). Thus, the distribution of Xi given by Bi is Xi |(Bi = bi ) ∼ N(μ1, σ2 1), if bi = 1, N(μ2, σ2 2), if bi = 0. The joint density of (Xi , Bi ) is therefore given by f(xi , bi , θ) = f(xi |bi , θ) Pr(Bi = bi , p) ∼    p√ 2πσ1 exp(−(xi −μ1)2 2σ2 1 ), if bi = 1, 1−p√ 2πσ2 exp(−(xi −μ2)2 2σ2 2 ), if bi = 0, where θ = (p, μ1, σ2 1, μ2, σ2 2)T , from which the marginal density of Xi is obtained as f(xi , θ) = bi ∈{0,1} f(xi , bi , θ) = f(xi , 0, θ) + f(xi , 1, θ). 58 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Mixture of two univariate normal distribution The binormal density function is a linear combination of the density functions given by N(μ1, σ2 1) and N(μ2, σ2 2) distributions. waiting time (minutes) histograminabsolutescale 40 50 60 70 80 90 100 01020304050 0.00 0.01 0.02 0.03 50 60 70 80 90 waiting time (minutes) density Figure: Mixture of two normal densities – data faithful 59 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Binomial distribution Jacob Bernoulli (1655–1705) – one of the founding fathers of probability theory. Definition (binomial distribution) Let N be number of independent identical (random) Bernoulli trials Xi, where Xi = 1 is a success (event occurred) and Xi = 0 is a failure (event did not occur), i = 1, 2, . . . , N. Then probability of success Pr(Xi = 1) = p and probability of failure Pr(Xi = 0) = 1 − p. Number of successes X = N i=1 Xi. The probability that random variable X is equal to x = n (realisation) is defined as Pr(X = x) = N x px (1 − p)N−x , for x = 0, 1, 2, . . . , N. Expected value of X is defined as E[X] = N x=0 x Pr(X = x) = N x=0 x N x px (1 − p)N−x = Np. Variance of X is defined as Var[X] = N x=0 (x − E[X])2 Pr(X = x) = N x=0 (x − Np)2 N x px (1 − p)N−x = Np (1 − p). 60 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Binomial distribution Reading: Random variable X is binomially distributed with parameters N and p, where θ = p. Notation: X ∼ Bin(N, p), θ = p Do we need to change it? YES. Why? Due to generalisation. Equivalently, X ∼ Bin (N, p, 1 − p), where X = (X1, X2)T , θ = (p, 1 − p)T , X1 is number of successes, X2 = N − X1 is number of failures, X1 ∼ Bin (N, p) and X2 ∼ Bin (N, 1 − p). Then E[X1] = Np, E[X2] = N (1 − p), Var[X2] = Np (1 − p) = Var[X1] is independent of p, Cov [X1, X2] = −Np (1 − p), Cor [X1, X2] = −1. Finally, n = (n1, n2)T and p = (p1, p2)T , p1 = p and p2 = 1 − p. Then θ = p. 61 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Random sampling from a population of size Npop If each selection from a population of size Npop is returned to the population, i.e. the sampling is with replacement, then, for each selection, the probability of selecting an individual with given characteristic is p = M/Npop, where number of individuals with given characteristic is M (M means ”marked”) and the proportion can now be treated as a probability. Since the selections or ”trials” are mutually independent and number of trials N is fixed, number of outcomes X having given characteristic in the sample now has a Binomial distribution, denoted by Bin(N, p). 62 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Binomial distribution Definition (binomial distribution) If a random sample of size N is taken from the population of size Npop with replacement and X is the number of individuals with a given characteristic in the sample, then X has a binomial distribution with probability mass function defined as Pr(X = x) = N x px (1 − p)N−x , where x = 0, 1, 2, . . . , N. Expected value of X is defined as E[X] = Np. Variance of X is defined as Var[X] = Np (1 − p). 63 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Random sampling from a population of size Npop If we remove and individual chosen at random from the population of size Npop and chose a second individual at random from the remainder, then the probability of getting an individual with given characteristic (M means ”marked”) is (M − 1)/(Npop − 1) if the first individual was with this characteristic and M/(Npop − 1) if it was not. This is called sampling without replacement and the probability of choosing an individual with given characteristic changes with each selection. Then number of outcomes X having given characteristic now has a Hypergeometric distribution, denoted by HypGeom(N, p). 64 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Hypergeometric distribution Definition (hypergeometric distribution) If a random sample of size N is taken from the population of size Npop without replacement and X is the number of individuals with a given characteristic in the sample, then X has a hypergeometric distribution with probability mass function defined as Pr(X = x) = M x Npop−M N−x / Npop N , where max {N + M − Npop, 0} ≤ x ≤ min {M, N}, but we usually have x = 0, 1, 2, . . . , N. Expected value of X is defined as E[X] = Np. Variance of X is defined as Var[X] = Np (1 − p) r, where r = Npop−N Npop−1 = 1 − N−1 Npop−1 > 1 − fs, fs = N/Npop is sampling fraction. (fs can generally be neglected if fs < 0.1 (or preferably fs < 0.05) and we can then set r = 1) 65 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Hypergeometric distribution We see then that if fs can be ignored, we can approximate sampling without replacement by sampling with replacement, and approximate the hypergeometric distribution by the binomial distribution. 66 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multinomial distribution Definition (multinomial distribution) Let N be number of independent identical (random) trials and in each of them J ≥ 2 distinct possible outcomes can occur, where Xji = 1 is a success (event occurred) and Xji = 0 is a failure (event did not occur), i = 1, 2, . . . , N, j = 1, 2, . . . , J. Number of successes Xj = N i=1 Xji, N = J j=1 Xj. Then probability of success of j-th outcome in i-th trial is equal to Pr(Xji = 1) = pj (cell probabilities) and probability of failure in j-th trial is equal to Pr(Xji = 0) = 1 − pj. Let X = (X1, X2, . . . , XJ)T . The probability that random variables Xj are equal to xj = nj is defined as Pr(X1 = x1, . . . , XJ = xJ) = N! j xj! J j=1 p xj j . 67 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multinomial distribution Expected value of X is a vector defined as E[X] = Np. Covariance matrix of X is defined as Var[X] = N diag (p) − ppT , where (Var[X])ij = Npj(1 − pj) if i = j −Npipj if i = j Marginal distributions are binomial, i.e. Xj ∼ Bin N, pj . Then E[Xj] = Npj, Var[Xj] = Npj 1 − pj Cov Xi, Xj = −Npipj Cor Xi, Xj = −pipj / pi (1 − pi) pj 1 − pj 68 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multinomial distribution Reading: Random vector X is multinomially distributed with parameters N and p, where θ = p. Notation: X ∼ MultJ(N, p). If J = 2, then Bin (N, p) ≈ Mult2 (N, p) Realisation of one trial xji could be (1, 0, . . . , 0)T or (0, 1, . . . , 0)T . Example (number of individuals with certain blood type) Number of individuals X = (X1, X2, X3, X4)T with certain blood group is multinomially distributed following Hardy-Wienberg equilibrium, i.e. X = (X1, X2, X3, X4)T ∼ Mult4(N, p), where N = 500 (Katina et al. 2015). Calculate theoretical frequencies nj,E . attributes (groups) 0 A B AB nj,O 209 184 81 26 nj,E 210 183 80 27 69 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multinomial distribution Example (number of individuals with certain socioeconomic status, political philosophy and political affiliation) Number of individuals X1, . . . , X8 with socioeconomic status, political philosophy and political affiliation is multinomially distributed, i.e. X = (X1, . . . , X8)T ∼ Mult8(N, p), where p = (p1, p2, . . . , p8)T and N = 500 (Christensen 1990, modified). Calculate (a) Var[X1], (b) Var[X4], (c) Cov [X1, X4] and (d) Cor [X1, X4]. Table: 2 × 4 contingency table of probabilities pj D-C D-Li R-C R-Li total H 0.12 0.12 0.12 0.04 0.4 Lo 0.18 0.18 0.18 0.06 0.6 total 0.30 0.30 0.30 0.10 1.0 70 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multinomial distribution Notation: (1) socioeconomic status (high – H, low – Lo), (2) political philosophy (democrat – D, republican – R) a (3) political affiliation (conservative – C, liberal – Li). Then X1 (H-D-C), X2 (H-D-Li), X3 (H-R-C), X4 (H-R-Li), X5 (Lo-D-C), X6 (Lo-D-Li), X7 (Lo-R-C) and X8 (Lo-R-Li). Solution: Var[X1] = 500 × 0.12 × (1 − 0.12) = 52.8 Var[X4] = 500 × 0.04 × (1 − 0.04) = 19.2 Cov [X1, X4] = −500 × 0.12 × 0.04 = −2.4 Cor [X1, X4] = −2.4/ √ 52.8 × 19.2 = −0.075 What are the expected frequencies? Table: 2 × 4 contingency table of frequencies Xj D-C D-Li R-C R-Li H 60 60 60 20 Lo 90 90 90 30 71 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multi-hypergeometric distribution Definition (multi-hypergeometric distribution) Suppose we have k subpopulations of sizes Mj, where j = 1, 2, . . . , k, and k j=1 Mj = Npop, the total population size. Let pj = Mj/Npop. A simple random sample of size N is taken from the population yielding Xj from the j-th subpopulation. X = (X1, X2, . . . , Xk )T has multi-hypergeometric distribution. Then joint probability mass function of X is defined as f(x) = Pr(X = x) = k j=1 Mj xj / Npop N , where 0 ≤ xj ≤ min Mj, N and N = k j=1 xj. 72 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multi-hypergeometric distribution Since we can add the subpopulations together we see that the marginal distribution of an Xj is also hypergeometric, with two subpopulations Mj and N − Mj , namely fj (xj ) = Mj xj Npop−Mj N−xj / Npop N . In a similar fashion we see that the probability function of X1 + X2 is the multi-hypergeometric distribution, namely f12(x1, x2) = M1+M2 x1+x2 Npop−M1−M2 N−x1−x2 / Npop N . Additionally, Var[Xj ] = Npj (1 − pj )r, where r = (Npop − N)/(Npop − 1), and Var[X1 + X2] = Nr(p1 + p2)(1 − p1 − p2). Finally, the covariance of X1 and X2 is equal to Cov[X1, X2] = 1 2 (Var[X1 + X2] − Var[X1] − Var[X2]) = −rNp1p2. We then find that if qj = 1 − pj , then Var[X1 − X2] = Var[X1] + Var[X2] − 2Cov[X1, X2] = rN [p1q1 + p2q2 − 2p1p2] = rN p1 + p2 − (p1 − p2)2 . 73 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multi-hypergeometric distribution and two dependent proportions Suppose we have a population of Npop people and a sample of size N is chosen at random without replacement. Each selected person is asked two questions to each of which they answer yes (1) or no (2), so that p12 is the proportion answering yes to the first question and no to the second, p11 is the proportion answering yes to both questions, and so forth. Then the proportion answering yes to the first question is p1 = p11 + p12 and the proportion answering yes to the second question is p2 = p11 + p21. Let Xij (i, j = 1, 2) be the number observed in the sample in the category with probability pij , let X1 = X11 + X12 the number answering yes to the first question, and let X2 = X11 + X21 be the number answering yes to the second question. The interest is to compare p1 and p2 but p12 is often ignored (and p21 as well). 74 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multi-hypergeometric distribution and two dependent proportions The four variables Xij have a multi-hypergeometric distribution, and X1 N − X2 N = X1−X2 N = X12−X21 N = X12 N − X21 N E X1 N − X2 N = p1 − p2 = p12 − p21, Finally, Var X1 N − X2 N = 1 N2 Var[X1 − X2] = 1 N2 (Var[X1] + Var[X2] − 2Cov[X1, X2]) = r 1 N [p1q1 + p2q2 − 2p1p2] = r 1 N p1 + p2 − (p1 − p2)2 . 75 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Multi-hypergeometric distribution vs multinomial distribution If we can approximate sampling without replacement by sampling with replacement, we can set r = 1 above, and the multi-hypergeometric distribution can be replaced by the multinomial distribution. The Multinomial distribution also arises when we have N fixed Bernoulli trials but with k possible outcomes rather than just two, as with the binomial distribution. 76 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution Definition (product-multinomial distribution) Let Nk be number of independent identical (random) trials and in each of them J ≥ 2 distinct possible outcomes can occur, where Xkji = 1 is a success (event occurred) and Xkji = 0 is a failure (event did not occur), i = 1, 2, . . . , Nk , k = 1, 2, . . . , K, j = 1, 2, . . . , J. Number of successes Xkj = Nk i=1 Xkji and K k=1 Nk = N. Then probability of success of kj-th outcome in i-th trial is equal to Pr(Xkji = 1) = pkj (cell probabilities) and probability of failure of kj-th outcome in i-th trial is equal to Pr(Xkji = 0) = 1 − pkj. Let Xk = (Xk1, Xk2, . . . , XkJ)T be multinomially distributed with parameters Nk and pk , i.e. Xk ∼ MultJ (Nk , pk ), where θk = pk a pk = (pk1, pk2, . . . , pkJ)T . Let realisations of Xk be xk . Then xkj = nkj and nk = (nk1, nk2, . . . , nkJ)T . Additionally, Xk are independent. 77 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution The probability that random variables Xkj are equal to xkj = nkj (for all j and k) is defined as Pr(Xkj = xkj, ∀k, j) = K k=1 Pr(Xkj = xkj , ∀j). The probability that random variables Xkj are equal to xkj = nkj (for all j) is defined as Pr(Xkj = xkj, ∀j) =  Nk !/ J j=1 xkj!   J j=1 p xkj kj . Then Pr(Xkj = xkj, ∀k, j) = K k=1    Nk !/ J j=1 xkj!   J j=1 p xkj kj   . 78 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution Reading: Random matrix X is product-multinomially distributed with parameters N = (N1, N2, . . . , NK )T and p with the rows pk , where θk = pk , k = 1, 2, . . . , K. Notation: X ∼ ProdMultK (N, p). If K = 1, then MultJ (N, p ) ≈ ProdMult1 (N, p) Realisation of one trial xkij could be (1, 0, . . . , 0)T or (0, 1, . . . , 0)T . Then expected frequencies are equal to Nk pkj , within each Xk , variances Var[Xkj], covariances Cov[Xkj, Xki] and correlations Cor[Xkj, Xki] are calculated as for multinomial distribution, between Xk , e.g. Cov[X1, X2], k = 1, 2, are zeroes due to independence of Xk 79 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution Example (number of individuals with certain socioeconomic status, political philosophy and political affiliation) Number of individuals X = (X1, X2)T with socioeconomic status, political philosophy and political affiliation is product-multinomially distributed, i.e. X ∼ ProdMult2(N, p), where X1 = (X11, X12, X13, X14)T are number of individuals with high socioeconomic status, X2 = (X21, X22, X23, X24)T number of individuals with low socioeconomic status, pk = (p1|k , p2|k , . . . , pJ|k )T , pkj = pj|k = njk nk , k = 1, 2, N = (N1, N2)T , N1 = 200, N2 = 300 (Christensen 1990. modified). Calculate (a) probabilities pj|k , (b) expected frequencies, (c) Var[X4|1], (d) Cov X1|2, X4|2 and (e) Cov X1|1, X4|2 . 80 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution Notation: (1) socioeconomic status (high – H, low – Lo), (2) political philosophy (democrat – D, republican – R) a (3) political affiliation (conservative – C, liberal – Li). Then X1 (H-D-C), X2 (H-D-Li), X3 (H-R-C), X4 (H-R-Li), X5 (Lo-D-C), X6 (Lo-D-Li), X7 (Lo-R-C) and X8 (Lo-R-Li). Solution: Table: 2 × 4 contingency table of probabilities pj|k D-C D-Li R-C R-Li total H 0.3 0.3 0.3 0.1 1.0 Lo 0.3 0.3 0.3 0.1 1.0 81 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Product-multinomial distribution Table: 2 × 4 contingency table of frequencies nkj D-C D-Li R-C R-Li total H 60 60 60 20 200 Lo 90 90 90 30 300 Var[X4|1] = 200 × 0.1 × (1 − 0.1) = 18. Cov X1|2, X4|2 = −300 × 0.3 × 0.1 = −9, Cov X1|1, X4|2 = 0, due to the independence of X1 and X2. 82 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Poisson distribution Definition (Poisson distribution) Let X be random variable characterised by Poisson distribution, i.e. X ∼ Poiss(λ), where θ = λ. Then Pr(X = x) = λx e−λ x! , x = 0, 1, . . . , where x = n is realisation of X. Then E[X] = λ and Var[X] = λ. Binomial distribution can be approximated by Poisson distribution if N → ∞, p → 0 and λN = Np → λ, where X ∼ Poiss(λ). Poisson distribution can be approximated by χ2 distribution if N → ∞, p → 0 and λN = Np → λ and Pr(X ≤ y) = Pr(χ2 2(1+y) ≤ 2λ), where X ∼ Poiss(λ). 83 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Poisson distribution Example (Poisson distribution; number of car accidents per week) Having 50 million people driving car independently in Italy next week, the probability of car crash deaths (road traffic deaths) is 0.000002 (death rate), where number of deaths X is distributed binomially, i.e. Bin(50 mil, 0.000002) or Poiss(50 mil × 0.000002) ≈ Poiss(100). Example (Poisson distribution, three types of accidents) Let n1 be number of car crash deaths, n2 be number of airplane crash deaths, n3 be number of train crash deaths in Italy next week. Then Poisson model with parameters λ1, λ2 a λ3 for independent Poisson random variables X1, X2 a X3 is defined as X1 + X2 + X3 ∼ Poiss(λ1 + λ2 + λ3). Generalising this example we get X1 + X2 + . . . + XJ ∼ Poiss (λ1 + λ2 + . . . + λJ ) . 84 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Poisson distribution Multinomial distribution can be approximated by Poisson distribution (X1 + X2 + . . . + XJ ) |N ∼ MultJ (N, p1, p2, . . . , pJ ) , where N = j Xj and pj = λj j λj , j = 1, 2, . . . , J. If Xj , j = 1, 2, . . . , J are independent, Xj ∼ Poiss λj , where E[Xj ] = λj , then conditional probability, that all Xj = xj fixing (conditioning on) N = j Xj is equal to Pr  X = x| j Xj = N   = Pr(X1 = x1, X2 = x2, . . . , XJ = xJ ) Pr( j Xj = N) = j λ xj j e −λj xj ! λN e−λ N! = N!e−λ j λ xj j e−λ j λx j xj ! = N! j xj ! j λj λ xj , where pj = λj λ . 85 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Cumulative distribution function and density Definition (cumulative distribution function) Let X be random variable. The cumulative distribution function of X is defined as FX (x) = Pr(X ≤ x). for all x ∈ R, where R is called a domain and with 0, 1 as counterdomain. Properties of cumulative distribution function: 1 FX (−∞) = limx→−∞ FX (x) = 0, and FX (∞) = limx→∞ FX (x) = 1. 2 FX (x) is a monotone, nondecreasing function, i.e. FX (a) ≤ FX (b) for a < b. 3 FX (x) is right continuous in each argument, i.e. lim0 0. Definition (conditional discrete cumulative distribution function) Let X and Y be jointly discrete random variables, the discrete cumulative distribution function of Y given X = x is defined to be FY|X (y|x) = Pr[Y ≤ y, X = x] for all fX (x) > 0. Remark: FY|X (y|x) = j:yj ≤y fY|X (y|x). 93 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Conditional continuous density function and cumulative distribution function Definition (conditional continuous density function) Let X and Y be jointly continuous random variables with joint continuous density function fXY (x, y). The conditional continuous density function of Y given X = x is defined as fY|X (y|x) = fXY (x, y) fX (x) , if fX (x) > 0. Definition (conditional continuous cumulative distribution function) Let X and Y be jointly continuous random variables, the conditional continuous cumulative distribution function of Y given X = x is defined as FY|X (y|x) = Pr[Y ≤ y, X = x] for all fX (x) > 0. Remark: FY|X (y|x) = y −∞ fY|X (u|x)du. 94 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Conditional, joint and marginal distributions We can also write the following: ∞ −∞ fY|X (y|x)dy = ∞ −∞ fXY (x,y) fX (x) dy = 1 fX (x) ∞ −∞ fXY (x, y)dy = fX (x) fX (x) = 1. Example (joint normal density) Prove that the function fXY (x, y) = 1 A exp − 1 B x−μX σX 2 − 2ρx−μX σX y−μY σY + y−μY σY 2 where A = 2πσX σY 1 − ρ2, B = 2 1 − ρ2 , has the following property ∞ −∞ ∞ −∞ fXY (x, y)dxdy = 1. To simplify the integral, you shall substitute u = (x − μX )/σX and v = (y − μY )/σY , and then w = u−ρv√ 1−ρ2 and dw = du√ 1−ρ2 . 95 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Marginal normal density Theorem (marginal normal density) If (X, Y) has a bivariate normal distribution, then the marginal distributions of X and Y are univariate normal distributions, i.e. X is normally distributed with mean μX and variance σ2 X , and Y is normally distributed with mean μY and variance σ2 Y . Example (marginal normal density) Prove above mentioned theorem, e.g. for fX (x) = ∞ −∞ fXY (x, y)dy and substituting v = (y − μY )/σY . 96 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Conditional normal density Theorem (conditional normal density) If random vector (X, Y)T has a bivariate normal distribution, then the conditional distributions of Y given X = x is normal with mean μY + ρσY σX (x − μX ) and variance σ2 Y (1 − ρ2 ) and density fY|X (y|x) = 1 √ 2πσY 1 − ρ2 exp − 1 2σ2 Y (1 − ρ2) y − μY − ρ σY σX (x − μX ) . Example (conditional normal density) Prove above mentioned theorem using joint and marginal normal densities, i.e. prove that fY|X (y|x) = fXY (x,y) fX (x) . 97 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Stochastic independence Definition (stochastic independence) Let (X1, X2, . . . , Xk )T be a k-dimensional random vector. X1, X2, . . . , Xk are defined to be stochastically independent if and only if FX1,...,Xk (x1, . . . , xk ) = k j=1 FXj (xj ) for all x1, . . . , xk . Definition (stochastic independence) Let (X1, X2, . . . , Xk )T be a k-dimensional random vector. X1, X2, . . . , Xk are defined to be stochastically independent if and only if fX1,...,Xk (x1, . . . , xk ) = k j=1 fXj (xj ) for all x1, . . . , xk . Remark: Often the word ”stochastically” is omitted. 98 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Assignments in Assignment number of individuals with certain socioeconomic status, political philosophy and affiliation: 1 What is the number of all 2 × 4 contingency table with N = 50? n+k−1 k = 57 8 = 57 49 = 1652411475 1 choose(57,49) 2 choose(57,8) 2 What is the probability of getting the following 2 × 4 contingency table? D-C D-Li R-C R-Li H 5 7 6 4 Lo 8 7 10 3 Pr(X1 = x1, X2 = x2, . . . , X8 = x8) = 50! 5!7!4!6!8!7!3!10! 0.125 0.127 0.044 0.126 0.188 0.187 0.063 0.1810 = 2.332506 × 10−6 3 n <- c(5,7,6,4,8,7,10,3) 4 p <- c(.12,.12,.12,.04,.18,.18,.18,.06) 5 dmultinom(x=n, prob=p) # 2.332506e-06 99 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Assignments in Assignment number of individuals with certain socioeconomic status, political philosophy and affiliation: 1 What is the most probable 2 × 4 contingency table and what is the probability of getting it? D-C D-Li R-C R-Li H 6 6 6 2 Lo 9 9 9 3 Pr(X1 = x1, X2 = x2, . . . , X8 = x8) = 50! 6!6!2!6!9!9!3!9! 0.126 0.126 0.126 0.042 0.189 0.189 0.189 0.063 = 1.020471 × 10−5 4.375× more than in (2) 6 n <- c(6,6,6,2,9,9,9,3) 7 p <- c(.12,.12,.12,.04,.18,.18,.18,.06) 8 dmultinom(x=n, prob=p) # 1.020471e-05 2 Draw probability mass function of number of possible 2 × 4 contingency tables with N = 50. 100 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Distributions for circular data – uniform and wrapped normal distribution Example (histogram on a circle, rose diagram) A wind rose is a graphic tool used by meteorologists to give a succinct view of how wind speed and direction are typically distributed at a particular location. In statistics, it is called bivariate histogram. Visualise in wind rose of wind speed Xs in m/s (for a reference 1 m/s = 3.6 km/h) and wind direction Xd in dgr of simulated data: (A) Xd ∼ Unif(a, b), where a = 0 and b = 360, Xs ∼ Gamma(λ, k), where λ = 50 and k = 1 (Gamma(λ, 1) ≈ Exp(λ)), n = 1000. (B) Xd ∼ WN(μ, ρ), where μ = 0 and ρ = exp(−σ2 /2), σ = 0.5, Xs ∼ Gamma(λ, k), where λ = 50 and k = 1 (Gamma(λ, 1), n = 1000. Use library(circular) and function windrose(). To visualise wind speed use also function topo.colors(k). Be careful with colour scaling of k ordered intervals of wind speed. Visualise also rose diagrams, data and averages of wind direction (the latter when appropriate) and compare it with wind rose (orientation, scaling, etc.). 101 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Distributions for circular data – uniform and wrapped normal distribution N E S W + q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q uniform distribution N E S W + q q q q q q q q qq q q q q q q q qq q q q q q q q q q qq qq q q q q q q q qqq q qq q q qq qq qqq qq qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qqq q q q q q q q q q q q q qq q qqq q qq q qq q q q q q qq q q q q qq q q q q q q qq q qq qq q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q qq q q q q qq q qq q q q q q qq q qq q qq q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q qq q q q q qq q q qq q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q qq q q q q qq q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q qq q qq q qqq q q q q q q q q qq q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q qqq q q qq q q qq qqq q qq q q q qqq q q q q q q qq qqq q q q qqqq q q q q qq q q q q q q q q q q q q q qq q q q q qq q q q q q qqq q q q q q q q q q qq qqqq qqq q q q qq q q q qq q qq q q q q q q q q q q q q q q q q q qq q q q q q q qq qq q q qq q q q q q qq q q q q q qq q q qq q q q q qq q q q q q q q q q q q q q q qq q q q qqq q qq q q q q qq q q q q q q qq q q q q q qq q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q qqqqqqq q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q qq q q q q q q q q q q q q q qq q q q q q q q q qq q q q qq q q q q q q q q q q q qqqq q qq q q q q qq q q q qqq q q q qq q q q q qq q q qqqqq q q q q q q q q qq q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q qq q q qqqq q q qq q q q q q q qq qqqq q q q q q q q qq q qq q q q qq q q q q q q q q q q qq q q qq q q q q q qq q qq q q q q q q q qq q q q qq q q q q q q q wrapped normal distribution 102 / 152 Stanislav Katina Statistical Inference I and II Statistical Graphics Distributions for circular data – uniform and wrapped normal distribution 0.2 N E S W + uniform distribution of wind direction 0.2 0.4 0.6 0.8 N E S W + wrapped normal distribution of wind direction 103 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Poisson process and marked Poisson process A common phenomenon is the arrival or occurrence of an event at a time t independently of the time of previous occurrence of the events – events on nonoverlapping time intervals are mutually independent. In addition, the average rate of arrivals is constant. The Poisson probability mass function (pmf) is a good model for the number of arrivals in an interval t and in general we call it a Poisson process. Typical applications include occurrence of earthquakes. As we increase the rate, the pmf would be more and more like a normal distribution. We are interested in determining: the pmf of the number arrivals in a time interval t, the probability density function (pdf) of the arrival time of the kth occurrence (e.g. k = 0, k = 1, k > 1), and the pdf of the time interval between arrivals of successive occurrences (interarrival time). 104 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Poisson process, marked and compound Poisson process Note: This process refers to arrivals on a continuous line. For many applications, this line is time, but for others it may be considered a spatial domain of dimension one; e.g., a transect along an ecosystem, or the midline of a river, or a road. It is of interest also to include some quantity (a mark) to the occurrence of the event at time t. For earthquakes, this quantity may be intensity, magnitude, and energy. For rain events, the quantity may be rainfall intensity. Associating a quantity yi to the time ti we have a marked Poisson process. We assume that the random variable describing quantity is independent from the random variable describing arrival times. The sum of all marks for arrivals occurring in the interval t is called a compound Poisson process. 105 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Marked Poisson process, generalised gamma family of distributions As an example, think about modeling rainfall for every day of a month. A rainy or wet day be decided upon a Poisson process, and the mark would be the amount of rain for that day if it is a wet day. The frequency distribution of rainfall in rainy days at a site determines the amount of rain, once a day is selected as wet (Richardson and Nicks, 1990). Daily rainfall distribution is skewed toward low values and it varies month to month according to climatic records. The most typical distributions for rainfall amount are: exponential and Weibull, gamma and generalised gamma, and skewed normal, log-normal and log-logistic. Note: In general, most of these distributions are from generalised gamma family or related distributions. 106 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Simulation of marked Poisson process – rainfall 0 5 10 15 20 25 30 01234 day run 1, rain days = 9 rain(cm/day) 0 5 10 15 20 25 30 01234 day run 2, rain days = 11 rain(cm/day) 0 5 10 15 20 25 30 01234 day run 3, rain days = 12 rain(cm/day) 0 5 10 15 20 25 30 01234 day run 4, rain days = 9 rain(cm/day) 0 5 10 15 20 25 30 01234 day run 5, rain days = 7 rain(cm/day) 0 5 10 15 20 25 30 01234 day run 6, rain days = 10 rain(cm/day) 0 5 10 15 20 25 30 01234 day run 7, rain days = 15 rain(cm/day) 0 5 10 15 20 25 30 01234 day run 8, rain days = 8 rain(cm/day) 0 5 10 15 20 25 30 01234 day run 9, rain days = 12 rain(cm/day) Figure: Amount of rain for a day (cm/day) during 30-day period – marked Poisson process simulation 107 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Simulation of marked Poisson process – rainfall 0.0 1.0 2.0 3.0 0.00.51.01.5 rain (cm/day) run 1, rain days = 9 relativefrequency 0.0 0.2 0.4 0.6 0.8 1.0 0123 rain (cm/day) run 2, rain days = 11 relativefrequency 0.0 0.4 0.8 1.2 0123 rain (cm/day) run 3, rain days = 12 relativefrequency 0.0 0.5 1.0 1.5 01234 rain (cm/day) run 4, rain days = 9 relativefrequency 0.0 0.5 1.0 1.5 2.0 0.00.51.01.5 rain (cm/day) run 5, rain days = 7 relativefrequency 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0123 rain (cm/day) run 6, rain days = 10 relativefrequency 0.0 0.5 1.0 1.5 0.00.51.01.52.02.53.0 rain (cm/day) run 7, rain days = 15 relativefrequency 0.0 0.2 0.4 0.6 0.8 1.0 1.2 01234 rain (cm/day) run 8, rain days = 8 relativefrequency 0 1 2 3 4 0.00.51.01.5 rain (cm/day) run 9, rain days = 12 relativefrequency Figure: Amount of rain for a day (cm/day) during 30-day period – marked Poisson process simulation 108 / 152 Stanislav Katina Statistical Inference I and II Probabilistic and Statistical Models Simulation of marked Poisson process – rainfall 0 5 10 15 20 25 30 35 0.000.020.040.060.080.100.12 650 211 92 33 8 5 1 rain (cm/day) exponential distributionrelativefrequency 0 10 20 30 40 50 0.000.010.020.030.040.050.06 333 310 189 88 46 26 7 0 0 1 rain (cm/day) Weibull distribution relativefrequency 0 5 10 15 20 25 30 35 0.000.050.100.15 734 172 64 18 8 3 1 rain (cm/day) gamma distribution relativefrequency 0 10 20 30 40 0.000.010.020.030.040.050.06 3 81 306 297 184 93 29 5 2 rain (cm/day) skewed normal distributionrelativefrequency Figure: Daily rainfall amount – simulations, numer of days n = 1000 109 / 152 Stanislav Katina Statistical Inference I and II