Linear Models in Statistics II Lecture notes Andrea Kraus Spring 2017 Generalized linear models Logistic regression 1 Generalized linear models From linear to generalized linear models Generalized linear models Inference for generalized linear models 2 Logistic regression The model Logistic curve and its parameters Fitted model Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 0 / 32 Generalized linear models Logistic regression 1 Generalized linear models From linear to generalized linear models Generalized linear models Inference for generalized linear models 2 Logistic regression The model Logistic curve and its parameters Fitted model Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 0 / 32 Generalized linear models Logistic regression From linear to generalized linear models Linear model: a reminder Yi = β0 + β1xi,1 + . . . + βkxi,k + εi , i ∈ {1, . . . , n} Yi : outcome, response, output, dependent variable • random variable, we observe a realization yi • (odezva, z´avisle promˇenn´a, regresand) xi,1, . . . , xi,k : covariates, predictors, explanatory variables, input, independent variables • given, known • (nez´avisle promˇenn´e, regresory) β0, . . . , βk : coefficients • unknown • (regresn´ı koeficienty) εi : random error • random variable, unobserved εi iid ∼ (0, σ2), i ∈ {1, . . . , n} E εi = 0: no systematic errors Var εi = σ2 : same precision we often assume that εi iid ∼ N(0, σ2), i ∈ {1, . . . , n} Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 1 / 32 Generalized linear models Logistic regression From linear to generalized linear models Example: bloodpress data from sites.stat.psu.edu/~lsimon/stat501wc/sp05/data/ association between the mean arterial blood pressure[mmHg] and age[years], weight[kg], body surface area[m2], duration of hypertension[years], basal pulse[beats/min], stress data: BP Age Weight BSA DoH Pulse Stress 105 47 85.4 1.75 5.1 63 33 115 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 110 48 90.5 1.88 9.0 71 99 122 56 95.7 2.09 7.0 75 99 model: Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 2 / 32 Generalized linear models Logistic regression From linear to generalized linear models Non-normal outcome linear model: Y = Xβ + ε outcome Y • random vector, we observe a realization y predictors x,1, . . . , x,k • vector of given (known) constants coefficients β • vector of unknown constants error ε • unknown random vector, we do not observe its realization assumptions: ε ∼ (0, σ2 I) • E Y = Xβ: the expected value of Y is a linear function of β • E ε = 0: no systematic errors • Var ε = σ2 I: independence and same precision normality not crucial with a large data set without influential observations BUT what if Y is nowhere close to normal? e.g. what if Yi ∈ {0, 1} ∀i? Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 3 / 32 Generalized linear models Logistic regression From linear to generalized linear models Example: heart attack data From: Simon Wood (2006). Generalized Additive models Question: Is the level of creatinine kinase (CK) in blood a marker of an on-going heart attack (HA)? Data: CK level HA (yes:1, no:0) 20 1 20 1 20 0 20 0 20 0 20 0 20 0 20 0 20 0 20 0 20 0 . . . X . . . Y Structure: outcome Y • random vector, we observe a realization y, yi ∈ {0, 1} ∀i predictors x,1, . . . , x,k • vector of given (known) constants Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 4 / 32 Generalized linear models Logistic regression From linear to generalized linear models Binary outcome outcome Y random vector, we observe a realization y, yi ∈ {0, 1} ∀i predictors x,1, . . . , x,k vector of given (known) constants coefficients β vector of unknown constants how to connect Y, X, and β, so that we can describe the relationship between Y and X using β? clearly not Y = Xβ + ε, ε ∼ N(0, σ2 I) Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 5 / 32 Generalized linear models Logistic regression From linear to generalized linear models Binary outcome outcome Y random vector, we observe a realization y, yi ∈ {0, 1} ∀i predictors x,1, . . . , x,k vector of given (known) constants coefficients β vector of unknown constants how to connect Y, X, and β, so that we can describe the relationship between Y and X using β? clearly not Y = Xβ + ε, ε ∼ N(0, σ2 I) how about picking up on E Y = Xβ? Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 5 / 32 Generalized linear models Logistic regression From linear to generalized linear models Binary outcome outcome Y random vector, we observe a realization y, yi ∈ {0, 1} ∀i predictors x,1, . . . , x,k vector of given (known) constants coefficients β vector of unknown constants how to connect Y, X, and β, so that we can describe the relationship between Y and X using β? clearly not Y = Xβ + ε, ε ∼ N(0, σ2 I) how about picking up on E Y = Xβ? Yi ∈ {0, 1} ⇒ Yi ∼ Bernoulli(pi ) = Bi(1, pi ) Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 5 / 32 Generalized linear models Logistic regression From linear to generalized linear models Binary outcome outcome Y random vector, we observe a realization y, yi ∈ {0, 1} ∀i predictors x,1, . . . , x,k vector of given (known) constants coefficients β vector of unknown constants how to connect Y, X, and β, so that we can describe the relationship between Y and X using β? clearly not Y = Xβ + ε, ε ∼ N(0, σ2 I) how about picking up on E Y = Xβ? Yi ∈ {0, 1} ⇒ Yi ∼ Bernoulli(pi ) = Bi(1, pi ) E Yi = pi = f (xi,·, β) how about E Yi = xi,·β? (i.e. E Y = Xβ again) Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 5 / 32 Generalized linear models Logistic regression From linear to generalized linear models Binary outcome outcome Y random vector, we observe a realization y, yi ∈ {0, 1} ∀i predictors x,1, . . . , x,k vector of given (known) constants coefficients β vector of unknown constants how to connect Y, X, and β, so that we can describe the relationship between Y and X using β? clearly not Y = Xβ + ε, ε ∼ N(0, σ2 I) how about picking up on E Y = Xβ? Yi ∈ {0, 1} ⇒ Yi ∼ Bernoulli(pi ) = Bi(1, pi ) E Yi = pi = f (xi,·, β) how about E Yi = xi,·β? (i.e. E Y = Xβ again) E Yi is a probability ⇒ must be in [0, 1] . . . unlike xi,· β Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 5 / 32 Generalized linear models Logistic regression From linear to generalized linear models Binary outcome outcome Y random vector, we observe a realization y, yi ∈ {0, 1} ∀i predictors x,1, . . . , x,k vector of given (known) constants coefficients β vector of unknown constants how to connect Y, X, and β, so that we can describe the relationship between Y and X using β? clearly not Y = Xβ + ε, ε ∼ N(0, σ2 I) how about picking up on E Y = Xβ? Yi ∈ {0, 1} ⇒ Yi ∼ Bernoulli(pi ) = Bi(1, pi ) E Yi = pi = f (xi,·, β) how about E Yi = xi,·β? (i.e. E Y = Xβ again) E Yi is a probability ⇒ must be in [0, 1] . . . unlike xi,· β how about E Yi = g−1 xi,· β , where g : (0, 1) → R? Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 5 / 32 Generalized linear models Logistic regression From linear to generalized linear models Binary outcome outcome Y random vector, we observe a realization y, yi ∈ {0, 1} ∀i predictors x,1, . . . , x,k vector of given (known) constants coefficients β vector of unknown constants how to connect Y, X, and β, so that we can describe the relationship between Y and X using β? clearly not Y = Xβ + ε, ε ∼ N(0, σ2 I) how about picking up on E Y = Xβ? Yi ∈ {0, 1} ⇒ Yi ∼ Bernoulli(pi ) = Bi(1, pi ) E Yi = pi = f (xi,·, β) how about E Yi = xi,·β? (i.e. E Y = Xβ again) E Yi is a probability ⇒ must be in [0, 1] . . . unlike xi,· β how about E Yi = g−1 xi,· β , where g : (0, 1) → R? logistic regression: E Yi = exp xi,· β / 1 + exp xi,· β Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 5 / 32 Generalized linear models Logistic regression Generalized linear models Generalized linear model outcome Y random vector, we observe a realization y predictors x,1, . . . , x,k vector of given (known) constants coefficients β vector of unknown constants model: Yi iid ∼ L L a probability distribution • from an exponential family of distributions • density/probability mass function satisfies that f (y; θ, ϕ) = exp θy − b(θ) ϕ + c(y, ϕ) and there are some assumptions on b • E Y = b (θ), Var Y = ϕ b (θ) g(E Yi ) = xi,· β • g is called link function Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 6 / 32 Generalized linear models Logistic regression Generalized linear models GLM example: linear regression exponential family of distributions f (y; θ, ϕ) = exp θy−b(θ) ϕ + c(y, ϕ) Gaussian distribution Yi iid ∼ N(µ, σ2 ) f (y; µ, σ2 ) = 1 √ 2πσ2 exp − 1 2σ2 (y − µ)2 = exp µ y − µ2 /2 σ2 − 1 2σ2 y2 − 1 2 log(2πσ2 ) link function g(E Yi ) = xi,· β E Y = µ canonical link: identity g(x) = x linear regression log-link may help address heteroskedasticity in linear regression Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 7 / 32 Generalized linear models Logistic regression Generalized linear models GLM example: Gamma regression exponential family of distributions f (y; θ, ϕ) = exp θy−b(θ) ϕ + c(y, ϕ) Gamma distribution Yi iid ∼ Gamma(α, β) f (y; α, β) = βα yα−1 e−βy Γ(α) = exp {α log(β) + (α − 1) log(y) − βy − log(Γ(α))} = exp (−β/α) y + log(β/α) 1/α + α log(α y) − log Γ(α) y link function g(E Yi ) = xi,· β E Y = α/β common links: g(x) = 1/x (canonical), g(x) = log(x) used for skewed non-negative data addresses heteroskedasticity and heavy tail (e.g. the size of an insurance claim) but other choices possible as well (log-normal, inverse Gaussian, . . . )Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 8 / 32 Generalized linear models Logistic regression Generalized linear models Log link logarithm g(λ) = log(λ) : (0, ∞) → R exponential g−1 (x) = ex : R → (0, ∞) 0 10 20 30 40 50 −3−2−101234 Logarithm λ log(λ) −6 −2 0 2 4 6 02004006008001000 Exponential x ex Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 9 / 32 Generalized linear models Logistic regression Generalized linear models GLM example: log-linear model exponential family of distributions f (y; θ, ϕ) = exp θy−b(θ) ϕ + c(y, ϕ) Poisson distribution Yi iid ∼ Po(λ) f (y; λ) = e−λ λy y! = exp log(λ) y − λ − log(y!) link function g(E Yi ) = xi,· β E Y = λ canonical link: log link: g(x) = log (x) convenient way of handling contingency tables used to model e.g. the number of insurance claims has connections to Cox PH model Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 10 / 32 Generalized linear models Logistic regression Generalized linear models GLM example: logistic regression exponential family of distributions f (y; θ, ϕ) = exp θy−b(θ) ϕ + c(y, ϕ) Bernoulli distribution: Yi iid ∼ Bernoulli(p) f (y; p) = py (1 − p)1−y = exp y log(p) + (1 − y) log(1 − p) = exp log p 1 − p y + log(1 − p) link function g(E Yi ) = xi,· β E Y = p canonical link: “logit” g(x) = log x 1 − x other common choices: “probit”, “complementary log-log” used e.g. in credit risk analysis (probability of default, classification) Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 11 / 32 Generalized linear models Logistic regression Generalized linear models “Logit” link “logit” g(p) = log p 1 − p : (0, 1) → R “expit” g−1 (x) = ex 1 + ex : R → (0, 1) 0.0 0.2 0.4 0.6 0.8 1.0 −6−4−20246 Logit p log(p/(1−p)) −6 −2 0 2 4 6 0.00.20.40.60.81.0 Expit x exp(x)/(1+exp(x)) Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 12 / 32 Generalized linear models Logistic regression Inference for generalized linear models MLE for θ in exponential families exponential family of distributions f (y; θ, ϕ) = exp θy − b(θ) ϕ + c(y, ϕ) likelihood L(y; θ, ϕ) = n i=1 f (yi ; θ, ϕ) = exp θ ϕ n i=1 yi − n b(θ) ϕ + n i=1 c(yi , ϕ) log-likelihood (y; θ, ϕ) = θ ϕ n i=1 yi − n b(θ) ϕ + n i=1 c(yi , ϕ) score function (the θ-related part) U1(y; θ, ϕ) = ∂ ∂θ (y; θ, ϕ) = 1 ϕ n i=1 yi − n b (θ) ϕ solution to the score equation (the θ-related part) U1(y; θ, ϕ) = 0 ⇔ 1 n n i=1 yi = b (θ) Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 13 / 32 Generalized linear models Logistic regression Inference for generalized linear models From MLE for θ to MLE for β under some assumptions on the exponential family ∃ unique MLE for θ: ˆθ = b −1 1 n n i=1 yi it can be shown that √ n ˆθ − θ d −→ N 0, ϕ (b (θ)) −1 note that • ˆθ does not depend on ϕ ⇒ we do not need ϕ for point estimation of θ • the (asymptotic) variance of ˆθ depends on ϕ ⇒ ⇒ we do need ˆϕ for interval estimation of θ Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 14 / 32 Generalized linear models Logistic regression Inference for generalized linear models From MLE for θ to MLE for β under some assumptions on the exponential family ∃ unique MLE for θ: ˆθ = b −1 1 n n i=1 yi it can be shown that √ n ˆθ − θ d −→ N 0, ϕ (b (θ)) −1 note that • ˆθ does not depend on ϕ ⇒ we do not need ϕ for point estimation of θ • the (asymptotic) variance of ˆθ depends on ϕ ⇒ ⇒ we do need ˆϕ for interval estimation of θ this is all very nice BUT in a GLM g(E Yi ) = xi,· β = b(θi ) • so there is θi , not θ • and we want to estimate β Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 14 / 32 Generalized linear models Logistic regression Inference for generalized linear models Estimating β in GLMs log-likelihood for an exponential family: (y; θ, ϕ) = θ ϕ n i=1 yi − n b(θ) ϕ + n i=1 c(yi , ϕ) log-likelihood for a GLM: (y; β, ϕ) = 1 ϕ n i=1 θi yi − b(θi ) + n i=1 c(yi , ϕ) score function (the β-related part) U1:p(y; β, ϕ) = ∂ ∂β (y; β, ϕ) = 1 ϕ n i=1 yi − b (θi ) ∂ ∂β θi no closed-form solution ⇒ numerical solution through the Iteratively Re-weighted Least Squares algorithm usually converges fast; if not, we might have a deeper problem • no guarantee that a solution exists • no guarantee that a solution is the MLE unless the link is canonical Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 15 / 32 Generalized linear models Logistic regression Inference for generalized linear models Inference for β β is a MLE ⇒ we can use general theory on the asymptotic properties of the MLEs the results are asymptotic (i.e. for large n) hold under some assumptions but “if all is well”. . . 1 β is a consistent estimator of β 2 √ n β − β d −→ N 0, I−1 (β, ϕ) 3 2 Y; β, ϕ − Y; β, ϕ d −→ χ2 p inference for β 1 tells us we are eventually getting what we want 2 is a basis for Wald tests and CIs about β 3 and similar results basis for likelihood ratio tests and CIs for β • CIs are based on profile likelihood • recommended over Wald (or Rao) tests and CIs we have treated ϕ as fixed so far but we need to estimate it in order to get the test statistics and CIs Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 16 / 32 Generalized linear models Logistic regression Inference for generalized linear models Deviance a model that fits E Yi = Yi is called saturated model in GLMs has a parameter for each unique covariate combination unscaled deviance is ϕ× the difference between the maximized log-likelihood in the saturated and current model D y, β = 2ϕ (saturated model) − (y; β, ϕ) a goodness-of-fit measure a generalization of the residual sum of squares from LM scaled deviance is the difference between the maximized log-likelihood in the saturated and current model D∗ y, β = 2 (saturated model) − (y; β, ϕ) difference between deviances of two nested models is the test statistic of the likelihood ratio test (with ϕ replaced by ˆϕ) other goodness of fit and model selection tools AIC, BIC, . . . Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 17 / 32 Generalized linear models Logistic regression Inference for generalized linear models Residuals and model diagnostics Pearson residuals rP i = Yi − E Yi Var Yi Var rP i ≈ ϕ(1 − hi,i ) (H comes from the WLS in IRLS) standardized Pearson residuals rSP i = Yi − E Yi ˆϕ Var Yi (1 − hi,i ) deviance residuals rD i = sgn(Yi − E Yi ) di d2 i is the contribution of the ith observation to the deviance standardized deviance residuals rSD i = sgn(Yi − E Yi ) di ˆϕ (1 − hi,i ) residuals can be used for residual plots as in LM a generalization of leverage and Cook’s distance from LM available for GLM Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 18 / 32 Generalized linear models Logistic regression 1 Generalized linear models From linear to generalized linear models Generalized linear models Inference for generalized linear models 2 Logistic regression The model Logistic curve and its parameters Fitted model Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 18 / 32 Generalized linear models Logistic regression The model Logistic regression outcome Y random vector, we observe a realization y, yi ∈ {0, 1} ∀i predictors x,1, . . . , x,k vector of given (known) constants coefficients β vector of unknown constants model: Yi iid ∼ Bernoulli(pi ) pi = exp xi,· β 1+exp xi,· β with “logit” link: g(p) = log p 1−p less common choices for the link function: • pi = Φ xi,·β with Φ the distribution function of N(0, 1) with “probit” link g(p) = Φ−1 (p) • pi = 1 − exp − exp{xi,· β} with “complementary log-log” link g(p) = log − log(1 − p) Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 19 / 32 Generalized linear models Logistic regression The model “Logit” link “logit” g(p) = log p 1 − p : (0, 1) → R “expit” g−1 (x) = ex 1 + ex : R → (0, 1) 0.0 0.2 0.4 0.6 0.8 1.0 −6−4−20246 Logit p log(p/(1−p)) −6 −2 0 2 4 6 0.00.20.40.60.81.0 Expit x exp(x)/(1+exp(x)) Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 20 / 32 Generalized linear models Logistic regression The model Example: heart attack data Is the level of creatinine kinase (CK) in blood a marker of an on-going heart attack (HA)? Data: CK level HA (yes:1, no:0) 20 1 20 1 20 0 20 0 20 0 20 0 20 0 20 0 20 0 20 0 20 0 . . . . . . . . . . . . . . . . . . Data (equivalent form): CK level Nr. of HAs Nr. of no HAs 20 2 88 60 13 26 100 30 8 140 30 5 180 21 0 220 19 1 260 18 1 300 13 1 340 19 1 380 15 0 420 7 0 460 8 0 Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 21 / 32 Generalized linear models Logistic regression The model Binomial form for the heart attack data Data CK level Nr. of HAs Nr. of no HAs 20 2 88 60 13 26 100 30 8 140 30 5 180 21 0 220 19 1 260 18 1 300 13 1 340 19 1 380 15 0 420 7 0 460 8 0 Observed proportions q q q q q q q q q q q q 100 200 300 400 0.00.20.40.60.81.0 CK level ProportionofHA Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 22 / 32 Generalized linear models Logistic regression Logistic curve and its parameters Model fit: a logistic curve “logit” and “expit” 0.0 0.2 0.4 0.6 0.8 1.0 −6−4−20246 Logit p log(p/(1−p)) −6 −2 0 2 4 6 0.00.20.40.60.81.0 Expit x exp(x)/(1+exp(x)) fitted logistic curve for the heart attack data q q q q q q q q q q q q 100 200 300 400 0.00.20.40.60.81.0 CK level ProportionofHA Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 23 / 32 Generalized linear models Logistic regression Logistic curve and its parameters Fitted model for the heart attack data > summary(glm.ha) Call: glm(formula = cbind(ha.ha, ha.ok) ~ ck, family = "binomial", data = heart.attack) Deviance Residuals: Min 1Q Median 3Q Max -3.08184 -1.93008 0.01652 0.41772 2.60362 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.758358 0.336696 -8.192 2.56e-16 *** ck 0.031244 0.003619 8.633 < 2e-16 *** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 271.712 on 11 degrees of freedom Residual deviance: 36.929 on 10 degrees of freedom AIC: 62.334 Number of Fisher Scoring iterations: 6 Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 24 / 32 Generalized linear models Logistic regression Logistic curve and its parameters Fitted logistic curve for the heart attack data Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.758358 0.336696 -8.192 2.56e-16 *** ck 0.031244 0.003619 8.633 < 2e-16 *** fitted probability p(ck) = exp{−2.76 + 0.03ck} 1 + exp{−2.76 + 0.03ck} q q q q q q q q q q q q 100 200 300 400 0.00.20.40.60.81.0 CK level ProportionofHA Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 25 / 32 Generalized linear models Logistic regression Logistic curve and its parameters Interpretation of the parameters fitted probability p(ck) = exp{−2.76 + 0.03 ck} 1 + exp{−2.76 + 0.03 ck} is there a nice way to see ˆβ1 = 0.03? odds p 1 − p odds ratio p 1 − p / ˜p 1 − ˜p e ˆβ1 is the estimated odds ratio for two patients whose difference in CK level is one unit estimated odds for heart attack become e ˆβ1 = 1.03 times higher when the CK level increases by one unit with more covariates the interpretation remains the same when the values of all other covariates are kept fixed Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 26 / 32 Generalized linear models Logistic regression Fitted model Fitted model for the heart attack data > summary(glm.ha) Call: glm(formula = cbind(ha.ha, ha.ok) ~ ck, family = "binomial", data = heart.attack) Deviance Residuals: Min 1Q Median 3Q Max -3.08184 -1.93008 0.01652 0.41772 2.60362 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.758358 0.336696 -8.192 2.56e-16 *** ck 0.031244 0.003619 8.633 < 2e-16 *** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 271.712 on 11 degrees of freedom Residual deviance: 36.929 on 10 degrees of freedom AIC: 62.334 Number of Fisher Scoring iterations: 6 Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 27 / 32 Generalized linear models Logistic regression Fitted model Inference for the heart attack data Wald test statistics (and confidence intervals) > summary(glm.ha) ... Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.758358 0.336696 -8.192 2.56e-16 *** ck 0.031244 0.003619 8.633 < 2e-16 *** likelihood ratio confidence intervals (preferred) > confint(glm.ha) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) -3.46305890 -2.13705606 ck 0.02467179 0.03889618 likelihood ratio test (preferred) > anova(glm.ha.null, glm.ha, test="Chisq") Analysis of Deviance Table Model 1: cbind(ha.ha, ha.ok) ~ 1 Model 2: cbind(ha.ha, ha.ok) ~ ck Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 11 271.712 2 10 36.929 1 234.78 < 2.2e-16 *** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 28 / 32 Generalized linear models Logistic regression Fitted model Fitted model for the heart attack data > summary(glm.ha) Call: glm(formula = cbind(ha.ha, ha.ok) ~ ck, family = "binomial", data = heart.attack) Deviance Residuals: Min 1Q Median 3Q Max -3.08184 -1.93008 0.01652 0.41772 2.60362 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.758358 0.336696 -8.192 2.56e-16 *** ck 0.031244 0.003619 8.633 < 2e-16 *** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 271.712 on 11 degrees of freedom Residual deviance: 36.929 on 10 degrees of freedom AIC: 62.334 Number of Fisher Scoring iterations: 6 Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 29 / 32 Generalized linear models Logistic regression Fitted model Goodness of fit for the heart attack data > summary(glm.ha) ... Null deviance: 271.712 on 11 degrees of freedom Residual deviance: 36.929 on 10 degrees of freedom AIC: 62.334 null deviance: deviance of the null model (only intercept) residual deviance: deviance of the current model a generalization of the proportion explained > (271.712 - 36.929)/271.712 [1] 0.8640877 residual variance should be ≈ χ2 10 if the model is OK: deviance sometimes used for goodness of fit (caution. . . ) but primary use is for model comparison > 1-pchisq(36.929, df=10) [1] 5.821642e-05 other measures of goodness of fit/model comparison/selection > AIC(glm.ha) [1] 62.3339 > BIC(glm.ha) [1] 63.30371 Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 30 / 32 Generalized linear models Logistic regression Fitted model Fitted model for the heart attack data > summary(glm.ha) Call: glm(formula = cbind(ha.ha, ha.ok) ~ ck, family = "binomial", data = heart.attack) Deviance Residuals: Min 1Q Median 3Q Max -3.08184 -1.93008 0.01652 0.41772 2.60362 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.758358 0.336696 -8.192 2.56e-16 *** ck 0.031244 0.003619 8.633 < 2e-16 *** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 271.712 on 11 degrees of freedom Residual deviance: 36.929 on 10 degrees of freedom AIC: 62.334 Number of Fisher Scoring iterations: 6 Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 31 / 32 Generalized linear models Logistic regression Fitted model Example: diagnostic plots for the heart attack data −2 0 2 4 6 8 10 12 −3−2−10123 Predicted values Residuals q q q q q q q q q q q q glm(cbind(ha.ha, ha.ok) ~ ck) Residuals vs Fitted 1 9 3 q q q q q q q q q qqq −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 −40−20020 Theoretical Quantiles Std.devianceresid. glm(cbind(ha.ha, ha.ok) ~ ck) Normal Q−Q 1 3 9 −2 0 2 4 6 8 10 12 01234567 Predicted values Std.devianceresid. q q q q q q q q q q q q glm(cbind(ha.ha, ha.ok) ~ ck) Scale−Location 1 3 9 2 4 6 8 10 12 05101520 Obs. number Cook'sdistance glm(cbind(ha.ha, ha.ok) ~ ck) Cook's distance 1 3 9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 −10−50 Leverage Std.Pearsonresid. q q q q q q q q q qqq glm(cbind(ha.ha, ha.ok) ~ ck) Cook's distance 1 0.5 0.5 1 Residuals vs Leverage 1 3 9 051015 Leverage hii Cook'sdistance q q q qqqqqq qqq 0 0.1 0.2 0.3 0.4 0.5 0.6 glm(cbind(ha.ha, ha.ok) ~ ck) 0 2 4681012 Cook's dist vs Leverage hii (1 − hii) 1 3 9 Andrea Kraus Linear Models in Statistics MUNI, Fall 2016 32 / 32