Stano Pekár ■ Gamma and lognormal data arise: • precise measurements of small quantities (concentration), weight, time, etc. • measurements are continuous - negative values and zeros are not allowed - distribution is skewed to the right • logarithmic transformation of measurements will homogenise variance and adjust asymmetry of distribution • moments - 2 parameters (jutr, atr) - while on log scale variance is independent of mean, on original scale variance is a function of expected mean • predicted values: ^^^^^^Q • used to model inverse polynomials moments - 2 parameters (ju, cp) Var(y) = dispersion parameter (cp) = Var(y) / ju: X X Welch test (t. test) to compare two means with heterogenous variances glm (formula, Gamma (link= . . .) ) • links: - inverse (default) - logarithmic (log) - identity (identity) • lm (log (y) ~ . .) Background In euryphagous predators the size of prey is positively related to their body size. There is an upper limit due to e.g. morphological constraints. Design In the laboratory, acceptance of food was studied in 36 species of granivorous beetles. Each carabid beetle was offered seeds of various sizes [g]. Preferred seed size was recorded. For each beetle body size [mm] was recorded too. 1 a 1 = a + p kde seedi ~ Gamaifi^ ml <- glm anova(ml, test=rrF11) Analysis of Deviance Table Model: Gamma, link:: inverse Response: seed Terms added sequentially (first to last) Df Deviance Resid, Df Resid* Dev F Pr(>F) NULL 35 15.3681 I(l/body) 1 8.3662 34 7.0019 42.624 1.787e-07 *** 1 o 1 1 = a + p-+ y- fit body t hody\ seed i - Gama(fip iti2 <- glm(seed - I + i anova (ml, m2, test=TTFM) Analysis of Deviance Table Model 1: Model 2: Resid. _ seed - I(1/body) seed ~ I (1/body) + I (l/bodyA2) Df Resid, Dev Df Deviance F Pr(>F) 34 7,0019 33 7.0016 1 0.0003 0.0013 0.9713 o d O S ° 75 ° Vj o CD CN O I o i o O Residuals vs Fitted B oil o o $> o .... o .... lA o 0 1 5 1 1 1 10 15 20 Predicted values glm[seed~l[l ]/body) i 25 -o o O CN O O d CN o o I 0 Q i-1-1-1-1-1-1-r 0 5 10 15 20 25 30 35 body > summary (ml) Call: glm(formula = seed - l(l/body), family = Gamma) Deviance Residuals: Min IQ Median 3Q Max -0.7530027 -0.4237538 0.0008676 0.2527096 0.7024871 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.7418 0.3162 5.508 3.76e-Q6 *** I(l/body) 11.8626 2.4463 4.849 2.69e-05 *** Signif. codes: 0 0.001 0,01 **' 0.05 1.f 0.1 1 ' 1 (Dispersion parameter for Gamma family taken to be 0.1962785) Null deviance: 15.3681 on 35 degrees of freedom Residual deviance: 7.0019 on 34 degrees of freedom AIC: -49.67 6 REBBBI |(1536S1 - 7.0019) / 15.3681 = 0.54. 1/1.7418 = 0.574, Coefficient of determination: Asymptote: immia seed; = a + f}hodyi + yhodyf + z{, kde z{ - JV(0, a2), nezávisle pro jednotlivé jedince. > m3 <- Im(seed ~ poly(body,2)) > summary(m3) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.30842 0.02318 13.305 B.17e-15 *** poly(body, 2)1 0.55682 0.13908 4.004 0.000333 *** poly(body, 2)2 -0.41591 0.13908 -2.990 0.005235 ** Signif. codes: 0 0.001 0.01 »*' 0.05 0.1 1 ' 1 Residual standard error: 0.1391 on 33 degrees of freedom Multiple R-Squared: 0.4307, Adjusted R-squared: 0.3962 F-statistic: 12.49 on 2 and 33 BE, p-value: 9.173e-05 -o Q l.-i Ö — Ö o IS* Q CM o body CM Ö o Ö a) Od CM Ö o o 0.1 B Residuals vs Fitted o 34fl 0.2 —I-1- 0.3 0.4 27 o 0.5 Fitted values lm(seed - poly (body, 2JJ 2-way ANOYA Background In the gift-giving spider a male brings a prey to a female in order to avoid being cannibalised. Several variables can potentially influence how quickly female will accept the gift. In the laboratory, effect of two variables was studied: satiation of female (satiated, starved) and their mating experience (mated, virgin). Time [s] of the gift presentation was recorded. Experiment was fully factorial, for each combination 10 males and females were used. Hypotheses Is presentation time affected by any of the two variables? If it is what is the difference between factor levels? Variables MATING: mated, virgin FEED: satiated, starved time timeijk = a + MATING■ + FEEDk + MATING\FEEDjk + e()jt, s - JV(0, a2), nezävisle pro jednotliva pozoroväni. > ml <- lm(time ~ mating*feed) > anova(ml) Analysis of Variance Table Response: time Df Sum Sq Mean Sq F value Pr(>F) mating _ 165122 165122 0 .2558 0. 616098 feed _ 5625000 5625000 8 .7142 0. 005528 ** mating r feed _ 40322 40322 C: .0625 0. 904058 Residuals 36 23237845 645496 > anova(m3) Analysis of Variance Table Response: time Df Sum Sq Mean Sq F value Pr(>F) feed 1 5625000 5625000 9.1177 0.004507 ** Residuals 38 23443290 616929 Fitted values Im (time feed) Predicted valjes glm(Hme ~ feed) log(f^) = a + MATING} + FEEDk + MATING:FEEDjk, s ftme^ ~ Gama(fyk, nezavisle pro jednotliva pozorovani, > m4 <- glm(time - mating*feed, Gamma anova(m4, test=rrFTT) 4 1- ■ Df Deviance Resid. Df Resid, Dev F Px(>F) NULL 39 122.018 mating 1 0.564 38 121.454 0. 3888 0. 5368618 feed 1 26.258 37 95.196 18, 1021 0. 0001425 *** mating: feed 1 0.083 36 95.113 C. 0570 0. 8126218 ---_ > anova(m6, test="F"> ■t w m Df Deviance Resid. Df Resid. Dev F Pr(>F) NULL 39 122. 018 feed 1 25.922 38 96. 096 20 .3 6.138e-05 *** Signif, codes: 0 0,001 ,**' 0,01 »*' 0.05 0,1 1 ' 1 > summary (m6) Coefficients Estimate Std. Error t value Pr(>|t|) (Intercept) £.8222 feedstarved -1.6982 0.2527 26.999 < 2e-16 *** 0.3573 -4.752 2.87e-05 *** > m7 <- lm(log(time) - mating*feed} > anova(m7) Analysis of Variance Table Response: log(time) Df Sim Sq Mean Sq F value Pr(>F) mating 1 11.432 11.432 2.7578 0.10547 feed 1 19.262 19.262 4.6468 0.03787 matingrfeed 1 0.019 0.019 0.0045 0.94681 Residuals 36 149.226 4.145 > m8 <- lm(log(time) - feed) > summary (m8) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.4658 0.4598 11.887 2.27e-14 *** feedstarved -1.3379 0.6503 -2.134 0.0393 * Background The nutritional quality of the diet affects growth of organisms in a various ways. To find optimal diet for cockroaches the following experiments was performed. Design Effect of five diet types (control, lipidl, lipid2, proteinl, protein2) was tested on body weight [g] of male and female cockroaches. For each diet 10 females and 7 males were used. Their body weight [g] was recorded before and after the experiment. Hypotheses Is weight influenced by the diet type? If so which diet resulted in largest weight? Is weight on diets similar for males and females? 0.2 0.3 0.4 0.5 0.2 0.3 0.4 0.5 L_I_I_I_I lipidl Iipid2 prof&in1 protein2 O ° O O O ^° A 0 4 A o °o°^ Aoo0 o o OA j, A A 0 A A Q O Q O 0 o A A O A A*0 & A ^ O A A 4 H 5 2 H i H n i i í 0.2 0.3 0.4 0.5 0 2 0 3 0 4 0.5 0.2 0.3 0.4 0.5 sta rt > ml <- lm(log(weight) - diet*sex*start) > anova(ml) Analysis of Variance Table Response: log(weight) Df Sum Sq Mean Sq F value Pr (>F) diet 4 16.1349 4.0337 150.3981 <2e-16 *** sex 1 0 .0261 0.0261 0.9732 0 .3275 start 1 0 .0455 0.0455 1.6956 0 .1975 diet:sex 4 0 .0866 0.0217 0 . 8073 0 . 5250 diet:start 4 0 .0244 0.0061 0.2272 0.9222 sex:start 1 0 .0315 0.0315 1 .1743 0.2825 diet:sex:start 4 0 .1829 0.0457 1 . 7048 0 .1596 Residuals 65 1 .7433 0.0268 > anova(lm(log(weight) ~ sex*diet*start)) Analysis of Variance Table Response: log(weight) sex die ~ start sex:diet sex:start diet:start Df Sum Sq Mean Sq F value Pr(>F) 1 0.0261 4 16.1349 1 0.0455 4 0.0366 1 0.0196 4 0.0363 sex:diet:start 4 0.1329 Residuals 65 1.7433 0.0261 0.9732 0.3275 4.0337 150.3981 <2e-16 *** 0.0455 1.6956 0.1975 0.0217 0.8073 0.5250 0.0196 0.7302 0.3959 0.0091 0.3382 0.8512 0.0457 1.7043 0.1596 0.0268 log(weightiJk) = a + DIET, + SEXk + fistarti + ystartf + DIET:SEXjk + 5jistarti + 8lkstartI + Sjkstarti + w^startf + a; ik$tartl + a> ^start? + £jyjt, (9-13) kde £ľí>it - MO, tf2)> nezávisle pro jednotlivá měření. > m2 <- lm(log(weight) - diet*sex*poly(start,2>) > anova(ml, m2) Analysis of Variance Table Model 1: log (weight) - diet * sex * start Model 2: log (weight) - diet * sex * poly(start, 2) Res.Df RSS Df Sum of Sq F Pr (>F) 1 65 1.7433 2 55 1.4122 10 0.33113 1.2896 0.2592 > anova(m3) Analysis of Variance Table Response: log(weight) Df Sum Sq Mean Sq J ■ value Pr(>F) die 7 4 16.1349 4 .0337 144 .4941 <2e-16 sex _ 0.0261 0 .0261 0 .9350 0.3369 start _ 0.0455 0 .0455 _ .6290 0.2061 diet:sex 4 0.0366 0 .0217 0 .7756 0.5448 diet:start 4 0.0244 0 .0061 0 .2133 0.9274 sex:start _ 0.0315 0 .0315 _ .1282 0.2919 Residuals 69 1.9262 0 .0279 > summary (m8 ) Call: lm(formula = log(weight) ~ diet) Residuals: Min 1Q Median 3Q Max -0.33311 -0.09764 -0.02934 0,11146 0.41505 Coefficients: Estimate Std. Error t value Pr(>1t1) (Intercept) -0.05319 0 .03967 -1.341 0.184 dietlipidl 0,55181 0 .05610 9,836 2.02e-15 * * * dietlipid2 0,52190 0 .05610 9,303 2 .23e-14 * * X dietproteinl 1.17298 0 .05610 20. 908 < 2e-16 * * X dietprotein2 1,12984 0 .05610 20.139 < 2e-16 --- > summary (m9) ■i * ■ Coefficients: Estimate std. Err M" t value Pr (>|t 1 ) (Intercept) - 0 .05319 C , ,03940 -1 ,35 0 .181 diet21ipid 0 .53686 C: , ,04825 11 ,13 <2e- 16 *** diet2prot _ .15141 C: , .04825 23 .86 <2e- 16 *** >í/jalyse% Stano Pekár ■ Poisson data arise when data are: - counts/frequencies of individuals, species, cells - events of behaviour, etc. - always positive integers - counts are often low (including 0) • we count how many times an event occurred but we do not know how often it did not occur (we do not know n) moment: ^^^^^^^^^^U IF die • x2 test (chisq. test) to analyse 2-dimension tables • Fisher exact test (fisher .test) to analyse 2x2 tables • Mantel-Haenszel test (mantelhaen .test) to analyse 3-dimension tables for independence • Log-linear analysis (loglin) to study complex frequency tables • Contingency tables (xtabs) to study effect of factors • Standard regression (lm) can be used after transformation - squareroot transformation - can predict values out of bounds (negative) • Poisson GLM (glm) to study effect of both factorial and continuous predictors •glm(..., family = poisson(link=...)) link functions: - logarithmic (log) - squareroot (sqrt) - identity (identity) • estimated parameters are on logaritmic scale (-oo, +oc) • inverse function to log is exp 1-wiy IIO¥I Background ^ Diversity of organisms changes with the age of the habitat. According to the intermediate disturbance hypothesis, the diversity increases and then decreases with age, thus being highest at medium age. ■ Design KIHBHBai^^™ In 15 apple orchards diversity of arachnids was studied on trees, The orchards were of variable age, classified into 3 classes: 0-9, 10-19 and 20-30 years old. Each class was represented by 5 orchards. > ml <- glm (divers orchard, f amily=poisson) > anova (ml f test="ChiM) Analysis of Deviance Table Model: poisson, link: log Response: divers Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 14 38.964 orchard 2 26.246 12 12.718 1.999e-06 Coefficients : Estimate Std. Error z value Pr(>|z|) (Intercept) 2.2192 0.1474 15.051 < 2e-16 orchardolder 0.3442 0.1763 4.788 1.63e-06 orchardoldest 0.3457 0.1927 1.794 0.0727 > contrasts(orchard) <- "contr.helmertIT > m2 <- glni (divers orchard, family=poisson) > summary (rn2) + + + Coefficients : Estimate Std. Error z value Pr(>|z|) (Intercept) 2.61585 0.07186 36.404 < 2e-16 *** orchardl 0.42209 0.08815 4.783 1.68e-06 +** orchard2 -0.02545 0.05072 -0.502 0.616 > orchardl <- ordered(orchard) > m3 <- glm(divers ^ orchardl, family=poisson) > summary (m3) Coefficients : Estimate Std. Error z value Pr(>1z 1 ) (Intercept) 2.61585 0 .07186 36.404 < 2e-16 *** orchardl.L 0.24448 0 .13624 1.794 0.0727 . orchardl.Q -0.54 813 0 .11144 -4.919 8.71e-07 *** > m.3 <- glm (divers ~ orchard - 1, poisson) > summary (m3) Coefficients : orchardyoung orchardolder orchardoldest Estimate Std. Error z value Pr(>|z|) 2.21920 3.06339 2.56495 0 .14744 0 .09667 0 .12403 15.05 31. 69 20. 68 <2e-16 *** <2e-16 *** <2e-16 *** > exp {confint (m3) ) Waiting for profiling to be done... 2.5% 97.5% orchardyoung 6.790864 12.12010 orchardolder 17.597063 25.71441 orchardoldest 10.090235 16.42096 • arises when dispersion parameter cp K^^^BHwarHBI i.e. the residual deviance is not similar to the residual degrees of freedom _ - overdispersion: variance is larger —» cp > 1 - underdispersion: variance is smaller cp<\ • causes: - if the distribution is aggregated - if counts are not independent - lack of important variables, etc. - suspicious data • solution: use quasipoisson family • this will influence SE of parameter estimates - if cp > 1 then SE will be larger -if cp < 1 then SE will be smaller • without correction for overdispersion there would be too many false positive results (in favour of HA) • when using quasipoisson y}- and z- tests have to change to F- and t- tests Background Abundance of carabid beetles in cereals depends on abiotic and biotic factors. If we understand how abiotic factors influence abundance of carabids then we can adapt certain management practices to increase the abundance when needed. 1 Design In the field, on 21 wheat plots the abundance of carabid beetles was studied by means of pitfall traps. At every site average day temperature [°C] and average sun activity [W/m2] was recorded. Hypotheses Was abundance of beetles affected by any of the two variables? If so what is the model of the relationship? Variables temp sun abun log(/iŕ) = a + ßltempi + j82swni + ötempisuni, kde öfrwri,. ~ Poi(fit), nezávisle pro jednotlivé porosty. > ml <- glm(abun ~ temp*sunf family=poisson) > summary(ml) Coefficients (Intercept) temp sun temp:sun Estimate Std. Error z value Pr(>|z|) 4.195e+G0 4.745e-01 8.840 < 2e-16 *** -5.386e-G2 2.258e-Q2 -2.385 0.0171 * ■1.151e-03 2.364e-04 -4.869 1.12e-06 *** 6.257e-05 1.006e-05 6.221 4.95e-10 *** Signif. codes: 0 ^ -k -k -k f 0.001 \ -kk r 0.01 0.05 0.1 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 317.229 on 20 degrees of freedom Residual deviance: 98.657 on 17 degrees of freedom > m2 <- update(mlf family=quasipoisson) > anova(m2 f test="F") Df Deviance Resid. NULL temp sun temp:sun 1 1 1 153.10 27 .90 37 .57 Df Resid. Dev F Pr(>F) 20 317.23 19 164.12 24.5836 0.0001196 *** 18 136.23 4.4796 0.0493541 * 17 98.66 6.0324 0.0251002 * > m3 <- glm anova(m3, test=rrChi") 4 i m Df Deviance Resid. Df Resid. Dev P(>I Chi|) NULL 19 75 .292 temp 1 40.291 18 35 .001 2.188e-10 sun 1 12 .165 17 22 .836 4.870e-04 temp:sun 1 0.117 16 22 .719 0.732 > m.4 <- update (m3, ~.-temp: sun) > anova(m4, test=rľChi") ■■ ■■ ■ Df Deviance Resid. Df Resid. Dev P(>l Chi | ) NULL 19 75 .292 temp 1 4 0 .291 35 .001 2.188e-10 sun 1 _1 .165 17 22 .836 4.870e-04 > library(car) > Anova(m4) Analysis of Deviance Table (Type II tests) Response: abun LR Chisq Df Pr(>Chisq) temp 12.567 1 0.0003926 * * * sun 12.165 1 0.0004870 * * * > summary (m4) 4 » ■ Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.283e+Q0 2.088e-01 10.933 < 2e-16 *** temp 3.781e-Q2 1.070e-02 3.534 0.000409 *** sun 1.954e-04 5.655e-05 3.455 0.000550 *** Signif. codes: 0 0.001 0.01 0.05 0.1 1 ' 1 {Dispersion parameter for poisson family taken to be 1) Null deviance: 75.2 92 on 19 degrees of freedom Residual deviance: 22.836 on 17 degrees of freedom AIC: 135.76 Some spiders are specialised in their diet. Specialisation can involve evolution of physiological and behavioural traits, such as prey-specific venom and number of attacks. Design In the lab, the number of attacks of an ant-eating spider on ants of two subfamilies was observed. For each subfamily 20 species of ants were used. Each ant species was tested once. For each ant body size was recorded as it may influence its susceptibility to venom. Hypotheses Was the number of attack related to ant size? Was the number of attacks similar for ants of both subfamilies? What is the shape of the relationship? Variables ANT: famA, famB size number > ml <- glm(number * size*ant, family=poisson) > anova(mlf test=MChiM) NULL s ize ant s ize:ant Df Deviance Resid. 1 93.395 1 75,555 1 25.804 Df Resid. Dev P(>|Chi|) 39 215.561 38 122.167 4.284e-22 37 46.612 3,554e-18 36 20.808 3.779e-07 > summary (ml) m m m Coefficients : Estimate £ (Intercept) 0.89794 size -0.02154 antfamB -2.66924 size:antfamB 0.70407 Signif. codes: 0 d. Error z value Pr (>|z| ) 0.64904 1.383 0.166512 0.12456 -0.173 0.862735 0.80637 -3.310 0.000932 *** 0.14579 4.829 1.37e-06 *** 0.001 0.01 0.05 0. {Dispersion parameter for poisson family taken to be 1) Null deviance: 215.561 on 39 degrees of freedom Residual deviance: 20.803 on 36 degrees of freedom AIC: 153.15 A Residuals vs Fitted B o CM - ol3 34 o 0 □ o °\ 0 V ° O 0 a o "O" o 0 0 0 o 32° 0.5 .0 i 1.5 2.0 2.5 3.0 Predicted values glmfnumber ~ size * ant] o o d o d d o si; > m2 <- glm(number ^ poly(sizef2)*antr poisson) > anova(mlf m2 , test=MChi") Analysis of Deviance Table Model 1: number - size * ant Model 2: number - poly(size, 2) * ant Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 36 20.3084 2 34 20.7673 2 0.0411 0.9797 Jnumber = a + fisizei + ysxze] + e kde e. ~ N(0, cr2), nezávisle pro jednotlivá pozorování. > m3 <- lm (sqrt (number) -* size + I(sizeA2), subset= (ant==" f amB,T) ) > anova(m3) Analysis of Variance Table Response: sqrt (number) Df Sum Sq Mean Sq F value Pr(>F) size 1 28.3476 28.3476 161.0631 4.253e-10 *** I(size^2) 1 1.1930 1.1930 6.7783 0.01855 * Residuals 17 2.9921 0.1760 o =) ač A Residuals vs Fitted B o CM 1 E o Fitted values lm(sqrt(number)~size + l(sizeA2)) size Background Some predators use conditional strategies to catch prey. The use of strategy often depends on the characteristics of prey. Design In the field, it was observed which of three strategies spiders used to capture prey. For each trial, size (two size classes) and movement (slow or fast) of prey was recorded. Altogether 88 trials were observed. Hypotheses Is use of strategy influenced by prey size and its movement? If so which prey is captured by strategy A, B and C? Variables PREY: fast, slow SIZE: large, small STRATEGY: stratA, stratB, stratC freq A B stratA stratB stratC stratA stratB stratC strategy strategy > ml <- glm(freq ~ strategy*size*preyF family=poisson) > summary(ml) Call: glm(formula = freq ~ strategy * size * prey, family = poisson) Deviance Residuals: [1] 000000000000 Coefficients : Estimate Sto L. Error z value Pr(>|z|) {Intercept) : .485e+00 2. 887e-01 8 .608 <2e-16 strategystratB -4 .Q55e-01 4. 564e-01 -0.888 0. 3744 strategystratC -1 .792e+00 7. 638e-01 -2.346 0. 0190 sizesmall 5 .596e-01 3. 619e-01 1 .546 0. 1220 preyslow -1 .823e-01 4. 282e-01 -0.426 0. 6702 strategystratB: sizesmall -2 .594e+01 6. 965e+G4 -0 .000372 0. 9997 strategystratC: sizesmall -1 .253e+G0 1. 277e+00 -0.981 0. 3266 strategystratB: preyslow 4 .055e-01 6. 390e-01 0 .635 0. 5257 strategystratC: preyslow -5 .108e-01 1. 297e+00 -0.394 0. 6938 sizesmall:preyslow B .224e-02 5. 325e-01 0 .154 0. 8773 strategystratB: sizesmall :preyslow : .433e+01 6. 965e+G4 0 .000350 0. 9997 strategystratC: sizesmall :preyslow -2 .269e+01 6. 965e+04 -0 .000326 0. 9997 > anova(ml, test="Chi") + + + Df Deviance NULL strategy 2 64. .205 s ize 1 0. .045 prey 1 0. .000 strategy:size 2 15, . 939 strategy:prey 2 2. . 962 size:prey 1 0. .507 strategy:size:prey 2 4, .307 Resid. Df Resid. Dev P(>|Chi|) 11 87 .966 23.761 1 .143e-14 8 23.715 0 .331 7 23.715 1.000 5 7 .776 3 .458e-04 3 4 .814 0 .227 2 4 .307 0 .476 0 3.033e-10 0.116 > m2 <- update(mlr ~.-strategy: size rprey) > anova(m2, test=M Chi") ■ ■ ■ Df Deviance Resid . Df Resid. Dev P (> IChi|) NULL 11 87. 966 strategy : 64 .205 9 23. 761 1.143e-14 size 1 0 .045 6 23. 715 0.831 prey 1 0 .000 7 23. 715 1.000 strategy:size : 15 .939 5 7. 776 3.458e-04 strategy:prey : 2 .962 3 4. 814 0.227 size:prey 1 0 .507 2 4. 307 0.476 > m.3 <- update (m2 , ~*. - strategy: prey) > anova (m.3 r test=" ChiM) + + + Df Deviance Resid . Df Resid. Dev P(>|Chi|) NULL 11 87. 966 strategy 2 64 .205 9 23. 761 1.143e-14 size 1 0 .045 8 23. 715 0.831 prey 1 0 .000 7 23. 715 1.000 strategy:size 2 15.939 5 7. 776 3.458e-04 size:prey 1 0 .045 4 7. 731 0.831 > summary (m3) Call: glm(formula = freq - strategy + size + prey + strategy:size + size:prey, family = poisson) Deviance Residuals: : : 3 4 5 t 7 -0.3233 1.2076 - : .0111 -0.2297 0 .3990 -0 . 4079 0.3227 S 9 10 11 12 -1.9777 0.6395 G .2194 -0.4077 0 .3585 Coefficients : Estimate Std. Error z value Pr(>|z|) (Intercept) 2 .42083 0 .26010 9 .307 < 2e-16 *** strategystratB -0 .20067 0 .31782 -0 . 631 G .527782 strategystrate -1 .99243 0 .61546 -3 .237 G .001207 ** s izesmall 0 .55237 0 .34042 1 . 623 G .104669 preyslow -0 .04652 0 .30508 -0 .152 G .878805 strategystratB: sizesmall -2 .10191 0 .61318 -3 .428 G .000608 *** strategystratC: sizesmall -1 .69645 1 .18481 -1 .432 G .152193 sizesmallipreyslow 0 .09097 0 .42662 0 .213 G .831142 > attacks <- tapply (predict (m3 , type=MresponseTT) , list (size, strategy) , mean) > attacks stratA stratB stratC large 11 9 1.5 small 20 2 0.5 30- 25- in ij 20H o Ž 15H o 10- 5- 0- I a roe stratA stratB srnal stratA stratB + stratC Strategy stratC _I_ ■ NB is a parametric alternative to Poisson model with overdispersion • distribution of y is strongly asymmetric with many zeros • NB has two parameters, ju and 6 • moments: - 6 is aggregation parameter (0?oo) - if 6 > 1 .. random distribution, 6 < 1 .. aggregated distribution - 6 can be estimated from glm. nb (f ormula) from MASS library • links: log (default) sqrt identity • begin with Poisson model, if overdispersion is large switch to glm. nb Grain beetles are serious pests in grain stores. They may occur not only in the grain but also in crevices of corridors. It is essential to know where they occur before control methods are applied. Design Density of grain beetles was surveyed in a grain store by means of sticky traps. Traps were installed in two places: 25 traps in the corridors and 25 traps in the grain. After few days number of beetles was recorded. log(fy) = a + PLACEj, kde density. ~ Poi(ß), nezávisle pro jednotlivé pasti. > ml <- glm(density ~ placef family=quasipoisson) > anova(mlf test="F") m- m- -m Df Deviance Resid. Df Resid. Dev F Pr(>F) NULL 4 9 80 2 6.5 place 1 1350.1 48 6676.4 6.0434 0.01762 * > summary (nil) ■■■■■■ Coefficients : Estimate Std. Error t value Pr(>|t|) (Intercept) 4.5161 0.3125 14.45 <2e-16 *** placegrain -1.6280 0.7715 -2.11 0.0401 * Signif. codes: 0 0.001 0.01 **' 0.05 0.1 1 ' 1 (Dispersion parameter for quasipoisson family taken to be 223.3983) log(ty) = a + PLACE., kde density j ~ NB(pip 0), nezávisle pro jednotlivé pasti. > tapply(density, place, var)/tapply(densityf placef mean) floor grain 386.53096 60.20546 > tapply(density, place, function(x) mean(x)A2/(var(x)-mean(x))) floor grain 0.2372524 0.3033504 > library(MASS) > m2 <- glm.nb(density ~ place) > anova(m2) Analysis of Deviance Table Model: Negative Binomial(0.3318), link: log Response: density Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 49 70.174 place 1 9.877 48 60.297 0.002 Warning message: In anova.negbin(m2) : tests made without re-estimating 'theta' > summary (m.2) Call: glm.nb(formula = density - place, init.theta = 0.33184400 6124825, link = log) Coefficients : Estimate Std. Error z value Pr(>|z|) (Intercept) 4.5161 0.3 47 8 12.98 4 < 2e-16 *** placegrain -1.6280 0.4937 -3.297 0.000976 *** Signif. codes: 0 0.001 0.01 0.05 *0.1 1 ' 1 (Dispersion parameter for Negative Binomial(0.3318) family taken to be 1) Null deviance: 70.174 on 4 9 degrees of freedom Residual deviance: 60.297 on 48 degrees of freedom AIC: 430.95 Number of Fisher Scoring iterations: 1 Theta: 0.3318 Std. Err.: 0.0610 2 x log-likelihood: -424.9480 > a <- split(x=density, f=place) > m.3 <- glm, nb (floor ~ 1) > summary (m3) m- m- -m Null deviance: 31.307 Residual deviance: 31.307 AIC: 245.47 on 24 degrees of freedom on 24 degrees of freedom Number of Fisher Scoring iterations: 1 > m4 <- glm«nb(grain ~ 1) > summary {m.4) ■ m m Null deviance: 2 9.197 Residual deviance: 2 9.197 AIC: 186.78 on 24 degrees of freedom on 24 degrees of freedom Number of Fisher Scoring iterations: 1 Theta: 0.39 9 Std. Err.: 0.111 2 x log-likelihood: -182.780 > m.5 <- glm. nb (density ~ place-1) > exp {confint 1) • Binomial GLM (glm) to study effect of both factorial and continuous predictors • glm(..., family = binomial(link=...)) link functions: - logit (logit) - probit (probit) - complementary logit (cloglog) Data format: • Binomial distribution ... individuals within a group are homogenous - two vectors (y? n-y) or (y? n) of integers • Bernoulli (binary) distribution ... individuals within a group are heterogenous, each characterised by a continuous character -n=\ - single vector of O's or 1 's Background Some weed seeds may germinate following water priming (by rain) more than others thus attaining likely competitive advantage. m Design ^^^^^m^^^^m^H The effect of water priming on the germination of weed seeds of 4 genera was studied in the laboratory. Each of 5 days 400 seeds of each genus were sown (200 seeds on control and 200 seeds on wet soil). Altogether 2000 seeds per genus were sown. Germination was recorded thereafter. Based on assumption of similar conditions during 5 days, data from 5 days were pooled. Hypotheses • Does water priming promote germination? • If it does was the effect similar for all four • Which species germinated most and least? ienera? Variables: TREATMENT: control, water GENUS: genA, genB, genC, genD germ > y <- abind(germ, n-germ) > ml <- glm(y ~ genus*treatmentf family=binomial) > anova(mlf test="ChiM) Analysis of Deviance Table Model: binomial, link: logit Response: y Terms added sequentially (first to last) Df NULL genus 3 treatment 1 genus:treatment 3 Deviance Resid. Df 7 638.74 4 30.23 3 0 .37 0 esid. Dev P(>|Chi|) 669.34 30.60 4.026e-138 0.37 3.840e-08 1.212e-13 0.95 > m2 <- update(mlf ~.-genus:pesticide) > summary(m2) Coefficients (Intercept) genusgenB genusgenC genusgenD Estimate Std. Error z value Pr(>|z|) 0 .56138 -1.59933 -1.15462 -1.06030 0.05256 10.681 treatmentwater 0.25859 0 0 0 0 06860 06614 06583 04710 -23.313 -17.457 -16.106 5.491 <2e-16 *** <2e-16 -k -k -k <2e-16 *** <2e-16 *** 4e-08 *** > 1/ (1+ exp(-0.56138)) [1] 0.6367718 > 1/(1 + exp(-0.56138+1.59933)) [1] 0.2615457 > genus1 <- genus > levels(genusl) [1] "genA" "genB" "genC" "genD" > levels(genusl)[3:4] <- "genCD" > m3 <- glm(y ~ genusl + treatment, binomial) > anova(m2, m3, test="Chi,T) Analysis of Deviance Table Model 1: y - genus + treatment Model 2: y - genusl + treatment Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 3 0.37316 2 4 2.49523 -1 -2.12207 0.14519 > summary(m3) m- m- -m Coefficients : Estimate Std. Error z value Pr(>|z|) (Intercept) genuslgenB genuslgenCD -1.59933 -1.10723 0.56141 0.05749 0.06860 0.05256 23.31 19.26 10.68 < 2e-16 *** < 2e-16 *** < 2e-16 *** treatmentwater 0 .25852 0.04709 5.49 4.02e-08 *** > genus2 <- genus1 > levels(genus2) [1] "genA" "genB" "genCD" > levels(genus2) [2 : 3] <- "genBCD" > m4 <- glm(y ~ genus2 + treatmentf binomial) > anova (m.3 f m.4 f test="Chi,T) Analysis of Deviance Table Model 1: Model 2: Resid. 1 2 y - genus1 + treatment y - genus2 + treatment Df Resid. Dev Df Deviance P(>|Chi|) 4 2.495 5 73.684 -1 -71.189 3.246e-17 • statistical and biological effects are not identical • statistical effects are affected by precision of measurements, number of measurements, type of test • Cohen's coefficient: • h < 0.2 ... weak effect • h > 0.8 ... strong effect > both <- paste(pesticide,genus! > m4 <- glm(y ~ factor(both) - 1, binomial) > 1/(1+exp (-confint (m4 ) ) ) (Waiting for profiling to be done... 2.5% 97.5% factor (both)control genA 0.6048442 0.6644666 factor (both)control genB 0.2315221 0 .2857134 factor(both)control genCD 0.3485230 0.3908104 factor(both)water genA 0.6670153 0.7239840 factor(both)water genB 0.2896266 0.3473026 factor(both)water genCD 0.4044330 0.4477560 • arises when dispersion parameter cp P^EffWffpfWBI - overdispersion: variance is larger —» cp > 1 - underdispersion: variance is smaller —» cp < 1 • causes: - if the model is mispecified - lacks important explanatory variables - relative frequency is not constant within a group • solution: use quasibinomial family in which variance is estimated as ^^^^^^^^^^ instead of m this will influence SE of parameter estimates - if (p > 1 then SE will be larger - if cp < 1 then SE will be smaller changes P values • when using quasibinomial %2- and z- tests have to change to F- and t- tests Rtartssioii Background ^ Production of eggsac is influenced by a number of variables, such as body size, i.e. amount of consumed food. For an experimental study we need to be able to predict probability of production at a range of body sizes. 9^^^BI^^^H^^^H Design ^^^^^^■■■^^^^B In the laboratory, production of eggsacs was studied in a spider with a variable body size [mm]. As the body size was measured with the precision of 0.5 mm, all 160 individuals were classified into size classes each containing 15 to 30 specimens. Females that produced eggsac were recorded. Hypotheses • Is eggsac production related to the body size? • If it is what is the shape of the relationship? • What is the model that can be used to predict e; for spider sizes of 3-12 mm? Variables: ;sac production > tr <- asin(sqrt(p)) > ml <- lm(tr ~ body + I(bodyA2), weights=n) > summary(ml) Coefficients : Estimate Std. Error t (Intercept) -2.34592 0.59329 body 1.30161 0.24776 I(bodyA2) -0.11121 0.02433 > m2 <- update(ml, -.-I(bodyA2)) > summary(m2) ■ m m Coefficients : Estimate Std. Error t value Pr(>|t|) (Intercept) 0.28836 0.31429 0.918 0.4010 body 0.17649 0.06279 2.811 0.0375 * value Pr(>|t|) -3.954 0.01676 * 5.254 0.00628 ** -4.571 0.01025 * body body > y <- cbind(eggs, n-eggs) > m3 <- glm(y ~ body + I(bodyA2), family=binomial) > summary(m3) m -m m Coefficients : Estimate Std. Error z value Pr(>|z|) (Intercept) -13.7857 3.8482 -3.582 0.000340 *** body 5.7218 1.6771 3.412 0.000645 *** I(bodyA2) -0.4825 0.1695 -2.846 0.004427 ** Signif. codes: 0 0.001 ***' 0.01 %*' 0.05 *.r 0.1 1 ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 44.2136 on 6 degrees of freedom Residual deviance: 3.3357 on 4 degrees of freedom > summary (m.4) ■ ■ ■ Coefficients : Estimate Std. Error z value Pr(> z|) (Intercept) -3.9270 1.1038 -3.558 0.000374 *** body 1.2079 0.2756 4.383 1.17e-05 *** Signif. codes: 0 *# o .001 ***' 0.01 **' 0.05 y.r 0.1 1 ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 44 .214 on 6 degrees of freedom Residual deviance: 11 .072 on 5 degrees of freedom > m.5 <- update (m.4 f f amily=quasibinomial) > summary (m5) (Dispersion parameter for quasibinomial family taken to be 3.332466) > anova(m5, test="F") + * * Df Deviance Resid. Df Resid. Dev F Pr(>F) NULL 6 44 .214 body 1 33.141 5 11 .072 9. 945 0.02528 * body Background i>iti mm. Synthetic insecticides often have a species-specific efficiency. The recommended doses or concentrations then have to adjusted. Design In the laboratory an effect of an insecticide on the mortality of two aphid species was studied. The insecticide was applied at 6 concentrations [ppm]. Each concentration was tested on 30 individuals of both aphid species. Hypotheses •Is mortality affected by the concentration? • Was the efficiency similar for both species? • What is the LC50 (i.e. 50% lethal concentration) for both species? Variables: SPECIES: A, B cone dead log / 7t ^ \ = a + SPECIES . + fJlogiconCi) + ôjlogiconc^)^ kde decide ~ Binin^ nezávisle pro jednotlivá pozorování. > y <- cbind(dead, n-dead) > ml <- glm (y ** log (cone) *species F binomial) > anava(ml) ■■■■■■ Df Deviance Resid . Df Resid. Dev P(>1 Chi 1) NULL 11 185 .807 log(conc) 1 110.170 10 75 . 638 8.996e-26 species 1 62.087 9 13 .551 3.286e-15 log(cone):species 1 1.343 8 12 .207 0.246 > m2 <- update(mlf ^. -log(cone) :species) > anova(m2) m- m- -m Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 11 185.807 log(conc) 1 110.170 10 75.638 8 .996e-26 species 1 62.087 9 13.551 3.286e-15 > summary (m2) Coefficients : Estimate Std. Error z value Pr(>|z|) (Intercept) 1.3825 0.2201 6.280 3.39e-10 *** log(conc) 1.2328 0.1348 9. 146 < 2e-16 *** speciesB -2.2117 0.3180 -6.955 3.52e-12 *** Signif. codes: 0 0.001 0.01 %*' 0.05 0 (Dispersion parameter for binomial family taken to be Null deviance: 185.807 on 11 degrees of freedom Residual deviance: 13.551 on 9 degrees of freedom > m.3 <- glm(y ~ species + log (cone) - 1, binomial) > summary (m3) Coefficients : Estimate Std. Error z value Pr(>|z|) speciesA speciesB log(cone) 1.3325 -0.8293 1.2328 0 .2201 0 .2020 0.1343 > library(MASS) > dose.p(m3, cf=c(l,3), p=0.5) Dose SE p = 0.5: -1.121418 0.1627097 > dose.p(m3, cf=c(2,3), p=0.5) Dose SE p = 0.5: 0.6726313 0.159251 6.230 3.39e-10 *** 4.106 4.02e-05 *** 9.146 < 2e-16 *** exp(-1.121) - 0326 exp(0,673) = 1.96. Granivorous ants collect various seeds and bring them into nest. Sympatrically occurring species may show trophic niche partitionin related to the size of collected seeds. Design Seed preference of two ant species was studied in the laboratory. Each of 25 ants of both species was offered seeds of variable size expressed as its weight [mg]. Response of ants was classified as "yes" or "no" if it took or refused to take a seed, respectively. Hypotheses • Is acceptance related to the seed size? • Did both species have similar preference for seed sizes? • If not what is the threshold size of seeds for both species? (The threshold size is defined as a size that is accepted with higher than 90% probability) Variables: SPECIES: specA, specB seed take 0.8- O.ó- 0.4- 0.2- 0.0- O ODWQ DO O O QEO CD i-1-1-1-1-1- i-1-1-1-1-r 0.0 0.5 1.0 1.5 2.0 2.5 seed > ml <- glm(take - seed*speciesf family=binomial) > summary(ml) ■ m m Coefficients : Estimate Std. Error z value Pr(>1z 1 ) (Intercept) 4.012 1. 646 2 .437 0.01480 -J: seed -8.346 3.315 -2.517 0.01182 ■J; speciesspecB -10.957 3. 697 -2.964 0.00304 * * seed:speciesspecB 19.147 6.141 3.118 0.00182 * * Signif. codes: 0 \-k**r 0.001 A **' 0.01 1 * ' 0 . 05 y.r 0 .1 {Dispersion parameter for binomial family taken to be 1) Null deviance: 68.593 on 4 9 degrees of freedom Residual deviance: 24.726 on 4 6 degrees of freedom > anova(mlf test="Chi") * + * Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 49 68 .593 seed 1 0.054 48 68 .539 0.817 species 1 0.325 47 68 .214 0 .568 seed:species 1 43.488 46 24 .726 4.267e-ll > m2 <- glm(take ** log (seed) *species, binomial) > AIC (ml, m2) df AIC ml 4 32.72631 m2 4 32.23823 > m3 <- glm(talce ** seed*species, binomial (link=cloglog) ) > AIC(m3) [1] 31.63241 • several for GLM models • McFaden's coefficient - based on likelihood of models • ranges from 0 to 1 binomial) > 1-logLik(ml)/logLik(m4) 'logLik' 0.6395213 (df=4) Residuals vs Fitted Normal Q-Q Predicted values Obs. number > (log<0.9/0.1)-4.012)/-8.346 [1] 0.2174425 > (log<0.9/0.1)-4.012+10.957)/{-8.346+19.147) [1] 0.3464239