GOODNESS-OF-FIT TESTS WITH NUISANCE REGRESSION AND SCALE Jana Jurečková 1 Introduction Robust estimators are constructed in such a way that they are insensitive to small deviations from the assumed distribution of the model errors; for instance, the Huber estimator of the location or regression parameters is minimaximally robust over a family of contaminated normal distributions. Before using a robust estimator, likely operating well in a neighborhood of some distribution, we can try to verify a hypothesis on the shape of this distribution; otherwise, we can start with a suitable goodness-of-fit test. The %2 and the Kolmogorov-Smirnov tests are probably most well-known. Many various other tests can be found in the literature (e.g., Huber-Carol et al. (2002)). These tests work well in the simplest situation, when our observations Y\,..., Yn are independent and identically distributed with a distribution function F, and we want to verify the hypothesis H0 : F = F0, where F0 is a fully specified distribution function. However, the hypothetical distribution function F0 is often specified only up to several unknown parameters, e.g., up to the location, scale or regression parameters. This is a typical situation: our observations can follow a linear regression model, whose parameters we want to estimate by a suitable robust estimator, and an approximate knowledge of the shape of the distribution of errors would lead to a good choice of the score function. This situation is more realistic, but the standard goodness-of-fit tests then lose their simplicity. Taking these facts into account, we want to offer some goodness-of-fit tests on the shape of the distribution in the presence of nuisance regression and scale parameters. 2 Tests of normality of the Shapiro-Wilk type with nuisance regression and scale parameters If the distribution seems to have a symmetric unimodal density, then the first natural idea is to test for its normality. A highly intuitive goodness-of-fit test of normality with nuisance location and scale parameters was proposed by Shapiro and Wilk (1965). Their test has received considerable attention in the literature; its asymptotic null distribution was later studied by de Wet and Venter (1973), and recently by Sen (2002). Because we often also have a nuisance regression, we shall describe an extension of the Shapiro-Wilk test of normality to the situation with nuisance regression and scale parameters, 1 2 constructed by Sen, Jurečková and Picek (2003). Their test is based on the pair of two estimators of the standard deviation of errors in the linear regression model, namely on the maximum likelihood estimator and on an L-estimator. Similar to the Shapiro-Wilk test, the asymptotic equivalence of these estimators is a characteristic property of the normal distribution of the errors, i.e., it is true only under the normality, and thus provides a test. Let Yi,..., Yn be independent observations following the linear model Yi = 9 + x'iß + aei, i = l,...,n (2.1) where x» E Mp, i = 1,..., n are given regressors, not all equal, 9 E M1, ß E Rp and a > 0 are unknown intercept, regression and scale parameters, and the errors e j are independent and identically distributed according to a continuous distribution function F with location 0 and scale parameter 1. We want to test the hypothesis H0 : F = $, against Hi : F = Fi ^ $ (2.2) where $ is the standard normal distribution function, F\ is a general nonnormal distribution function, and 9, ß, and a are treated as nuisance parameters. For the special location-scale model (i.e., when ß = 0), Shapiro and Wilk (1965) proposed a goodness-of-fit test based on two estimators of er: Ln, the BLUE (best linear estimator) under H0, and an, the maximum likelihood estimator (MLE) under H0. Suppose that Y\,... ,Yn are i.i.d. observations with the distribution Af(ß,a2). Then the MLE of a is an, where K = n-^(Yt-Y^ (2.3) i=\ The best linear unbiased estimate (BLUE) Ln of a has the form n J-'n / J Q"ni* ní V J i=\ where a,= (ai,...,ara) = (M;V-1Mra)-1(M;V;1), a^ = 0 (2-5) and where Mra = M denotes the vector of expected values of order statistics and Vn = V is their variance matrix. Shapiro and Wilk (1965) modified the BLUE of a to Ln0 = Yh=i am,oYni where (a„i)0, • • •, ann>0)' = a„o is such that , _ M'V"1 ara° ~ (M'V-iV-iM)172 then a^lra = 0 and a^a^o = 1- Ln0 is asymptotically equivalent to T» = -E$"1í^rrVnť (2.7) 3 (see, e.g., Serfling (1980)). Let us write the Shapiro-Wilk criterion in the form L Wn = n\l--^\ (2.8) Two scale estimators Ln0 and an are asymptotically equivalent if and only if F = $, i.e., if the hypothesis of normality is true, while under nonnormal alternative F\ with the finite second moment, the sequence \Jn í 1 —-fr j has a nondegenerate asymptotic (normal) distribution. It means that the test criterion is consistent with respect to the non-normal alternatives. We propose the goodness-of-fit test of the hypothesis (2.2) of the normality, based on the observations Yi,..., Yn, following the linear regression model (2.1) with unknown 9, ß and a. The test criterion is Wn = nll-^\ (2.9) where 1 '" - "£(y* - F™ - $*# n n í=i n Ĺn = Y,anŕ™ (2.10) í=l are the residual variance and the linear estimator of a with anito, i = 1,..., n defined in (2.6) and the rn:i are the order statistics corresponding to the residuals ~7 rni = Yi-Yn-ßxi, i=l,...,n (2.11) We assume that the n x p matrix Xra = [xi,..., xra]' satisfies X'lra = 0, Rank(Xra) = p < n - 1 and (2.12) max hn>ii = ö(n~l) as n —► oo (the balanced design) Kiij = Xj(X'X) 1Xj, i,j = l,...,n. The MLE of parameters 9, ß, a under normal $ have the form Un 1 n n ±n 1 n V \ 6«,, G« IT' *-n n ßn = (XX)-1^ = ß + ra (2.14) 4 where ra is the asymptotic critical value of the Shapiro-Wilk test of normality with nuisance location and scale. The coefficients anito, i = 1,..., n and the critical values of the original Shapiro-Wilk test for n < 50 are tabulated in Shapiro and Wilk (1965). The critical values for n > 50 we have calculated by a Monte Carlo procedure. 3 Goodness-of-fit tests for general distribution with nuisance regression and scale Consider again the linear model (2.1), but this time we have another distribution of errors e\,...,en, in mind, and we want to test the hypothesis H0 : F(e) = F0(e/a), for a specified distribution function F0. This is not necessarily normal, but for simplicity we assume that F0 G J7, the class of distribution functions is symmetric around 0 and possessing a positive density, finite variance and finite Fisher's information. Similarly as in the test of normality, our proposed test of H0 is based on the ratio of two scale statistics: the first one is based on regression rank scores äni(a), i = 1,... ,n, 0 < a < 1, introduced in Section 4.6, and the second one is an extension of the interquartile range to the regression quantiles. If F0 E J7, we may choose the score generating function ipo(u) = F0~ľ(u) or ~fó(Fo1(u))/fo(F~~1(u)), 0 < u < 1 (if F0 is strongly unimodal with finite Fisher information), because F(e) = F0(e/a), F_1(m) = uF~~l{u) under H0. Our proposed test is based on the statistic Tn* = n*{log |g-log£(Fo)} (3-15) where n Sn0 = Sn0(Y) = n-'Ys^bm = n-'Y'K (3.16) and bn = (bn\,..., bnn) are the regression scores generated by ipo in the following way: l>ni = - / , ! e2dFo(e) - \ß2 7ľl = WŘW) (3-22) ß2= e2dF0(e), ßA= l e4dF0(e) Jr Jr qn is the first diagonal element of the matrix D_1 where D = lim^^oo Dn = lim™-^ n_1X'X. Then r^ does not depend on a and is positive unless |^y = £(F0) (what happens with probability 0). We are almost ready to formulate the critical region of the test; however, we should think over alternative distributions F against which we wish to have the test consistent. We shall introduce the following one- and two-sided alternatives of H0. For a pair (F0, F) of distributions, let A(F0, F) = So(F)^l^ - So(F0) (3.23) and set the partial ordering F y F0 or F -< F0 accordingly A(F0,F) is > or < 0 This partial ordering is linked to Hájek's (1969) interpretation of F having heavier or lighter tails than F0. Consider the following alternatives to H0: H^ : F y Fo, Hr : F -< F0, Uf : H^ U H^ Then • we reject H0 in favor of H^ on the asymptotic significance level a if rp* Tbl 6 • we reject H0 in favor of H^ on the asymptotic significance level a if rp* TOI we reject H0 in favor of R"f on the asymptotic significance level a if > Ui rp* n 2 where ua = $ ľ(l — a) and $ is the standard normal distribution function. 4 Numerical illustration 4.1 Comparison of tests for testing normality Let us illustrate the performance of the proposed test on the simulated regression model. Concerning the design matrix, we generate three columns as independent identically distributed random variables with uniform distribution on (—10,10) with the first column ln added; ß = (2, —2, 1,-1)' and consider 25 rows for it. The errors were generated from the following densities: normal Af(0,1) : /(*) = i -Í-= ~/fee 2 normal A/"(0, 4) : /(*) = 1 -^ = —7=e 32 logistic (0,1) : /(*) = e-x - (1+e-z)2 logistic (0, 4) : /(*) = e-x/4 (1+e-^/4)2 Laplace (0,1) : /(*) = - le-\x\ Laplace (0,4) : /(*) = - 4p-4|x| Cauchy: /(*) = 1 In order to gain insight into larger sample size behavior for our proposed tests, we also generate the design matrix of 100, 250 and 500 rows, respectively; in each case the errors &i are generated to insure independence. 1000 replications were simulated for each case. Based on these data, we calculated the test statistics hüL | — r, I 1 - (^ľ=l ani,0rn:i) š2 \ Tn r2 Wn = n[l-% UJl-^^] (4.24) where r • — D"1/2 (y- — 6 - x'/3 7 and xnO — (Onl,0) • • • j Q>nnfl) — M'V (4.25) (M'V-iV-iM)172 Because the asymptotic null distributions of the test statistics of the Shapiro-Wilk type test Wn are not known for n > 50, they were approximated by the following simple Monte Carlo procedure: For a fixed n, a random sample of size n from the normal distribution was generated and Wn was computed, and this random experiment was repeated 100, 000 times. For the sake of comparison, the nonparametric test of Section 7.3 was performed on the same data for testing the normality. Tables 1-4 give the numbers of rejections of H0 (among 1000 tests) for both statistics described above. Distribution of errors a=0.01 a=0.05 a=0.1 wn rp* n wn rp* n wn rp* n Normal N{0,1) Normal N(0, 4) Logistic (0,1) Logistic (0,4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1) 21 9 42 42 141 148 838 13 12 22 19 56 54 624 42 54 136 131 275 255 900 61 57 76 76 127 120 720 105 90 175 182 365 320 920 142 118 139 130 212 216 765 Table 1: Numbers of rejections of H0 among 1000 cases on level a for matrix (25x4) Distribution of errors a=0.01 a=0.05 a=0.1 Wn rp* n wn rp* n Wn rp* n Normal N(0,1) 8 12 46 60 104 143 Normal N(0,4) 8 12 47 57 96 119 Logistic (0,1) 51 22 119 76 174 137 Logistic (0, 4) 44 18 112 76 158 131 Laplace (0,1) 333 63 499 127 581 210 Laplace (0, 4) 354 60 539 125 616 224 Cauchy (0,1) 1000 620 1000 730 1000 773 Table 2: Numbers of rejections of H0 among 1000 cases on level a for matrix (100x4) Distribution of errors a=0.01 o;=0.05 a=0.1 Wn rp* n wn rp* n wn rp* n Normal N(0,1) Normal N(0,4) Logistic (0,1) Logistic (0, 4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1) 10 10 52 44 546 563 1000 9 13 88 86 669 687 1000 49 50 96 98 721 719 1000 56 57 206 197 836 832 1000 96 100 147 146 782 796 1000 107 113 302 289 890 891 1000 Table 3: Numbers of rejections of H0 among 1000 cases on level a for matrix (250x4) Distribution of errors a=0.01 o;=0.05 a=0.1 Wn rp* n Wn rp* n Wn rp* n Normal N(0,1) Normal N(0,4) Logistic (0,1) Logistic (0, 4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1) 10 9 56 49 869 856 1000 15 10 199 185 967 963 1000 56 56 96 95 937 926 1000 53 51 389 369 991 990 1000 102 100 142 136 960 939 1000 107 101 495 489 995 995 1000 Table 4: Numbers of rejections of Hq among 1000 cases on level a for matrix (500x4) 9 4.2 Testing for nonnormal distributions We used the test of Section 3 for verifying the following three null hypotheses: (i) H0 : F = logistic Logistic scores for Sno : ipo(u) = logu — log(l — u), 0 < u < 1 (ii) H0 : F = normal Normal scores for Sno : ipo(u) = $_1(«), 0 < u < 1 where $ is the standard normal distribution function. (iii) H0 : F = Laplace Laplace scores for Sno :) , s _ í log 2« 0 < u < 0.5 '^u>- \ -log(2(l-u)) «>0.5 The errors were generated by sampling from the hypothetical F0 (normal, logistic, Laplace), and from the following alternative densities: -si-normal Al (0,1) : f(x) = -?k=g 2 -si-normal Al (0,4) : f (x) = 775= e 32 logistic (0,1) : f (x) = (1^e-s)2 logistic (0,4): f (x) = {1*~-í4/4)2 Lapiace (0,1) : f (x) = |e"|:c| Lapiace (0,4) : f (x) = fe"4^ Cauchy: f (x) = ľ(1^2) 1000 replications were simulated for each case. Based on these data, we calculated the test statistics ?: = ^ {log l^-logÉ(Fo)} for the hypothesis H0 : F = F0, (ß,a unspecified). We took the regression interquartile range ß\{\) — ßi{\) in the role of Sn\. Tables 5-7 give the numbers of rejections of H0 (among 1000 tests) for all above cases. 10 I. Hq : F = LOGISTIC (0,1) (i.e., we used logistic scores). Distribution n=27 n=108 a=0.01 o;=0.05 a=0.1 a=0.01 o;=0.05 a=0.1 Normal N(0,1) Normal N(0,4) Logistic (0,1) Logistic (0, 4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1) 19 18 24 18 21 22 513 124 92 93 86 87 100 632 213 197 169 181 149 169 687 45 45 12 10 84 88 999 163 187 61 62 199 210 1000 279 286 127 114 286 298 999 Table 5: Numbers of rejections of H0 among 1000 cases on level a for matrix (25x4) and for matrix (100 x 3) II. H0 : F = NORMAL (0,1) (i.e., we used van der Waerden scores). Distribution n=27 n=108 a=0.01 o;=0.05 a=0.1 a=0.01 o;=0.05 a=0.1 Normal N(0,1) Normal N(0,4) Logistic (0,1) Logistic (0, 4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1) 11 11 22 18 55 53 626 60 59 77 77 129 124 722 142 129 136 132 212 216 765 12 11 40 38 337 331 1000 52 53 116 122 516 509 1000 116 112 183 192 611 607 1000 Table 6: Numbers of rejections of H0 among 1000 cases on level a for matrix (25x4) and for matrix (100 x 3) 11 III. H0 : F = LAPLACE (0,1) (i.e., we used Laplace scores). Distribution n=27 n=108 a=0.01 a=0.05 a=0.1 a=0.01 a=0.05 a=0.1 Normal N(0,1) Normal N(0,4) Logistic (0,1) Logistic (0, 4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1) 43 42 37 44 19 24 342 259 251 189 203 121 131 452 424 378 319 337 189 231 515 366 365 153 144 11 9 971 689 687 378 375 69 73 991 806 802 516 511 135 145 1000 Table 7: Numbers of rejections of Ho among 1000 cases on level a for for matrix (25x4) and for matrix (100 x 3) References: C. Huber-Carol, N. Balakrishnan, M. S. Nikulin, M. Mesbah (eds.) (2002). Goodness-of-Fit Tests and Model Validity. Parkhäuser, Boston. J. Jurečková and J. Picek (2005): Robust Statistical Methods with R. Chapman & Hall/CRC. J. Jurečková, J. Picek and P. K. Sen (2003). Goodness-of-fit test with nuisance regression and scale. Metrika 58, 235-258. J. Jurečková and P. K. Sen (1996). Robust Statistical Procedures: Asymptotics and Interrelations. John Wiley & Sons, New York. P. Royston (1982a). An extension of Shapiro and Wilk's W test for normality to large samples. Appl. Statistics 31, 115-124. P. Royston (1982b). Algorithm AS 181: The W test for normality. Appl. Statistics 31, 176-180. P. Royston (1995). A remark on algorithm AS 181: The W test for normality. Appl. Statistics 44, 547-551. P. K. Sen (2002). Shapiro-Wilk type goodness-of-fit tests for normality: Asymptotics revisited. Goodness-of-Fit Tests and Model Validity (C. Huber-Carol et al., eds.), pp. 73-88. Birkhäuser, Boston. P. K. Sen, J. Jurečková and J. Picek (2003). Goodness-of-fit test of Shapiro-Wilk type with nuisance regression and scale. Austrian J. of Statist. 32, No. 1 & 2, 163-177. S. S. Shapiro and M. B. Wilk (1965). An analysis of variance for normality (complete samples). Biometrika 52, 591-611.