GOODNESS-OF-FIT TESTS WITH NUISANCE REGRESSION AND SCALE
Jana Jurečková
1     Introduction
Robust estimators are constructed in such a way that they are insensitive to small deviations from the assumed distribution of the model errors; for instance, the Huber estimator of the location or regression parameters is minimaximally robust over a family of contaminated normal distributions.
Before using a robust estimator, likely operating well in a neighborhood of some distribution, we can try to verify a hypothesis on the shape of this distribution; otherwise, we can start with a suitable goodness-of-fit test. The %2 and the Kolmogorov-Smirnov tests are probably most well-known. Many various other tests can be found in the literature (e.g., Huber-Carol et al. (2002)).
These tests work well in the simplest situation, when our observations Y\,..., Yn are independent and identically distributed with a distribution function F, and we want to verify the hypothesis H0 :   F = F0, where F0 is a fully specified distribution function.
However, the hypothetical distribution function F0 is often specified only up to several unknown parameters, e.g., up to the location, scale or regression parameters. This is a typical situation: our observations can follow a linear regression model, whose parameters we want to estimate by a suitable robust estimator, and an approximate knowledge of the shape of the distribution of errors would lead to a good choice of the score function. This situation is more realistic, but the standard goodness-of-fit tests then lose their simplicity.
Taking these facts into account, we want to offer some goodness-of-fit tests on the shape of the distribution in the presence of nuisance regression and scale parameters.
2     Tests of normality of the Shapiro-Wilk type with nuisance regression and scale parameters
If the distribution seems to have a symmetric unimodal density, then the first natural idea is to test for its normality. A highly intuitive goodness-of-fit test of normality with nuisance location and scale parameters was proposed by Shapiro and Wilk (1965). Their test has received considerable attention in the literature; its asymptotic null distribution was later studied by de Wet and Venter (1973), and recently by Sen (2002). Because we often also have a nuisance regression, we shall describe an extension of the Shapiro-Wilk test of normality to the situation with nuisance regression and scale parameters,
1
2
constructed by Sen, Jurečková and Picek (2003). Their test is based on the pair of two estimators of the standard deviation of errors in the linear regression model, namely on the maximum likelihood estimator and on an L-estimator. Similar to the Shapiro-Wilk test, the asymptotic equivalence of these estimators is a characteristic property of the normal distribution of the errors, i.e., it is true only under the normality, and thus provides a test.
Let Yi,..., Yn be independent observations following the linear model
Yi = 9 + x'iß + aei, i = l,...,n                                     (2.1)
where x» E Mp, i = 1,..., n are given regressors, not all equal, 9 E M1, ß E Rp and a > 0 are unknown intercept, regression and scale parameters, and the errors e j are independent and identically distributed according to a continuous distribution function F with location 0 and scale parameter 1.
We want to test the hypothesis
H0 :   F = $,      against     Hi :   F = Fi ^ $                           (2.2)
where $ is the standard normal distribution function, F\ is a general nonnormal distribution function, and 9, ß, and a are treated as nuisance parameters.
For the special location-scale model (i.e., when ß = 0), Shapiro and Wilk (1965) proposed a goodness-of-fit test based on two estimators of er: Ln, the BLUE (best linear estimator) under H0, and an, the maximum likelihood estimator (MLE) under H0.
Suppose that Y\,... ,Yn are i.i.d. observations with the distribution Af(ß,a2). Then the MLE of a is an, where
K = n-^(Yt-Y^                             (2.3)
i=\
The best linear unbiased estimate (BLUE) Ln of a has the form
n
J-'n        /   J Q"ni* ní                                                                     V        J
i=\
where
a,= (ai,...,ara) = (M;V-1Mra)-1(M;V;1),    a^ = 0                  (2-5)
and where Mra = M denotes the vector of expected values of order statistics and Vn = V is their variance matrix. Shapiro and Wilk (1965) modified the BLUE of a to Ln0 = Yh=i am,oYni where (a„i)0, • • •, ann>0)' = a„o is such that
,   _          M'V"1
ara° ~ (M'V-iV-iM)172
then a^lra = 0 and a^a^o = 1- Ln0 is asymptotically equivalent to
T» = -E$"1í^rrVnť                              (2.7)
3
(see, e.g., Serfling (1980)). Let us write the Shapiro-Wilk criterion in the form
L
Wn = n\l--^\                                      (2.8)
Two scale estimators Ln0 and an are asymptotically equivalent if and only if F = $, i.e., if the hypothesis of normality is true, while under nonnormal alternative F\ with the finite second moment, the sequence \Jn í 1 —-fr j has a nondegenerate asymptotic (normal)
distribution. It means that the test criterion is consistent with respect to the non-normal alternatives.
We propose the goodness-of-fit test of the hypothesis (2.2) of the normality, based on the observations Yi,..., Yn, following the linear regression model (2.1) with unknown 9, ß and a. The test criterion is
Wn = nll-^\                                         (2.9)
where
1    '"
- "£(y* - F™ - $*#
n
n í=i
n
Ĺn = Y,anŕ™                                                  (2.10)
í=l
are the residual variance and the linear estimator of a with anito, i = 1,..., n defined in (2.6) and the rn:i are the order statistics corresponding to the residuals
~7
rni = Yi-Yn-ßxi, i=l,...,n                                  (2.11)
We assume that the n x p matrix Xra = [xi,..., xra]' satisfies
X'lra = 0,     Rank(Xra) = p < n - 1 and                                                                                                       (2.12)
max hn>ii = ö(n~l)   as  n —► oo   (the balanced design)
Ki<n
where hn>ij = Xj(X'X)  1Xj, i,j = l,...,n. The MLE of parameters 9, ß, a under normal $ have the form
Un        1 n        n      ±n 1 n        V   \   6«,,   G«        IT'      *-n   n
ßn = (XX)-1^ = ß + <j(X;Xn)-1en                        (2.13)
n lY,^-0n-<ßn?
ol = n
í=\
Sen, Jurečková and Picek (2003) proved that the asymptotic null distribution of Wn coincides with that of Wn; hence, the test rejects the hypothesis of the normality on the asymptotic significance level a provided
Wn > ra                                                     (2.14)
4
where ra is the asymptotic critical value of the Shapiro-Wilk test of normality with nuisance location and scale. The coefficients anito, i = 1,..., n and the critical values of the original Shapiro-Wilk test for n < 50 are tabulated in Shapiro and Wilk (1965). The critical values for n > 50 we have calculated by a Monte Carlo procedure.
3     Goodness-of-fit tests for general distribution with nuisance regression and scale
Consider again the linear model (2.1), but this time we have another distribution of errors e\,...,en, in mind, and we want to test the hypothesis H0 : F(e) = F0(e/a), for a specified distribution function F0. This is not necessarily normal, but for simplicity we assume that F0 G J7, the class of distribution functions is symmetric around 0 and possessing a positive density, finite variance and finite Fisher's information. Similarly as in the test of normality, our proposed test of H0 is based on the ratio of two scale statistics: the first one is based on regression rank scores äni(a), i = 1,... ,n, 0 < a < 1, introduced in Section 4.6, and the second one is an extension of the interquartile range to the regression quantiles.
If F0 E J7, we may choose the score generating function ipo(u) = F0~ľ(u) or ~fó(Fo1(u))/fo(F~~1(u)),  0 < u < 1 (if F0 is strongly unimodal with finite Fisher information), because F(e) = F0(e/a), F_1(m) = uF~~l{u) under H0. Our proposed test is based on the statistic
Tn* = n*{log |g-log£(Fo)}                                      (3-15)
where
n
Sn0 = Sn0(Y) = n-'Ys^bm = n-'Y'K                       (3.16)
and bn = (bn\,..., bnn) are the regression scores generated by ipo in the following way:
l>ni = -        <Po(u)däni(u),       2=1,..., U                                   (3.17)
./o
The second scale statistic Sn\ will be based on the regression quantiles; note that the regression quantiles and regression rank scores are asymptotically independent. For simplicity, we recommend the regression interquartile range,
Sm = ßi{\)-ßi{\)                                                 (3.18)
where ß\(a) is the first component of the Cü-regression quantile. Moreover, denote
So(F) =  í vo{u)F-\u)du Jo
S1(F) = F-l(l)-F-l(l) = 2F-l(l)                       (3.19)
e(F) = So(F)
5
Note that £(F0) = 6°L°j is a completely known function, because it depends on the chosen ifo and on the hypothetical F0, and does not depend on a.
Jurečková, Picek and Sen (2003) proved that the asymptotic (null) distribution of the criterion T* under the hypothesis Hq is normal
where
where
T:^+M(0,t£)                                    (3.20)
1
Til = W^)^°° " 2e(Foho1 + e (F°hľl *                   (3'21)
Too = i(ßi- ßl)
-FöHb
7o°i = 7ľo = 9f ,F-i,3v>      /   , !    e2dFo(e) - \ß2
7ľl = WŘW)                                           (3-22)
ß2=      e2dF0(e),    ßA=  l e4dF0(e) Jr                           Jr
qn is the first diagonal element of the matrix D_1 where D = lim^^oo Dn = lim™-^ n_1X'X.
Then r^ does not depend on a and is positive unless |^y = £(F0) (what happens with probability 0).
We are almost ready to formulate the critical region of the test; however, we should think over alternative distributions F against which we wish to have the test consistent.
We shall introduce the following one- and two-sided alternatives of H0. For a pair (F0, F) of distributions, let
A(F0, F) = So(F)^l^ - So(F0)                                  (3.23)
and set the partial ordering
F y F0   or   F -< F0   accordingly   A(F0,F)   is   >    or   < 0
This partial ordering is linked to Hájek's (1969) interpretation of F having heavier or lighter tails than F0. Consider the following alternatives to H0:
H^ :   F y Fo,        Hr :   F -<   F0,        Uf :   H^ U H^
Then
• we reject H0 in favor of H^ on the asymptotic significance level a if
rp*
Tbl
6
• we reject H0 in favor of H^ on the asymptotic significance level a if
rp*
TOI
we reject H0 in favor of R"f on the asymptotic significance level a if
> Ui
rp*
n
2
where ua = $  ľ(l — a) and $ is the standard normal distribution function.
4    Numerical illustration
4.1     Comparison of tests for testing normality
Let us illustrate the performance of the proposed test on the simulated regression model. Concerning the design matrix, we generate three columns as independent identically distributed random variables with uniform distribution on (—10,10) with the first column ln added; ß = (2, —2, 1,-1)' and consider 25 rows for it. The errors were generated from the following densities:
normal Af(0,1) :	/(*) =	i    -Í-= ~/fee 2
normal A/"(0, 4) :	/(*) =	1   -^ = —7=e   32
logistic (0,1) :	/(*) =	e-x
		-   (1+e-z)2
logistic (0, 4) :	/(*) =	e-x/4
		(1+e-^/4)2
Laplace (0,1) :	/(*) =	-  le-\x\
Laplace (0,4) :	/(*) =	-  4p-4|x|
Cauchy:	/(*) =	1
In order to gain insight into larger sample size behavior for our proposed tests, we also generate the design matrix of 100, 250 and 500 rows, respectively; in each case the errors &i are generated to insure independence.
1000 replications were simulated for each case. Based on these data, we calculated the test statistics
hüL   |   — r,   I   1   -   (^ľ=l ani,0rn:i)
š2            \             Tn   r2
Wn = n[l-% UJl-^^]            (4.24)
where
r • — D"1/2 (y- — 6   - x'/3
7
and
xnO
—  (Onl,0) • • • j Q>nnfl) —
M'V
(4.25)
(M'V-iV-iM)172
Because the asymptotic null distributions of the test statistics of the Shapiro-Wilk type test Wn are not known for n > 50, they were approximated by the following simple Monte Carlo procedure:
For a fixed n, a random sample of size n from the normal distribution was generated and Wn was computed, and this random experiment was repeated 100, 000 times.
For the sake of comparison, the nonparametric test of Section 7.3 was performed on the same data for testing the normality.
Tables 1-4 give the numbers of rejections of H0 (among 1000 tests) for both statistics described above.
Distribution of errors	a=0.01		a=0.05		a=0.1	
	wn	rp* n	wn	rp* n	wn	rp* n
Normal N{0,1) Normal N(0, 4) Logistic (0,1) Logistic (0,4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1)	21 9 42 42 141 148 838	13 12 22 19 56 54 624	42 54 136 131 275 255 900	61 57 76 76 127 120 720	105 90 175 182 365 320 920	142 118 139 130 212 216 765
Table 1: Numbers of rejections of H0 among 1000 cases on level a for matrix (25x4)
Distribution of errors	a=0.01		a=0.05		a=0.1	
	Wn	rp* n	wn	rp* n	Wn	rp* n
Normal N(0,1)	8	12	46	60	104	143
Normal N(0,4)	8	12	47	57	96	119
Logistic (0,1)	51	22	119	76	174	137
Logistic (0, 4)	44	18	112	76	158	131
Laplace (0,1)	333	63	499	127	581	210
Laplace (0, 4)	354	60	539	125	616	224
Cauchy (0,1)	1000	620	1000	730	1000	773
Table 2: Numbers of rejections of H0 among 1000 cases on level a for matrix (100x4)
Distribution of errors	a=0.01		o;=0.05		a=0.1	
	Wn	rp* n	wn	rp* n	wn	rp* n
Normal N(0,1) Normal N(0,4) Logistic (0,1) Logistic (0, 4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1)	10 10 52 44 546 563 1000	9 13 88 86 669 687 1000	49 50 96 98 721 719 1000	56 57 206 197 836 832 1000	96 100 147 146 782 796 1000	107 113 302 289 890 891 1000
Table 3:    Numbers of rejections of H0 among 1000 cases on level a for matrix (250x4)
Distribution of errors	a=0.01		o;=0.05		a=0.1	
	Wn	rp* n	Wn	rp* n	Wn	rp* n
Normal N(0,1) Normal N(0,4) Logistic (0,1) Logistic (0, 4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1)	10 9 56 49 869 856 1000	15 10 199 185 967 963 1000	56 56 96 95 937 926 1000	53 51 389 369 991 990 1000	102 100 142 136 960 939 1000	107 101 495 489 995 995 1000
Table 4: Numbers of rejections of Hq among 1000 cases on level a for matrix (500x4)
9
4.2    Testing for nonnormal distributions
We used the test of Section 3 for verifying the following three null hypotheses:
(i) H0 :   F = logistic
Logistic scores for Sno :
ipo(u) = logu — log(l — u), 0 < u < 1
(ii) H0 :   F = normal
Normal scores for Sno :
ipo(u) = $_1(«), 0 < u < 1
where $ is the standard normal distribution function.
(iii) H0 :  F = Laplace
Laplace scores for Sno :)
, s _ í log 2«                   0 < u < 0.5
'^u>- \ -log(2(l-u))   «>0.5
The errors were generated by sampling from the hypothetical F0 (normal, logistic, Laplace), and from the following alternative densities:
-si-normal Al (0,1) :    f(x) = -?k=g   2
-si-normal Al (0,4) :    f (x) = 775= e   32
logistic (0,1) :        f (x) = (1^e-s)2
logistic (0,4):         f (x) = {1*~-í4/4)2
Lapiace (0,1) :       f (x) = |e"|:c|
Lapiace (0,4) :       f (x) = fe"4^
Cauchy:                  f (x) = ľ(1^2)
1000 replications were simulated for each case.  Based on these data, we calculated the test statistics
?: = ^ {log l^-logÉ(Fo)}
for the hypothesis H0 :   F = F0,  (ß,a unspecified).
We took the regression interquartile range ß\{\) — ßi{\) in the role of Sn\.
Tables 5-7 give the numbers of rejections of H0 (among 1000 tests) for all above cases.
10
I. Hq : F = LOGISTIC (0,1) (i.e., we used logistic scores).
Distribution	n=27			n=108		
	a=0.01	o;=0.05	a=0.1	a=0.01	o;=0.05	a=0.1
Normal N(0,1) Normal N(0,4) Logistic (0,1) Logistic (0, 4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1)	19 18 24 18 21 22 513	124 92 93 86 87 100 632	213 197 169 181 149 169 687	45 45 12 10 84 88 999	163 187 61 62 199 210 1000	279 286 127 114 286 298 999
Table 5: Numbers of rejections of H0 among 1000 cases on level a for matrix (25x4)
and for matrix (100 x 3)
II. H0 : F = NORMAL (0,1) (i.e., we used van der Waerden scores).
Distribution	n=27			n=108		
	a=0.01	o;=0.05	a=0.1	a=0.01	o;=0.05	a=0.1
Normal N(0,1) Normal N(0,4) Logistic (0,1) Logistic (0, 4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1)	11 11 22 18 55 53 626	60 59 77 77 129 124 722	142 129 136 132 212 216 765	12 11 40 38 337 331 1000	52 53 116 122 516 509 1000	116 112 183 192 611 607 1000
Table 6: Numbers of rejections of H0 among 1000 cases on level a for matrix (25x4)
and for matrix (100 x 3)
11
III. H0 : F = LAPLACE (0,1) (i.e., we used Laplace scores).
Distribution	n=27			n=108		
	a=0.01	a=0.05	a=0.1	a=0.01	a=0.05	a=0.1
Normal N(0,1) Normal N(0,4) Logistic (0,1) Logistic (0, 4) Laplace (0,1) Laplace (0, 4) Cauchy (0,1)	43 42 37 44 19 24 342	259 251 189 203 121 131 452	424 378 319 337 189 231 515	366 365 153 144 11 9 971	689 687 378 375 69 73 991	806 802 516 511 135 145 1000
Table 7:    Numbers of rejections of Ho among 1000 cases on level a for for matrix
(25x4) and for matrix (100 x 3)
References:
C. Huber-Carol, N. Balakrishnan, M. S. Nikulin, M. Mesbah (eds.) (2002). Goodness-of-Fit Tests and Model Validity. Parkhäuser, Boston.
J. Jurečková and J. Picek (2005): Robust Statistical Methods with R. Chapman & Hall/CRC.
J. Jurečková, J. Picek and P. K. Sen (2003). Goodness-of-fit test with nuisance regression and scale. Metrika 58, 235-258.
J. Jurečková and P. K. Sen (1996). Robust Statistical Procedures: Asymptotics and Interrelations. John Wiley & Sons, New York.
P. Royston (1982a). An extension of Shapiro and Wilk's W test for normality to large samples. Appl. Statistics 31, 115-124.
P. Royston (1982b). Algorithm AS 181: The W test for normality. Appl. Statistics 31, 176-180.
P. Royston (1995). A remark on algorithm AS 181: The W test for normality. Appl. Statistics 44, 547-551.
P. K. Sen (2002). Shapiro-Wilk type goodness-of-fit tests for normality: Asymptotics revisited. Goodness-of-Fit Tests and Model Validity (C. Huber-Carol et al., eds.), pp. 73-88. Birkhäuser, Boston.
P. K. Sen, J. Jurečková and J. Picek (2003). Goodness-of-fit test of Shapiro-Wilk type with nuisance regression and scale. Austrian J. of Statist. 32, No. 1 & 2, 163-177.
S. S. Shapiro and M. B. Wilk (1965). An analysis of variance for normality (complete samples). Biometrika 52, 591-611.