ROBUST AND NONPARAMETRIC METHODS Jana Jurečková 2 Contents 1 Rank tests in linear regression model 5 1.1 Properties of ranks and order statistics . . . . . . . . . . . . . . . . . . . . 5 1.1.1 The distribution of X(.) and of R : . . . . . . . . . . . . . . . . . . 5 1.1.2 Marginal distributions of the random vectors R and X(.) under H0 : 6 1.2 Locally most powerful rank tests . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Structure of the locally most powerful rank tests of H0 : . . . . . . . . . . 8 1.3.1 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Rank tests for simple regression model with nonrandom regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.1 Rank tests for H (1) 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.2 Rank tests for H (2) 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.5 Rank tests for some multiple linear regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5.1 Rank tests for H (1) 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5.2 Rank tests for H (2) 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.6 Rank estimation in simple linear regression models . . . . . . . . . . . . . . . . . . . . . . . 20 1.6.1 Estimation of the slope of the regression line . . . . . . . . . . . . 20 1.6.2 Estimation in multiple regression model . . . . . . . . . . . . . . . 22 1.7 Aligned rank tests about the intercept . . . . . . . . . . . . . . . . . . . . 22 1.7.1 Regression line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.7.2 Multiple regression model . . . . . . . . . . . . . . . . . . . . . . . 24 3 4 Chapter 1 Rank tests in linear regression model 1.1 Properties of ranks and order statistics Let X = (X1, . . . , Xn) be the vector of observations; denote Xn:1 Xn:2 . . . Xn:n the components of X ordered according to increasing magnitude. The vector X(.) = (Xn:1, . . . , Xn:n) is called the vector of order statistics and Xn:i is called the ith order statistic. Assume that the components of X are different and define the rank of Xi as Ri = n j=1 I[Xj Xi]. Then the vector R of ranks of X takes on the values in the set R of n! permutations (r1, . . . , rn) of (1,. . . , n). 1.1.1 The distribution of X(.) and of R : Lemma 1.1.1 If X has density pn(x1, . . . , xn), then the vector X(.) of order statistics has the distribution with the density p(xn:1, . . . , xn:n) = rR p(xn:r1 , . . . , xn:rn ) . . . xn:1 . . . xn:n 0 otherwise. (ii) The conditional distribution of R given X(.) = x(.) has the form IP(R = r|X(.) = x(.)) = p(xn:r1 , . . . xn:rn ) p(xn:1, . . . , xn:n) for any r R and any xn:1 . . . xn:n. Proof. For any Borel set B X(.) should hold IP(X(.) B) = rR IP(X(.) B, R = r) = rR x(.)B,R=r . . . p(x1, . . . , xn)dx1, . . . , dxn = rR B . . . p(xn:r1 , . . . , xn:rn )dxn:1, . . . , xn:n = B . . . p(xn:1, . . . , xn:n)dxn:1, . . . , xn:n, 5 6 what proves (i). Similarly, IP(X(.) B, R = r) = B . . . p(xn:r1 , . . . , xn:rn )dxn:1, . . . , dxn:n = B . . . p(xn:r1 , . . . , xn:rn ) p(xn:1, . . . , xn:n) p(xn:1, . . . , xn:n)dxn:1, . . . , dxn:n = B . . . IP(R = r X(.) = x(.))p(xn:1, . . . , xn:n)dxn:1, . . . , dxn:n, what proves (ii). 2 We say that the random vector X satisfies the hypothesis of randomness H0, if it has a probability distribution with density of the form p(x) = n i=1 f(xi), x IRn where f is an arbitrary one-dimensional density. Otherwise speaking, X satisfies the hypothesis of randomness provided its components are a random sample from an absolutely continuous distribution. We say that the random vector X satisfies the hypothesis of exchangeability H, if p(x1, . . . , xn) = p (xr1 , . . . , xrn ) for every permutation (r1, . . . , rn) of 1, . . . , n. If X satisfies H0, then it obviously satisfies H. The following Lemma follows from Lemma 1.1.1. Lemma 1.1.2 If X satisfies H0 or H, then X(.) and R are independent, the vector of ranks R has the uniform discrete distribution IP(R = r) = 1 n! , r R and the distribution of X(.) has the density p(xn:1, . . . , xn:n) = n!p(xn:1, . . . , xn:n) . . . xn:1 . . . xn:n 0 . . . otherwise. 1.1.2 Marginal distributions of the random vectors R and X(.) under H0 : Lemma 1.1.3 Let X satisfy the hypothesis H0. Then (i) Pr(Ri = j) = 1 n i, j = 1, . . . , n. (ii) Pr(Ri = k, Rj = m) = 1 n(n-1) for 1 i, j, k, m n, i = j, k = m. (iii) IERi = n+1 2 , i = 1, . . . , n. 7 (iv) var Ri = n2-1 12 , i = 1, . . . , n. (v) cov(Ri, Rj) = -n+1 12 , 1 i, j n, i = j. (vi) If X has density p(x1, . . . , xn) = n i=1 f(xi), then Xn:k has the distribution with density f(n)(x) = n n - 1 k - 1 (F(x))k-1 (1 - F(x))n-k f(x), x IR1 where F(x) is the distribution function of X1, . . . , Xn. (vii) If X has uniform R[0, 1] distribution, then Xn:i has beta B(i, n - i + 1) distribution with the expectation and variance IEXn:i = i n + 1 , Var Xn:i = i(n - i + 1) (n + 1)2(n + 2) . Proof. Lemma follows immediately from Lemma 1.1.2. 2 1.2 Locally most powerful rank tests We want to test a hypothesis of randomness H0 on the distribution of X. The rank test is characterized by test function (R). The most powerful rank -test of H0 against a simple alternative K : {Q} [that X has the fixed distribution Q] follows directly from the Neyman-Pearson Lemma: (r) = 1 ...n! Q(R = r) > k 0 ...n! Q(R = r) < k ...n! Q(R = r) = k, r R where k and are determined so that #{r : n! Q(R = r) > k)} + #{r : n! Q(R = r) = k} = n!, 0 < < 1. If we want to test against a composite alternative and the uniformly most powerful rank tests do not exist, then we look for a rank test, most powerful locally in a neighborhood of the hypothesis. Definition 1.2.1 Let d(Q) be a measure of distance of alternative Q K from the hypothesis H. The -test 0 is called the locally most powerful in the class M of -tests of H against K if, given any other test M, there exists > 0 such that the power-functions of 0 and satisfy the inequality 0 (Q) (Q) Q satisfying 0 < d(Q) < . 8 1.3 Structure of the locally most powerful rank tests of H0 : Theorem 1.3.1 Let A be a class of densities, A = {g(x, ) : J } such that J IR1 is an open interval, J 0. g(x, ) is absolutely continuous in for almost all x. Moreover, let for almost all x there exist the limit ˙g(x, 0) = lim 0 1 [g(x, ) - g(x, 0)] and lim 0 |˙g(x, )|dx = |˙g(x, 0)|dx. Consider the alternative K = {q : > 0}, where q(x1, . . . , xn) = n i=1 g(xi, ci), c1, . . . , cn given numbers. Then the test with the critical region n i=1 cian(Ri, g) k is the locally most powerful rank test of H0 against K on the significance level = P( n i=1 cian(Ri, g) k), where P is any distribution satisfying H0, an(i, g) = IE ˙g(Xn:i, 0) g(Xn:i, 0) , i = 1, . . . , n are the scores where Xn:1, . . . , Xn:n are the order statistics corresponding to the random sample of size n from the population with the density g(x, 0). Proof. Of Q is the probability distribution with the density q, then, for any permutation r R, lim 0 1 [n! Q(R = r) - 1] = n i=1 ci an(ri, g). (1.3.1) If (1.3.1) is true, then there exists an > 0 such that n i=1 ci an(ri, g) > n i=1 ci an(ri, g) = Q(R = r) > Q(R = r ) for all (0, ) and for different r, r R; then we reject Q for r R such that n i=1 ci an(ri, g) > k for a suitable k. So we must prove (1.3.1), what we shall do as 9 follows: We can write 1 [Q(R = r) - Q0(R = r] = R=r . . . 1 n i=1 g(xi, ci) - n i=1 g(xi, 0) dx1, . . . , dxn = n i=1 R=r . . . 1 (g(xi, ci) - g(xi, 0)) i-1 j=1 g(xj, cj) n k=i+1 g(xk, 0)dx1, . . . , dxn where we used the identity n i=1 Ai - n j=1 Bj = n i=1 (Ai - Bi) i-1 j=1 Aj n k=i+1 Bk. If ci > 0, then lim sup 0 R=r . . . 1 (g(xi, ci) - g(xi, 0)) i-1 j=1 g(xj, cj) n k=i+1 g(xk, 0)dx1, . . . , dxn ci R=r . . . |˙g(xi, 0)| j=i g(xj, 0)dx1, . . . , dxn, analogously for ci < 0. This, combining with the Fatou lemma, leads to lim 0 n i=1 R=r . . . 1 (g(xi, ci) - g(xi, 0)) i-1 j=1 g(xj, cj) n k=i+1 g(xk, 0)dx1, . . . , dxn = n i=1 R=r . . . ci ˙g(xi, 0) j=i g(xj, 0)dx1, . . . , dxn n i=1 ci R=r . . . ˙g(xi, 0) g(xi, 0) n j=1 g(xj, 0)dx1, . . . , dxn = 1 n! n i=1 ciIE ˙g(Xi, 0) g(Xi, 0) R = r = 1 n! n i=1 ci an(ri, g). regarding that g(x, 0) = 0 and ˙g(x, 0) = 0 can happen simultaneously only on the set of measure 0. This implies (1.3.1). 2 1.3.1 Special cases I. Two-sample alternative of the shift in location: K1 : {q : > 0} where q(x1, . . . , xN ) = m i=1 f(xi) N i=m+1 f(xi - ) 10 with f being a fixed absolutely continuous density such that |f (x)|dx < . Then the locally most powerful rank -test of H0 against K has the critical region N i=m+1 aN (Ri, f) k where k satisfies the condition P( N i=m+1 aN (Ri, f) k) = , P H0 and aN (i, f) = IE f (XN:i) f(XN:i) , i = 1, . . . , N where XN:1 < . . . < XN:N are the order statistics corresponding to the sample of size N from the distribution with the density f. The scores may be also written as aN (i, f) = IE(UN:i, f), i = 1, . . . , N where (u, f) = -f (F-1(u)) f(F-1(u)) , 0 < u < 1 and UN:1, . . . , UN:N are the order statistics corresponding to the sample of size N from the uniform R(0, 1) distribution. Another form of the scores is aN (i, f) = N N - 1 i - 1 f (x)Fi-1 (x)(1 - F(x))N-i dx. Remark 1.3.1 The computation of the scores is difficult for some densities; if there are no tables of the scores at disposal, they are often replaced by the approximate scores aN (i, f) = i N + 1 = (IEUN:i, f), i = 1, . . . , N, i = 1, . . . , N. The asymptotic critical values coincide for both types of scores. II. Alternative of simple linear regression: K2 = {q : > 0} where q(x1, . . . , xn) = n i=1 f(xi - ci) with a fixed absolutely continuous density f and with given constants c1, . . . , cn, n i=1 c2 i > 0. Then the locally most powerful rank -test has the critical region n i=1 ci an(Ri, f) k (1.3.2) with the the same scores as in case I, and with k determined by the condition IP n i=1 ci an(Ri, f) > k + IP n i=1 ci an(Ri, f) > k = . In the practice we most often use the test with the Wilcoxon scores: Put (u) = u - 1 2 and reject H0 provided Wn = n i=1 ci Ri > k, where k is such that 11 P n i=1 ci Ri > k H0 + P n i=1 ci Ri = k H0 = , 0 < 1. This test is the locally most powerful against K2 with F logistic with the density f(x) = e-x (1 + e-x)2 , x IR but is rather efficient also for other alternatives. For large n we use the normal approximation of Wn : If n , then Wn has asymptotically normal distribution under H0 in the following sense: lim n PH0 Wn - IEWn var Wn < x = (x), x IR1 , where is the standard normal distribution function. To be able to use the normal approximation, we must know the expectation and variance of Wn under H0. The following Lemma gives the expectation and the variance of a more general linear rank statistic, covering the Wilcoxon as well other rank tests. Lemma 1.3.1 Let the random vector (R1, . . . , Rn) have the discrete uniform distribution on the set R of all permutations of numbers 1, . . . , n, i.e. IP(R = r) = 1 n! , r R; let c1, . . . , cN and a1 = a(1), . . . , an = a(n) are arbitrary constants. Then the expectation and variance of the linear statistic Sn = n i=1 ci a(Ri) are IESN = 1 n n i=1 ci n j=1 aj var Sn = 1 n - 1 n i=1 (ci - c)2 n j=1 (aj - a)2 , where c = 1 n n i=1 ci, a = 1 n n i=1 ai. Proof. The proposition follows from the distribution of R under H0. 1.4 Rank tests for simple regression model with nonrandom regressors Let X1, . . . , XN be independent random variables with continuous distribution funtions F1, . . . , FN , where Fi(x) = F(x - 0 - ci), i = 1, . . . , N, x R, F is continuous, cN = (c1, . . . , cn) is a vector of (known) regression constants (not all equal), and (0, ) are unknown parameters; we call 0 an intercept of the regression line and is called the regression coefficient. Our first hypothesis is that there is no regression, H (1) 0 : = 0 against K(1) : = 0 or K (1) + : > 0, (1.4.1) 12 where 0 is considered as a nuisance parameter. We may be also interested in the joint hypothesis H (2) 0 : (0, ) = 0 against K(2) : (0, ) = 0. (1.4.2) The third hypothesis is H (3) 0 : 0 = 0 against K(3) : 0 = 0 or K (3) + : 0 > 0, (1.4.3) where is treated as a nuisance parameter. In either case there exists a distribution-free rank test, whose critical values do not depend on F. We can also consider = or (0, ) = ( 0, ); then we work with X i = Xi - 0 - ci, i = 1, . . . , N. 1.4.1 Rank tests for H (1) 0 Let RN = (RN1, . . . , RNN ) be the ranks of X1, . . . , XN . Choose some nondecreasing score function : (0, 1) R and put SN = N i=1 (ci - cN )aN (RNi), cN = 1 N N i=1 ci (1.4.4) where the scores have the form aN (i) = IE(UN:i) or i N + 1 , 1 i N, (1.4.5) where UN:1 . . . UN:N are the order statistics corresponding to the sample U1, . . . , UN from the uniform R(0, 1) distribution. Under H (1) 0 , it holds F1(x) = . . . = FN (x) = F(x - 0) = F0(x) (say), where F0 is continuous. Because the ties between X1, . . . , XN can happen with probability 0, we have IP RN = rN H (1) 0 = 1 N! rN RN (permutations), hence IP{RNi = k| H (1) 0 } = 1 N i, k, 1 i, k N IP{RNi = k, RNj = | H (1) 0 } = 1 N(N-1) i, j, k, , 1 i = j, k = N. Hence, IE{SN | H (1) 0 } = N i=1 (ci - cN )IE{aN (RNi)| H (1) 0 } = 1 N N i=1 (ci - cN ) N j=1 aN (i) = 0, Var {SN | H (1) 0 } = 1 N - 1 N i=1 (ci - cN )2 N j=1 (aN (i) - aN )2 13 The distribution of SN under H (1) 0 does not depend on F and on 0, hence we reject H (1) 0 in favor of {K (1) + : > 0} when SN > k+ and reject with probability when SN = k+ , where k+ is determined so that IP{SN > k+ | H (1) 0 } + IP{SN = k+ | H (1) 0 } = and = 0.05 or 0.01, for instance. Similarly, we reject H (1) 0 in favor of {K(1) : = 0} when |SN | > k and reject with probability [0, 1) when |SN | = k, where k is determined so that IP{|SN | > k| H (1) 0 } + IP{|SN | = k| H (1) 0 } = . For small N we can calculate the critical values k+ and k; but for large N we must use an asymptotic approximation. The asymptotic distribution of SN under H (1) 0 is based on the following theorems, proved by Hájek (1961): Theorem 1.4.1 Let RN = (RN1, . . . , RNN ) be a random vector such that IP{R = r} = 1 N! r R and let {aN (i), 1 i N} and {cN (i), 1 i N} be two sequences of real numbers such that, as N , max 1iN (aN (i) - aN )2 N j=1(aN (j) - aN )2 0, max 1iN (cN (i) - cN )2 N j=1(cN (j) - cN )2 0 (Noether condition). (1.4.6) Then IP SN - IESN Var SN x (x) as N x R where is the standard normal distribution function, if and only if, for every > 0, lim N 1 N N i=1 N j=1 2 N, ij I[|N, ij| > ] = 0 (Lindeberg condition) (1.4.7) and N, ij = (aN (i) - aN )(cN (j) - cN ) N-1 N k=1(aN (k) - aN )2 N =1(cN ( ) - cN )2 1/2 , i, j = 1, . . . , N. Theorem 1.4.2 (Projection theorem). If aN (1) . . . aN (N) and max 1iN (aN (i) - aN )2 N j=1(aN (j) - aN )2 0 as N , then SN is asymptotically equivalent in the quadratic mean to the statistic TN = N i=1 (cN (i) - cN )a0 N (Ui) + NcN aN 14 in the sense that lim N IE (SN - TN )2 Var SN = 0. Here a0 N (i) = aN (i) for i - 1 N < u i N , i = 1, . . . , N and U1, . . . , UN is a random sample from the uniform R(0, 1) distribution. Corollary 1.4.1 Let N, ij = (aN (i) - aN )(ci - cN ) AN CN , i, j = 1, . . . , N, A2 N = (N - 1)-1 N k=1 (ak - aN )2 , C2 N = N =1 (c - cN )2 , and let the sequences {aN (1) . . . , aN (N)} and {c1, . . . , cN } satisfy the Noether condition (1.4.6). Then lim N IP SN AN CN x H (1) 0 = (x) x R. The asymptotic rank test rejects H (1) 0 in favor of K (1) + on the significance level if SN AN CN -1 (1 - ) and in favor of K(1) if |SN | AN CN -1 1 - 2 , respectively. 1.4.2 Rank tests for H (2) 0 The hypothesis H (2) 0 : (0, ) = 0 we shall test under the condition of symmetry on F, i.e. F(x) + F(-x) = 1 for x R. Because the ranks are invariant to the shift in location, the test should also involve the signs of observations. Let R+ Ni be the rank of |X|Ni among |X|N1, . . . , |X|NN , i = 1, . . . , N. Choose a score-generating function : (0, 1) [0, ) and the scores a N (1), . . . , a N (N) generated by in the same manner as in (1.4.5). Under the hypothesis H (2) 0 , the observations are independent and identically distributed with a continuous distribution function F, symmetric about 0. Consider two statistics S+ N,1 = N i=1 a N (R+ Ni)sign Xi, S+ N,2 = N i=1 cia N (R+ Ni)sign Xi, SN = (S+ N,1, S+ N,2) 15 and denote (N) 11 = N, (N) 12 = N i=1 ci, (N) 22 = N i=1 c2 i , (N) = (N) ij i,j=1,2 . Under H (2) 0 and under symmetry of F, the vector (sign X1 R+ N1, . . . , sign XN R+ NN ) can take on N!2N values, each with probability 1/(N!2N ), and sign Xi is independent of R+ Ni, i = 1, . . . , N. Hence, IE(S+ N |H (2) 0 ) = 0, IE(S+ N S+ N |H (2) 0 ) = A2 N (N) , A2 N = 1 N N i=1 (a N (i))2 . Consider the following test criterion W+ N = S+ N IEH (2) 0 S+ N S+ N -1 S+ N = S+ N -1 N SN /A2 N . (1.4.8) Under H (2) 0 and under symmetry of F, the distribution of W+ N does not depend on the unknown F. However, the exact distribution of W+ N is very laborious to calculate, hence we should again use the asymptotic approximation. The asymptotic behavior is described in the following theorem: Theorem 1.4.3 Assume that the sequences {aN (i), 1 i N} and {cNi, 1 i N} satisfy, as N , max1iN a2 N (i) N j=1 a2 N (j) 0, max1iN c2 Ni N j=1 c2 Nj 0. Denote N,ij = aN (i)cNj N-1 N k=1 a2 N (k) N =1 c2 N 1/2 , i, j = 1, . . . , N. Then, under H (2) 0 and under symmetry of F, the sequence (S+ N2 - IES+ N2)/ VarS+ N2 is asymptotically normally distributed N(0, 1) if and only if, for every > 0, lim N 1 N N i=1 N j=1 2 N,ijI[|N,ij| > ] = 0 (Lindeberg condition). If we further apply Theorem 1.4.3 to cni = 1, i = 1, . . . , N, we conclude that the random vector S+ N has asymptotically a bivariate normal distribution N2 0, A N (N) . This implies that under H (2) 0 and under symmetry of F, W+ N has asymptotically 2 distribution with 2 degrees of freedom. Hence, the asymptotic test rejects H (2) 0 in favor K(2) if W+ N 2 2,. 16 1.4.3 Example A group of students, boys and girls, graduated in a summer language course. They passed two tests, before and after the course. The responses in the table are differences in the tests scores for each individual; ci = 1 for a boy and ci = -1 for a girl. # response ci RNi R+ Ni ciRNi sign XiR+ Ni 1 5.2 1 19 19 19 19 2 -0.7 1 6 63 6 -6 3 -2.3 1 2 13 2 -13 4 3.2 1 16 15 16 15 5 -1.5 1 4 9 4 -9 6 4.7 1 18 18 18 18 7 1.8 1 14 12 14 12 8 -0.4 1 8 3 8 -3 9 0.6 1 11 5 11 5 10 6.6 1 20 20 20 20 11 -0.9 -1 5 8 -5 -8 12 1.7 -1 13 11 -13 11 13 -0.3 -1 9 2 -9 -2 14 2.4 -1 15 14 -15 146 15 4.2 -1 17 16 -17 16 16 -1.6 -1 3 10 -3 -10 17 -4.3 -1 1 17 -1 -17 18 0.8 -1 12 7 -12 7 19 -0.5 -1 7 4 -7 -4 20 -0.2 -1 10 1 -10 -1 We want to test whether the course had an effect and whether there is a difference between the performance of boys and girls. We take the Wilcoxon scores, aN (i) = a N (i) = i 21 , i = 1, . . . , 20 and get SN AN CN = 0.9826 < 1.96 = -1 (0.95), W+ N = 2.368 < 5.99 = 2 2(0.95). Hence, we cannot reject either of the hypotheses. 17 1.5 Rank tests for some multiple linear regression models Consider the linear regression model Yi = 0 + xi + ei, i = 1, . . . , N (1.5.1) where 0 R1, Rp are unknown parameters and ei, . . . , eN are independent errors, identically distributed according to a continuous d.f. F and xi Rp are given regressors, i = 1, . . . , N. Denote XN = x1 ... xN the regression matrix. We shall first consider the hypotheses H (1) 0 : = 0 versus K(1) : = 0 and H (2) 0 : = (0, ) = 0 versus K(2) : = 0. The hypotheses and tests are extensions of those for the regression line. 1.5.1 Rank tests for H (1) 0 Let RN1, . . . , RNN be the ranks of Y1, . . . , YN and let aN (1), . . . , aN (N) be the scores generated by a nondecreasing, square-integrable score function : (0, 1) R1 so that aN (i) = i N+1 , i = 1, . . . , N. Consider the linear rank statistics SNj = N i=1 (xij - xNj)aN (RNi), xNj = 1 N N i=1 xij, j = 1, . . . , N and the vector SN = N i=1 (xi - xN )aN (RNi) = (SN1, . . . , SNp) . The distribution function of observation Yi under H (1) 0 is F(y - 0), i = 1, . . . , N. Hence, (RN1, . . . , RNN ) assumes all possible permutations of (1, 2, . . . , N) with equal probability 1 N! . Hence, the expectation and covariance matrix of SN under H (1) 0 are IE(SN |H (1) 0 ) = 0 and IE(SN SN |H (1) 0 ) = A2 N QN , where A2 N = 1 N - 1 N i=1 (aN (i) - aN )2 , QN = N i=1 (xi - xN )(xi - xN ) . 18 Our test for H (1) 0 is based on the quadratic form SN = A-2 N SN Q-1 N SN , (1.5.2) where Q-1 N is replaced by the generalized inverse QN if QN is singular. We reject H (1) 0 if SN > k where k is a suitable critical value. Notice that SN depends only on x1, . . . , xN , on the scores aN (1), . . . , aN (N) and on the ranks RN1, . . . , RNN . Hence, the distribution of SN and thus also that of SN under the hypothesis H (1) 0 does not depend on the distribution function F of the errors. For small N, the critical value can be calculated numerically, but it would become laborious with increasing N. Hence, again, we should use the large-sample approximation. This can be derived under some conditions on the matrix XN , and on the scores: Theorem 1.5.1 Assume that (i) the matrix QN is regular for N > N0 and max 1iN (xi - xN ) Q-1 N (xi - xN ) 0 as N , (ii) the scores satisfy the Noether condition, i.e. max 1iN (aN (i) - aN )2 N j=1(aN (j) - aN )2 0 as N , (iii) lim N 1 N N i=1 N j=1 2 N,ijkI[|N,ijk| > ] = 0 for every > 0, k = 1, . . . , p, where N,ijk = (aN (i) - aN )(xjk - xk) N-1 N i=1(aN (i) - aN )2 N j=1(xjk - xk)2 1/2 , k = 1, . . . , p, i, j = 1, . . . , N. Then, under H (1) 0 , the criterion SN in (1.5.2) has asymptotically 2 distribution with p degrees of freedom. Remark 1.5.1 We reject hypothesis H (1) 0 on the significance level if SN > 2 p(1 - ), where 2 p(1 - ) is the (1 - ) quantile of the 2 distribution with p degrees of freedom. 19 Sketch of the proof. It suffices to show that under H (1) 0 the asymptotic distribution of SN is p-dimensional normal with expectation equal to 0 and dispersion matrix A2 N QN . Then the quadratic form SN will have asymptotically the 2 (p). To prove the asymptotic normality of SN , we must prove that, for any vector Rp, = 0, the scalar product SN has asymptotically normal distribution N(0, A2 N QN ). But SN = N i=1 [ (xi - xN )]aN (RNi) and its coefficients (xi - xN ) satisfy the Noether condition (1.4.6), because max 1iN [ (xi - xN )]2 N j=1[ (xj - xN )]2 = max 1iN (xi - xN )(xi - xN ) QN max 1iN xi - x 2 max(Q-1 N ) = max 1iN max{(xi - x) Q-1 (xi - x)} 0. Moreover, we can show by some arithmetics that the entities N,ij() = (xi - x)(aN (j) - aN ) N-1 N i=1[ (xi - x)]2 N j=1(aN (j) - aN )2 satisfy the Lindeberg condition (1.4.7). Then the asymptotic normality of the scalar product will follow from Theorem 1.4.3 for every = 0. 2 1.5.2 Rank tests for H (2) 0 Consider again the model Yi = 0 +xi+ei, i = 1, . . . , N, and assume that the errors ei have a symmetric distribution function, F(x) + F(-x) = 1 x. Let R+ N1, . . . , R+ NN be the ranks of |Y1|, . . . , |YN |. Choose a score-generating function : (0, 1) [0, ) and the scores a N (1), . . . , a N (N) generated by . Put xi0 = 1, i = 1, . . . , N, and for j = 0, 1, . . . , p consider the signed-rank statistics S+ N,j = N i=1 xij sign Yi a N (R+ Ni) and the vector S+ N = (S+ N,0, S+ N,1, . . . , S+ N,p) . Then, under H (2) 0 , IE S+ N |H (2) 0 = 0 and IE S+ N S+ N |H (2) 0 = A2 N Q N , where A2 N = 1 N N i=1[a N (i)]2 and Q N = N i=1 x i x i = N i=1 xijxij j,j =0,1,...,p 20 and x i = (xi0,, xi1, . . . , xip) . The test criterion will be the quadratic form S+ N = A-2 N S+ N (Q N )-1 S+ N . The distribution of S+ N (and hence of S+ N ) is generated by N!2N equally probable realizations of (sign Y1, . . . , sign YN ) and (R+ N1, . . . , R+ NN ). The asymptotic distribution of S+ N under H (2) 0 will be 2 (p + 1), provided max 1iN x i (Q N )-1 x i 0 as N , (a N (1), . . . , a N (N)) satisfy the Noether condition (1.4.6), and under the Lindeberg condition (1.4.7) on some mixed terms corresponding to x i and a N (i), analogously as under the regression line. 1.6 Rank estimation in simple linear regression models 1.6.1 Estimation of the slope of the regression line Let Y1, . . . , YN be independent random variables, Yi have a distribution function Fi(y) = F(y - 0 - (xi - xN )), i = 1, . . . , N where F is continuous. We want to estimate the parameter with the aid of ranks. Denote Yi(b) = Yi - (xi - xN )b, 1 i N, b R1. Let TN (Y1, . . . , YN ) be a test statistics for testing H0 : = 0 and assume that under H0 the distribution of TN is symmetric about N or that IEH0 TN = N . If TN (Y1(b), . . . , YN (b)) is nonincreasing in b R1, then we can define the estimate of as N = 1 2 (N + + N ), (1.6.1) N = sup{b : TN (b) > N }, + N = inf{b : TN (b) < N }. If TN = N i=1(xi - xN )(Yi - YN ), then N = 0 and TN (b) is linear in b; the estimator is the least-squares estimator 0f . Lemma 1.6.1 Let TN = SN = N i=1(xi - xN )aN (RNi) where aN (1) . . . aN (N) (not all equal) and RNi is the rank of Yi, i = 1, . . . , N. Then SN (b) is nonincreasing in b. Proof. See Puri and Sen (1985). The following Lemma shows that SN is symmetrically distributed under some condi- tions. 21 Lemma 1.6.2 Let either xi - xN = xN - xN-i+1, i = 1, . . . , N (1.6.2) or ai - aN = aN - aN-i+1, i = 1, . . . , N. (1.6.3) Then, if = 0, the distribution of SN is symmetric about 0. Proof. Let (1.6.2) hold. Because (RN1, . . . , RNN ) have the same distribution as (RNN , . . . , RN1), then SN has the same distribution as SN = N i=1(xi-xN )aN (RN,N-i+1) = -SN . Similarly we proceed under (1.6.2). 2 Properties of N : 1. N (Y1 + x1b, . . . , YN + xN b) = N (Y1, . . . , YN ) + b b R1. 2. N (cY1, . . . , cYN ) = cN (Y1, . . . , YN ) c > 0. 3. IP(N < a) IP(SN (a) < n) IP(SN (a) N ) IP(N a) Asymptotic normality of N : Theorem 1.6.1 Assume that {xN1, . . . , xNN } satisfy the conditions 0 < lim N 1 N N i=1 (xNi - xN )2 = C2 0 < , (1.6.4) max 1iN 1 N (xNi - xN )2 0 as N . Let aN (i) = IE(UN:i) or = i N+1 , i = 1, . . . , N, where is nondecreasing on (0, 1) and A2 = 1 0 2 (u)du < , 1 0 (u)du = 0. Let F have finite Fisher's information, i.e. A2 = 1 0 2 (u)du, where (u) = f (F-1 (u)) f(F-1(u)) , 0 < u < 1. Then N1/2 (N - ) N=1 is asymptotically normally distributed N 0, A2 C2 0 2(, F) , (, F) = 1 0 (u)(u)du. 22 1.6.2 Estimation in multiple regression model Let Y1, . . . , YN be independent observations, Yi have distribution function Fi(y) = F(y - 0 - (xi - xN ) ), xi Rp, 1 i N. Consider the (vector) linear rank statistic SN (b) = N i=1 (xi - xN )aN (RNi(b)) = (SN1(b), . . . , SNN (b)) , where RNi(b) is the rank of Yi - x b, i = 1, . . . , N, and the scores are nondecreasing. Obviously IESN (0) = 0. Define DN = b : SN (b) = min, b Rp where is either L1 or the L2-norm. If DN is a convex set, then we can define the center of gravity of DN as an estimator N of . Assume that xNi satisfy the (Noether) condition max 1iN (xNi - xN ) Q-1 N (xNi - xN ) 0 as N , where QN = N i=1(xNi - xN )(xNi - xN ) . If F has the finite Fisher's information, then N1/2 (N - ) is asymptotically normally distributed Np 0, A2 2(, F) 1 N QN -1 . 1.7 Aligned rank tests about the intercept 1.7.1 Regression line Let Y1, . . . , YN are independent, Yi has distribution function Fi(y) = IP(Yi y) = F(y - 0 - (xi - xN )), 1 i N, y R. Consider the hypothesis H0 : 0 = 0 versus K+ : 0 > 0 or K : 0 = 0 where is treated as a nuisance parameter. If = 0, then Y1, . . . , YN are not identically distributed, and we cannot use their ranks. If we have an estimate N of , we can consider the ranks of the residuals |Yi - (xi - xN )N |, i = 1, . . . , N (aligned ranks) and an (aligned) signed rank statistics based on them. Under some conditions, such statistic is asymptotically distribution-free, i.e. under the hypothesis H0 : 0 = 0, its asymptotic distribution does not depend on F. 23 Let N be the rank estimate (1.6.1) based on the linear rank statistic N i=1 (xi - xN )aN (RNi(b)), b R1. Yi = Yi - (xi - xN )N , i = 1, . . . , N and the aligned signed rank statistic SN = N i=1 sign Yi a N (R+ Ni), where R+ Ni is the rank of |Yi - (xi - xN )N |, i = 1, . . . , N. The test criterion for H0 will be TN = N-1/2 SN A N , (A N )2 = 1 N N i=1 (a N (i))2 . We reject H0 in favor of K+ if TN > k+ , and reject H0 in favor of K if |TN | > k. The critical values k+ and k are determined from the asymptotic normal distribution of TN . Theorem 1.7.1 Assume that (i) F is symmetric about 0 and has an absolutely continuous density f and finite and positive Fisher information, 0 < I(f) = f (z) f(z) 2 dF(z) < . (ii) 1 N N i=1(xi -xN )2 C2 , 0 < C < , and 1 N [max1iN (xi - xN )2 ] 0 as N . (iii) (t) is nondecreasing, (1 - t) = -(t), t (0, 1), and 0 < A2 () = 1 0 2 (t)dt < . Put (u) = u+1 2 , 0 < u < 1 and a N (i) = IE (UN:i) or a N (i) = i N+1 , i = 1, . . . , N. Then, under H0 : 0 = 0, the criterion TN has asymptotically normal distribution with mean 0 and variance 1. Sketch of he proof. Because limN A N = A2 () and N1/2 (N - ) = Op(1), it can be proved (not elementary) that under H0 N-1/2 [SN - SN ()] p 0 as N , (1.7.5) where SN () = N i=1 sign(Yi()) a N (R+ Ni()), where Yi() = Yi-(xi-xN ) and R+ Ni() is the rank of Yi() = Yi-(xi-xN ), 1 i N. Under H0 are Yi() = Yi - (xi - xN ) independent and identically distributed with d.f. F symmetric about 0. It was shown earlier that N-1/2 SN () d N(0, A2 ()), hence, regarding (1.7.5), also N-1/2 SN d N(0, A2 ()). 2 Remark 1.7.1 We reject H0 in favor of K+ on the asymptotic significance level , provided TN -1 (1 - ), and we reject H0 in favor of K provided |TN | 1 - 2 . 24 Powers of the tests against local alternatives: The tests are consistent in the sense that their powers tend to 1 as 0 (or |0| ). However, important is the power for alternatives close the the hypothesis, namely K1N : 0 = N-1/2 , = 0 fixed . Such alternative is contiguous in the sense of LeCam/Hájek, and it can be shown that the approximation (1.7.5) holds not only under the hypothesis, but also under K1N . Hence, N-1/2 SN has the same asymptotic distribution as SN () also under K1N . Denote = -1 (1 - ), 0 < < 1. The asymptotic power of the aligned rank test is IP{TN |K1N } 1 - - A 1 0 (u)f (u)du one-sided test Comparison: Classical test of H0 The least-squares estimator of 0 is ~0N = YN = 1 N N i=1 Yi and the likelihood ratio statistic is LN = N YN sN , where s2 N = 1 N - 2 N i=1 [Yi - YN - (xi - xN )~N ]2 , ~N = N i=1(xi - xN )(Yi - YN ) N i=1(xi - xN )2 . If 2 = z2 dF(z) < , then s2 N p 2 , YN p 0, ~N p as N . Under H0 : 0 = 0, the likelihood ratio is asymptotically N(0, 1). The asymptotic relative efficiency of the aligned signed rank test with respect to the likelihood ratio test is 2 1 0 (u)f (u)du 2 1 0 2(u)du 2 I(f). 1.7.2 Multiple regression model Let Y1, . . . , YN be independent with distribution functions F1, . . . , FN such that Fi(y) = IP(Yi y) = F(y - 0 - (xi - xN ) ), 1 i N, y R1, Rp. 25 We want to test the hypothesis H1 : 0 = 0 versus K+ 1 : 0 > 0 or K1 : 0 = 0, where is unspecified. We may also partition as = 1 2 where 1 Rp1 , 2 Rp2 , p1 + p2 = p. We want to test the hypothesis H2 : 2 = 0 versus 2 = 0 where 0, 1 are unspecified. Test of H1 Let N be the estimator of . Consider the residuals Yi = Yi - xi, i = 1, . . . , N and the (aligned) ranks R+ N1, . . . , R+ NN of |Yi|, i = 1, . . . , N. Similarly as in the case of the regression line, the test is based on the aligned sign rank statistic SN = N i=1 sign(Yi) a N (R+ Ni) and the test criterion is T2 N = S2 N NA2 N , (A N )2 = 1 N N i=1 (a N (i))2 T2 N has asymptotically 2 distribution with 1 d.f. References J. Hájek and Z. Šidák (1967): Theory of Rank Tests. Academia, Prague & Academic Press, New York. J. Hájek, Z. Šidák and P. K. Sen (2000): Theory of Rank Tests (2nd edition). Academic Press, New York. M. L. Puri and P. K. Sen (1985): Nonparametric Methods in General Linear Models. J. Wiley, New York.