ROBUST AND NONPARAMETRIC METHODS
Jana Jurečkova
2
Contents
1    Rank tests in linear regression model                                                               5
1.1    Properties of ranks and order statistics....................      5
1.1.1     The distribution of X(.) and of R :    ..................      5
1.1.2     Marginal distributions of the random vectors R and X(.) under H0 :     6
1.2    Locally most powerful rank tests    .......................      7
1.3    Structure of the locally most powerful rank tests of H0 :    ..........      8
1.3.1     Special cases...............................      9
1.4    Rank tests for simple regression model
with nonrandom regressors...........................    11
1.4.1     Rank tests for H^...........................    12
1.4.2     Rank tests for H02)...........................    14
1.4.3     Example.................................    16
1.5    Rank tests for some multiple
linear regression models    ............................    17
1.5.1     Rank tests for H^...........................    17
1.5.2     Rank tests for H02)...........................    19
1.6    Rank estimation
in simple linear regression models.......................    20
1.6.1     Estimation of the slope ß of the regression line............    20
1.6.2     Estimation in multiple regression model    ...............    22
1.7    Aligned rank tests about the intercept    ....................    22
1.7.1     Regression line    .............................    22
1.7.2     Multiple regression model.......................    24
3
4
Chapter 1
Rank tests in linear regression model
1.1    Properties of ranks and order statistics
Let X = (Xi,... ,Xn) be the vector of observations; denote Xn:i < Xn:2 ... < Xn:n the
components of X ordered according to increasing magnitude. The vector
X(.) = (Xn:i,... ,Xn:n) is called the vector of order statistics and Xn:i is called the zth
order statistic.
Assume that the components of X are different and define the rank of Xi as Ri =
Yľj=iI[Xj < Xi]. Then the vector R of ranks of X takes on the values in the set TZ
of n\ permutations (ri,..., rn) of (1,..., n).
1.1.1     The distribution of Xg and of R :
Lemma 1.1.1 J/X has density pn(x\,... ,xn), then the vector■ X(.) of order statistics has the distribution with the density
I     / ;rc7? P\%n:n ) • • • > •^n:rn)      ■ ■ ■ %n:l  _  • • •  _ %n:n P\X'n:l j • • • j Xn:n)         \
y 0   otherwise, (ii) The conditional distribution of R given Xq = xq has the form
P{R = r|X(0 = x(.)) = —-------—
for any r E TZ and any xn-.i < • • • < xn:n.
Proof. For any Borel set B E X^ should hold
F(X(.) eB) = ^F(X(.) EB,R = r) = J2                  •••     p(xi,... ,xn)dx1}... ,dxn
ren                                  renJx(.)£B,R^r      J
/   y   /    ' ' '   /   P\Xn:ri> ■ ■ ■ > Xn:rn)ClXn:i, . . . , Xn:n          /     . . .    /   P\Xn:\, . . . , Xn:n)CiXn:i, . . . ,Xn:n,
reTZJB        J                                                                         JB
6
what proves (i). Similarly,
P(Xi.)eB,R = r) = J ... j
P\%n:ri > • • • > •^n:rn)Oi-^n:i) • • • ■> Ol^nvr
P\%n:ri, ■ ■ ■ , %n:rri) _/                              -.   ,                      ,
_/                              \   P\%n:l? ■ ■ ■ j KninjUXníi) . . . , ClXn:n
B         J      P\%n:li ■ ■ ■ )%n:n)
=      ...     JP{R = r X(.) = X(.))p(xra:i,... ,xn:n)dxn:1,... ,dxn:n,
what proves (ii).                                                                                                             □
We say that the random vector X satisfies the hypothesis of randomness H0, if it has a probability distribution with density of the form
n
p(x) = Hf(xi), xeF
i=\
where / is an arbitrary one-dimensional density. Otherwise speaking, X satisfies the hypothesis of randomness provided its components are a random sample from an absolutely continuous distribution. We say that the random vector X satisfies the hypothesis of exchangeability H*, if
PyX\, • • • , Xn)       p \Xri, . . . , XTn )
for every permutation (ri,..., rn) of 1,..., n. If X satisfies H0, then it obviously satisfies H*. The following Lemma follows from Lemma 1.1.1.
Lemma 1.1.2 If X satisfies H0 or H*, then Xq and R are independent, the vector of ranks R has the uniform discrete distribution
P(R=r) = —, r eTZ and the distribution o/X(.) has the density
.    Tť-PKpCn:!) ■ ■ ■ )%n:n)      ■ ■ ■ %n:l _  • • •  _ ■^n:n P\Xn:l> ■ ■ ■ }Xn:n)          ^
0                             ...  otherwise.
1.1.2     Marginal distributions of the random vectors R and X(.) under Ho :
Lemma 1.1.3 Let X satisfy the hypothesis H0.  Then
(i)Pi(Ri = j) = ±Vi,j = l,...,n.
(ii) Pr(Ri = k, Rj = m) = ^Yy
for 1 < i,j,k,m < n,i ^ j, k^m.
(in) ERi = ^, i = l,...,n.
7
(iv) var Ri = ľ]—^, i = 1,..., n.
(v) cov(Rt,R3) = -2±i,  1 < i, j < n, % ± j.
(vi) If X has density p(xi,... ,xn) = YYi=if(xí)> ^en Xn:k has the distribution with density
/(»)(*) = " ( fc ľ í ) (^))fc_1(l - F(x))n-kf(x),    xeR1
where F(x) is the distribution function of Xi,... ,Xn.
(vii) IfX. has uniform R[0,1] distribution, then Xn:i has beta B(i, n — i + 1) distribution with the expectation and variance
EXn:t = ^-,    VarX_=    ^ " * + 1}
n+V              nA      (n + l)2(n + 2)'
Proof. Lemma follows immediately from Lemma 1.1.2.                                                □
1.2    Locally most powerful rank tests
We want to test a hypothesis of randomness H0 on the distribution of X. The rank test is characterized by test function $(R). The most powerful rank a—test of H0 against a simple alternative K : {Q} [that X has the fixed distribution Q] follows directly from the Neyman-Pearson Lemma:
r  1    ...n\Q(R = r)>ka $(r) = I   0    ...n\ Q(R = r) < ka
[ 7   ...n\ Q(R = r) = ka, reTZ
where ka and 7 are determined so that
#{r : n\ Q(R = r) > ka)} + 7#{r : n\ Q(R = r) = ka} = n\a, 0 < a < 1.
If we want to test against a composite alternative and the uniformly most powerful rank tests do not exist, then we look for a rank test, most powerful locally in a neighborhood of the hypothesis.
Definition 1.2.1 Let d(Q) be a measure of distance of alternative Q E K from the hypothesis H. The a—test $0 is called the locally most powerful in the class M. of a—tests of H against K if, given any other test $ G Al, there exists e > 0 such that the power-functions of $0 and $ satisfy the inequality
ß<f>0(Q) > ß$(Q)    VQ     satisfying     0 < d(Q) < e.
8
1.3     Structure of the locally most powerful rank tests of H0 :
Theorem 1.3.1 Let A be a class of densities, A = {g(x,9) : 9 G J} such that
J C R1   is an open interval,   J 3 0.
g(x, 9) is absolutely continuous in  9 for almost all x.
Moreover, let for almost all x there exist the limit
g(x, 0) = lim -\g(x, 9) - g(x, 0)]
0^0 ť
and lim /     \g(x,9)\dx =  /     \g(x,0)\dx.
e^°J-oo                                J-oo
Consider the alternative K = {^a : A > 0}, where
n
q±(xi,...,xn) = Y[g(xi,Aci),
i=\
Ci,... ,cn    given numbers. Then the test with the critical region
n
^2cian(Ri,g) > k
í=i
is the locally most powerful rank test of H0  against K on the significance level a P(Si=\cian{.Ri)9) > k), where P is any distribution satisfying H0;
On(i,g)
g(Xn:i,0)
g(Xn:i,0)
, i = 1,..., n     are the scores
where Xn:i,... ,Xn:n are the order statistics corresponding to the random sample of size n from the population with the density g(x,0).
Proof.  Of Q a is the probability distribution with the density q a, then, for any permutation r G TZ,
1                                      n
hm — [n\ Qa(R = r) - 1] = ^Q an{ri}g).                         (1.3.1)
~~*                                                        i=\
If (1.3.1) is true, then there exists an e > 0 such that
n                                 n
s^2lcian{ri}g)>s^2lcian{r'i}g)  =^>  QA(R= r) > QA(R= r')
i=\                      i=\
for all A G (0,e) and for different r,r' G TZ; then we reject Q a for r G TZ such that Yľí=i cí an(^í,g) > k for a suitable k. So we must prove (1.3.1), what we shall do as
follows: We can write
i[QA(R=r)-Qo(R = r] = ^    ...J±
A i-i
Y[g(xi,Ad) -Ylg(xi,0)
.i=i
í=\
\JjiAj 1  •   .   .   .    •   KJj iAj ir.
I b               ft                                   /*      ~1                                                                                                           vi-                                                            lb
^2         '" / ^^(xi,Aa) - g(xi,0))Yl9(xj,Acj)  JJ ^(xfc,0)dxi,... ,dx
i=\ ^R=r        J                                                     j=\                      k=i+l
where we used the identity
i-i
ua-ub=y-{a~b)^Aj nBk-
i=\            i=\              i=\                     j=\       k=i-\-l
If c- > 0, then
/*                                    /*        -1                                                                                                            hi.                                                            lb
limsup /      ... / — (g(xi,Aci) - g(xi,0))T\ g(xj,Acj)  TT g(xk, 0)dx1,..., dx
< C,;
R=r
Iš'^í, 0)| JJ 5f(xj, 0)dxi,..., dxn,
analogously for c» < 0. This, combining with the Fatou lemma, leads to
i-i mV /      ... / — (ciixi, A&) - q(xí,0))
I b         ft                     /*"!                                                                 Vi-                                    lb
lim V" /      ...  / — (g(xi,Aci) - g(xi,0))Y[g(xj,Acj)  JJ g(xk,0)dx1}... ,dx, y^ /      ••• / cig(xi,0)Y\_g(xj,0)dx1,... ,dxn
í=\
'R=r
3+i
E
/R=r
1   ra
-J^Ci an(ri,g).
q(x- 0)                                          1
JJ 5f(xj, 0)dxi,..., dxn = — ^21
ni
%=\
g(Xi,o)
R=r
í=\
regarding that g(x, 0) = 0 and g(x, 0) ^ 0 can happen simultaneously only on the set of measure 0. This implies (1.3.1).                                                                                     □
1.3.1     Special cases
I. Two-sample alternative of the shift in location: K^ : {q& : A > 0} where
N
qA(x1,...,xN) = Y[f(xi)   Yl   f fa-A)
i=l             i=m-\-l
10
with / being a fixed absolutely continuous density such that f^°  \f'(x)\dx < oo. Then the locally most powerful rank a—test of H0 against K has the critical region
N
y^  aN(Ri,f) > k
i=m-\-l
1«
where k satisfies the condition P(ž2i=m+1 cin(Rí, f) > k) = a, P G H0 and
f(xN:iy
(ín(íJ) =
f(XN:i)
, i = l,...,N
where Xn:í < ... < Xn:n are the order statistics corresponding to the sample of size N from the distribution with the density /. The scores may be also written as
aN(i,f) = E<p(UN:i,f), i=l,..., N
where tp{u, f) = — f\F-iuA) ' 0 < m < 1 and Un-.i, ■ ■ ■, Un-.n are the order statistics corresponding to the sample of size N from the uniform R(0,1) distribution. Another form of the scores is
aN(i,f) = Nl1:_1±)J     f(x)F*-\x)(l - F(x))N-*dx.
Remark 1.3.1 The computation of the scores is difficult for some densities; if there are no tables of the scores at disposal, they are often replaced by the approximate scores
aN(i, f) = <p (j-jlj = f(EUN:i, f),i = l,...,N,    i = l,...,N.
The asymptotic critical values coincide for both types of scores.
II. Alternative of simple linear regression: K2 = {qa : A > 0} where q&.(xi, • • • ,xn) = nľ=i f(xi ~ ^ci) with a fixed absolutely continuous density / and with given constants C\,..., cn, Yľi=i cl > 0- Then the locally most powerful rank cü-test has the critical region
n
Y^^an(Rt,f)>k                                 (1.3.2)
i=\
with the the same scores as in case I, and with k determined by the condition P (][> an(Rt,f) > k\ +7P í ^q an(Rt,f) > k\ = a.
In the practice we most often use the test with the Wilcoxon scores: Put
<p(u) = u — \ and reject H0 provided
Wn = 2_^ Ci Ri > k,   where  k  is such that
%=\
11
Hq    = a, 0 < 7 < 1.
PÍ J]ctJRt>fcHoJ +7W J2c*R* = k This test is the locally most powerful against K2 with F logistic with the density
fix) = —----------, x e R
but is rather efficient also for other alternatives. For large n we use the normal approximation of Wn : If n —► oo, then Wn has asymptotically normal distribution under H0 in the following sense:
lim PHo [W"-EW* <x)= $(aľ)) x e R\ n^°°         {   vvar Wn           J
where $ is the standard normal distribution function.
To be able to use the normal approximation, we must know the expectation and variance of Wn under H0. The following Lemma gives the expectation and the variance of a more general linear rank statistic, covering the Wilcoxon as well other rank tests.
Lemma 1.3.1 Let the random vector (R\,..., Rn) have the discrete uniform distribution on the set TZ of all permutations of numbers 1,... ,n, i.e. P(R = r) = ^, r e TZ; let Ci,... ,cn and a\ = a(l),..., an = a(n) are arbitrary constants. Then the expectation and variance of the linear statistic Sn = Yľi=i c* a(^í) are
-.      n           n
n • i      • i
1=1        3=1
n I         .—,
var S„
— ^2(ci - c)2 ^2(aj - a)2,
n
i=l                         j=l
where c = - V"- ,&,     a = - V"- -, a«.
n *—^i=l    l'                   n t—^i=l     l
Proof. The proposition follows from the distribution of R under Ho.
1.4    Rank tests for simple regression model with nonrandom regressors
Let X\,..., Xn be independent random variables with continuous distribution funtions Fi,..., FN, where
Fi(x) = F(x - ßo - ßci),    i = l,..., N, xeR,
F is continuous, Cn = (c\,. . . ,cn)' is a vector of (known) regression constants (not all equal), and (ßo,ß) are unknown parameters; we call ß0 an intercept of the regression line and ß is called the regression coefficient. Our first hypothesis is that there is no regression,
H^ :   ß = 0   against   K(1) :   ß ^ 0  or  K^ :   ß > 0,                   (1.4.1)
12
where ß0 is considered as a nuisance parameter. We may be also interested in the joint hypothesis
H02) :   (ß0,ß) = 0   against   K<2> :   (ß0,ß) =£ 0.                         (1.4.2)
The third hypothesis is
H03) :   ßo = 0   against   K(3) :   ß0 ^ 0   or  K^ :   ß0 > 0,                 (1.4.3)
where ß is treated as a nuisance parameter.
In either case there exists a distribution-free rank test, whose critical values do not depend on F. We can also consider ß = ß* or (ßo,ß) = (ßo,ß*); then we work with X* = Xt-ß*-ß*ct, i=l,...,N.
1.4.1    Rank tests for Hq
Let Rjv = (Rni, ■ ■ ■, Rnn) be the ranks of Xi,..., Xw. Choose some nondecreasing score function ip : (0,1) i—► E and put
S'jv = ^2(cí - cN)aN(RNi),    cN = —^Ci                           (1-4.4)
i=\                                                             i=\
where the scores have the form
aN(i) = E<p(UN:i)     or     y (jj?—) ,     1 < i < N,                    (1.4.5)
where U n a < • • • Un-.n are the order statistics corresponding to the sample U\,..., Un from the uniform R(0,1) distribution. Under Hq , it holds F\(x) = ... = Fn(x) = F(x — ßo) = F0(x) (say), where F0 is continuous. Because the ties between Xi,... ,XN can happen with probability 0, we have
P \ KN = rN Hq H = —    Vrw G 7^    (permutations),
hence
P{Rm = k\ H^} = £    V«, fc, 1 < i, k < N
N
p{Rm = k,RN3 = e\HÍl)} = J^ vt,3,k,e, i<t^j,k^e<N.
Hence,
N                                                                         NN
E{SN\ H^} = ^(cí - čw)^{aw(i^)l H^} = - J](c, - cw) J] aw(z) = 0,
í=l                                                                               i=\                       j=l
N                            N
Var {SN\ Hq  } = N _ -, y^(cj - cw)2 ^((^(z) - aN)
í=i                      j=\
13
The distribution of S n under Hq   does not depend on F and on ß0, hence we reject H0 in favor of {K+   :   ß > 0} when S n > fc+ and reject with probability 7 when S n = fc+, where fc+ is determined so that
P{SN > k+\ H^} + 7P{SN = k+\ H«} = a
and a = 0.05 or 0.01, for instance. Similarly, we reject Hq in favor of {K^ : ß =£ 0} when \Sn\ > ka and reject with probability 7 G [0,1) when IS^I = ka, where ka is determined so that
P{\SN\ > ka\ H*1»} + jP{\SN\ = ka\ H^} = a.
For small N we can calculate the critical values fc+ and ka; but for large N we must use an asymptotic approximation. The asymptotic distribution of Sn under H0 is based on the following theorems, proved by Hájek (1961):
Theorem 1.4.1 Let Rjv = (Rni, ■ ■ ■, Rnn) be a random vector such that
P{R = r} = —    Vr G TZ
and let {cin(í), 1 < i < N} and {cn(í), 1 < i < N} be two sequences of real numbers such that, as N —► 00,
max —Tf-------------------------> 0,      max —rr-------------------------> 0   (Noether condition).
i<^Ef-iK(j)-^)2          ^N tZ-^cnU) - cNy
(1.4.6) Then
p{Sn~ES-<x\ ->$(a;)      as  N^ 00    VxeR [ V Var Sn          J
where $ is the standard normal distribution function, if and only if, for every e > 0,
í 1    N    N                          1
Kiv, ij I[\kn, ij\ > e] ( = 0    (Lindeberg condition)             (1-4-7)
~^°° I      i=i i=i                                   J
and
(ajv(z) - aN){cN{j) - cN)
N-1 ELi(«iv(fc) - öw)2 ELM^) - cw)2
Theorem 1.4.2 (Projection theorem). If a^{l) < ... < cin(N) and
(aN(i) - aNf
max —rr-------------------------> 0   as  1\ —► 00,
«jv, y =-----------------------------------------------------------------------fTi,     i,j = l,...,N.
i<^eLiK(j)-^)2
í/ien SV is asymptotically equivalent in the quadratic mean to the statistic
N
TN = ^2(cN(i) - cN)a°N(Ui) + NčNäN
i=\
14
in the sense that
(Sn — TnY
lim E
Var S,
N
= 0.
Here
1   —   1                     1
a°N{i) = aN{i)     for^--<u<—,     i=l,...,N and Ui,..., Un is a random sample from the uniform R(0,1) distribution. Corollary 1.4.1 Let
(aN(i) - aN)(ci -cN)       .   .
KN,ij = ---------------—^---------------,      l,J = l,...,J\,
N                                                   N
A2N = (N- I)"1 J>fc - aN)2,     C2N = J](q - cN)\
k=\                                          1=1
and let the sequences {a^{l)... ,cin(N)} and {c\,... ,cn} satisfy the Noether condition (I.4.6).  Then
lim P <í ———— < x
N^oo        1 ANCN
r(i)
H0i; \=<&{x)    VxG E.
The asymptotic rank test rejects Hq    in favor of~K+   on the significance level a if
Sn
An C n
>$_1(l-a)
and in favor ofK.^ if
\Sn\    >$_1/1     a
AmCm             V       2
respectively.
1.4.2    Rank tests for HJj2)
The hypothesis
H02):   (ß0,ß) = 0 we shall test under the condition of symmetry on F, i.e.
F(x) + F(-x) = 1     for  x G E.
Because the ranks are invariant to the shift in location, the test should also involve the signs of observations. Let R^ be the rank of \X\ní among |X|jvi,..., |X|^viv, i = 1,... ,N. Choose a score-generating function ip* : (0,1) 1—► [0, 00) and the scores a*N(l),..., a*N(N) generated by <p* in the same manner as in (1.4.5). Under the hypothesis Hq , the observations are independent and identically distributed with a continuous distribution function F, symmetric about 0. Consider two statistics
N                                                                  N
SN,1 = Yl a*N(RNí)SÍSn Xii       SN,2 = Yl Cía*N(RNí)SÍSn Xii       SN = (S^, Sj2)' i=\                                                        i=\
15
and denote
N                                  N
>vp=n, A<?>=5>, a^=x;^ a<»>
A(w)
í=l                            í=l
i,j=l,2
Under H0    and under symmetry of F, the vector (sign X\ ■ R^1, ■ ■ ■, sign Xn • Rnn) can take on N\2N values, each with probability 1/(N\2N), and sign X» is independent of
Rpii, i = 1,..., N. Hence,
i?(s+|h(2)) = o,
1  N
i=l Consider the following test criterion
W+ = S+' (^H(2)S+S+')_1 S+ = (S+'A^Sjv) /<•                    (1.4.8)
Under Hq and under symmetry of F, the distribution of W^ does not depend on the unknown F. However, the exact distribution of W^ is very laborious to calculate, hence we should again use the asymptotic approximation. The asymptotic behavior is described in the following theorem:
Theorem 1.4.3 Assume that the sequences {cin(í),  1 < i < N} and {cm,  1 < « < N}
satisfy, as N —► oo,
max1<i<jVq2v(ž)               max^^jy c2Ni
Z^j=iaN\J)                           Z^j=lcNj
Denote
aN(i)cNj
kn.
v
n-'T.LíO.MT.Lč
N£
1/2 :
i,j = l,...,N.
Then, under H0    and under symmetry of F, the sequence (S^2 — ES^2)/y^VarS^2 is asymptotically normally distributed X(0,1) if and only if, for every e > 0,
í 1    N    N                       }
K<N,ijI[\KN,ij\ > £] ( = 0      (Lindeberg condition).
~^°°   I      i=l  3 = 1                              J
If we further apply Theorem 1.4.3 to c^ = 1,  i = 1,..., X, we conclude that the random vector S^ has asymptotically a bivariate normal distribution A/"2 í 0, A*N\^N') . This
implies that under Hq    and under symmetry of F, W^ has asymptotically %2 distribution with 2 degrees of freedom.  Hence, the asymptotic test rejects H0    in favor K^2^ if
WÚ > Xla-
16
1.4.3    Example
A group of students, boys and girls, graduated in a summer language course. They passed two tests, before and after the course. The responses in the table are differences in the tests scores for each individual; q = 1 for a boy and q = — 1 for a girl.
#    response    d    RNi    RJjí    CíRní    sign XiRjJi
1	5.2	1	19	19	19	19
2	-0.7	1	6	63	6	-6
3	-2.3	1	2	13	2	-13
4	3.2	1	16	15	16	15
5	-1.5	1	4	9	4	-9
6	4.7	1	18	18	18	18
7	1.8	1	14	12	14	12
8	-0.4	1	8	3	8	-3
9	0.6	1	11	5	11	5
10	6.6	1	20	20	20	20
11	-0.9	-1	5	8	-5	-8
12	1.7	-1	13	11	-13	11
13	-0.3	-1	9	2	-9	-2
14	2.4	-1	15	14	-15	146
15	4.2	-1	17	16	-17	16
16	-1.6	-1	3	10	-3	-10
17	-4.3	-1	1	17	-1	-17
18	0.8	-1	12	7	-12	7
19	-0.5	-1	7	4	-7	-4
20	-0.2	-1	10	1	-10	-1
We want to test whether the course had an effect and whether there is a difference between the performance of boys and girls. We take the Wilcoxon scores, clnÍi) = ci*n(í) = ^, i =
1,..., 20 and get
c
N    = 0.9826 < 1.96 = ^(O^ö), AnCn
W+ = 2.368 < 5.99 = xl(0.95). Hence, we cannot reject either of the hypotheses.
17
1.5    Rank tests for some multiple linear regression models
Consider the linear regression model
Yt = ßo + ^ß + et,    i = l,...,N                                    (1.5.1)
where ^eEi, ß G Ep are unknown parameters and e^,..., e^ are independent errors, identically distributed according to a continuous d.f. F and Xj G Ep are given regressors, i = 1,..., N. Denote
" x'
~l'n the regression matrix. We shall first consider the hypotheses
H^ :   ß = 0   versus   K(1) :  ß ^ 0
and
H02) : ß* = (ßo,ß')' = 0   versus   K(2) :   ß* ^ 0.
The hypotheses and tests are extensions of those for the regression line.
1.5.1    Rank tests for Hq '
Let Rni, ■ ■ ■, Rnn be the ranks of Yi,..., YN and let ßAr(l),..., cin(N)  be the scores
generated by a nondecreasing, square-integrable score function <p : (0,1)  i—► Ei so that aN(i) = ^(wTl) , i = l,...,N.
Consider the linear rank statistics
N                                                                        1      N
Snj = ^2(xij - xNj)aN(RNi),    xNj = — ^2 xíj>    J = !)•••) N
i=\                                                                    i=\
and the vector
N
Sjv = _^,(xí ~~ ^■n)cin(Rní) = (Sni, • • • > Snp) ■
i=\
The distribution function of observation Yi under Hq is F(y — ß0), i = 1,..., N. Hence, (Rni, ■ ■ ■, Rnn) assumes all possible permutations of (1, 2,..., N) with equal probability -^y. Hence, the expectation and covariance matrix of S n under Hq    are
E(SN\H^) = 0     and     ^(S^H«) = ANQN, where
N                                                           N
— ^2(aN(i) - aN)2,     Qn = J^(Xí - xjv)(xi - xw)'.
Az  —
An~n   .
í=i                                                  í=i
18
r(l) w
Our test for Hq   is based on the quadratic form
(1.5.2)
where Q^1 is replaced by the generalized inverse QN if Qn is singular. We reject H0   if <Sn > ka where ka is a suitable critical value.
Notice that Sn depends only on xi,..., xjv, on the scores cin(1),..., cin(N) and on the ranks Rni, ■ ■ ■, Rnn- Hence, the distribution of Sn and thus also that of Sn under the hypothesis Hq does not depend on the distribution function F of the errors. For small N, the critical value can be calculated numerically, but it would become laborious with increasing N. Hence, again, we should use the large-sample approximation. This can be derived under some conditions on the matrix Xjv, and on the scores:
Theorem 1.5.1 Assume that
(i)  the matrix Qn is regular for N > N0 and
max (xj — xw)/Q^1(xj — xw) —► 0     as     N —► oo,
Ki<N
(ii)  the scores satisfy the Noether condition, i.e.
(aN(i) - aN)2
max
^^EliKC?)-^
0     as     N —► oo,
(iii)
lim
iV—>oo
where
Č>N,ijk =
NN
N
í=i i=\
0     for every  e > 0, \/k = 1,... ,p,
(aN(i) - aN)(xjk - xk)
N~l Eili(ajv(i) - aN)2 Ejlifefc - xk)2
-Ü2>     k=l,...,p, i,j = 1,...,N.
Then, under H0  , the criterion Sn in (1.5.2) has asymptotically \2 distribution with p degrees of freedom.
(i)
Remark 1.5.1   We reject hypothesis H0    on the significance level a if
Sn > xU1 -a),
where Xp(l — a) is the (1 — a) quantile of the \2 distribution with p degrees of freedom.
19
Sketch of the proof. It suffices to show that under Hq the asymptotic distribution of S n is p-dimensional normal with expectation equal to 0 and dispersion matrix A2nQn. Then the quadratic form Sn will have asymptotically the x2(p)- To prove the asymptotic normality of Sjv, we must prove that, for any vector A G Rp, A^ 0, the scalar product A'Sjv has asymptotically normal distribution A/"(0, X'A2NQNX). But
N
\'SN = ^[A'(xi - xN)]aN(RNi)
i=\
and its coefficients A'(xj — xw) satisfy the Noether condition (1.4.6), because
[A'(xi - xw)]2                     A'(xí-xAř)(xí-xAř)/A max —ry--------------------= max -----------—r——------------
< max ||Xj — x||   • Kmax{y^N ) = max Kmax\{Ki — xj (4    (Xj — xj}    > 0.
l<i<N                                               l<i<N
Moreover, we can show by some arithmetics that the entities
A'(xj -x)(aw(j) - aN)
satisfy the Lindeberg condition (1.4.7). Then the asymptotic normality of the scalar product will follow from Theorem 1.4.3 for every A^ 0.                                                     □
1.5.2    Rank tests for HJj2)
Consider again the model Yi = ß0 +x^/3 + ej, i = 1,..., N, and assume that the errors e» have a symmetric distribution function, F(x) + F(—x) = 1 \/x. Let -R^u • • • > ^ivjv be ^ne ranks of |Yi|,..., |l^v|- Choose a score-generating function p* : (0,1) 1—► [0, 00) and the scores a*N(l),..., a*N(N) generated by p*. Put x^ = 1, i = 1,..., N, and for j = 0,1,... ,p consider the signed-rank statistics
N
and the vector
Then, under Hq ',
SN,j = J2XiJ SÍgn Yi a*NÍRNí) i=\
c+ _ (c+     c+           c+  V
°N — WM) jn,h ■ ■ ■ 1 °N,p) ■
E (S+|H02)) = 0     and     E (s+S+'|H02)) = A%Ql where ^2 = ^EtiKW]2 and
N
q*v = 5>*x*'
í=\
N
/       y   XíjXíj'
■i=l                 J j,j'=0,l,...,p
20
aiiQ x^      yJ^io, •> x%\ )•••') x%p) •
The test criterion will be the quadratic form
"^n = AN    (»^(Qjy)    Sn) .
The distribution of S^ (and hence of S~^) is generated by N\2N equally probable realizations of (sign Yi,..., sign YN) and (RN1,..., RNN).
The asymptotic distribution of S^ under Hq   will be x2(P + 1)> provided
max xfrQtr)-^* —► 0     as  ^^oo, l<i<N   % v^iV/       %
(a*N(l),... ,a*N(N)) satisfy the Noether condition (1.4.6), and under the Lindeberg condition (1.4.7) on some mixed terms corresponding to x* and a*N(i), analogously as under the regression line.
1.6    Rank estimation
in simple linear regression models
1.6.1    Estimation of the slope ß of the regression line
Let Yi,... ,YN be independent random variables, Yi have a distribution function
Hv) = Fiv-ßo-ß(xi-xN)),   t = i,...,n
where F is continuous. We want to estimate the parameter ß with the aid of ranks. Denote
Yi(b) = Yi-(xi-xN)b,    l<i<N,    6 G Ei.
Let TN(Yi,..., YN) be a test statistics for testing H0 : ß = 0 and assume that under H0 the distribution of TN is symmetric about ß^ or that En0TN = ßN.
If T/v(Yi(6),..., Y/v(6)) is nonincreasing in b G Ei, then we can define the estimate of
ß as
Pn = UPn + Pn),                                                                  (1-6.1)
ßN = sup{6 :   TN(b) > ßN},     ßN = inf{6 :   TN(b) < ßN}.
If T/v = ^2i=i(xi — x~n)(Yí — Ýn), then ßN = 0 and TN(b) is linear in b; the estimator is the least-squares estimator Of ß.
Lemma 1.6.1 Let TN = Sn = J2i=1(xi — xn)cin(Rní) where a^{l) < ... < cln(N) (not all equal) and R^i is the rank ofYi, i = 1,..., N. Then S n (b) is nonincreasing in b.
Proof. See Puri and Sen (1985).
The following Lemma shows that Sn is symmetrically distributed under some conditions.
21 Lemma 1.6.2 Let either
Xi - xn = xn - XN-i+i,     i=l,..., N                                  (1.6.2)
or
ai-aN = aN-aN-i+i,     i=l,...,N.                               (1.6.3)
Then, if ß = 0, the distribution of S n is symmetric about 0.
Proof. Let (1.6.2) hold. Because (Rni, ■ ■ ■ ,Rnn) have the same distribution as
(Rnn, ■ ■ ■, Rni), then Sn has the same distribution as Šn = J2í=i(xí—xn)o,n(Rn,n-í+i) =
—Sn-
Similarly we proceed under (1.6.2).                                                                                  □
Properties of ßN :
1.   ßN(Y1+x1b,...,YN + xNb) = ßN(Y1,...,YN) + b   V&eRi.
2.   /3n(cY1,...,cYn) = c/3n(Y1,...,Yn)    Vc> 0.
3.   P0N <a)< P(SN(a) < ßn) < P(SN(a) < ßN) < JP(ßN < a) Asymptotic normality of ßN-
Theorem 1.6.1 Assume that {xni,- ■ ■ ,xnn} satisfy the conditions
1   N 0 <  lim -Vt y^(xNi - xN)2 = Cl < oo,                              (1.6.4)
N^oo iv    — i=l
max —(xm — xn)2 —► 0     as     N —► oo.
l<i<N N
Let cin(í) = Elp(Un:í) or = p (jfvi) ,     i = 1,... ,N, where p is nondecreasing on (0,1) and
A2V =   /   p2(u)du < oo,    /   ip(u)du = 0. Jo                            Jo
Let F have finite Fisher's information, i.e.
ľ1                                                           f'(F~l(ii))
Alb =  I   i)2(u)du,      where     ip(u) = —                 , 0 < u < 1.
Jo                                              f(F l(u))
Then \ Nll2(ßN — ß) \        is asymptotically normally distributed
I                             J N=l
Ch2(^F)J'     — '     J0
A^(0,  ^     *   ^ ),    ~f(p,F)= j   p(u)ij(u)du.
22
1.6.2     Estimation in multiple regression model
Let Yi,..., YN be independent observations, Yi have distribution function
Fi(y) = F(y-ß0-(xi-5tN),ß),     x, G Rp,     1<1<N. Consider the (vector) linear rank statistic
N
Sjv(b) = J^(xí - xAř)aAř(JRAří(b)) = (SVi(b),..., SNN(b))',
i=\
where R,Ni(h) is the rank of Yi — x'b, i = 1,..., N, and the scores are nondecreasing. Obviously ESN(0) = 0. Define
VN = |b :   ||Sjv(b)|| = min, b G Rp\
where || • || is either L\ or the L2-norm.   If T>n is a convex set, then we can define the center of gravity of T>n as an estimator ßN of ß. Assume that xWi satisfy the (Noether) condition
max (jím — xnYQ'n'Í^ní — ž n) —► 0     as   N ^ co,
l<i<N                            ,v
where Qn = Sí=i(xwí — ^w)(xwí — žn)'- If F has the finite Fisher's information, then < N1^2(ßN — ß) > is asymptotically normally distributed
K (o. ^ (^)-) .
1.7    Aligned rank tests about the intercept
1.7.1    Regression line
Let Y\,..., Yn are independent, Yi has distribution function
Fi{y) = P{Yi <y) = F(y - ß0 - (xt - xN)ß),  1 < i < N, y G E.
Consider the hypothesis
Ho :  ß0 = 0     versus     K+ :  ß0 > 0    or    K :   ß0 ^ 0
where /3 is treated as a nuisance parameter. If /5 7^ 0, then Y\,..., Yn are not identically distributed, and we cannot use their ranks. If we have an estimate Pn of ß, we can consider the ranks of the residuals |Yi — (xi — xn)Pn\, i = 1, ■ ■ ■ ,N (aligned ranks) and an (aligned) signed rank statistics based on them. Under some conditions, such statistic is asymptotically distribution-free, i.e. under the hypothesis H0 : ßo = 0, its asymptotic distribution does not depend on F.
23
Let Pn be the rank estimate (1.6.1) based on the linear rank statistic
N
^2(xi - xN)aN(RNi(b)), 6 G Ei.
%=\
Yi = Yi — (xi — xn)Pn, i = 1,..., N and the aligned signed rank statistic
N
S N = X!SÍSn Y* a*N(RNi),
í=l
where R^i is the rank of \Yi — (xi — xn)Pn\, i = 1,..., N. The test criterion for H0 will be
Ar-1/2 c?                                       i      N
rp        IV       '    ÖN               2       1    y^      ^       2
Tn = -----t;-----,     (AN)   = —^(M«))-
N                                           i=i
We reject H0 in favor of K+ if T/v > k£, and reject H0 in favor of K if \Tn\ > ka. The
critical values fc+ and ka are determined from the asymptotic normal distribution of TN.
Theorem 1.7.1 Assume that
(i) F is symmetric about 0 and has an absolutely continuous density f and finite and
positive Fisher information, 0 < /(/) = J ŕ 4^)   d F (z) < oo.
(U) jf J2í=i(xí — %n)2 —► C2, 0 < C < oo, and -^ hnaxi<i<N(xi — xn)2] —► 0 as N —► oo.
(iii) ip(ť) is nondecreasing, ip(l — t) = —ip(t),    t G (0,1), and
0 < A2 ((p) = /jj1 ^2(t)dt < oo. Put (p*(u) = (p (^) , 0 < u < 1 and a*N(i) = E^(UN:i) or a*N(i) = ^ (jfc) ,i = l,...,N.
Then, under H0 : ßo = 0, the criterion Tn has asymptotically normal distribution with mean 0 and variance 1.
Sketch of he proof. Because lim^^oo A*N = A2(tp) and Nl/2(ßN — ß) = Op(l), it can be proved (not elementary) that under H0
N~1/2[SN - SN(ß)] ^ 0  as   iV^oo,                              (1.7.5)
where
N
SN(ß) = ^sign(yť(/3)) a*N(R+t(ß)), í=i
where Yi(ß) = Yi — {xi—x^)ß and R^Aß) is the rank of Yi(ß) = Yi — {xi—XN)ß) 1 < i < N. Under H0 are Yi(ß) = Yi — {xi — x^)ß independent and identically distributed with d.f. F symmetric about 0. It was shown earlier that
Ar1/2SW(/3)^A/-(0,A2(^)), hence, regarding (1.7.5), also N~l'2SN -Í- A/"(0, A2 ((f)).                                                  D
Remark 1.7.1 We reject H0 in favor of~K+ on the asymptotic significance level a, provided TN > $_1(1 — a), and we reject H0 in favor o/K provided \TN\ > $ (l — |) .
24
Powers of the tests against local alternatives:
The tests are consistent in the sense that their powers tend to 1 as ß0 —► oo (or \ß0\ —► oo). However, important is the power for alternatives close the the hypothesis, namely
Kijv :  ßo = N~1/2X,     A ^ 0   fixed .
Such alternative is contiguous in the sense of LeCam/Hájek, and it can be shown that the approximation (1.7.5) holds not only under the hypothesis, but also under K^. Hence, N~1^2Sn has the same asymptotic distribution as S^iß) also under K1W.
Denote ra = $_1(1 — a), 0 < a < 1. The asymptotic power of the aligned rank test is
P{Tn > 7a|KiAř} —► 1 — $ ( Ta —-r— /   ip(u)ipf(u)du J  one-sided test
Comparison: Classical test of H0
The least-squares estimator of ß0 is
N
N
1   N
í=i and the likelihood ratio statistic is
L n = v N—,   where
SN
N
Sn      N - 2
— ^2iYi -Yn - {xi - xN)ßN]2,
ň   _ 52í=1{xí-xn){Yj-Yn)
ž2í=l{Xi-XN)2
If a2 = J z2dF(z) < oo, then
2      P        2        -O-       P     a           ô       P     n             at
SN ~^ a i     *n ^ ßo,    ßN —► ß  as   N —► oo.
Under H0 : ßo = 0, the likelihood ratio is asymptotically Af(0,1). The asymptotic relative efficiency of the aligned signed rank test with respect to the likelihood ratio test is
(Jo ^u)^f{u)du)2
a --------j---------------- S er -M/J-
Jo f2(u)du
1.7.2    Multiple regression model
Let Yi,..., Yn be independent with distribution functions F\,..., Fn such that
Fi(y) = P(Yt <y) = F(y - ßo - (x, - xw)'/3), 1 < i < N, y E Ru ß E Rp.
25
We want to test the hypothesis
Hi :   ßo = 0   versus   K+ :   ß0 > 0  or  Ki :  ß0 ^ 0, where /3 is unspecified. We may also partition ß as
where p\ E EPl, ß2 E EP2, p\+ P2= P- We want to test the hypothesis
H2 : ß2 = 0   versus   /32 ^ 0 where /3o, p\ are unspecified.
Test of Hi
Let ßN be the estimator of p\ Consider the residuals Yi = Yi — x^/3, i = 1,..., N and the (aligned) ranks -R^u • • • i^nn °f 1^1 > * = 1, • • •, ^- Similarly as in the case of the regression line, the test is based on the aligned sign rank statistic
N
Sn = ^sign(yi) alfiR^i)
and the test criterion is
q2                                -      n
Of
i2   _     ^N
1      N
1 " ~    M L
^N                               -     i=1
Tf2] has asymptotically x2 distribution with 1 d.f.