ROBUST AND NONPARAMETRIC METHODS
Jana Jurečková
2
Contents
1 Rank tests in linear regression model 5
1.1 Properties of ranks and order statistics . . . . . . . . . . . . . . . . . . . . 5
1.1.1 The distribution of X(.) and of R : . . . . . . . . . . . . . . . . . . 5
1.1.2 Marginal distributions of the random vectors R and X(.) under H0 : 6
1.2 Locally most powerful rank tests . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Structure of the locally most powerful rank tests of H0 : . . . . . . . . . . 8
1.3.1 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Rank tests for simple regression model
with nonrandom regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 Rank tests for H
(1)
0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 Rank tests for H
(2)
0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Rank tests for some multiple
linear regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.1 Rank tests for H
(1)
0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.2 Rank tests for H
(2)
0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Rank estimation
in simple linear regression models . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.1 Estimation of the slope  of the regression line . . . . . . . . . . . . 20
1.6.2 Estimation in multiple regression model . . . . . . . . . . . . . . . 22
1.7 Aligned rank tests about the intercept . . . . . . . . . . . . . . . . . . . . 22
1.7.1 Regression line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.7.2 Multiple regression model . . . . . . . . . . . . . . . . . . . . . . . 24
3
4
Chapter 1
Rank tests in linear regression model
1.1 Properties of ranks and order statistics
Let X = (X1, . . . , Xn) be the vector of observations; denote Xn:1  Xn:2 . . .  Xn:n the
components of X ordered according to increasing magnitude. The vector
X(.) = (Xn:1, . . . , Xn:n) is called the vector of order statistics and Xn:i is called the ith
order statistic.
Assume that the components of X are different and define the rank of Xi as Ri =
n
j=1 I[Xj  Xi]. Then the vector R of ranks of X takes on the values in the set R
of n! permutations (r1, . . . , rn) of (1,. . . , n).
1.1.1 The distribution of X(.) and of R :
Lemma 1.1.1 If X has density pn(x1, . . . , xn), then the vector X(.) of order statistics has
the distribution with the density
p(xn:1, . . . , xn:n) =
rR p(xn:r1 , . . . , xn:rn ) . . . xn:1  . . .  xn:n
0 otherwise.
(ii) The conditional distribution of R given X(.) = x(.) has the form
IP(R = r|X(.) = x(.)) =
p(xn:r1 , . . . xn:rn )
p(xn:1, . . . , xn:n)
for any r  R and any xn:1  . . .  xn:n.
Proof. For any Borel set B  X(.) should hold
IP(X(.)  B) =
rR
IP(X(.)  B, R = r) =
rR x(.)B,R=r
. . . p(x1, . . . , xn)dx1, . . . , dxn
=
rR B
. . . p(xn:r1 , . . . , xn:rn )dxn:1, . . . , xn:n =
B
. . . p(xn:1, . . . , xn:n)dxn:1, . . . , xn:n,
5
6
what proves (i). Similarly,
IP(X(.)  B, R = r) =
B
. . . p(xn:r1 , . . . , xn:rn )dxn:1, . . . , dxn:n
=
B
. . .
p(xn:r1 , . . . , xn:rn )
p(xn:1, . . . , xn:n)
p(xn:1, . . . , xn:n)dxn:1, . . . , dxn:n
=
B
. . . IP(R = r X(.) = x(.))p(xn:1, . . . , xn:n)dxn:1, . . . , dxn:n,
what proves (ii). 2
We say that the random vector X satisfies the hypothesis of randomness H0, if it has a
probability distribution with density of the form
p(x) =
n
i=1
f(xi), x  IRn
where f is an arbitrary one-dimensional density. Otherwise speaking, X satisfies the hypothesis
of randomness provided its components are a random sample from an absolutely
continuous distribution. We say that the random vector X satisfies the hypothesis of
exchangeability H, if
p(x1, . . . , xn) = p (xr1 , . . . , xrn )
for every permutation (r1, . . . , rn) of 1, . . . , n. If X satisfies H0, then it obviously satisfies
H. The following Lemma follows from Lemma 1.1.1.
Lemma 1.1.2 If X satisfies H0 or H, then X(.) and R are independent, the vector of
ranks R has the uniform discrete distribution
IP(R = r) =
1
n!
, r  R
and the distribution of X(.) has the density
p(xn:1, . . . , xn:n) =
n!p(xn:1, . . . , xn:n) . . . xn:1  . . .  xn:n
0 . . . otherwise.
1.1.2 Marginal distributions of the random vectors R and X(.)
under H0 :
Lemma 1.1.3 Let X satisfy the hypothesis H0. Then
(i) Pr(Ri = j) = 1
n
i, j = 1, . . . , n.
(ii) Pr(Ri = k, Rj = m) = 1
n(n-1)
for 1  i, j, k, m  n, i = j, k = m.
(iii) IERi = n+1
2
, i = 1, . . . , n.
7
(iv) var Ri = n2-1
12
, i = 1, . . . , n.
(v) cov(Ri, Rj) = -n+1
12
, 1  i, j  n, i = j.
(vi) If X has density p(x1, . . . , xn) = n
i=1 f(xi), then Xn:k has the distribution with
density
f(n)(x) = n
n - 1
k - 1
(F(x))k-1
(1 - F(x))n-k
f(x), x  IR1
where F(x) is the distribution function of X1, . . . , Xn.
(vii) If X has uniform R[0, 1] distribution, then Xn:i has beta B(i, n - i + 1) distribution
with the expectation and variance
IEXn:i =
i
n + 1
, Var Xn:i =
i(n - i + 1)
(n + 1)2(n + 2)
.
Proof. Lemma follows immediately from Lemma 1.1.2. 2
1.2 Locally most powerful rank tests
We want to test a hypothesis of randomness H0 on the distribution of X. The rank test
is characterized by test function (R). The most powerful rank -test of H0 against a
simple alternative K : {Q} [that X has the fixed distribution Q] follows directly from the
Neyman-Pearson Lemma:
(r) =


1 ...n! Q(R = r) > k
0 ...n! Q(R = r) < k
 ...n! Q(R = r) = k, r  R
where k and  are determined so that
#{r : n! Q(R = r) > k)} + #{r : n! Q(R = r) = k} = n!, 0 <  < 1.
If we want to test against a composite alternative and the uniformly most powerful
rank tests do not exist, then we look for a rank test, most powerful locally in a neighborhood
of the hypothesis.
Definition 1.2.1 Let d(Q) be a measure of distance of alternative Q  K from the hypothesis
H. The -test 0 is called the locally most powerful in the class M of -tests of H
against K if, given any other test   M, there exists  > 0 such that the power-functions
of 0 and  satisfy the inequality
0 (Q)  (Q) Q satisfying 0 < d(Q) < .
8
1.3 Structure of the locally most powerful rank tests
of H0 :
Theorem 1.3.1 Let A be a class of densities, A = {g(x, ) :   J } such that
J  IR1
is an open interval, J 0.
g(x, ) is absolutely continuous in  for almost all x.
Moreover, let for almost all x there exist the limit
˙g(x, 0) = lim
0
1

[g(x, ) - g(x, 0)]
and lim
0

|˙g(x,
)|dx =

|˙g(x,
0)|dx.
Consider the alternative K = {q :  > 0}, where
q(x1, . . . , xn) =
n
i=1
g(xi, ci),
c1, . . . , cn given numbers. Then the test with the critical region
n
i=1
cian(Ri, g)  k
is the locally most powerful rank test of H0 against K on the significance level  =
P( n
i=1 cian(Ri, g)  k), where P is any distribution satisfying H0,
an(i, g) = IE
˙g(Xn:i, 0)
g(Xn:i, 0)
, i = 1, . . . , n are the scores
where Xn:1, . . . , Xn:n are the order statistics corresponding to the random sample of size
n from the population with the density g(x, 0).
Proof. Of Q is the probability distribution with the density q, then, for any permutation
r  R,
lim
0
1

[n! Q(R = r) - 1] =
n
i=1
ci an(ri, g). (1.3.1)
If (1.3.1) is true, then there exists an  > 0 such that
n
i=1
ci an(ri, g) >
n
i=1
ci an(ri, g) = Q(R = r) > Q(R = r )
for all   (0, ) and for different r, r  R; then we reject Q for r  R such that
n
i=1 ci an(ri, g) > k for a suitable k. So we must prove (1.3.1), what we shall do as
9
follows: We can write
1

[Q(R = r) - Q0(R = r] =
R=r
. . .
1

n
i=1
g(xi, ci) -
n
i=1
g(xi, 0) dx1, . . . , dxn
=
n
i=1 R=r
. . .
1

(g(xi, ci) - g(xi, 0))
i-1
j=1
g(xj, cj)
n
k=i+1
g(xk, 0)dx1, . . . , dxn
where we used the identity
n
i=1
Ai -
n
j=1
Bj =
n
i=1
(Ai - Bi)
i-1
j=1
Aj
n
k=i+1
Bk.
If ci > 0, then
lim sup
0 R=r
. . .
1

(g(xi, ci) - g(xi, 0))
i-1
j=1
g(xj, cj)
n
k=i+1
g(xk, 0)dx1, . . . , dxn
 ci
R=r
. . . |˙g(xi, 0)|
j=i
g(xj, 0)dx1, . . . , dxn,
analogously for ci < 0. This, combining with the Fatou lemma, leads to
lim
0
n
i=1 R=r
. . .
1

(g(xi, ci) - g(xi, 0))
i-1
j=1
g(xj, cj)
n
k=i+1
g(xk, 0)dx1, . . . , dxn
=
n
i=1 R=r
. . . ci ˙g(xi, 0)
j=i
g(xj, 0)dx1, . . . , dxn
n
i=1
ci
R=r
. . .
˙g(xi, 0)
g(xi, 0)
n
j=1
g(xj, 0)dx1, . . . , dxn =
1
n!
n
i=1
ciIE
˙g(Xi, 0)
g(Xi, 0)
R = r
=
1
n!
n
i=1
ci an(ri, g).
regarding that g(x, 0) = 0 and ˙g(x, 0) = 0 can happen simultaneously only on the set of
measure 0. This implies (1.3.1). 2
1.3.1 Special cases
I. Two-sample alternative of the shift in location: K1 : {q :  > 0} where
q(x1, . . . , xN ) =
m
i=1
f(xi)
N
i=m+1
f(xi - )
10
with f being a fixed absolutely continuous density such that

|f
(x)|dx < . Then
the locally most powerful rank -test of H0 against K has the critical region
N
i=m+1
aN (Ri, f)  k
where k satisfies the condition P( N
i=m+1 aN (Ri, f)  k) = , P  H0 and
aN (i, f) = IE f
(XN:i)
f(XN:i)
, i = 1, . . . , N
where XN:1 < . . . < XN:N are the order statistics corresponding to the sample of size N
from the distribution with the density f. The scores may be also written as
aN (i, f) = IE(UN:i, f), i = 1, . . . , N
where (u, f) = -f (F-1(u))
f(F-1(u))
, 0 < u < 1 and UN:1, . . . , UN:N are the order statistics
corresponding to the sample of size N from the uniform R(0, 1) distribution. Another
form of the scores is
aN (i, f) = N
N - 1
i - 1

f
(x)Fi-1
(x)(1 - F(x))N-i
dx.
Remark 1.3.1 The computation of the scores is difficult for some densities; if there are
no tables of the scores at disposal, they are often replaced by the approximate scores
aN (i, f) = 
i
N + 1
= (IEUN:i, f), i = 1, . . . , N, i = 1, . . . , N.
The asymptotic critical values coincide for both types of scores.
II. Alternative of simple linear regression: K2 = {q :  > 0} where q(x1, . . . , xn) =
n
i=1 f(xi - ci) with a fixed absolutely continuous density f and with given constants
c1, . . . , cn, n
i=1 c2
i > 0. Then the locally most powerful rank -test has the critical region
n
i=1
ci an(Ri, f)  k (1.3.2)
with the the same scores as in case I, and with k determined by the condition
IP
n
i=1
ci an(Ri, f) > k + IP
n
i=1
ci an(Ri, f) > k = .
In the practice we most often use the test with the Wilcoxon scores: Put
(u) = u - 1
2
and reject H0 provided
Wn =
n
i=1
ci Ri > k, where k is such that
11
P
n
i=1
ci Ri > k H0 + P
n
i=1
ci Ri = k H0 = , 0   < 1.
This test is the locally most powerful against K2 with F logistic with the density
f(x) =
e-x
(1 + e-x)2
, x  IR
but is rather efficient also for other alternatives. For large n we use the normal approximation
of Wn : If n  , then Wn has asymptotically normal distribution under H0 in
the following sense:
lim
n
PH0
Wn - IEWn

var Wn
< x = (x), x  IR1
,
where  is the standard normal distribution function.
To be able to use the normal approximation, we must know the expectation and
variance of Wn under H0. The following Lemma gives the expectation and the variance
of a more general linear rank statistic, covering the Wilcoxon as well other rank tests.
Lemma 1.3.1 Let the random vector (R1, . . . , Rn) have the discrete uniform distribution
on the set R of all permutations of numbers 1, . . . , n, i.e. IP(R = r) = 1
n!
, r  R; let
c1, . . . , cN and a1 = a(1), . . . , an = a(n) are arbitrary constants. Then the expectation and
variance of the linear statistic Sn = n
i=1 ci a(Ri) are
IESN =
1
n
n
i=1
ci
n
j=1
aj
var Sn =
1
n - 1
n
i=1
(ci - c)2
n
j=1
(aj - a)2
,
where c = 1
n
n
i=1 ci, a = 1
n
n
i=1 ai.
Proof. The proposition follows from the distribution of R under H0.
1.4 Rank tests for simple regression model
with nonrandom regressors
Let X1, . . . , XN be independent random variables with continuous distribution funtions
F1, . . . , FN , where
Fi(x) = F(x - 0 - ci), i = 1, . . . , N, x  R,
F is continuous, cN = (c1, . . . , cn) is a vector of (known) regression constants (not all
equal), and (0, ) are unknown parameters; we call 0 an intercept of the regression line
and  is called the regression coefficient. Our first hypothesis is that there is no regression,
H
(1)
0 :  = 0 against K(1)
:  = 0 or K
(1)
+ :  > 0, (1.4.1)
12
where 0 is considered as a nuisance parameter. We may be also interested in the joint
hypothesis
H
(2)
0 : (0, ) = 0 against K(2)
: (0, ) = 0. (1.4.2)
The third hypothesis is
H
(3)
0 : 0 = 0 against K(3)
: 0 = 0 or K
(3)
+ : 0 > 0, (1.4.3)
where  is treated as a nuisance parameter.
In either case there exists a distribution-free rank test, whose critical values do not
depend on F. We can also consider  = 
or (0, ) = (
0, 
); then we work with
X
i = Xi - 
0 - 
ci, i = 1, . . . , N.
1.4.1 Rank tests for H
(1)
0
Let RN = (RN1, . . . , RNN ) be the ranks of X1, . . . , XN . Choose some nondecreasing score
function  : (0, 1)  R and put
SN =
N
i=1
(ci - cN )aN (RNi), cN =
1
N
N
i=1
ci (1.4.4)
where the scores have the form
aN (i) = IE(UN:i) or 
i
N + 1
, 1  i  N, (1.4.5)
where UN:1  . . . UN:N are the order statistics corresponding to the sample U1, . . . , UN
from the uniform R(0, 1) distribution. Under H
(1)
0 , it holds F1(x) = . . . = FN (x) =
F(x - 0) = F0(x) (say), where F0 is continuous. Because the ties between X1, . . . , XN
can happen with probability 0, we have
IP RN = rN H
(1)
0 =
1
N!
rN  RN (permutations),
hence
IP{RNi = k| H
(1)
0 } = 1
N
i, k, 1  i, k  N
IP{RNi = k, RNj = | H
(1)
0 } = 1
N(N-1)
i, j, k, , 1  i = j, k =  N.
Hence,
IE{SN | H
(1)
0 } =
N
i=1
(ci - cN )IE{aN (RNi)| H
(1)
0 } =
1
N
N
i=1
(ci - cN )
N
j=1
aN (i) = 0,
Var {SN | H
(1)
0 } =
1
N - 1
N
i=1
(ci - cN )2
N
j=1
(aN (i) - aN )2
13
The distribution of SN under H
(1)
0 does not depend on F and on 0, hence we reject H
(1)
0
in favor of {K
(1)
+ :  > 0} when SN > k+
 and reject with probability  when SN = k+
 ,
where k+
 is determined so that
IP{SN > k+
 | H
(1)
0 } + IP{SN = k+
 | H
(1)
0 } = 
and  = 0.05 or 0.01, for instance. Similarly, we reject H
(1)
0 in favor of {K(1)
:  = 0}
when |SN | > k and reject with probability   [0, 1) when |SN | = k, where k is
determined so that
IP{|SN | > k| H
(1)
0 } + IP{|SN | = k| H
(1)
0 } = .
For small N we can calculate the critical values k+
 and k; but for large N we must use
an asymptotic approximation. The asymptotic distribution of SN under H
(1)
0 is based on
the following theorems, proved by Hájek (1961):
Theorem 1.4.1 Let RN = (RN1, . . . , RNN ) be a random vector such that
IP{R = r} =
1
N!
r  R
and let {aN (i), 1  i  N} and {cN (i), 1  i  N} be two sequences of real numbers
such that, as N  ,
max
1iN
(aN (i) - aN )2
N
j=1(aN (j) - aN )2
 0, max
1iN
(cN (i) - cN )2
N
j=1(cN (j) - cN )2
 0 (Noether condition).
(1.4.6)
Then
IP
SN - IESN

Var SN
 x  (x) as N   x  R
where  is the standard normal distribution function, if and only if, for every  > 0,
lim
N
1
N
N
i=1
N
j=1
2
N, ij I[|N, ij| > ] = 0 (Lindeberg condition) (1.4.7)
and
N, ij =
(aN (i) - aN )(cN (j) - cN )
N-1 N
k=1(aN (k) - aN )2 N
=1(cN ( ) - cN )2
1/2
, i, j = 1, . . . , N.
Theorem 1.4.2 (Projection theorem). If aN (1)  . . .  aN (N) and
max
1iN
(aN (i) - aN )2
N
j=1(aN (j) - aN )2
 0 as N  ,
then SN is asymptotically equivalent in the quadratic mean to the statistic
TN =
N
i=1
(cN (i) - cN )a0
N (Ui) + NcN aN
14
in the sense that
lim
N
IE
(SN - TN )2
Var SN
= 0.
Here
a0
N (i) = aN (i) for
i - 1
N
< u 
i
N
, i = 1, . . . , N
and U1, . . . , UN is a random sample from the uniform R(0, 1) distribution.
Corollary 1.4.1 Let
N, ij =
(aN (i) - aN )(ci - cN )
AN CN
, i, j = 1, . . . , N,
A2
N = (N - 1)-1
N
k=1
(ak - aN )2
, C2
N =
N
=1
(c - cN )2
,
and let the sequences {aN (1) . . . , aN (N)} and {c1, . . . , cN } satisfy the Noether condition
(1.4.6). Then
lim
N
IP
SN
AN CN
 x H
(1)
0 = (x) x  R.
The asymptotic rank test rejects H
(1)
0 in favor of K
(1)
+ on the significance level  if
SN
AN CN
 -1
(1 - )
and in favor of K(1)
if
|SN |
AN CN
 -1
1 -

2
,
respectively.
1.4.2 Rank tests for H
(2)
0
The hypothesis
H
(2)
0 : (0, ) = 0
we shall test under the condition of symmetry on F, i.e.
F(x) + F(-x) = 1 for x  R.
Because the ranks are invariant to the shift in location, the test should also involve the
signs of observations. Let R+
Ni be the rank of |X|Ni among |X|N1, . . . , |X|NN , i = 1, . . . , N.
Choose a score-generating function 
: (0, 1)  [0, ) and the scores a
N (1), . . . , a
N (N)
generated by  in the same manner as in (1.4.5). Under the hypothesis H
(2)
0 , the observations
are independent and identically distributed with a continuous distribution function
F, symmetric about 0. Consider two statistics
S+
N,1 =
N
i=1
a
N (R+
Ni)sign Xi, S+
N,2 =
N
i=1
cia
N (R+
Ni)sign Xi, SN = (S+
N,1, S+
N,2)
15
and denote

(N)
11 = N, 
(N)
12 =
N
i=1
ci, 
(N)
22 =
N
i=1
c2
i , (N)
= 
(N)
ij
i,j=1,2
.
Under H
(2)
0 and under symmetry of F, the vector (sign X1  R+
N1, . . . , sign XN  R+
NN )
can take on N!2N
values, each with probability 1/(N!2N
), and sign Xi is independent of
R+
Ni, i = 1, . . . , N. Hence,
IE(S+
N |H
(2)
0 ) = 0,
IE(S+
N S+
N |H
(2)
0 ) = A2
N (N)
,
A2
N =
1
N
N
i=1
(a
N (i))2
.
Consider the following test criterion
W+
N = S+
N IEH
(2)
0
S+
N S+
N
-1
S+
N = S+
N -1
N SN /A2
N . (1.4.8)
Under H
(2)
0 and under symmetry of F, the distribution of W+
N does not depend on the
unknown F. However, the exact distribution of W+
N is very laborious to calculate, hence
we should again use the asymptotic approximation. The asymptotic behavior is described
in the following theorem:
Theorem 1.4.3 Assume that the sequences {aN (i), 1  i  N} and {cNi, 1  i  N}
satisfy, as N  ,
max1iN a2
N (i)
N
j=1 a2
N (j)
 0,
max1iN c2
Ni
N
j=1 c2
Nj
 0.
Denote
N,ij =
aN (i)cNj
N-1 N
k=1 a2
N (k) N
=1 c2
N
1/2
, i, j = 1, . . . , N.
Then, under H
(2)
0 and under symmetry of F, the sequence (S+
N2 - IES+
N2)/ VarS+
N2 is
asymptotically normally distributed N(0, 1) if and only if, for every  > 0,
lim
N
1
N
N
i=1
N
j=1
2
N,ijI[|N,ij| > ] = 0 (Lindeberg condition).
If we further apply Theorem 1.4.3 to cni = 1, i = 1, . . . , N, we conclude that the random
vector S+
N has asymptotically a bivariate normal distribution N2 0, A
N (N)
. This
implies that under H
(2)
0 and under symmetry of F, W+
N has asymptotically 2
distribution
with 2 degrees of freedom. Hence, the asymptotic test rejects H
(2)
0 in favor K(2)
if
W+
N  2
2,.
16
1.4.3 Example
A group of students, boys and girls, graduated in a summer language course. They passed
two tests, before and after the course. The responses in the table are differences in the
tests scores for each individual; ci = 1 for a boy and ci = -1 for a girl.
# response ci RNi R+
Ni ciRNi sign XiR+
Ni
1 5.2 1 19 19 19 19
2 -0.7 1 6 63 6 -6
3 -2.3 1 2 13 2 -13
4 3.2 1 16 15 16 15
5 -1.5 1 4 9 4 -9
6 4.7 1 18 18 18 18
7 1.8 1 14 12 14 12
8 -0.4 1 8 3 8 -3
9 0.6 1 11 5 11 5
10 6.6 1 20 20 20 20
11 -0.9 -1 5 8 -5 -8
12 1.7 -1 13 11 -13 11
13 -0.3 -1 9 2 -9 -2
14 2.4 -1 15 14 -15 146
15 4.2 -1 17 16 -17 16
16 -1.6 -1 3 10 -3 -10
17 -4.3 -1 1 17 -1 -17
18 0.8 -1 12 7 -12 7
19 -0.5 -1 7 4 -7 -4
20 -0.2 -1 10 1 -10 -1
We want to test whether the course had an effect and whether there is a difference between
the performance of boys and girls. We take the Wilcoxon scores, aN (i) = a
N (i) = i
21
, i =
1, . . . , 20 and get
SN
AN CN
= 0.9826 < 1.96 = -1
(0.95),
W+
N = 2.368 < 5.99 = 2
2(0.95).
Hence, we cannot reject either of the hypotheses.
17
1.5 Rank tests for some multiple
linear regression models
Consider the linear regression model
Yi = 0 + xi + ei, i = 1, . . . , N (1.5.1)
where 0  R1,   Rp are unknown parameters and ei, . . . , eN are independent errors,
identically distributed according to a continuous d.f. F and xi  Rp are given regressors,
i = 1, . . . , N. Denote
XN =


x1
...
xN


the regression matrix. We shall first consider the hypotheses
H
(1)
0 :  = 0 versus K(1)
:  = 0
and
H
(2)
0 : 
= (0,  ) = 0 versus K(2)
: 
= 0.
The hypotheses and tests are extensions of those for the regression line.
1.5.1 Rank tests for H
(1)
0
Let RN1, . . . , RNN be the ranks of Y1, . . . , YN and let aN (1), . . . , aN (N) be the scores
generated by a nondecreasing, square-integrable score function  : (0, 1)  R1 so that
aN (i) =  i
N+1
, i = 1, . . . , N.
Consider the linear rank statistics
SNj =
N
i=1
(xij - xNj)aN (RNi), xNj =
1
N
N
i=1
xij, j = 1, . . . , N
and the vector
SN =
N
i=1
(xi - xN )aN (RNi) = (SN1, . . . , SNp) .
The distribution function of observation Yi under H
(1)
0 is F(y - 0), i = 1, . . . , N. Hence,
(RN1, . . . , RNN ) assumes all possible permutations of (1, 2, . . . , N) with equal probability
1
N!
. Hence, the expectation and covariance matrix of SN under H
(1)
0 are
IE(SN |H
(1)
0 ) = 0 and IE(SN SN |H
(1)
0 ) = A2
N QN ,
where
A2
N =
1
N - 1
N
i=1
(aN (i) - aN )2
, QN =
N
i=1
(xi - xN )(xi - xN ) .
18
Our test for H
(1)
0 is based on the quadratic form
SN = A-2
N SN Q-1
N SN , (1.5.2)
where Q-1
N is replaced by the generalized inverse QN
if QN is singular. We reject H
(1)
0 if
SN > k where k is a suitable critical value.
Notice that SN depends only on x1, . . . , xN , on the scores aN (1), . . . , aN (N) and on
the ranks RN1, . . . , RNN . Hence, the distribution of SN and thus also that of SN under
the hypothesis H
(1)
0 does not depend on the distribution function F of the errors. For
small N, the critical value can be calculated numerically, but it would become laborious
with increasing N. Hence, again, we should use the large-sample approximation. This can
be derived under some conditions on the matrix XN , and on the scores:
Theorem 1.5.1 Assume that
(i) the matrix QN is regular for N > N0 and
max
1iN
(xi - xN ) Q-1
N (xi - xN )  0 as N  ,
(ii) the scores satisfy the Noether condition, i.e.
max
1iN
(aN (i) - aN )2
N
j=1(aN (j) - aN )2
 0 as N  ,
(iii)
lim
N
1
N
N
i=1
N
j=1
2
N,ijkI[|N,ijk| > ] = 0 for every  > 0, k = 1, . . . , p,
where
N,ijk =
(aN (i) - aN )(xjk - xk)
N-1 N
i=1(aN (i) - aN )2 N
j=1(xjk - xk)2
1/2
, k = 1, . . . , p, i, j = 1, . . . , N.
Then, under H
(1)
0 , the criterion SN in (1.5.2) has asymptotically 2
distribution with p
degrees of freedom.
Remark 1.5.1 We reject hypothesis H
(1)
0 on the significance level  if
SN > 2
p(1 - ),
where 2
p(1 - ) is the (1 - ) quantile of the 2
distribution with p degrees of freedom.
19
Sketch of the proof. It suffices to show that under H
(1)
0 the asymptotic distribution of
SN is p-dimensional normal with expectation equal to 0 and dispersion matrix A2
N QN .
Then the quadratic form SN will have asymptotically the 2
(p). To prove the asymptotic
normality of SN , we must prove that, for any vector   Rp,  = 0, the scalar product
 SN has asymptotically normal distribution N(0,  A2
N QN ). But
 SN =
N
i=1
[ (xi - xN )]aN (RNi)
and its coefficients  (xi - xN ) satisfy the Noether condition (1.4.6), because
max
1iN
[ (xi - xN )]2
N
j=1[ (xj - xN )]2
= max
1iN
 (xi - xN )(xi - xN ) 
 QN 
 max
1iN
xi - x 2
 max(Q-1
N ) = max
1iN
max{(xi - x) Q-1
(xi - x)}  0.
Moreover, we can show by some arithmetics that the entities
N,ij() =
 (xi - x)(aN (j) - aN )
N-1 N
i=1[ (xi - x)]2 N
j=1(aN (j) - aN )2
satisfy the Lindeberg condition (1.4.7). Then the asymptotic normality of the scalar product
will follow from Theorem 1.4.3 for every  = 0. 2
1.5.2 Rank tests for H
(2)
0
Consider again the model Yi = 0 +xi+ei, i = 1, . . . , N, and assume that the errors ei
have a symmetric distribution function, F(x) + F(-x) = 1 x. Let R+
N1, . . . , R+
NN be the
ranks of |Y1|, . . . , |YN |. Choose a score-generating function 
: (0, 1)  [0, ) and the
scores a
N (1), . . . , a
N (N) generated by 
. Put xi0 = 1, i = 1, . . . , N, and for j = 0, 1, . . . , p
consider the signed-rank statistics
S+
N,j =
N
i=1
xij sign Yi a
N (R+
Ni)
and the vector
S+
N = (S+
N,0, S+
N,1, . . . , S+
N,p) .
Then, under H
(2)
0 ,
IE S+
N |H
(2)
0 = 0 and IE S+
N S+
N |H
(2)
0 = A2
N Q
N ,
where A2
N = 1
N
N
i=1[a
N (i)]2
and
Q
N =
N
i=1
x
i x
i =
N
i=1
xijxij
j,j =0,1,...,p
20
and x
i = (xi0,, xi1, . . . , xip) .
The test criterion will be the quadratic form
S+
N = A-2
N S+
N (Q
N )-1
S+
N .
The distribution of S+
N (and hence of S+
N ) is generated by N!2N
equally probable realizations
of (sign Y1, . . . , sign YN ) and (R+
N1, . . . , R+
NN ).
The asymptotic distribution of S+
N under H
(2)
0 will be 2
(p + 1), provided
max
1iN
x
i (Q
N )-1
x
i  0 as N  ,
(a
N (1), . . . , a
N (N)) satisfy the Noether condition (1.4.6), and under the Lindeberg condition
(1.4.7) on some mixed terms corresponding to x
i and a
N (i), analogously as under
the regression line.
1.6 Rank estimation
in simple linear regression models
1.6.1 Estimation of the slope  of the regression line
Let Y1, . . . , YN be independent random variables, Yi have a distribution function
Fi(y) = F(y - 0 - (xi - xN )), i = 1, . . . , N
where F is continuous. We want to estimate the parameter  with the aid of ranks.
Denote
Yi(b) = Yi - (xi - xN )b, 1  i  N, b  R1.
Let TN (Y1, . . . , YN ) be a test statistics for testing H0 :  = 0 and assume that under H0
the distribution of TN is symmetric about N or that IEH0 TN = N .
If TN (Y1(b), . . . , YN (b)) is nonincreasing in b  R1, then we can define the estimate of
 as
N = 1
2
(N
+ +
N ), (1.6.1)
N
= sup{b : TN (b) > N }, +
N = inf{b : TN (b) < N }.
If TN = N
i=1(xi - xN )(Yi - YN ), then N = 0 and TN (b) is linear in b; the estimator is
the least-squares estimator 0f .
Lemma 1.6.1 Let TN = SN = N
i=1(xi - xN )aN (RNi) where aN (1)  . . .  aN (N) (not
all equal) and RNi is the rank of Yi, i = 1, . . . , N. Then SN (b) is nonincreasing in b.
Proof. See Puri and Sen (1985).
The following Lemma shows that SN is symmetrically distributed under some condi-
tions.
21
Lemma 1.6.2 Let either
xi - xN = xN - xN-i+1, i = 1, . . . , N (1.6.2)
or
ai - aN = aN - aN-i+1, i = 1, . . . , N. (1.6.3)
Then, if  = 0, the distribution of SN is symmetric about 0.
Proof. Let (1.6.2) hold. Because (RN1, . . . , RNN ) have the same distribution as
(RNN , . . . , RN1), then SN has the same distribution as SN = N
i=1(xi-xN )aN (RN,N-i+1) =
-SN .
Similarly we proceed under (1.6.2). 2
Properties of N :
1. N (Y1 + x1b, . . . , YN + xN b) = N (Y1, . . . , YN ) + b b  R1.
2. N (cY1, . . . , cYN ) = cN (Y1, . . . , YN ) c > 0.
3. IP(N < a)  IP(SN (a) < n)  IP(SN (a)  N )  IP(N  a)
Asymptotic normality of N :
Theorem 1.6.1 Assume that {xN1, . . . , xNN } satisfy the conditions
0 < lim
N
1
N
N
i=1
(xNi - xN )2
= C2
0 < , (1.6.4)
max
1iN
1
N
(xNi - xN )2
 0 as N  .
Let aN (i) = IE(UN:i) or =  i
N+1
, i = 1, . . . , N, where  is nondecreasing on (0, 1)
and
A2
 =
1
0
2
(u)du < ,
1
0
(u)du = 0.
Let F have finite Fisher's information, i.e.
A2
 =
1
0
2
(u)du, where (u) = f
(F-1
(u))
f(F-1(u))
, 0 < u < 1.
Then N1/2
(N - )

N=1
is asymptotically normally distributed
N 0,
A2

C2
0 2(, F)
, (, F) =
1
0
(u)(u)du.
22
1.6.2 Estimation in multiple regression model
Let Y1, . . . , YN be independent observations, Yi have distribution function
Fi(y) = F(y - 0 - (xi - xN ) ), xi  Rp, 1  i  N.
Consider the (vector) linear rank statistic
SN (b) =
N
i=1
(xi - xN )aN (RNi(b)) = (SN1(b), . . . , SNN (b)) ,
where RNi(b) is the rank of Yi - x b, i = 1, . . . , N, and the scores are nondecreasing.
Obviously IESN (0) = 0. Define
DN = b : SN (b) = min, b  Rp
where  is either L1 or the L2-norm. If DN is a convex set, then we can define the
center of gravity of DN as an estimator N of .
Assume that xNi satisfy the (Noether) condition
max
1iN
(xNi - xN ) Q-1
N (xNi - xN )  0 as N  ,
where QN = N
i=1(xNi - xN )(xNi - xN ) . If F has the finite Fisher's information, then
N1/2
(N - ) is asymptotically normally distributed
Np 0,
A2

2(, F)
1
N
QN
-1
.
1.7 Aligned rank tests about the intercept
1.7.1 Regression line
Let Y1, . . . , YN are independent, Yi has distribution function
Fi(y) = IP(Yi  y) = F(y - 0 - (xi - xN )), 1  i  N, y  R.
Consider the hypothesis
H0 : 0 = 0 versus K+
: 0 > 0 or K : 0 = 0
where  is treated as a nuisance parameter. If  = 0, then Y1, . . . , YN are not identically
distributed, and we cannot use their ranks. If we have an estimate N of , we can
consider the ranks of the residuals |Yi - (xi - xN )N |, i = 1, . . . , N (aligned ranks) and
an (aligned) signed rank statistics based on them. Under some conditions, such statistic
is asymptotically distribution-free, i.e. under the hypothesis H0 : 0 = 0, its asymptotic
distribution does not depend on F.
23
Let N be the rank estimate (1.6.1) based on the linear rank statistic
N
i=1
(xi - xN )aN (RNi(b)), b  R1.
Yi = Yi - (xi - xN )N , i = 1, . . . , N and the aligned signed rank statistic
SN =
N
i=1
sign Yi a
N (R+
Ni),
where R+
Ni is the rank of |Yi - (xi - xN )N |, i = 1, . . . , N. The test criterion for H0 will
be
TN =
N-1/2
SN
A
N
, (A
N )2
=
1
N
N
i=1
(a
N (i))2
.
We reject H0 in favor of K+
if TN > k+
 , and reject H0 in favor of K if |TN | > k. The
critical values k+
 and k are determined from the asymptotic normal distribution of TN .
Theorem 1.7.1 Assume that
(i) F is symmetric about 0 and has an absolutely continuous density f and finite and
positive Fisher information, 0 < I(f) = f (z)
f(z)
2
dF(z) < .
(ii) 1
N
N
i=1(xi -xN )2
 C2
, 0 < C < , and 1
N
[max1iN (xi - xN )2
]  0 as N  .
(iii) (t) is nondecreasing, (1 - t) = -(t), t  (0, 1), and
0 < A2
() =
1
0
2
(t)dt < . Put 
(u) =  u+1
2
, 0 < u < 1 and
a
N (i) = IE
(UN:i) or a
N (i) =  i
N+1
, i = 1, . . . , N.
Then, under H0 : 0 = 0, the criterion TN has asymptotically normal distribution with
mean 0 and variance 1.
Sketch of he proof. Because limN A
N = A2
() and N1/2
(N - ) = Op(1), it can
be proved (not elementary) that under H0
N-1/2
[SN - SN ()]
p
 0 as N  , (1.7.5)
where
SN () =
N
i=1
sign(Yi()) a
N (R+
Ni()),
where Yi() = Yi-(xi-xN ) and R+
Ni() is the rank of Yi() = Yi-(xi-xN ), 1  i  N.
Under H0 are Yi() = Yi - (xi - xN ) independent and identically distributed with d.f.
F symmetric about 0. It was shown earlier that
N-1/2
SN ()
d
 N(0, A2
()),
hence, regarding (1.7.5), also N-1/2
SN
d
 N(0, A2
()). 2
Remark 1.7.1 We reject H0 in favor of K+
on the asymptotic significance level , provided
TN  -1
(1 - ), and we reject H0 in favor of K provided |TN |   1 - 
2
.
24
Powers of the tests against local alternatives:
The tests are consistent in the sense that their powers tend to 1 as 0   (or |0|  ).
However, important is the power for alternatives close the the hypothesis, namely
K1N : 0 = N-1/2
,  = 0 fixed .
Such alternative is contiguous in the sense of LeCam/Hájek, and it can be shown that the
approximation (1.7.5) holds not only under the hypothesis, but also under K1N . Hence,
N-1/2
SN has the same asymptotic distribution as SN () also under K1N .
Denote  = -1
(1 - ), 0 <  < 1. The asymptotic power of the aligned rank test is
IP{TN  |K1N }  1 -   -

A
1
0
(u)f (u)du one-sided test
Comparison: Classical test of H0
The least-squares estimator of 0 is
~0N = YN =
1
N
N
i=1
Yi
and the likelihood ratio statistic is
LN =

N
YN
sN
, where
s2
N =
1
N - 2
N
i=1
[Yi - YN - (xi - xN )~N ]2
,
~N =
N
i=1(xi - xN )(Yi - YN )
N
i=1(xi - xN )2
.
If 2
= z2
dF(z) < , then
s2
N
p
 2
, YN
p
 0, ~N
p
  as N  .
Under H0 : 0 = 0, the likelihood ratio is asymptotically N(0, 1). The asymptotic relative
efficiency of the aligned signed rank test with respect to the likelihood ratio test is
2
1
0
(u)f (u)du
2
1
0
2(u)du
 2
I(f).
1.7.2 Multiple regression model
Let Y1, . . . , YN be independent with distribution functions F1, . . . , FN such that
Fi(y) = IP(Yi  y) = F(y - 0 - (xi - xN ) ), 1  i  N, y  R1,   Rp.
25
We want to test the hypothesis
H1 : 0 = 0 versus K+
1 : 0 > 0 or K1 : 0 = 0,
where  is unspecified. We may also partition  as
 =
1
2
where 1  Rp1 , 2  Rp2 , p1 + p2 = p. We want to test the hypothesis
H2 : 2 = 0 versus 2 = 0
where 0, 1 are unspecified.
Test of H1
Let N be the estimator of . Consider the residuals Yi = Yi - xi, i = 1, . . . , N and
the (aligned) ranks R+
N1, . . . , R+
NN of |Yi|, i = 1, . . . , N. Similarly as in the case of the
regression line, the test is based on the aligned sign rank statistic
SN =
N
i=1
sign(Yi) a
N (R+
Ni)
and the test criterion is
T2
N =
S2
N
NA2
N
, (A
N )2
=
1
N
N
i=1
(a
N (i))2
T2
N has asymptotically 2
distribution with 1 d.f.
References
J. Hájek and Z. Šidák (1967): Theory of Rank Tests. Academia, Prague & Academic
Press, New York.
J. Hájek, Z. Šidák and P. K. Sen (2000): Theory of Rank Tests (2nd edition). Academic
Press, New York.
M. L. Puri and P. K. Sen (1985): Nonparametric Methods in General Linear Models. J.
Wiley, New York.