STATISTICAL TESTS BASED ON
RANKS
Jana Jurečková
Charles University, Prague
1. Parametric and nonparametric models
Example 1. Model of measurement.
Let X = (X1, . . . , Xn) measurements of some
physical entity . If we admit random fluctuations,
then we consider the model
Xi =  + ei, i = 1, . . . , n.
What can we assume about the vector of errors
e1, . . . , en? We can assume that
(1) The distribution of vector (e1, . . . , en) is independent
of .
(2) Moreover, e1, . . . , en are independent.
(3) Moreover, e1, . . . , en are identically distrib-
uted.
(3) Moreover, the distribution of e1 has a density,
symmetric about 0.
(4) Moreover, the distribution of e1 is normal
N(0, 2) with unknown .
(5) Moreover,  is even known.
If we assume 1­5, then X = 1
n
n
i=1 Xi is an
efficient estimate of . But often we are not
sure of the normal distribution, and even the
assumption 3 may be unrealistic, if e.g.  is
the length of the volume.
Example 2. Comparison of two treatments.
Let X1, . . . , Xm be the blood pressure of m patients
after an application of some medicament
and Y1, . . . , Yn be the blood pressure of the control
group, which received a placebo. Let F
and G be the respective distribution functions
of X and Y.
We wish to test the hypothesis H : F  G (no
effect). But the test depends on the alternative
under consideration:
(1) F and G are absolutely continuous, otherwise
unknown, and the medicament has reduces
the blood pressure, i.e.
K1 : G(z)  F(z) z, G(z0)  F(z0)
(Y is stochastically larger than X).
(2) Moreover, K2 : G(z) = F(z - ) z with
some  > 0 (the alternative of shift in loca-
tion).
(3) F  N(1, 2
1), G  N(2, 2
2), where
1, 2, 1, 2 are unknown, K3 : 1 < 2, where
generally 1 = 2.
(4) F  N(1, 2), G  N(2, 2), where 1, 2, 
are unknown, K4 : 1 < 2.
We would use the t-test against K4; testing
H against K3 is known as the Behrens-Fisher
problem. We would use the rank tests for H
against K1 and K2.
2. Practical problems which we can solve
with the aid of rank tests or tests based
on generalized ranks:
(1) Two sample tests of equality of two treatments
effects against alternatives of shift in
location or scale.
(Wilcoxon, van der Waerden, median rank tests;
Siegel-Tukey and quartile rank tests). Some of
them tests we shall describe later in detail.
(2) Two sample tests of equality of two treatments
effects against general one sided or two
sided alternatives.
(Kolmogorov-Smirnov tests).
(3) Tests of equality of effects of several treatments
(Kruskal-Wallis rank test).
(4) Tests of equality of effects of several treatments
on observations organized in blocks (Friedman
rank test).
(5) Tests of equality of effects of several treatments
on observations categorized in contingency
tables (Kruskal-Wallis test with midranks).
(6) Tests of equality of effects of two treatments
based on paired observations (signedrank
tests: one-sample Wilcoxon, van der Waerden,
sign test).
(7) Tests of independence in bivariate population
(Spearman rank correlation coefficient,
Kendalľs tau, quadrant test).
For most of these cases, there exists
also a permutation test, based on the
order statistics.
(8) Tests of hypothesis H :  = 0 or more
generally H : A = b in the linear regression
model Y = X + e.
(9) Tests of hypothesis on some components
of  in the linear regression model, with the
other components nuisance, without a neccessity
to estimate the nuisance parameter (tests
based on so called regression rank scores).
(10) Tests on the parameters of the linear autoregressive
time series model. The nuisance
parameters are either estimated (alligned rank
tests) or the tests are based on the autoregression
rank scores. Especially, tests on the
order of the autoregression.
(11) Tests of independence of two autoregressive
time series (based on autoregression rank
scores) - often desired in practice, but there
were no reasonable tests untill recently.
3. Nonparametric hypotheses and tests
Let X = (X1, . . . , Xn) be the vector of observations.
The hypothesis and alternative H and K
are two disjoint sets of probability distributions
of X. The hypothesis is usually the homogeneous,
symmetric, independent, while the alternative
means an inhomogeneity, asymmetry,
dependence etc.
Every rule, which assigns just one of the decisions
"to accept H" or "to reject H" to any
point x = (x1, . . . , xn), is called the test (nonrandomized)
of hypothesis H against alternative
K. Such test partitions the sample space
X into two complementary parts: the critical
region (rejection region) AK and acceptance
region AH. The test rejects H if x  AK and
accepts H if x  AH.
To simplify the structure of the tests, we supplement
the family of tests by the randomized
tests. A randomized test rejects H with the
probability (x) and accepts with probability
1 - (x) while observing x, where 0  (x) 
1 x is the test function. The set of randomized
tests coincides with the set {(x) : 0 
  1} and hence it is convex.
If we make the test on the basis of observations
x, then either our decision is correct or
we can make either of the following two kinds
of errors:
(1) We reject H even if it is correct (error of
the first kind);
(2) we accept H even if it is incorrect (error
of the second kind).
If X has distribution P, then the test  rejects
H with the probability
(P) = IEP ((X)) =
X
(x)dP(x).
The probability (Q) = IEQ((X)), Q  K, is
called the power of the test  against the alternative
Q and the function (Q) : K  [0, 1]
is called the power function of the test. The
desirable test maximizes the power function
uniformly over the whole alternative K and has
the small probability (smaller than a prescribed
) of the error of the first kind for all distributions
from the hypothesis H.
The criterion of optimality for tests:
Select a small number , 0 <  < 1, called the
significance level, and among all tests satisfy-
ing
(P)   P  H
we look for the test satisfying
(Q) := max Q  K.
Such test, if it exists, is called the uniformly
most powerful test of size  , briefly the uniformly
most powerful -test of H against K.
Simple hypothesis [alternative] means that H
[K] is one-point set. (Otherwise it is called
composite). The test of a simple hypothesis
against a simple alternative is given by the fundamental
Neyman-Pearson lemma.
Neyman-Pearson Lemma. Let P and Q be
two probability distributions with densities p
and q with respect to some measure  (e.g.,
 = P + Q). Then, for testing the simple hypothesis
H : {P} against the simple alternative
K : {Q}, there exists the test  and a constant
k such that
IEP ((X)) =  (1)
and
(x) =
1 if q(x) > k.p(x)
0 if q(x) < k.p(x).
(2)
This test is the most powerful -test of H
against K.
4. Invariant tests
Let g be a 1:1 transformation X : X. We say
that the problem of testing of H against K is
invariant with respect to g, if g retains both H
and K, i.e.
X satisfies H iff gX satisfies H
X satisfies K iff gX satisfies K.
If the problem of testing H against K is invariant
with respect to the group G of transformations
of X onto X, then we naturally consider
only the invariant tests, which satisfy
(gX) = (X) x  X, g  G.
We shall then look for the most powerful invariant
-test. In some cases, there exists a
statistic T(X), called maximal invariant, such
that every invariant test is a function of T(X).
Definition. The statistic T = T(X) is
called maximal invariant with respect
to the group G of transformations, provided
T is invariant, i.e.
T(gx) = T(x) x  X, g  G
and if T(x1) = T(x2) then there exists
g  G such that x2 = gx1.
The test  is invariant with respect to G if and
only if it is a function of the maximal invariant.
Examples of maximal invariants
(1) Let G be the set of n! permutations of
x1, . . . , xn. Then the vector ordered components
of x (vector of order statistics)
T(x) = (xn:1  xn:2  ...  xn:n)
is the maximal invariant with respect to G.
(2) Let G be the set of transformations x
i =
f(xi), i = 1, . . . , n) such that f : IR1  IR1
is continuous and strictly increasing function.
Consider only the points of the sample space X
with different components. Let Ri be the rank
of xi among x1, . . . , xn, i.e. Ri = n
j=1 I[xj 
xi], i = 1, . . . , n. Then T(x) = (R1, . . . , Rn) is
the maximal invariant for G.
Actually, a continuous and increasing function
does not change the ranks of the components
of x, i.e. T is invariant to G. On the other
hand, let two different vectors x and x have
the same vector of ranks R1, . . . , Rn. Put f(xi) =
x
i, i = 1, . . . , n and let f be linear on the intervals
[xn:1, xn:2], . . . , [xn:n-1, xn:n]; define f in
the rest of the real line so that it is strictly increasing.
Such f always exists, hence T is the
maximal invariant.
5. Properties of ranks and of order
statistics
Let X = (X1, . . . , Xn) be the vector of observations;
denote Xn:1  Xn:2 . . .  Xn:n the components
of X ordered according to increasing
magnitude. The vector
X(.) = (Xn:1, . . . , Xn:n) is called the vector of
order statistics and Xn:i is called the ith order
statistic.
Assume that the components of X are different
and define the rank of Xi as Ri = n
j=1 I[Xj 
Xi]. Then the vector R of ranks of X takes
on the values in the set R of n! permutations
(r1, . . . , rn) of (1,. . . , n).
The distribution of X(.) and of R :
If X has density pn(x1, . . . , xn), then the vector
X(.) of order statistics has the distribution with
the density
p(xn:1, . . . , xn:n) =


rR p(xn:r1, . . . , xn:rn)
. . . xn:1  . . .  xn:n
0 otherwise.
We say that the random vector X satisfies the
hypothesis of randomness H0, if it has a
probability distribution with density of the form
p(x) =
n
i=1
f(xi), x  IRn
where f is an arbitrary one-dimensional density.
Otherwise speaking, X satisfies the hypothesis
of randomness provided its components are a
random sample from an absolutely continuous
distribution.
If X satisfies the hypothesis of randomness H0,
then X(.) and R are independent, the vector of
ranks R has the uniform discrete distribution
Pr(R = r) =
1
n!
, r  R (3)
and the distribution of X(.) has the density
p(xn:1, . . . , xn:n) =


n!p(xn:1, . . . , xn:n)
. . . xn:1  . . .  xn:n
0 . . . otherwise.
Marginal distributions of the random vectors
R and X(.) under H0:
(i) Pr(Ri = j) = 1
n i, j = 1, . . . , n.
(ii) Pr(Ri = k, Rj = m) = 1
n(n-1)
for 1  i, j, k, m  n, i = j, k = m.
(iii) IERi = n+1
2 , i = 1, . . . , n.
(iv) var Ri = n2-1
12 , i = 1, . . . , n.
(v) cov(Ri, Rj) = -n+1
12 , 1  i, j  n, i = j.
(vi) If X has density p(x1, . . . , xn) = n
i=1 f(xi),
then Xn:k has the distribution with density
f(n)(x)
= n
n - 1
k - 1
(F(x))k-1(1 - F(x))n-kf(x),
x  IR1
where F(x) is the distribution function of
X1, . . . , Xn.
6. Locally most powerful rank tests
We want to test a hypothesis of randomness
H0 on the distribution of X. The rank test is
characterized by test function (R). The most
powerful rank -test of H0 against a simple
alternative K : {Q} [that X has the fixed distribution
Q] follows directly from the NeymanPearson
Lemma:
(r) =


1 ...n!Q(R = r) > k
0 ...n!Q(R = r) < k
 ...n!Q(R = r) = k, r  R
where k and  are determined so that
#{r : n!Q(R = r) > k)} +
#{r : n!Q(R = r) = k} = n!, 0 <  < 1.
If we want to test against a composite alternative
and the uniformly most powerful rank
tests do not exist, then we look for a rank
test, most powerful locally in a neighborhood
of the hypothesis.
Definition. Let d(Q) be a measure of
distance of alternative Q  K from the
hypothesis H. The -test 0 is called
the locally most powerful in the class
M of -tests of H against K if, given
any other   M, there exists  > 0
such that
0
(Q)  (Q) Q satisfying 0 < d(Q) <
.
7. The structure of the locally most
powerful rank tests of H0 :
Let A be a class of densities, A = {g(x, ) :  
J } such that
J  IR1 is an open interval, J  0.
g(x, ) is absolutely continuous in  for almost
all x.
For almost all x, there exists the limit
˙g(x, 0) = lim
0
1

[g(x, ) - g(x, 0)]
and
lim
0

|˙g(x,
)|dx =

|˙g(x,
0)|dx.
Consider the alternative K = {q :  > 0},
where
q(x1, . . . , xn) =
n
i=1
g(xi, ci),
c1, . . . , cn given numbers. Then the test with
the critical region
n
i=1
cian(Ri, g)  k
is the locally most powerful rank test of H0
against K with the significance level
 = P(
n
i=1
cian(Ri, g)  k),
where P is any distribution satisfying H0,
an(i, g) = IE
˙g(Xn:i, 0)
g(Xn:i, 0)
, i = 1, . . . , n
and Xn:1, . . . , Xn:n are the order statistics corresponding
to the random sample of size n
from the population with the density g(x, 0).
8. Special cases:
I. Alternative of the shift in location:
K1 : {q :  > 0} where
q(x1, . . . , xN) =
m
i=1
f(xi)
N
i=m+1
f(xi - ),
where f is a fixed absolute continuous density
such that 
- |f(x)|dx < . Then the locally
most powerful rank -test of H0 against K
has the critical region
N
i=m+1
aN(Ri, f)  k
where k satisfies the condition
P( N
i=m+1 aN(Ri, f)  k) = , P  H0 and
aN(i, f) = IE -
f(XN:i)
f(XN:i)
, i = 1, . . . , N
and XN:1 < . . . < XN:N are the order statistics
corresponding to the sample of size N from
the distribution with the density f. The scores
may be also written as
aN(i, f) = IE(UN:i, f), i = 1, . . . , N
where (u, f) = -f(F-1(u))
f(F-1(u))
, 0 < u < 1
and UN:1, . . . , UN:N are the order statistics corresponding
to the sample of size N from the
uniform R(0, 1) distribution. The scores can
be also expressed in the form
aN(i, f) =
N
N - 1
i - 1

- f(x)Fi-1(x)(1 - F(x))N-idx.
Remark. The computation of the scores is
difficult for some densities; if there are no tables
of the scores at disposal, they are often
replaced by the approximate scores
aN(i, f) = 
i
N + 1
= (IEUN:i, f),
i = 1, . . . , N. The asymptotic critical values coincide
for both types of scores.
II. Alternative of two samples differing by scales:
K2 : {q :  > 0} where
q(x1, . . . , xN) =
m
i=1 f(xi - ) N
i=m+1 e-f xie
,
 > 0
where density f satisfies 
- |xf(x)|dx < 
and  is the nuisance parameter. Then the locally
most powerful test has the critical region
N
i=m+1
a1N(Ri, f)  k,
where k is determined by the condition
P( N
i=m+1 a1N(Ri, f)  k) = , P  H0
and the scores have the form
a1N(i, g) = IE -1 - XN:i
f(XN:i)
f(XN:i)
= IE1(UN:i, f), i = 1, . . . , N,
where 1(u, f) = -1 - F-1(u)f(F-1(u))
f(F-1(u))
,
0 < u < 1. In this case, too, we can replace the
scores by the approximate scores
a1N(i, f) = 1
i
N+1, f , i = 1, . . . , N.
III. Alternative of simple regression:
K3 = {q :  > 0} where
q(x1, . . . , xN) = N
i=1 f(xi - ci) with a fixed
absolutely continuous density f and with given
constants c1, . . . , cN, N
i=1 c2
i > 0. Then the locally
most powerful test has the critical region
N
i=1 ciaN(Ri, f)  k with the the same scores
as in I. and with k determined by the condition
P( N
i=1 ciaN(Ri, f)  k) = .
9. Selected two-sample rank tests
Denote (X1, . . . , Xm, Y1, . . . , Yn) = (Z1, . . . , ZN)
with N = m+n, where (X1, . . . , Xm) has distribution
function F and (Y1, . . . , Yn) has distribution
function G.
Consider testing H0 : F  G against the alter-
native
K1 : G(x)  F(x) x  IR1, G(x) = F(x) at
least for one x.
K1 is a one-sided alternative stating that the
random variable Y is stochastically larger than
X.
The problem of testing H0 against K1 is invariant
to the group G of transformations z
i =
g(zi), i = 1, . . . , N where g is any continuous
strictly increasing function, with the vector of
ranks R1, . . . , RN of Z1, . . . , ZN as the maximal
invariant. The class of invariant tests thus coincides
with that of rank tests.
Because both (X1, . . . , Xm) and (Y1, . . . , Yn) are
random samples, the distribution of the vector
of ranks (R1, . . . , Rm, Rm+1, . . . , Rm+n) is symmetric
in the first m and the last n arguments.
Hence, the vectors of ordered ranks
R
1 < . . . < R
m and R
m+1 < ... < R
m+n
are sufficient. Because either of these vectors
determines the other, the family of invariant
tests of H0 against K1 reduces to the tests
dependent only on the ordered ranks of one
of the samples, e.g. on the ordered ranks of
Y1, . . . , Yn.
Vector R
m+1, ..., R
N runs over
N
n
combinations.
All these combinations are equally probable
under H0 and hence the critical region of
each rank test of the size  = k/
N
n
consists
of just k points s1, . . . , sn, 1  s1 < . . . < sn  N.
The rank tests mutually differ in the points included
in the critical regions.
The above alternative K1 is still to rich and
hence there does not exist the uniformly most
powerful rank test of H0 against K1. However,
we are able to find rank tests locally most powerful
for H0 against some important subsets of
K1.
11. Two-sample tests of location
Consider the special alternative of K1, namely
that G differ from F by a shift in location, i.e.,
K2 : G(x) = F(x - ),  > 0.
If we know that F is normal, we use the twosample
t-test. Generally, the test statistic of
any rank test is a function of the ordered ranks
of the second sample. The locally most powerful
test generally has the critical region of
the form
N
i=m+1
aN(Ri)  k;
hence the test criterion really depends only on
the ordered ranks of Yi's. The scores aN(i) =
IE(UN:i) (or approximate aN(i) =  i
N+1 ),
i = 1, . . . , N, are generated by an appropriate
score function  : (0, 1)  IR1.
Three basic tests of this type the most
often used in practice:
(i) Wilcoxon / Mann-Whitney test. The
Wilcoxon test has the critical region
W =
N
i=m+1
Ri  k (4)
i.e., the test function
(x) =


1 ...W > k
0 ...W < k
 ...W = k
where k is determined so that
PH0
(W > k) + PH0
(W = k) = ,
( = 0.05,  = 0.01). This test is the locally
most powerful against K2 with F logistic with
the density
f(x) =
e-x
(1 + e-x)2
, x  IR.
For small m and n, the critical value k can
be directly determined: For each combination
s1 < . . . < sn of the numbers 1, . . . , N we calculate
n
i=1 si and order these values in the
increasing magnitude. The critical region is
formed of the MN largest sums where MN =

N
n
; if there is no integer MN satisfying
this condition, we find the largest integer MN
less than 
N
n
and randomize the combination
which leads to the (MN + 1)-st largest
value. However, this systematic way, though
precise, becomes difficult for large N, where
we should use the tables of critical values.
There exist various tables of the Wilcoxon test,
organized in various ways. Many tables provide
the critical values of the Mann-Whitney's sta-
tistic
UN =
N
i=m+1
m
j=1
I[Zi  Zj];
we can easily see that UN and WN are in oneto-one
relation WN = UN + n(n+1)
2 .
For an application of the Wilcoxon test, we can
alternatively use the dual form of the Wilcoxon
statistic: Let Z1 < . . . < ZN:N be the order
statistics and define V1, . . . , VN in the following
way:
Vi = 0 if ZN:i belongs to the 1st sample and
Vi = 1 if ZN:i belongs to the second sample.
Then WN = N
i=1 iVi.
For large m and n, where there are no tables,
we use the normal approximation of WN : If
m, n  , then, under H0, WN has asymptotically
normal distribution in the following sense:
lim
m,n
PH0
WN - IEWN

var WN
< x = (x), x  IR1,
where  is the standard normal distribution
function.
To be able to use the normal approximation,
we must know the expectation and variance of
WN under H0. The following theorem gives the
expectation and the variance of a more general
linear rank statistic, covering the
Wilcoxon as well other rank tests.
Theorem. Let the random vector R1, . . . , RN
have the discrete uniform distribution
on the set R of all permutations of
numbers 1, . . . , N, i.e. Pr(R = r) =
1
N!, r  R; let c1, . . . , cN and a1 = a(1), . . . , aN =
a(N) are arbitrary constants. Then the
expectation and variance of the linear
statistic SN = N
i=1 cia(Ri) are
IESN =
1
N
N
i=1
ci
N
j=1
aj
var SN =
1
N - 1
N
i=1
(ci-c)2
N
j=1
(aj -a)2,
where c = 1
N
N
i=1 ci and a = 1
N
N
i=1 ai.
Parameters of the Wilcoxon statistic under H0 :
IEWN =
n(N + 1)
2
, var WN =
mn(N + 1)
12
.
The distribution of WN under H0 is symmetric
around IEWN. If we test the H0 against the leftsided
alternative ( < 0, the second sample
shifted to the left with respect the first one),
we reject H0 if WN < 2IEWN - k.
(ii) van der Waerden test. Consider the approximate
scores corresponding to the score
function (u) = -1(u), 0 < u < 1, where  is
the standard normal distribution function. The
van der Waerden test is convenient for testing
H0 against K1 if the distribution function F has
approximately normal tails. In fact, the test
is asymptotically optimal for H0 against the
normal alternatives and its relative asymptotic
efficiency (Pitman efficiency) with respect to
the t-test is equal to 1 under normal F and  1
under all nonnormal F. For these good properties
the test can be recommended; for large
m, n, if we do not have the tables at disposal,
we can use the critical values of the test based
on the normal approximation N(IESN, var SN)
where in the van der Waerden case, by Theorem
4.1,
IESN = 0, var SN =
mn
N(N - 1)
N
i=1
-1 i
N + 1
2
.
Moreover, the distribution of SN under H0 is
symmetric around 0.
(iii) Median test. The median test uses the
scores generated by the score function
(u) =


0 ...0 < u < 1
2
1
2 ...u = 1
2
1 ...1
2 < u < 1.
The test statistic is equal to the number of
Y -observations situated above , increased by
1
2 for odd N.
If N is even, M = N/2 then, under H0, SN has
the hypergeometric probability distribution:
Pr(SN = k|H0) =
=


M
k
M
n - k
N
n
. . . max(0, n - M)  k  min(M, n)
0 . . . otherwise.
Hence, we can use the critical values from the
tables of the hypergeometric distribution. For
large number of observations we use the normal
approximation with the parameters
IESN = n/2, var SN =
mn
4(N - 1)
.
The median test is the most convenient for the
heavy tailed F with the density f such that
while limx f(x) = 0, this convergence is
much slower than in the case of the normal or
logistic distributions (e.g., for the Cauchy dis-
tribution).
12. Two-sample rank tests of scale
Let X1, . . . , Xm and Y1, . . . , Yn be two samples
with the respective distribution functions F(x-
 and G(y-), where  is an unknown nuisance
shift parameter. We wish to test the hypothesis
of randomness, i.e. H0 : F  G, against the
two-sample alternative of scale
K4 : G(x - ) = F
x - 

x  IR1,  > 1.
Instead of the tests optimal against some special
shapes of F with complicated form of the
scores, we shall rather describe tests with simple
scores which are really used in the practice.
The score function 1 for the scale alternatives
is U-shaped and the test statistics are of the
form
SN =
N
i=m+1
1
Ri
N + 1
.
(i) The Siegel-Tukey test. This test is based
on reordering the observations, leading to new
ranks, and to the test statistics whose distribution
under H0 is the same as that of the
Wilcoxon statistic. Let ZN:1 < ZN:2 < . . . <
ZN:N be the order statistics corresponding to
the pooled sample of N = m + n variables.
Re-order this vector in the following way:
ZN:1, ZN:N, ZN:N-1, ZN:2, ZN:3,
ZN:N-2, ZN:N-3, ZN:4, ZN:5, . . .
and denote ~Ri the new rank of Zi with respect
to the new order i = 1, . . . , N. The critical
region of the Siegel-Tukey test has the form
S
N =
N
i=m+1
~Ri  k

where k is determined so that PH0
(S
N < k
)+
PH0
(S
N = k
) = . The distribution of S
N
under H0 coincides with the distribution of the
Wilcoxon statistic, hence we can use the tables
of the Wilcoxon test.
(ii) Quartile test is based on the scoree func-
tion
1(u) =


0 ...0.25 < u < 0.75
0.5 ...u = 0.25, u = 0.75
1 ...0 < u < 0.25 and 0.75 < u < 1
and we get the test statistic
SN =
1
2
N
i=m+1
sign
Ri
N + 1
-
1
2
-
1
4
+ 1
and reject H0 for large values of SN. The value
of SN is, unless N +1 is divisible by 4, the number
of observations of the Y -sample which belong
either to the first or to the fourth quartile
of the pooled sample.
If N is divisible by 4, then SN has the hypergeometric
distribution under H0, analogously as
the median test.
13. Rank tests of H0 against general
two-sample alternatives based on the
empirical distribution functions.
Again, X1, . . . , Xm and Y1, . . . , Yn are two samples
with the respective distribution functions
F and G. We wish to test the hypotheses of
randomness H0 : F  G either against the
one-sided alternative
K+
5 : G(x)  F(x) x, F = G
or against the general alternative
K5 : F = G.
Testing against K5 is invariant to all continuous
functions and there is no reasonable maximal
invariant under this setup. In this case we
usually use the tests based on the empirical
distribution functions, which are the maximal
likelihood estimators of the theoretical distribution
functions in such nonparametric setup.
We shall describe the Kolmogorov -Smirnov
tests; another known test of this type is the
Cramér - von Mises test.
The empirical distribution function ^Fm corresponding
to the sample X1, . . . , Xm is defined
as
^Fm(x) =
1
m
m
i=1
I[Xi  x], x  IR1;
analogously is defined the empirical d.f. ^Gn for
the sample Y1, . . . , Yn. Denote
D+
mn = max
xIR1
[ ^Fm(x) - ^Gn(x)]
Dmn = max
xIR1
| ^Fm(x) - ^Gn(x)|.
The Kolmogorov-Smirnov test against K5 has
the test function (X, Y):
(X, Y) =


1 ...Dmn > C
 ...Dmn = C
0 ...Dmn < C
The statistic Dmn is the rank statistic, though
not linear. To see this, consider the order statistics
ZN:1 < . . . < ZN:N of the pooled sample
and establish the indicators V1, . . . , VN where
Vj = 0 if ZN:j comes from the X-sample and
ZN:j = 1 otherwise.
Because ^Fm and ^Gn are nondecreasing step
functions, the maximum can be attained only
in either of the points ZN:1, . . . , ZN:N; more-
over
^Fm(ZN:j) - ^Gn(ZN:j)
=
m + n
mn
j
mn
m + n
- V1 - . . . - Vj , j = 1, . . . , N
what gives the value of the test criterion
Dmn
=
m + n
mn
. max
1jN
j
mn
m + n
- V1 - . . . - Vj .
Notice that this expression depends only on
V1, . . . , VN; on the other hand, Vi = 1  one
of the ranks Rm+1, . . . , RN is equal to i, while
Vi = 0  one of the ranks R1, . . . , Rm is equal
to i. Thus V1, . . . , VN are dependent only on the
ranks, and so is also Dmn. This implies that the
distribution of Dmn under H0 is the same for all
F. This expression is also used for the calculation
of Dmn. Analogous consideration holds
for the one-sided Kolmogorov-Smirnov criterion
D+
mn which can be expressed in the form
D+
mn =
=
m + n
mn
. max
1jN
j
mn
m + n
- V1 - . . . - Vj .
For large values m, n, we can use the limit critical
values of the tests, but the asymptotic distributions
of the criteria are not normal. More
precisely, it holds
lim
m,n
PH0
mn
m + n
1/2
D+
mn  x =
= 1 - exp{-2x2}, x > 0.
14. Modification of tests in the presence
of ties
If both distribution functions F and G are continuous,
then all observations are different with
probability 1 and the ranks are well defined.
However, we round the observations to a finite
number of decimal places and thus, in fact,
we express all measurement on a countable
network. In such case, the possibility of ties
cannot be ignored and we should consider the
possible modifications of rank tests for such
situation. Let us first make several general re-
marks:
­ If the tied observations belong to the same
sample, then their mutual ordering does not
affect the value of the test criterion. Hence,
we should mainly consider the ties of observations
from different samples.
­ A small number of tied observations can be
eventually omitted but this is paid by a loss of
information.
­ Some test statistics are well defined even in
the presence of ties; the ties may only change
the probabilities of errors of the 1st and 2nd
kinds. Let us mention the Kolmogorov - Smirnov
test as an example: The definitions of the empirical
distribution function and of the test criterion
make sense even in the presence of ties.
However, if we use the tabulated critical values
of the Kolmogorov - Smirnov test in this
situation, the size of the critical region will be
less than the prescribed significance level. Actually,
we may then consider our observations
X1, . . . , Xm, Y1, . . . , Yn as the data rounded from
the continuous data
X
1, . . . , X
m, Y 
1 , . . . , Y 
n . Then the possible values
of ^Fm(x) - ^Gn(x), x  IR1 form a subset
of possible values of ^F
m(x) - ^G
n(x), x  IR1
where ^F
m and ^G
n are the empirical distribution
functions of X
i 's and Y 
j 's, respectively;
hence
max
xIR1
[ ^Fm(x) - ^Gn(x)]  max
xIR1
[ ^F
m(x) - ^G
n(x)]
and similarly for the maxima of absolute values.
We shall describe two possible modifications of
the rank tests in the presence of ties: randomization
and method of midranks.
15. Randomization
Let Z1, . . . , ZN be the pooled sample. Take
independent random variables U1, . . . , UN, uniformly
R(0, 1) distributed and independent of
Z1, . . . , ZN. Order the pairs (Z1, U1), . . . , (ZN, UN)
in the following way:
(Zi, Ui) < (Zj, Uj) 
either Zi < Zj
or Zi = Zj and Ui < Uj.
Denote R
1, . . . , R
n the ranks of the pairs
(Z1, U1), . . . , (ZN, UN). We shall say that
Z1, . . . , ZN satisfy the hypothesis H if they are
independent and identically distributed (not necessarily
with an absolutely continuous distribution).
Then, under H, the vector R
1, . . . , R
n
is uniformly distributed over the set R of permutations
of 1, . . . , N.
16. Method of midranks
The idea behind this method is that the equal
observations should have equal ranks; the joint
value of their rank is then taken as an average
of all ranks of the group. We shall mainly describe
this method on the Wilcoxon test, but
it is applicable also to other tests.
Assume that there are e different values among
N observations; among them, d1 observations
equal to the smallest value, d2 observations
equal to the second smallest value, etc., de observations
equal to the largest value, e
i=1 di =
N. The average ranks of the individual groups
are
v1 = . . . = vd1
=
1
2
(d1 + 1)
vd1+1 = . . . = vd1+d2
= d1 +
1
2
(d2 + 1)
vd1+d2+1 = . . . = vd1+d2+d3
= d1 + d2 +
1
2
(d3 + 1)
...................................................................
vd1+d2+...+de-1+1 = . . .
= d1 + d2 + . . . + de-1 +
1
2
(de + 1).
Let R
1, . . . , R
N denote the midranks of the observations
Z1, . . . , ZN. We have the modified
Wilcoxon statistic
W
N =
N
i=1
R
i.
Because the distribution of (R
1, . . . , R
N) under
H is not more uniform on R, (and the values
may not be integer), we cannot use the
standard tables of Wilcoxon critical values. If
the numbers of equal observations are small
comparing with N then we can use the normal
approximation for sufficiently large m, n.
To use this approximation, we must know the
expectation and the variance of W
N under H.
These characteristics are conditional given the
values d1, . . . , de and hence the whole test is
conditional. We have
IE(W
N|d1, . . . , de) = n
N + 1
2
= IEWN
The variance of W
N is equal
var W
n =
mn(N + 1)
12
mn
e
i=1(d3
i - di)
12N(N - 1)
.
The first term is the variance of the standard
Wilcoxon statistic, while the second term is a
correction for the ties which vanishes if there
are no ties among the observations.
17. Comparison of two treatments based
on paired observations
To exclude the effects due to the inhomogeneity
of the data, we can divide the experimental
units in n homogeneous pairs, and apply the
new treatment to one unit of the pair while
the other unit serves for the control. We can
also apply both treatments successively to the
same unit.
Let Y1, . . . , YN be the measurements of the
effects of the new treatment and X1, . . . , XN
be the control measurements. Then
(X1, Y1), . . . , (XN, YN) is a random sample from
a bivariate distribution with the distribution
function F(x, y); it is generally unknown and
assumed being continuous.
The hypothesis H1 of no effect of the new
treatment is equivalent to the statement that
the distribution function F(x, y) is symmetric
around the straight line y = x, i.e.
H1 : F(x, y) = F(y, x) x, y  IR1.
Under the alternative of a positive effect of
the new treatment, the distribution of the random
vector (X, Y ) is shifted toward the positive
halfplane y > x.
Rank tests of H1
Transform (Xi, Yi), i = 1, . . . , n in the following
way:
Zi = Yi - Xi, Wi = Xi + Yi, i = 1, . . . , n.
Under H1, the distribution of the vector
(Z1, W1), . . . , (ZN, WN) is symmetric around the
w-axis, while under the alternative it is shifted
in the direction of the positive half-axis z. The
problem is invariant with respect to the transformations
z
i = zi, w
i = g(wi), i = 1, . . . , n,
where g is a 1 : 1 function with finite number
of discontinuities. The invariant tests depend
only on (Z1, . . . , ZN), because it is the
maximal invariant. It is a sample from some
one-dimensional distribution with a continuous
distribution function D. The problem of testing
H1 is then equivalent to stating that the
distribution D is symmetric around 0,
H
1 : D(z) + D(-z) = 1 z  IR1
against the alternative that the distribution is
shifted in the direction of the positive z,
K
1 : D(z+)+D(-z+) = 1 z  IR1,  > 0
The distribution D is uniquely determined by
the triple (p, F1, F2) with
p = Pr(Z < 0), F1(z) = Pr(|Z| < z|Z < 0)
and F2(z) = Pr(Z < z|Z > 0). Equivalent expressions
for H
1 and K
1 are
H
1 : p = 1/2, F2 = F1, K
1 : p < 1/2, F2  F1.
This problem is invariant with respect to the
transformations G : z
i = g(zi), i = 1, . . . , n,
where g is continuous, odd and increasing function.
The maximal invariant is
(S1, . . . , Sm, R1, . . . , Rn), where S1, . . . , Sm are the
ranks of the absolute values of negative Z's
among |Z1|, . . . , |ZN| and R1, . . . , n are the ranks
of positive Z's among |Z1|, . . . , |ZN|. Moreover,
the vectors S
1 < . . . < S
m and R
1 <
. . . < R
n of ordered ranks are sufficient for
(S1, . . . , Sm, R1, . . . , Rn) and, further, one of them
uniguely determines the other; hence it is finally
consider only, e.g., R
1 < . . . < R
n and the
invariant tests of H1 [or of H
1] depends only
on R
1 < . . . < R
n.
Let  be the number of positive components
of (Z1, . . . , ZN). Then  is a binomial random
variable B(N, );  = 1/2 under H1 and, for
any fixed n,
PH1
(R
1 = r1, . . . , R
 = r,  = n)
PH1
(R
1 = r1, . . . , R
 = r| = n)PH1
( = n)
=
1
N
n
N
n
1
2
N
=
1
2
N
for any n-tuple
(r1, . . . , rn), 1  r1 < . . . < rn  N. The number
of such tuples is N
n=0
N
n
= 1
2
N
. The
critical region of any rank test of the size  =
1
2N contains just k such points (r1, . . . , rn).
However, there generally is no uniformly most
powerful test for H
1 against K
1. We usually
consider the alternative of shift in location that
(Z1, . . . , ZN) has the density q,  > 0 :
q(z1, . . . , zN) =
N
i=1
f(zi - ) :  > 0 (5)
where f is a one-dimensional symmetric density,
f(-x) = f(x), x  IR1.  = 0 under H1
[or H
1.] The locally most powerful rank test
of H1 has the critical region
N
i=1
a+
N (R+
i , f)sign Zi  k (6)
where R+
i is the rank of |Zi| among |Z1|, . . . , |ZN|
and the scores a+
N (i, f) have the form
a+
N (i, f) = IE+(U(i), f), i = 1, . . . , N
+(u, f) = (
u + 1
2
, f), 0 < u < 1
and where (u, f) = -f(F-1(u))
f(F-1(u))
, 0 < u < 1.
We shall describe two main tests of this type:
the one sample Wilcoxon test and the sign test.
One-sample Wilcoxon test
The one-sample Wilcoxon test is based on the
criterion
W+
N =
n
i=1
sign Zi.R+
i (7)
where R+
i is the rank of |Zi| among |Z1|, . . . , |ZN|,
or in the equivalent form
W++
N =

i=1
Ri (8)
where Ri is the rank of Zi > 0 among |Z1|, . . . , |ZN|,
 is the number of positive components. Obviously
W+
N = 2W++
N - 1
2N(N + 1).
We reject H1 if W+
N > C, i.e.if the test criterion
exceeds the critical value. For large N,
when the tables of critical values are not available,
we may use the normal approximation:
PH1


W+
N - IEW+
N
var W+
N
 x


  as N  
(9)
where
IEW+
N = 0, var W+
N =
1
6
N(N + 1)(2N + 1)
(10)
The parameters follow from the following propo-
sition:
Theorem. Let Z be a random variable with
continuous distribution function symmetric around
0, i.e. F(z) + F(-z) = 1, z  IR1. Then Z and
sign Z are independent.
The one-sample Wilcoxon test is convenient
for the densities of logistic type.
Sign test
In a more general situation, Z1, . . . , ZN are independent
random variables, Zi distributed according
to the distribution function Di, but not
all D1, . . . , DN are equal. This situation occurs
when we compare two treatments under different
experimental conditions or using different
methods.
We want to test the hypothesis of symmetry
of all distributions around 0, against the alternative
that all distributions are shifted toward
the positive values:
H
1 : Di(z)+Di(-z) = 1, z  IR1, i = 1, . . . , N
The problem is invariant with respect to all
transformations z
i = fi(zi), i = 1, . . . , N, where
fi's are continuous, increasing and odd functions.
The maximal invariant is the number
n of positive components. The invariant tests
depend only on n, and the uniformly most powerful
among them has the form
(n) =


1 ...n > C
 ...n = C
0 ...n < C
(11)
where C and  are determined by the equation
n>C
N
n
1
2
N
+ 
N
C
1
2
N
= . (12)
The criterion of the sign test is simply the
number of positive components among Z1, . . . , ZN
and its distribution under H1 is binomial b(N, 1/2).
For large N we can again use the normal ap-
proximation.
If all distribution functions D1, . . . , DN coincide,
the sign test is the locally most powerful
rank test of H1 for double-exponential D with
density d(z) = 1
2e-|z-|, z  IR1. For using
the rank test we need not to know the exact
values Xi, Yi, i = 1, . . . , N; it is sufficient to
know the signs of the differences Yi - Xi. This
is a very convenient property: we can use this
test even for the qualitative observations of
the type: "drogue A gives a better pain relief
than drogue B". As a matter of fact, we do
not have any better test under such conditions.
18. Tests of independence in bivariate
population
Let (X1, Y1), . . . , (Xn, Yn) be a random sample
from a bivariate distribution with a continuous
distribution function F(x, y). We want to test
the hypothesis of independence
H2 : F(x, y) = F1(x)F2(y) (13)
where F1 and F2 are arbitrary distribution functions.
The most natural alternative for H2 is
the positive [or negative] dependence, but it is
too wide and there is no uniformly most powerful
test. We rather consider the alternative
Xi = X0
i + Zi
Yi = Y 0
i + Zi
 > 0, i = 1, . . . , n, (14)
where X0
i , Y 0
i , Zi, i = 1, . . . , n are independent
and their distributions are independent of
i. The independence then means that  = 0.
Let R1, . . . , Rn be the ranks of X1, . . . , Xn
and let S1, . . . , Sn be the ranks of Y1, . . . , Yn,
respectively. Under the hypothesis of independence,
the vectors (R1, . . . , Rn) and (S1, . . . , Sn)
are independent and both have the uniform
distribution on the set R of permutations of
1, . . . , n. The locally most rank powerful test
of H2 against the alternative K2 in which X0
i
has the density f1 and Y 0
i the density f2, has
the critical region
n
i=1
an(Ri, f1)an(Ri, f2) > C (15)
where the scores an(i, f) are usually replaced
be approximate scores.
Two the most well-known rank tests of inde-
pendence.
Spearman test
The Spearman test is based on the correlation
coefficient of (R1, . . . , Rn) and (S1, . . . , Sn):
rS =
1
n
n
i=1 RiSi - RS
[1
n
n
i=1(Ri - R)2[1
n
n
i=1(Si - S)2]1/2
where
R = S =
n + 1
2
,
1
n
n
i=1
(Ri - R)2 =
1
n
n
i=1
(Si - S)2
=
1
n
n
i=1
i2 n
+ 1
2
2
=
n2 - 1
12
.
Then we can express the criterion in a simpler
form
rS =
12
n(n2 - 1)
n
i=1
RiSi -
3(n + 1)
n - 1
.
The test rejects H2 if rS > C, or, equivalently,
if S = n
i=1 RiSi > C
. In some tables we find
the critical values for the statistic
S =
n
i=1
(Ri - Si)2 (16)
for which rS = 1 - 6
n3-n
S. The test based on
S rejects H2 if S < C
.
For large n we use the normal approximation
with
IES =
n(n + 1)2
4
, var S =
n2(n + 1)2(n - 1)
144
.
The Spearman test is the locally most powerful
against the alternatives of the logistic type.
Quadrant test
This test is based on the criterion
Q =
1
4
n
i=1
[sign(Rin
+ 1
2
)+1][sign(Sin
+ 1
2
)+1]
and rejects H2 for large values of Q. For even
n is Q equal to the number of pairs (Xi, Yi), for
which Xi lies above the X-median and Yi lies
above the Y -median. Statistic Q then has,
under the hypothesis H2, the hypergeometric
distribution
Pr(Q = q) =
m
q
m
m - q
n
m
(17)
for q = 0, 1, . . . , m, m = n/2. For large n we
use the normal approximation with the para-
meters
IEQ = n/4, var Q =
n2
16(n - 1)
. (18)
19. Rank tests for comparison of several
treatments
One-way classification
We want to compare the effects of p treatments;
the experiment is organized in such
a way that the i-th treatment is applied on
ni subjects with the results xi1, . . . , xini
, i =
1, . . . , p,
p
i=1 ni = n. Then xi1, . . . , xini
is a random
sample from a distribution with a distribution
function Fi, i = 1, . . . , p. The hypothesis
of no difference between the treatments can be
then expressed as the hypothesis of equality of
p distribution functions, namely
H2 : F1  F2  . . .  Fp (19)
and we can consider this hypothesis either against
the general alternative
K2 : Fi(x) = Fj(x) (20)
at least for one pair i, j at least for some x =
x0,
or against a more special alternative
K
2 : Fi(x) = F(x - i), i = 1, . . . , p (21)
and i = j at least for one pair i, j.
The alternative claims that the effects of
treatments on the values of observations arelinear
and that at least two treatments differ
in their effects.
The classical test for this situation is the Ftest
of the variance analysis; this test works
well under the normality, Fi  N(+i, 2), i =
1, . . . , p. We obtain the usual model of variance
analysis
Xij =  + i + eij, j = 1, . . . , ni; i = 1, . . . , p
(22)
where eij are independent random variables with
the normal distribution N(0, 2). The hypothesis
H2 can be then reformulated as
H
2 : 1 = 2 = . . . = p = 0.
The F-test rejects the hypothesis H
2 provided
F =
n - p
p - 1
p
i=1 ni( Xi - X)2
p
i=1
ni
j=1(Xij - Xi)2
 C (23)
where
Xi =
1
ni
ni
j=1
Xij and X =
1
n
p
i=1
ni
j=1
Xij,
i = 1, . . . , p and where the critical value C
is found in the tables of F-distribution with
(p - 1, n - p) degrees of freedom.
Kruskal-Wallis rank test
Consider the vector of all observations and their
ranks
R11, . . . , R1n1
; R21, . . . , R2n2
; . . . ; Rp1, . . . , Rpnp.
Let R
i1 < . . . < R
ini
be the ordered ranks of
the i-th sample, i = 1, . . . , p. Then, under H,
it holds for any permutation {r11, . . . , rpnp} of
1, . . . , N such that ri1 < . . . < rini
, i = 1, . . . , p,
that
P R
11 = r11, . . . , R
pnp
= rpnp =
n1! . . . np!
N!
.
The Kruskal-Wallis rank test rejects H pro-
vided
KN =
12
N(N + 1)
p
i=1
ni(Ri N
+ 1
2
)2
=
12
N(N + 1)
p
i=1
niR2
i - 3(N + 1) > K
where
Ri =
1
ni
ni
j=1
Rij, i = 1, . . . , p.
It can be formally obtained from the F-test, if
we insert Ri for Xi and R = N+1
2 for X
If ni  , i = 1, . . . , p and p > 3, then KN
has asymptotically 2(p - 1) distribution under
H2. In practice we can use the 2 approximation
for p > 3 and ni > 5, i = 1, . . . , p.
In the special case p = 2, the Kruskal-Wallis
reduces to the two-sided (two-sample) Wilcoxon
test.
Modification in presence of tied observations:
Assume that there are e different values among
the components X11, . . . , Xpnp, and d1 are equal
to the smallest, . . . , de are equal to the largest
one. Let (R
11, . . . , R
pnp
) be the midranks of
X11, . . . , Xpnp. Then the modified Kruskal-Wallis
statistic has the form
K
N =
12
N(N+1)
p
i=1 ni R
i - N+1
2
2
1 - 1
N3-N
e
k=1(d3
k - dk)
.
The distribution of K
N conditioned by given
d1, . . . , de is approximately 2(p-1) under H for
large n1 . . . , np. In the special case p = 2, the
Kruskal-Wallis reduces to the two-sided (twosample)
Wilcoxon test.
Two-way classification (random blocks)
We want to compare p treatments and simultaneously
to reduce the effect of non-homoheneity
of the sample units. We divide the subjects in
n homogeneous groups, so called blocks, and
compare the effects of treatments within each
block separately. The subjects in the block are
usually assigned the treatments in a random
way. The simplest model has n independent
blocks, each containing p elements, and each
treatment is applied just once in each block.
The observations can be formally described
by the following table:
Treatm. 1 2 3 . . . p
Block
1 x11 x12 x13 . . . x1p
2 x21 x22 x23 . . . x2p
... ... ... ... ...
n xn1 xn2 xn3 . . . xnp
The observation xij is the measured effect of
the j-th treatment applied in the i-th block.
Xij are independent, Xij has a continuous distribution
function Fij, j = 1, . . . , n; i = 1, . . . , p.
We test the hypothesis that there is no significant
difference among the treaments, hence
H3 : Fi1  Fi2  . . .  Fip i = 1, . . . , n (24)
against the alternative
K3 : Fij = Fik (25)
at least for one i and at least for one pair j, k,
or against a more special alternative
K
3 : Fij(x) = Fi(x - j),
j = 1, . . . , n; i = 1, . . . , p
j = k at least for one pair j, k.
The classical test of H3 is the F-test in the
model
Xij = +i+j+Eij, j = 1, . . . , n; i = 1, . . . , p,
(26)
where Eij are independent with the normal
distribution N(0, 2),  is the main additive
effect, i is the effect if the i-th block and
j is the effect of the j-th treatment, j =
1, . . . , n; i = 1, . . . , p. The hypothesis H3 then
reduces to the form
1 = 2 = . . . = p.
critical regionThe F-test of H3 has the critical
region
F =
(n - 1)
p
j=1( Xj - X)2
p
j=1
n
i=1(Xij - Xi - Xj + X)2
> C
(27)
where C is the critical value of F-distribution
with p-1 and (p-1)(n-1) degrees of freedom.
Friedman rank test
Order the observations within each block and
denote the corresponding ranks Ri1, . . . , Rip; i =
1, . . . , n. The ranks we arrange in the following
table:
Treatm. 1 2 3 . . . p Row
Block average
1 R11 R12 R13 . . . R1p
p+1
2
2 R21 R22 R23 . . . R2p
p+1
2
... ... ... ... ... ...
n Rn1 Rn2 Rn3 . . . Rnp
p+1
2
Column R1 R2 R3 . . . IRp Average
average R = p+1
2
where Rj = 1
n
n
i=1 Rij and R = 1
np
n
i=1
p
j=1 Rij.
The Friedman test is based on the following
criterion:
Qn =
12n
p(p + 1)
p
j=1
(Rj p
+ 1
n
)2
=
12n
p(p + 1)
p
j=1
R
p
j - 3n(p + 1)
and the large value of the criterion are significant.
As n  , then the distribution of Qn is
approximately 2 with p-1 degrees of freedom.
In case p = 2, the Friedman test is reduced to
the two-sided sign test. The Friedman test is
applicable even in the situation that we observe
only the ranks rather than exact values of the
treatment effects.
PERMUTATION TESTS
Permutation tests are conditional tests under
given vector of order stagtistics. We shall illustrate
them on the test of hypothesis of randomness
against a two-sample alternative.
Let X1, . . . , Xm and Y1, . . . , Yn be two independent
samples with the distribution functions F
and G. We want to test
H0 : F  G against K : G(x) = F(x-),  > 0.
The distribution function F is unknown, but
we expect that it is normal. On the other
hand, we wish to have a test good under the
normality, but at least unbiased for all F with a
continuous density. Such are the permutation
tests.
For simplicity, denote
(X1, . . . , Xm, Y1, . . . , Yn) = (Z1, . . . , ZN),
N = m + n and let
Z(1)  Z(2)  . . .  Z(N) be the corresponding
order statistics. The permutation test is based
only on Z(1), Z(2), . . . , Z(N) and it should satisfy
1
N! rR
(Z(r1), . . . , Z(rN))) = , (28)
where R is the set of N! permutations of 1, 2, . . . , N.
The test is conditional, under given vector of
order statistics Z(1), Z(2), . . . , Z(N), variable are
only the permutations (r1, . . . , rN).
Generally, the test rejecting the alternative
that (Z1, . . . , ZN) has density q(z1, . . . , zN) has
the form
(zr1, . . . , zrN ) =  r1, . . . , rN|Z()
=


1 . . . q(zr1, . . . , zrN ) > C(z())
 . . . q(zr1, . . . , zrN ) = C(z())
0 . . . q(zr1, . . . , zrN ) < C(z())
where C(z()) is determined so that (28) is sat-
isfied.
It means that we reject H0 for k permutations
r1, . . . , rN of z(1), . . . , z(N) leading to the largest
values of q(z(r1), . . . , z(rN)), where k + = N!.
Special case: Two normal samples differing
by a shift in location
Here
q(z1, . . . , zN) = f(x1) . . . f(xm)f(y1-) . . . f(yn-)
where f is the density of N(, 2), i.e.
q(z1, . . . , zN)
1
(

2)N
exp


-
1
22


m
i=1
(xi - )2
+
n
j=1
(yj -  - )2


and this is large iff n
j=1 yj is large.
Hence, the test rejects if
n
j=1
yj =
N
i=m+1
zi > C1(z()).
The vectors zm+1, . . . , zN run over
N
n
combinations
of z1, . . . , zN. We reject the hypothesis
for k largest values of N
i=m+1 zi, where
k +  = 
N
n
. (29)
Practical procedure:
(i) Observe (x1, . . . , xm, y1, . . . , yn) = (z1, . . . , zN).
(ii) Determine integer k and fraction   0 satisfying
(29).
(iii) Calculate the values n
j=1 zj
for all combinations
1, . . . , n. Find the
N
n
- k + 1 -st largest sum, say a.
(iv) Reject H0, if n
j=1 yj > a and reject with
probability  if n
j=1 yj = a.
References:
H. B¨unning and G. Trenkler (1978). Nichparametrische
statistische Methoden (mit Aufgaben
und L¨osungen und einem Tabellenanhang).
W. de Gruyter, Berlin.
J.Hájek and Z.Šidák (1967). Theory of Rank
Tests. Academia, Prague.
J. Jurečková (1981). Rank Tests (in Czech).
Lecture Notes, Charles University in Prague.
J. P. Lecoutre and P. Tassi (1987). Statistique
non parametrique et robustesse. Economica,
Paris.
E. L. Lehmann (1975). Nonparametrics: Statistical
Methods Based on Ranks. Holden-Day,
San Francisco.
E. L. Lehmann (1986). Testing Statistical Hypotheses.
Second Edition. J. Wiley.