STATISTICAL TESTS BASED ON RANKS Jana Jurečková Charles University, Prague 1. Parametric and nonparametric models Example 1. Model of measurement. Let X = (X1, . . . , Xn) measurements of some physical entity . If we admit random fluctuations, then we consider the model Xi = + ei, i = 1, . . . , n. What can we assume about the vector of errors e1, . . . , en? We can assume that (1) The distribution of vector (e1, . . . , en) is independent of . (2) Moreover, e1, . . . , en are independent. (3) Moreover, e1, . . . , en are identically distrib- uted. (3) Moreover, the distribution of e1 has a density, symmetric about 0. (4) Moreover, the distribution of e1 is normal N(0, 2) with unknown . (5) Moreover, is even known. If we assume 1­5, then X = 1 n n i=1 Xi is an efficient estimate of . But often we are not sure of the normal distribution, and even the assumption 3 may be unrealistic, if e.g. is the length of the volume. Example 2. Comparison of two treatments. Let X1, . . . , Xm be the blood pressure of m patients after an application of some medicament and Y1, . . . , Yn be the blood pressure of the control group, which received a placebo. Let F and G be the respective distribution functions of X and Y. We wish to test the hypothesis H : F G (no effect). But the test depends on the alternative under consideration: (1) F and G are absolutely continuous, otherwise unknown, and the medicament has reduces the blood pressure, i.e. K1 : G(z) F(z) z, G(z0) F(z0) (Y is stochastically larger than X). (2) Moreover, K2 : G(z) = F(z - ) z with some > 0 (the alternative of shift in loca- tion). (3) F N(1, 2 1), G N(2, 2 2), where 1, 2, 1, 2 are unknown, K3 : 1 < 2, where generally 1 = 2. (4) F N(1, 2), G N(2, 2), where 1, 2, are unknown, K4 : 1 < 2. We would use the t-test against K4; testing H against K3 is known as the Behrens-Fisher problem. We would use the rank tests for H against K1 and K2. 2. Practical problems which we can solve with the aid of rank tests or tests based on generalized ranks: (1) Two sample tests of equality of two treatments effects against alternatives of shift in location or scale. (Wilcoxon, van der Waerden, median rank tests; Siegel-Tukey and quartile rank tests). Some of them tests we shall describe later in detail. (2) Two sample tests of equality of two treatments effects against general one sided or two sided alternatives. (Kolmogorov-Smirnov tests). (3) Tests of equality of effects of several treatments (Kruskal-Wallis rank test). (4) Tests of equality of effects of several treatments on observations organized in blocks (Friedman rank test). (5) Tests of equality of effects of several treatments on observations categorized in contingency tables (Kruskal-Wallis test with midranks). (6) Tests of equality of effects of two treatments based on paired observations (signedrank tests: one-sample Wilcoxon, van der Waerden, sign test). (7) Tests of independence in bivariate population (Spearman rank correlation coefficient, Kendalľs tau, quadrant test). For most of these cases, there exists also a permutation test, based on the order statistics. (8) Tests of hypothesis H : = 0 or more generally H : A = b in the linear regression model Y = X + e. (9) Tests of hypothesis on some components of in the linear regression model, with the other components nuisance, without a neccessity to estimate the nuisance parameter (tests based on so called regression rank scores). (10) Tests on the parameters of the linear autoregressive time series model. The nuisance parameters are either estimated (alligned rank tests) or the tests are based on the autoregression rank scores. Especially, tests on the order of the autoregression. (11) Tests of independence of two autoregressive time series (based on autoregression rank scores) - often desired in practice, but there were no reasonable tests untill recently. 3. Nonparametric hypotheses and tests Let X = (X1, . . . , Xn) be the vector of observations. The hypothesis and alternative H and K are two disjoint sets of probability distributions of X. The hypothesis is usually the homogeneous, symmetric, independent, while the alternative means an inhomogeneity, asymmetry, dependence etc. Every rule, which assigns just one of the decisions "to accept H" or "to reject H" to any point x = (x1, . . . , xn), is called the test (nonrandomized) of hypothesis H against alternative K. Such test partitions the sample space X into two complementary parts: the critical region (rejection region) AK and acceptance region AH. The test rejects H if x AK and accepts H if x AH. To simplify the structure of the tests, we supplement the family of tests by the randomized tests. A randomized test rejects H with the probability (x) and accepts with probability 1 - (x) while observing x, where 0 (x) 1 x is the test function. The set of randomized tests coincides with the set {(x) : 0 1} and hence it is convex. If we make the test on the basis of observations x, then either our decision is correct or we can make either of the following two kinds of errors: (1) We reject H even if it is correct (error of the first kind); (2) we accept H even if it is incorrect (error of the second kind). If X has distribution P, then the test rejects H with the probability (P) = IEP ((X)) = X (x)dP(x). The probability (Q) = IEQ((X)), Q K, is called the power of the test against the alternative Q and the function (Q) : K [0, 1] is called the power function of the test. The desirable test maximizes the power function uniformly over the whole alternative K and has the small probability (smaller than a prescribed ) of the error of the first kind for all distributions from the hypothesis H. The criterion of optimality for tests: Select a small number , 0 < < 1, called the significance level, and among all tests satisfy- ing (P) P H we look for the test satisfying (Q) := max Q K. Such test, if it exists, is called the uniformly most powerful test of size , briefly the uniformly most powerful -test of H against K. Simple hypothesis [alternative] means that H [K] is one-point set. (Otherwise it is called composite). The test of a simple hypothesis against a simple alternative is given by the fundamental Neyman-Pearson lemma. Neyman-Pearson Lemma. Let P and Q be two probability distributions with densities p and q with respect to some measure (e.g., = P + Q). Then, for testing the simple hypothesis H : {P} against the simple alternative K : {Q}, there exists the test and a constant k such that IEP ((X)) = (1) and (x) = 1 if q(x) > k.p(x) 0 if q(x) < k.p(x). (2) This test is the most powerful -test of H against K. 4. Invariant tests Let g be a 1:1 transformation X : X. We say that the problem of testing of H against K is invariant with respect to g, if g retains both H and K, i.e. X satisfies H iff gX satisfies H X satisfies K iff gX satisfies K. If the problem of testing H against K is invariant with respect to the group G of transformations of X onto X, then we naturally consider only the invariant tests, which satisfy (gX) = (X) x X, g G. We shall then look for the most powerful invariant -test. In some cases, there exists a statistic T(X), called maximal invariant, such that every invariant test is a function of T(X). Definition. The statistic T = T(X) is called maximal invariant with respect to the group G of transformations, provided T is invariant, i.e. T(gx) = T(x) x X, g G and if T(x1) = T(x2) then there exists g G such that x2 = gx1. The test is invariant with respect to G if and only if it is a function of the maximal invariant. Examples of maximal invariants (1) Let G be the set of n! permutations of x1, . . . , xn. Then the vector ordered components of x (vector of order statistics) T(x) = (xn:1 xn:2 ... xn:n) is the maximal invariant with respect to G. (2) Let G be the set of transformations x i = f(xi), i = 1, . . . , n) such that f : IR1 IR1 is continuous and strictly increasing function. Consider only the points of the sample space X with different components. Let Ri be the rank of xi among x1, . . . , xn, i.e. Ri = n j=1 I[xj xi], i = 1, . . . , n. Then T(x) = (R1, . . . , Rn) is the maximal invariant for G. Actually, a continuous and increasing function does not change the ranks of the components of x, i.e. T is invariant to G. On the other hand, let two different vectors x and x have the same vector of ranks R1, . . . , Rn. Put f(xi) = x i, i = 1, . . . , n and let f be linear on the intervals [xn:1, xn:2], . . . , [xn:n-1, xn:n]; define f in the rest of the real line so that it is strictly increasing. Such f always exists, hence T is the maximal invariant. 5. Properties of ranks and of order statistics Let X = (X1, . . . , Xn) be the vector of observations; denote Xn:1 Xn:2 . . . Xn:n the components of X ordered according to increasing magnitude. The vector X(.) = (Xn:1, . . . , Xn:n) is called the vector of order statistics and Xn:i is called the ith order statistic. Assume that the components of X are different and define the rank of Xi as Ri = n j=1 I[Xj Xi]. Then the vector R of ranks of X takes on the values in the set R of n! permutations (r1, . . . , rn) of (1,. . . , n). The distribution of X(.) and of R : If X has density pn(x1, . . . , xn), then the vector X(.) of order statistics has the distribution with the density p(xn:1, . . . , xn:n) = rR p(xn:r1, . . . , xn:rn) . . . xn:1 . . . xn:n 0 otherwise. We say that the random vector X satisfies the hypothesis of randomness H0, if it has a probability distribution with density of the form p(x) = n i=1 f(xi), x IRn where f is an arbitrary one-dimensional density. Otherwise speaking, X satisfies the hypothesis of randomness provided its components are a random sample from an absolutely continuous distribution. If X satisfies the hypothesis of randomness H0, then X(.) and R are independent, the vector of ranks R has the uniform discrete distribution Pr(R = r) = 1 n! , r R (3) and the distribution of X(.) has the density p(xn:1, . . . , xn:n) = n!p(xn:1, . . . , xn:n) . . . xn:1 . . . xn:n 0 . . . otherwise. Marginal distributions of the random vectors R and X(.) under H0: (i) Pr(Ri = j) = 1 n i, j = 1, . . . , n. (ii) Pr(Ri = k, Rj = m) = 1 n(n-1) for 1 i, j, k, m n, i = j, k = m. (iii) IERi = n+1 2 , i = 1, . . . , n. (iv) var Ri = n2-1 12 , i = 1, . . . , n. (v) cov(Ri, Rj) = -n+1 12 , 1 i, j n, i = j. (vi) If X has density p(x1, . . . , xn) = n i=1 f(xi), then Xn:k has the distribution with density f(n)(x) = n n - 1 k - 1 (F(x))k-1(1 - F(x))n-kf(x), x IR1 where F(x) is the distribution function of X1, . . . , Xn. 6. Locally most powerful rank tests We want to test a hypothesis of randomness H0 on the distribution of X. The rank test is characterized by test function (R). The most powerful rank -test of H0 against a simple alternative K : {Q} [that X has the fixed distribution Q] follows directly from the NeymanPearson Lemma: (r) = 1 ...n!Q(R = r) > k 0 ...n!Q(R = r) < k ...n!Q(R = r) = k, r R where k and are determined so that #{r : n!Q(R = r) > k)} + #{r : n!Q(R = r) = k} = n!, 0 < < 1. If we want to test against a composite alternative and the uniformly most powerful rank tests do not exist, then we look for a rank test, most powerful locally in a neighborhood of the hypothesis. Definition. Let d(Q) be a measure of distance of alternative Q K from the hypothesis H. The -test 0 is called the locally most powerful in the class M of -tests of H against K if, given any other M, there exists > 0 such that 0 (Q) (Q) Q satisfying 0 < d(Q) < . 7. The structure of the locally most powerful rank tests of H0 : Let A be a class of densities, A = {g(x, ) : J } such that J IR1 is an open interval, J 0. g(x, ) is absolutely continuous in for almost all x. For almost all x, there exists the limit ˙g(x, 0) = lim 0 1 [g(x, ) - g(x, 0)] and lim 0 |˙g(x, )|dx = |˙g(x, 0)|dx. Consider the alternative K = {q : > 0}, where q(x1, . . . , xn) = n i=1 g(xi, ci), c1, . . . , cn given numbers. Then the test with the critical region n i=1 cian(Ri, g) k is the locally most powerful rank test of H0 against K with the significance level = P( n i=1 cian(Ri, g) k), where P is any distribution satisfying H0, an(i, g) = IE ˙g(Xn:i, 0) g(Xn:i, 0) , i = 1, . . . , n and Xn:1, . . . , Xn:n are the order statistics corresponding to the random sample of size n from the population with the density g(x, 0). 8. Special cases: I. Alternative of the shift in location: K1 : {q : > 0} where q(x1, . . . , xN) = m i=1 f(xi) N i=m+1 f(xi - ), where f is a fixed absolute continuous density such that - |f(x)|dx < . Then the locally most powerful rank -test of H0 against K has the critical region N i=m+1 aN(Ri, f) k where k satisfies the condition P( N i=m+1 aN(Ri, f) k) = , P H0 and aN(i, f) = IE - f(XN:i) f(XN:i) , i = 1, . . . , N and XN:1 < . . . < XN:N are the order statistics corresponding to the sample of size N from the distribution with the density f. The scores may be also written as aN(i, f) = IE(UN:i, f), i = 1, . . . , N where (u, f) = -f(F-1(u)) f(F-1(u)) , 0 < u < 1 and UN:1, . . . , UN:N are the order statistics corresponding to the sample of size N from the uniform R(0, 1) distribution. The scores can be also expressed in the form aN(i, f) = N N - 1 i - 1 - f(x)Fi-1(x)(1 - F(x))N-idx. Remark. The computation of the scores is difficult for some densities; if there are no tables of the scores at disposal, they are often replaced by the approximate scores aN(i, f) = i N + 1 = (IEUN:i, f), i = 1, . . . , N. The asymptotic critical values coincide for both types of scores. II. Alternative of two samples differing by scales: K2 : {q : > 0} where q(x1, . . . , xN) = m i=1 f(xi - ) N i=m+1 e-f xie , > 0 where density f satisfies - |xf(x)|dx < and is the nuisance parameter. Then the locally most powerful test has the critical region N i=m+1 a1N(Ri, f) k, where k is determined by the condition P( N i=m+1 a1N(Ri, f) k) = , P H0 and the scores have the form a1N(i, g) = IE -1 - XN:i f(XN:i) f(XN:i) = IE1(UN:i, f), i = 1, . . . , N, where 1(u, f) = -1 - F-1(u)f(F-1(u)) f(F-1(u)) , 0 < u < 1. In this case, too, we can replace the scores by the approximate scores a1N(i, f) = 1 i N+1, f , i = 1, . . . , N. III. Alternative of simple regression: K3 = {q : > 0} where q(x1, . . . , xN) = N i=1 f(xi - ci) with a fixed absolutely continuous density f and with given constants c1, . . . , cN, N i=1 c2 i > 0. Then the locally most powerful test has the critical region N i=1 ciaN(Ri, f) k with the the same scores as in I. and with k determined by the condition P( N i=1 ciaN(Ri, f) k) = . 9. Selected two-sample rank tests Denote (X1, . . . , Xm, Y1, . . . , Yn) = (Z1, . . . , ZN) with N = m+n, where (X1, . . . , Xm) has distribution function F and (Y1, . . . , Yn) has distribution function G. Consider testing H0 : F G against the alter- native K1 : G(x) F(x) x IR1, G(x) = F(x) at least for one x. K1 is a one-sided alternative stating that the random variable Y is stochastically larger than X. The problem of testing H0 against K1 is invariant to the group G of transformations z i = g(zi), i = 1, . . . , N where g is any continuous strictly increasing function, with the vector of ranks R1, . . . , RN of Z1, . . . , ZN as the maximal invariant. The class of invariant tests thus coincides with that of rank tests. Because both (X1, . . . , Xm) and (Y1, . . . , Yn) are random samples, the distribution of the vector of ranks (R1, . . . , Rm, Rm+1, . . . , Rm+n) is symmetric in the first m and the last n arguments. Hence, the vectors of ordered ranks R 1 < . . . < R m and R m+1 < ... < R m+n are sufficient. Because either of these vectors determines the other, the family of invariant tests of H0 against K1 reduces to the tests dependent only on the ordered ranks of one of the samples, e.g. on the ordered ranks of Y1, . . . , Yn. Vector R m+1, ..., R N runs over N n combinations. All these combinations are equally probable under H0 and hence the critical region of each rank test of the size = k/ N n consists of just k points s1, . . . , sn, 1 s1 < . . . < sn N. The rank tests mutually differ in the points included in the critical regions. The above alternative K1 is still to rich and hence there does not exist the uniformly most powerful rank test of H0 against K1. However, we are able to find rank tests locally most powerful for H0 against some important subsets of K1. 11. Two-sample tests of location Consider the special alternative of K1, namely that G differ from F by a shift in location, i.e., K2 : G(x) = F(x - ), > 0. If we know that F is normal, we use the twosample t-test. Generally, the test statistic of any rank test is a function of the ordered ranks of the second sample. The locally most powerful test generally has the critical region of the form N i=m+1 aN(Ri) k; hence the test criterion really depends only on the ordered ranks of Yi's. The scores aN(i) = IE(UN:i) (or approximate aN(i) = i N+1 ), i = 1, . . . , N, are generated by an appropriate score function : (0, 1) IR1. Three basic tests of this type the most often used in practice: (i) Wilcoxon / Mann-Whitney test. The Wilcoxon test has the critical region W = N i=m+1 Ri k (4) i.e., the test function (x) = 1 ...W > k 0 ...W < k ...W = k where k is determined so that PH0 (W > k) + PH0 (W = k) = , ( = 0.05, = 0.01). This test is the locally most powerful against K2 with F logistic with the density f(x) = e-x (1 + e-x)2 , x IR. For small m and n, the critical value k can be directly determined: For each combination s1 < . . . < sn of the numbers 1, . . . , N we calculate n i=1 si and order these values in the increasing magnitude. The critical region is formed of the MN largest sums where MN = N n ; if there is no integer MN satisfying this condition, we find the largest integer MN less than N n and randomize the combination which leads to the (MN + 1)-st largest value. However, this systematic way, though precise, becomes difficult for large N, where we should use the tables of critical values. There exist various tables of the Wilcoxon test, organized in various ways. Many tables provide the critical values of the Mann-Whitney's sta- tistic UN = N i=m+1 m j=1 I[Zi Zj]; we can easily see that UN and WN are in oneto-one relation WN = UN + n(n+1) 2 . For an application of the Wilcoxon test, we can alternatively use the dual form of the Wilcoxon statistic: Let Z1 < . . . < ZN:N be the order statistics and define V1, . . . , VN in the following way: Vi = 0 if ZN:i belongs to the 1st sample and Vi = 1 if ZN:i belongs to the second sample. Then WN = N i=1 iVi. For large m and n, where there are no tables, we use the normal approximation of WN : If m, n , then, under H0, WN has asymptotically normal distribution in the following sense: lim m,n PH0 WN - IEWN var WN < x = (x), x IR1, where is the standard normal distribution function. To be able to use the normal approximation, we must know the expectation and variance of WN under H0. The following theorem gives the expectation and the variance of a more general linear rank statistic, covering the Wilcoxon as well other rank tests. Theorem. Let the random vector R1, . . . , RN have the discrete uniform distribution on the set R of all permutations of numbers 1, . . . , N, i.e. Pr(R = r) = 1 N!, r R; let c1, . . . , cN and a1 = a(1), . . . , aN = a(N) are arbitrary constants. Then the expectation and variance of the linear statistic SN = N i=1 cia(Ri) are IESN = 1 N N i=1 ci N j=1 aj var SN = 1 N - 1 N i=1 (ci-c)2 N j=1 (aj -a)2, where c = 1 N N i=1 ci and a = 1 N N i=1 ai. Parameters of the Wilcoxon statistic under H0 : IEWN = n(N + 1) 2 , var WN = mn(N + 1) 12 . The distribution of WN under H0 is symmetric around IEWN. If we test the H0 against the leftsided alternative ( < 0, the second sample shifted to the left with respect the first one), we reject H0 if WN < 2IEWN - k. (ii) van der Waerden test. Consider the approximate scores corresponding to the score function (u) = -1(u), 0 < u < 1, where is the standard normal distribution function. The van der Waerden test is convenient for testing H0 against K1 if the distribution function F has approximately normal tails. In fact, the test is asymptotically optimal for H0 against the normal alternatives and its relative asymptotic efficiency (Pitman efficiency) with respect to the t-test is equal to 1 under normal F and 1 under all nonnormal F. For these good properties the test can be recommended; for large m, n, if we do not have the tables at disposal, we can use the critical values of the test based on the normal approximation N(IESN, var SN) where in the van der Waerden case, by Theorem 4.1, IESN = 0, var SN = mn N(N - 1) N i=1 -1 i N + 1 2 . Moreover, the distribution of SN under H0 is symmetric around 0. (iii) Median test. The median test uses the scores generated by the score function (u) = 0 ...0 < u < 1 2 1 2 ...u = 1 2 1 ...1 2 < u < 1. The test statistic is equal to the number of Y -observations situated above , increased by 1 2 for odd N. If N is even, M = N/2 then, under H0, SN has the hypergeometric probability distribution: Pr(SN = k|H0) = = M k M n - k N n . . . max(0, n - M) k min(M, n) 0 . . . otherwise. Hence, we can use the critical values from the tables of the hypergeometric distribution. For large number of observations we use the normal approximation with the parameters IESN = n/2, var SN = mn 4(N - 1) . The median test is the most convenient for the heavy tailed F with the density f such that while limx f(x) = 0, this convergence is much slower than in the case of the normal or logistic distributions (e.g., for the Cauchy dis- tribution). 12. Two-sample rank tests of scale Let X1, . . . , Xm and Y1, . . . , Yn be two samples with the respective distribution functions F(x- and G(y-), where is an unknown nuisance shift parameter. We wish to test the hypothesis of randomness, i.e. H0 : F G, against the two-sample alternative of scale K4 : G(x - ) = F x - x IR1, > 1. Instead of the tests optimal against some special shapes of F with complicated form of the scores, we shall rather describe tests with simple scores which are really used in the practice. The score function 1 for the scale alternatives is U-shaped and the test statistics are of the form SN = N i=m+1 1 Ri N + 1 . (i) The Siegel-Tukey test. This test is based on reordering the observations, leading to new ranks, and to the test statistics whose distribution under H0 is the same as that of the Wilcoxon statistic. Let ZN:1 < ZN:2 < . . . < ZN:N be the order statistics corresponding to the pooled sample of N = m + n variables. Re-order this vector in the following way: ZN:1, ZN:N, ZN:N-1, ZN:2, ZN:3, ZN:N-2, ZN:N-3, ZN:4, ZN:5, . . . and denote ~Ri the new rank of Zi with respect to the new order i = 1, . . . , N. The critical region of the Siegel-Tukey test has the form S N = N i=m+1 ~Ri k where k is determined so that PH0 (S N < k )+ PH0 (S N = k ) = . The distribution of S N under H0 coincides with the distribution of the Wilcoxon statistic, hence we can use the tables of the Wilcoxon test. (ii) Quartile test is based on the scoree func- tion 1(u) = 0 ...0.25 < u < 0.75 0.5 ...u = 0.25, u = 0.75 1 ...0 < u < 0.25 and 0.75 < u < 1 and we get the test statistic SN = 1 2 N i=m+1 sign Ri N + 1 - 1 2 - 1 4 + 1 and reject H0 for large values of SN. The value of SN is, unless N +1 is divisible by 4, the number of observations of the Y -sample which belong either to the first or to the fourth quartile of the pooled sample. If N is divisible by 4, then SN has the hypergeometric distribution under H0, analogously as the median test. 13. Rank tests of H0 against general two-sample alternatives based on the empirical distribution functions. Again, X1, . . . , Xm and Y1, . . . , Yn are two samples with the respective distribution functions F and G. We wish to test the hypotheses of randomness H0 : F G either against the one-sided alternative K+ 5 : G(x) F(x) x, F = G or against the general alternative K5 : F = G. Testing against K5 is invariant to all continuous functions and there is no reasonable maximal invariant under this setup. In this case we usually use the tests based on the empirical distribution functions, which are the maximal likelihood estimators of the theoretical distribution functions in such nonparametric setup. We shall describe the Kolmogorov -Smirnov tests; another known test of this type is the Cramér - von Mises test. The empirical distribution function ^Fm corresponding to the sample X1, . . . , Xm is defined as ^Fm(x) = 1 m m i=1 I[Xi x], x IR1; analogously is defined the empirical d.f. ^Gn for the sample Y1, . . . , Yn. Denote D+ mn = max xIR1 [ ^Fm(x) - ^Gn(x)] Dmn = max xIR1 | ^Fm(x) - ^Gn(x)|. The Kolmogorov-Smirnov test against K5 has the test function (X, Y): (X, Y) = 1 ...Dmn > C ...Dmn = C 0 ...Dmn < C The statistic Dmn is the rank statistic, though not linear. To see this, consider the order statistics ZN:1 < . . . < ZN:N of the pooled sample and establish the indicators V1, . . . , VN where Vj = 0 if ZN:j comes from the X-sample and ZN:j = 1 otherwise. Because ^Fm and ^Gn are nondecreasing step functions, the maximum can be attained only in either of the points ZN:1, . . . , ZN:N; more- over ^Fm(ZN:j) - ^Gn(ZN:j) = m + n mn j mn m + n - V1 - . . . - Vj , j = 1, . . . , N what gives the value of the test criterion Dmn = m + n mn . max 1jN j mn m + n - V1 - . . . - Vj . Notice that this expression depends only on V1, . . . , VN; on the other hand, Vi = 1 one of the ranks Rm+1, . . . , RN is equal to i, while Vi = 0 one of the ranks R1, . . . , Rm is equal to i. Thus V1, . . . , VN are dependent only on the ranks, and so is also Dmn. This implies that the distribution of Dmn under H0 is the same for all F. This expression is also used for the calculation of Dmn. Analogous consideration holds for the one-sided Kolmogorov-Smirnov criterion D+ mn which can be expressed in the form D+ mn = = m + n mn . max 1jN j mn m + n - V1 - . . . - Vj . For large values m, n, we can use the limit critical values of the tests, but the asymptotic distributions of the criteria are not normal. More precisely, it holds lim m,n PH0 mn m + n 1/2 D+ mn x = = 1 - exp{-2x2}, x > 0. 14. Modification of tests in the presence of ties If both distribution functions F and G are continuous, then all observations are different with probability 1 and the ranks are well defined. However, we round the observations to a finite number of decimal places and thus, in fact, we express all measurement on a countable network. In such case, the possibility of ties cannot be ignored and we should consider the possible modifications of rank tests for such situation. Let us first make several general re- marks: ­ If the tied observations belong to the same sample, then their mutual ordering does not affect the value of the test criterion. Hence, we should mainly consider the ties of observations from different samples. ­ A small number of tied observations can be eventually omitted but this is paid by a loss of information. ­ Some test statistics are well defined even in the presence of ties; the ties may only change the probabilities of errors of the 1st and 2nd kinds. Let us mention the Kolmogorov - Smirnov test as an example: The definitions of the empirical distribution function and of the test criterion make sense even in the presence of ties. However, if we use the tabulated critical values of the Kolmogorov - Smirnov test in this situation, the size of the critical region will be less than the prescribed significance level. Actually, we may then consider our observations X1, . . . , Xm, Y1, . . . , Yn as the data rounded from the continuous data X 1, . . . , X m, Y 1 , . . . , Y n . Then the possible values of ^Fm(x) - ^Gn(x), x IR1 form a subset of possible values of ^F m(x) - ^G n(x), x IR1 where ^F m and ^G n are the empirical distribution functions of X i 's and Y j 's, respectively; hence max xIR1 [ ^Fm(x) - ^Gn(x)] max xIR1 [ ^F m(x) - ^G n(x)] and similarly for the maxima of absolute values. We shall describe two possible modifications of the rank tests in the presence of ties: randomization and method of midranks. 15. Randomization Let Z1, . . . , ZN be the pooled sample. Take independent random variables U1, . . . , UN, uniformly R(0, 1) distributed and independent of Z1, . . . , ZN. Order the pairs (Z1, U1), . . . , (ZN, UN) in the following way: (Zi, Ui) < (Zj, Uj) either Zi < Zj or Zi = Zj and Ui < Uj. Denote R 1, . . . , R n the ranks of the pairs (Z1, U1), . . . , (ZN, UN). We shall say that Z1, . . . , ZN satisfy the hypothesis H if they are independent and identically distributed (not necessarily with an absolutely continuous distribution). Then, under H, the vector R 1, . . . , R n is uniformly distributed over the set R of permutations of 1, . . . , N. 16. Method of midranks The idea behind this method is that the equal observations should have equal ranks; the joint value of their rank is then taken as an average of all ranks of the group. We shall mainly describe this method on the Wilcoxon test, but it is applicable also to other tests. Assume that there are e different values among N observations; among them, d1 observations equal to the smallest value, d2 observations equal to the second smallest value, etc., de observations equal to the largest value, e i=1 di = N. The average ranks of the individual groups are v1 = . . . = vd1 = 1 2 (d1 + 1) vd1+1 = . . . = vd1+d2 = d1 + 1 2 (d2 + 1) vd1+d2+1 = . . . = vd1+d2+d3 = d1 + d2 + 1 2 (d3 + 1) ................................................................... vd1+d2+...+de-1+1 = . . . = d1 + d2 + . . . + de-1 + 1 2 (de + 1). Let R 1, . . . , R N denote the midranks of the observations Z1, . . . , ZN. We have the modified Wilcoxon statistic W N = N i=1 R i. Because the distribution of (R 1, . . . , R N) under H is not more uniform on R, (and the values may not be integer), we cannot use the standard tables of Wilcoxon critical values. If the numbers of equal observations are small comparing with N then we can use the normal approximation for sufficiently large m, n. To use this approximation, we must know the expectation and the variance of W N under H. These characteristics are conditional given the values d1, . . . , de and hence the whole test is conditional. We have IE(W N|d1, . . . , de) = n N + 1 2 = IEWN The variance of W N is equal var W n = mn(N + 1) 12 mn e i=1(d3 i - di) 12N(N - 1) . The first term is the variance of the standard Wilcoxon statistic, while the second term is a correction for the ties which vanishes if there are no ties among the observations. 17. Comparison of two treatments based on paired observations To exclude the effects due to the inhomogeneity of the data, we can divide the experimental units in n homogeneous pairs, and apply the new treatment to one unit of the pair while the other unit serves for the control. We can also apply both treatments successively to the same unit. Let Y1, . . . , YN be the measurements of the effects of the new treatment and X1, . . . , XN be the control measurements. Then (X1, Y1), . . . , (XN, YN) is a random sample from a bivariate distribution with the distribution function F(x, y); it is generally unknown and assumed being continuous. The hypothesis H1 of no effect of the new treatment is equivalent to the statement that the distribution function F(x, y) is symmetric around the straight line y = x, i.e. H1 : F(x, y) = F(y, x) x, y IR1. Under the alternative of a positive effect of the new treatment, the distribution of the random vector (X, Y ) is shifted toward the positive halfplane y > x. Rank tests of H1 Transform (Xi, Yi), i = 1, . . . , n in the following way: Zi = Yi - Xi, Wi = Xi + Yi, i = 1, . . . , n. Under H1, the distribution of the vector (Z1, W1), . . . , (ZN, WN) is symmetric around the w-axis, while under the alternative it is shifted in the direction of the positive half-axis z. The problem is invariant with respect to the transformations z i = zi, w i = g(wi), i = 1, . . . , n, where g is a 1 : 1 function with finite number of discontinuities. The invariant tests depend only on (Z1, . . . , ZN), because it is the maximal invariant. It is a sample from some one-dimensional distribution with a continuous distribution function D. The problem of testing H1 is then equivalent to stating that the distribution D is symmetric around 0, H 1 : D(z) + D(-z) = 1 z IR1 against the alternative that the distribution is shifted in the direction of the positive z, K 1 : D(z+)+D(-z+) = 1 z IR1, > 0 The distribution D is uniquely determined by the triple (p, F1, F2) with p = Pr(Z < 0), F1(z) = Pr(|Z| < z|Z < 0) and F2(z) = Pr(Z < z|Z > 0). Equivalent expressions for H 1 and K 1 are H 1 : p = 1/2, F2 = F1, K 1 : p < 1/2, F2 F1. This problem is invariant with respect to the transformations G : z i = g(zi), i = 1, . . . , n, where g is continuous, odd and increasing function. The maximal invariant is (S1, . . . , Sm, R1, . . . , Rn), where S1, . . . , Sm are the ranks of the absolute values of negative Z's among |Z1|, . . . , |ZN| and R1, . . . , n are the ranks of positive Z's among |Z1|, . . . , |ZN|. Moreover, the vectors S 1 < . . . < S m and R 1 < . . . < R n of ordered ranks are sufficient for (S1, . . . , Sm, R1, . . . , Rn) and, further, one of them uniguely determines the other; hence it is finally consider only, e.g., R 1 < . . . < R n and the invariant tests of H1 [or of H 1] depends only on R 1 < . . . < R n. Let be the number of positive components of (Z1, . . . , ZN). Then is a binomial random variable B(N, ); = 1/2 under H1 and, for any fixed n, PH1 (R 1 = r1, . . . , R = r, = n) PH1 (R 1 = r1, . . . , R = r| = n)PH1 ( = n) = 1 N n N n 1 2 N = 1 2 N for any n-tuple (r1, . . . , rn), 1 r1 < . . . < rn N. The number of such tuples is N n=0 N n = 1 2 N . The critical region of any rank test of the size = 1 2N contains just k such points (r1, . . . , rn). However, there generally is no uniformly most powerful test for H 1 against K 1. We usually consider the alternative of shift in location that (Z1, . . . , ZN) has the density q, > 0 : q(z1, . . . , zN) = N i=1 f(zi - ) : > 0 (5) where f is a one-dimensional symmetric density, f(-x) = f(x), x IR1. = 0 under H1 [or H 1.] The locally most powerful rank test of H1 has the critical region N i=1 a+ N (R+ i , f)sign Zi k (6) where R+ i is the rank of |Zi| among |Z1|, . . . , |ZN| and the scores a+ N (i, f) have the form a+ N (i, f) = IE+(U(i), f), i = 1, . . . , N +(u, f) = ( u + 1 2 , f), 0 < u < 1 and where (u, f) = -f(F-1(u)) f(F-1(u)) , 0 < u < 1. We shall describe two main tests of this type: the one sample Wilcoxon test and the sign test. One-sample Wilcoxon test The one-sample Wilcoxon test is based on the criterion W+ N = n i=1 sign Zi.R+ i (7) where R+ i is the rank of |Zi| among |Z1|, . . . , |ZN|, or in the equivalent form W++ N = i=1 Ri (8) where Ri is the rank of Zi > 0 among |Z1|, . . . , |ZN|, is the number of positive components. Obviously W+ N = 2W++ N - 1 2N(N + 1). We reject H1 if W+ N > C, i.e.if the test criterion exceeds the critical value. For large N, when the tables of critical values are not available, we may use the normal approximation: PH1 W+ N - IEW+ N var W+ N x as N (9) where IEW+ N = 0, var W+ N = 1 6 N(N + 1)(2N + 1) (10) The parameters follow from the following propo- sition: Theorem. Let Z be a random variable with continuous distribution function symmetric around 0, i.e. F(z) + F(-z) = 1, z IR1. Then Z and sign Z are independent. The one-sample Wilcoxon test is convenient for the densities of logistic type. Sign test In a more general situation, Z1, . . . , ZN are independent random variables, Zi distributed according to the distribution function Di, but not all D1, . . . , DN are equal. This situation occurs when we compare two treatments under different experimental conditions or using different methods. We want to test the hypothesis of symmetry of all distributions around 0, against the alternative that all distributions are shifted toward the positive values: H 1 : Di(z)+Di(-z) = 1, z IR1, i = 1, . . . , N The problem is invariant with respect to all transformations z i = fi(zi), i = 1, . . . , N, where fi's are continuous, increasing and odd functions. The maximal invariant is the number n of positive components. The invariant tests depend only on n, and the uniformly most powerful among them has the form (n) = 1 ...n > C ...n = C 0 ...n < C (11) where C and are determined by the equation n>C N n 1 2 N + N C 1 2 N = . (12) The criterion of the sign test is simply the number of positive components among Z1, . . . , ZN and its distribution under H1 is binomial b(N, 1/2). For large N we can again use the normal ap- proximation. If all distribution functions D1, . . . , DN coincide, the sign test is the locally most powerful rank test of H1 for double-exponential D with density d(z) = 1 2e-|z-|, z IR1. For using the rank test we need not to know the exact values Xi, Yi, i = 1, . . . , N; it is sufficient to know the signs of the differences Yi - Xi. This is a very convenient property: we can use this test even for the qualitative observations of the type: "drogue A gives a better pain relief than drogue B". As a matter of fact, we do not have any better test under such conditions. 18. Tests of independence in bivariate population Let (X1, Y1), . . . , (Xn, Yn) be a random sample from a bivariate distribution with a continuous distribution function F(x, y). We want to test the hypothesis of independence H2 : F(x, y) = F1(x)F2(y) (13) where F1 and F2 are arbitrary distribution functions. The most natural alternative for H2 is the positive [or negative] dependence, but it is too wide and there is no uniformly most powerful test. We rather consider the alternative Xi = X0 i + Zi Yi = Y 0 i + Zi > 0, i = 1, . . . , n, (14) where X0 i , Y 0 i , Zi, i = 1, . . . , n are independent and their distributions are independent of i. The independence then means that = 0. Let R1, . . . , Rn be the ranks of X1, . . . , Xn and let S1, . . . , Sn be the ranks of Y1, . . . , Yn, respectively. Under the hypothesis of independence, the vectors (R1, . . . , Rn) and (S1, . . . , Sn) are independent and both have the uniform distribution on the set R of permutations of 1, . . . , n. The locally most rank powerful test of H2 against the alternative K2 in which X0 i has the density f1 and Y 0 i the density f2, has the critical region n i=1 an(Ri, f1)an(Ri, f2) > C (15) where the scores an(i, f) are usually replaced be approximate scores. Two the most well-known rank tests of inde- pendence. Spearman test The Spearman test is based on the correlation coefficient of (R1, . . . , Rn) and (S1, . . . , Sn): rS = 1 n n i=1 RiSi - RS [1 n n i=1(Ri - R)2[1 n n i=1(Si - S)2]1/2 where R = S = n + 1 2 , 1 n n i=1 (Ri - R)2 = 1 n n i=1 (Si - S)2 = 1 n n i=1 i2 n + 1 2 2 = n2 - 1 12 . Then we can express the criterion in a simpler form rS = 12 n(n2 - 1) n i=1 RiSi - 3(n + 1) n - 1 . The test rejects H2 if rS > C, or, equivalently, if S = n i=1 RiSi > C . In some tables we find the critical values for the statistic S = n i=1 (Ri - Si)2 (16) for which rS = 1 - 6 n3-n S. The test based on S rejects H2 if S < C . For large n we use the normal approximation with IES = n(n + 1)2 4 , var S = n2(n + 1)2(n - 1) 144 . The Spearman test is the locally most powerful against the alternatives of the logistic type. Quadrant test This test is based on the criterion Q = 1 4 n i=1 [sign(Rin + 1 2 )+1][sign(Sin + 1 2 )+1] and rejects H2 for large values of Q. For even n is Q equal to the number of pairs (Xi, Yi), for which Xi lies above the X-median and Yi lies above the Y -median. Statistic Q then has, under the hypothesis H2, the hypergeometric distribution Pr(Q = q) = m q m m - q n m (17) for q = 0, 1, . . . , m, m = n/2. For large n we use the normal approximation with the para- meters IEQ = n/4, var Q = n2 16(n - 1) . (18) 19. Rank tests for comparison of several treatments One-way classification We want to compare the effects of p treatments; the experiment is organized in such a way that the i-th treatment is applied on ni subjects with the results xi1, . . . , xini , i = 1, . . . , p, p i=1 ni = n. Then xi1, . . . , xini is a random sample from a distribution with a distribution function Fi, i = 1, . . . , p. The hypothesis of no difference between the treatments can be then expressed as the hypothesis of equality of p distribution functions, namely H2 : F1 F2 . . . Fp (19) and we can consider this hypothesis either against the general alternative K2 : Fi(x) = Fj(x) (20) at least for one pair i, j at least for some x = x0, or against a more special alternative K 2 : Fi(x) = F(x - i), i = 1, . . . , p (21) and i = j at least for one pair i, j. The alternative claims that the effects of treatments on the values of observations arelinear and that at least two treatments differ in their effects. The classical test for this situation is the Ftest of the variance analysis; this test works well under the normality, Fi N(+i, 2), i = 1, . . . , p. We obtain the usual model of variance analysis Xij = + i + eij, j = 1, . . . , ni; i = 1, . . . , p (22) where eij are independent random variables with the normal distribution N(0, 2). The hypothesis H2 can be then reformulated as H 2 : 1 = 2 = . . . = p = 0. The F-test rejects the hypothesis H 2 provided F = n - p p - 1 p i=1 ni( Xi - X)2 p i=1 ni j=1(Xij - Xi)2 C (23) where Xi = 1 ni ni j=1 Xij and X = 1 n p i=1 ni j=1 Xij, i = 1, . . . , p and where the critical value C is found in the tables of F-distribution with (p - 1, n - p) degrees of freedom. Kruskal-Wallis rank test Consider the vector of all observations and their ranks R11, . . . , R1n1 ; R21, . . . , R2n2 ; . . . ; Rp1, . . . , Rpnp. Let R i1 < . . . < R ini be the ordered ranks of the i-th sample, i = 1, . . . , p. Then, under H, it holds for any permutation {r11, . . . , rpnp} of 1, . . . , N such that ri1 < . . . < rini , i = 1, . . . , p, that P R 11 = r11, . . . , R pnp = rpnp = n1! . . . np! N! . The Kruskal-Wallis rank test rejects H pro- vided KN = 12 N(N + 1) p i=1 ni(Ri N + 1 2 )2 = 12 N(N + 1) p i=1 niR2 i - 3(N + 1) > K where Ri = 1 ni ni j=1 Rij, i = 1, . . . , p. It can be formally obtained from the F-test, if we insert Ri for Xi and R = N+1 2 for X If ni , i = 1, . . . , p and p > 3, then KN has asymptotically 2(p - 1) distribution under H2. In practice we can use the 2 approximation for p > 3 and ni > 5, i = 1, . . . , p. In the special case p = 2, the Kruskal-Wallis reduces to the two-sided (two-sample) Wilcoxon test. Modification in presence of tied observations: Assume that there are e different values among the components X11, . . . , Xpnp, and d1 are equal to the smallest, . . . , de are equal to the largest one. Let (R 11, . . . , R pnp ) be the midranks of X11, . . . , Xpnp. Then the modified Kruskal-Wallis statistic has the form K N = 12 N(N+1) p i=1 ni R i - N+1 2 2 1 - 1 N3-N e k=1(d3 k - dk) . The distribution of K N conditioned by given d1, . . . , de is approximately 2(p-1) under H for large n1 . . . , np. In the special case p = 2, the Kruskal-Wallis reduces to the two-sided (twosample) Wilcoxon test. Two-way classification (random blocks) We want to compare p treatments and simultaneously to reduce the effect of non-homoheneity of the sample units. We divide the subjects in n homogeneous groups, so called blocks, and compare the effects of treatments within each block separately. The subjects in the block are usually assigned the treatments in a random way. The simplest model has n independent blocks, each containing p elements, and each treatment is applied just once in each block. The observations can be formally described by the following table: Treatm. 1 2 3 . . . p Block 1 x11 x12 x13 . . . x1p 2 x21 x22 x23 . . . x2p ... ... ... ... ... n xn1 xn2 xn3 . . . xnp The observation xij is the measured effect of the j-th treatment applied in the i-th block. Xij are independent, Xij has a continuous distribution function Fij, j = 1, . . . , n; i = 1, . . . , p. We test the hypothesis that there is no significant difference among the treaments, hence H3 : Fi1 Fi2 . . . Fip i = 1, . . . , n (24) against the alternative K3 : Fij = Fik (25) at least for one i and at least for one pair j, k, or against a more special alternative K 3 : Fij(x) = Fi(x - j), j = 1, . . . , n; i = 1, . . . , p j = k at least for one pair j, k. The classical test of H3 is the F-test in the model Xij = +i+j+Eij, j = 1, . . . , n; i = 1, . . . , p, (26) where Eij are independent with the normal distribution N(0, 2), is the main additive effect, i is the effect if the i-th block and j is the effect of the j-th treatment, j = 1, . . . , n; i = 1, . . . , p. The hypothesis H3 then reduces to the form 1 = 2 = . . . = p. critical regionThe F-test of H3 has the critical region F = (n - 1) p j=1( Xj - X)2 p j=1 n i=1(Xij - Xi - Xj + X)2 > C (27) where C is the critical value of F-distribution with p-1 and (p-1)(n-1) degrees of freedom. Friedman rank test Order the observations within each block and denote the corresponding ranks Ri1, . . . , Rip; i = 1, . . . , n. The ranks we arrange in the following table: Treatm. 1 2 3 . . . p Row Block average 1 R11 R12 R13 . . . R1p p+1 2 2 R21 R22 R23 . . . R2p p+1 2 ... ... ... ... ... ... n Rn1 Rn2 Rn3 . . . Rnp p+1 2 Column R1 R2 R3 . . . IRp Average average R = p+1 2 where Rj = 1 n n i=1 Rij and R = 1 np n i=1 p j=1 Rij. The Friedman test is based on the following criterion: Qn = 12n p(p + 1) p j=1 (Rj p + 1 n )2 = 12n p(p + 1) p j=1 R p j - 3n(p + 1) and the large value of the criterion are significant. As n , then the distribution of Qn is approximately 2 with p-1 degrees of freedom. In case p = 2, the Friedman test is reduced to the two-sided sign test. The Friedman test is applicable even in the situation that we observe only the ranks rather than exact values of the treatment effects. PERMUTATION TESTS Permutation tests are conditional tests under given vector of order stagtistics. We shall illustrate them on the test of hypothesis of randomness against a two-sample alternative. Let X1, . . . , Xm and Y1, . . . , Yn be two independent samples with the distribution functions F and G. We want to test H0 : F G against K : G(x) = F(x-), > 0. The distribution function F is unknown, but we expect that it is normal. On the other hand, we wish to have a test good under the normality, but at least unbiased for all F with a continuous density. Such are the permutation tests. For simplicity, denote (X1, . . . , Xm, Y1, . . . , Yn) = (Z1, . . . , ZN), N = m + n and let Z(1) Z(2) . . . Z(N) be the corresponding order statistics. The permutation test is based only on Z(1), Z(2), . . . , Z(N) and it should satisfy 1 N! rR (Z(r1), . . . , Z(rN))) = , (28) where R is the set of N! permutations of 1, 2, . . . , N. The test is conditional, under given vector of order statistics Z(1), Z(2), . . . , Z(N), variable are only the permutations (r1, . . . , rN). Generally, the test rejecting the alternative that (Z1, . . . , ZN) has density q(z1, . . . , zN) has the form (zr1, . . . , zrN ) = r1, . . . , rN|Z() = 1 . . . q(zr1, . . . , zrN ) > C(z()) . . . q(zr1, . . . , zrN ) = C(z()) 0 . . . q(zr1, . . . , zrN ) < C(z()) where C(z()) is determined so that (28) is sat- isfied. It means that we reject H0 for k permutations r1, . . . , rN of z(1), . . . , z(N) leading to the largest values of q(z(r1), . . . , z(rN)), where k + = N!. Special case: Two normal samples differing by a shift in location Here q(z1, . . . , zN) = f(x1) . . . f(xm)f(y1-) . . . f(yn-) where f is the density of N(, 2), i.e. q(z1, . . . , zN) 1 ( 2)N exp - 1 22 m i=1 (xi - )2 + n j=1 (yj - - )2 and this is large iff n j=1 yj is large. Hence, the test rejects if n j=1 yj = N i=m+1 zi > C1(z()). The vectors zm+1, . . . , zN run over N n combinations of z1, . . . , zN. We reject the hypothesis for k largest values of N i=m+1 zi, where k + = N n . (29) Practical procedure: (i) Observe (x1, . . . , xm, y1, . . . , yn) = (z1, . . . , zN). (ii) Determine integer k and fraction 0 satisfying (29). (iii) Calculate the values n j=1 zj for all combinations 1, . . . , n. Find the N n - k + 1 -st largest sum, say a. (iv) Reject H0, if n j=1 yj > a and reject with probability if n j=1 yj = a. References: H. B¨unning and G. Trenkler (1978). Nichparametrische statistische Methoden (mit Aufgaben und L¨osungen und einem Tabellenanhang). W. de Gruyter, Berlin. J.Hájek and Z.Šidák (1967). Theory of Rank Tests. Academia, Prague. J. Jurečková (1981). Rank Tests (in Czech). Lecture Notes, Charles University in Prague. J. P. Lecoutre and P. Tassi (1987). Statistique non parametrique et robustesse. Economica, Paris. E. L. Lehmann (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco. E. L. Lehmann (1986). Testing Statistical Hypotheses. Second Edition. J. Wiley.