ESTIMATION WITH QUADRATIC LOSS W. JAMES FRESNO STATE COLLEGE AND CHARLES STEIN STANFORD UNIVERSITY 1. Introduction It has long been customary to measure the adequacy of an estimator by the smallness of its mean squared error. The least squares estimators were studied by Gauss and by other authors later in the nineteenth century. A proof that the best unbiased estimator of a linear function of the means of a set of observed random variables is the least squares estimator was given by Markov [12], a modified version of whose proof is given by David and Neyman [4]. A slightly more general theorem is given by Aitken [1]. Fisher [5] indicated that for large samples the maximum likelihood estimator approximately minimizes the mean squared error when compared with other reasonable estimators. This paper will be concerned with optimum properties or failure of optimum properties of the natural estimator in certain special problems with the risk usually measured by the mean squared error or, in the case of several parameters, by a quadratic function of the estimators. We shall first mention some recent papers on this subject and then give some results, mostly unpublished, in greater detail. Pitman [13] in 1939 discussed the estimation of location and scale parameters and obtained the best estimator among those invariant under the affine transformations leaving the problem invariant. He considered various loss functions, in particular, mean squared error. Wald [18], also in 1939, in what may be considered the first paper on statistical decision theory, did the same for location parameters alone, and tried to show in his theorem 5 that the estimator obtained in this way is admissible, that is, that there is no estimator whose risk is no greater at any parameter point, and smaller at some point. However, his proof of theorem 5 is not convincing since he interchanges the order of integration in (30) without comment, and it is not clear that this integral is absolutely convergent. To our knowledge, no counterexample to this theorem is known, but in higher-dimensional cases, where the analogous argument seems at first glance only slightly less plausible, counterexamples are given in Blackwell [2] (which is discussed briefly at the end of section 3 of this paper) and in [14], This work was supported in part by an ONR contract at Stanford University. 361 362 FOURTH BERKELEY SYMPOSIUM: JAMES AND STEIN which is repeated in section 2 of this paper. In the paper of Blackwell an analogue of Wald's theorem 5 is proved for the special case of a distribution concentrated on a finite arithmetic progression. Hodges and Lehmann [7] proved some results concerning special exponential families, including Wald's theorem 5 for the special problem of estimating the mean of a normal distribution. Some more general results for the estimation of the mean of a normal distribution including sequential estimation with somewhat arbitrary loss function were obtained by Blyth [3] whose method is a principal tool of this paper. Girshick and Savage [6] proved the minimax property (which is weaker than admissibility here) of the natural estimator in Wald's problem and also generalized the results of Hodges and Lehmann to an arbitrary exponential family. Karlin [8] proved Wald's theorem 5 for mean squared error and for certain other loss functions under fairly weak conditions and also generalized the,results of Girshick and Savage for exponential families. The author in [16] proved the result for mean squared error under weaker and simpler conditions than Karlin. This is given without complete proof as theorem 1 in section 3 of the present paper. In section 2 of this paper is given a new proof by the authors of the result of [14] that the usual estimator of the mean of a multivariate normal distribution with the identity as covariance matrix is inadmissible when the loss is the sum of squares of the errors in the different coordinates if the dimension is at least three. An explicit formula is given for an estimator, still inadmissible, whose risk is never more than that of the usual estimator and considerably less near the origin. Other distributions and other loss functions are considered later in section 2. In section 3 the general problem of admissibility of estimators for problems with quadratic loss is formulated and a sufficient condition for admissibility is given and its relation to the necessary and sufficient condition [15] is briefly discussed. In section 4 theorems are given which show that under weak conditions Pitman's estimator for one or two location parameters is admissible when the loss is taken to be equal to the sum of squares of the errors. Conjectures are discussed for the more difficult problem where unknown location parameters are also present as nuisance parameters, and Blackwell's example is given. In section 5 a problem in multivariate analysis is given where the natural estimator is not even minimax although it has constant risk. These are related to the examples of one of the authors quoted by Kiefer [9] and Lehmann [11]. In section 6 some unsolved problems are mentioned. The results of section 2 were obtained by the two authors working together. The remainder of the paper is the work of C. Stein. 2. Inadmissibility of the usual estimator for three or more location parameters Let us first look at the spherically symmetric normal case where the inadmissibility of the usual estimator was first proved in [14]. Let X be a normally distributed p-dimensional coordinate vector with unknown mean t = EX and ESTIMATION WITH QUADRATIC LOSS 363 covariance matrix equal to the identity matrix, that is, E(X - t)(X - t)I. We are interested in estimating t, say by 4 and define the loss to be (1) LQ(, 4) = (t - = |) - J112, using the notation (2) -1X112 = x'x. The usual estimator is 'po, defined by (3) OW(x) = x, and its risk is (4) p(Q, po) = EL[t, po(X)] = E(X -t)'(X- = p. It is well known that among all unbiased estimators, or among all translationinvariant estimators (those s for which po(x + c) = so(x) + c for all vectors x and c), this estimator go has minimum risk for all t. However, we shall see that for p > 3, (5) E||(1-lX| )X t|| = p -E P2+2 < p'(5)~ ~ ~ IX12p - 2 +2K where K has a Poisson distribution with mean tl12/2. Thus the estimator pi defined by (6) sol(X) = 1 - 2 X has smaller risk than so for all t. In fact, the risk of ,1 is 2 at 0= 0 and increases gradually with 11112 to the value p as I tl 2 oo. Although solis not admissible it seems unlikely that there are spherically symmetrical estimators which are appreciably better than pi. An analogous result is given in formulas (19) and (21) for the case where E(X - t)(X - t)' = a21, where a2 is unknown but we observe S distributed independently of X as a2 times a x2 with n degrees of freedom. For p _ 2 it is shown in [14] and also follows from the results of section 3 that the usual estimator is admissible. We compute the risk of the estimator 2 defined by (7) <2(X) = - __X, where b is a positive constant. We have (8) P(t P2) = E X-t = EIlX-_lI2 -2bE (Xllx X + b2E11l12 = p - 2bE (X x)'X + b2E 11X1 364 FOURTH BERKELEY SYMPOSIUM: JAMES AND STEIN It is well known that IIXI 12, a noncentral x2 with p degrees of freedom and noncentrality parameter lItI12, is distributed the same as a random variable W obtained by taking a random variable K having a Poisson distribution with mean 1/211,1 2 and then taking the conditional distribution of W given K to be that of a central x2 with p + 2K degrees of freedom. Thus (9) 1 1 =E(1E KIE1KE (9)IEj= X2+2K E X+2K) p-2+2K To compute the expected value of the middle term on the right side of (8) let (10) U= V=I2 Then (11) W = I1X12 = U2 + V, and U is normally distributed with mean IltlI and variance 1, and V is independent of U and has a x2 distribution with p - 1 d.f. The joint density of U and V is (12) , exp{-(U P{2(UV1(p)2}2(P21),2F](- 1)/2] V(P-)/2e-V/2 if v _ 0 and 0 if v <0. Thus the joint density of U and W is (13) 27r2(v-))/2r[(p - 1)/2] (W U2)(23)I2 exp 221I12 + IIfl U 1W} if u2 < w and 0 elsewhere. It follows that (14) EU exp {-2- r dwW x27r 2(P')'2r[(p - 1)/2] O i!U (W - U2)(P-3)/2 exp {lllu- w} Making the change of variable t = u/V\w we find (15) f dw f (w - U2)(p-3)2 exp{llllu - W}du 10f w~~3~' ex{p -w2}11dw exp {l1tltt\w} dt = f W(p3)/2 exp {-w w}dw )i f t'+'(1 - t2)(p3)/2 dt Itl (2ji+1)!|Wp/2+j-l tx --W W| f2(j+l)(1 -t2)(p-3)/2 dt ~ (Ca WP/2+1 exp dwwCd,=o(2j+ 1)! C ~ 2 W} 1t~~~( 2()2d ESTIMATION WITH QUADRATIC LOSS 365 2'0(2j+1)!2' +ir(-2+j)B(j+3,P 1-jo0 (2+;!21 Illf12i+1 2P/2+j r ( + 3)r (P ) =° (2j+ 1) (P 9p12 X 2(2 + j) (~uI) - =02 i+lr(j + 1) (2 +i) It follows from (10), (11), (14), and (15) that (16) E (Xll2 = 1- it IlE U =1-110~111t X r j21tjj2i+1 lIrf V j=02i+lr(j + 1)(2 +i) = exp {-2 11011!} { E X -11011 Ej!(p + 2j)} = exp {-2 pp{-2) i} i! p -2+2i p - 2 + 2 where K again has a Poisson distribution with mean jl1112/2. Combining (8), (9), and (16) we find (17) P(%, P2) = Ell(i-_ I_ )X- t|l = p-2(p-2)bE 1+ + b2E _ 2+2K This is minimized, for all {, by taking b = p - 2 and leads to the use of the estimator ioi defined by (6) and the formula (5) for its risk. Now let us look at the case where X has mean t and covariance matrix given by (18) E(X - t)(X -t)' = a2 366 FOURTH BERKELEY SYMPOSIUM: JAMES AND STEIN and we observe S independent of X distributed as a2 times a x2 with n degrees of freedom. Both t and a2 are unknown. We consider the estimator (p3 defined by (19) p3(X, S) = (1 X, where a is a nonnegative constant. We have (20) p(%, ps) = Ellp3(X, S) - t1|2 EIIX -t112 - 2aES(X_ tyX + a2E S2 11X1,12 IalX.112 = -j -2aE S E ($)')°+a2E(S)2E 1 -a2 {p -2an(p - 2)E - 2+ 2K + a2n(n + 2)E - 2K} by (9) and (16) with X and t replaced by X/a and t/a respectively. Here K has a Poisson distribution with mean ItI12/2a2. The choice a = (p - 2)/(n + 2) minimizes (20) for all t, giving it the value (21) p( $p3) = cr2 - n+2 (P- 2)2E -2+rp-n p 1 2+~Kf We can also treat the case where the covariance matrix is unknown but an estimate based on a Wishart matrix is available. Let X and S be independently distributed, X having a p-dimensional normal distribution with mean t and covariance matrix 2 and S being distributed as a p X p Wishart matrix with n degrees of freedom and expectation n2, where both t and 2 are unknown and 2 is nonsingular. Suppose we want to estimate t by I with loss function (22) L[(Q, 2), j] = (t -)t-(-. We consider estimators of the form (23) (p(X, S) I(i - X. The risk function of (p is given by (24) p[(Q, I), sp] = EE,x[(p(X, S) - t]' I-'[((X, S)z 1 - - M-1 [(i- c -= R,[(1- -,S X) X-]21[1X'S-,X) Xt = EE,1 - XS-,X) ] [(i X'S- X) where t*' = [(Q'Z-1j)1'2, 0, *, 0]. But it is well known (see, for example Wijsman [19]) that the conditional distribution of X'S-1X given X is that of X'X/S*, where S* is distributed as x2-p+l independent of X. Comparing (24) ESTIMATION WITH QUADRATIC LOSS 367 and (20) we see that the optimum choice of c is (p - 2)/(n - p + 3) and, for this choice, the risk function is given by (25) P[(' 1) f] = P-n p+ 1 (p- 2)2 1 n - p+3 p- 2 +2K where K has a Poisson distribution with mean (1/2)t'2;'. The improvement achieved by these estimators over the usual estimator may be understood better if we break up the error into its component along X and its component orthogonal to X. For simplicity we consider the case where the covariance matrix is known to be the identity. If we consider any estimator which lies along X, the error orthogonal to X is t - (t'X/lXlI2)X and its mean square is (26) Et lit- X || = lftl-l2Et ((X)12 1l2(1-E 1+ 2K) = (p 1)1101 p + 2K = (p - 1)[1-I -2)Ep 2 + 2K] Thus the mean square of the component along X of the error of [1(p - 2)/1 XII2]Xis (27) p p)2Et(P -(p -2)Et2 2K [P (p 2)2Et p-2 + 2K 1)[1- 2 + 2K] 1(P-2)E~ p-2 +2K =2. On seeing the results given above several people have expressed fear that they were closely tied up with the use of an unbounded loss function, which many people consider unreasonable. We give an example to show that, at least qualitatively, this is not so. Again, for simplicity we suppose X a p-variate random vector normally distributed with mean t and covariance matrix equal to the identity matrix. Suppose we are interested in estimating t by f with loss function (28) L(Q, j) = F( I -t112) where F has a bounded derivative and is continuouusly differentiable and concave (that is, F"(t) _ 0 for all t > 0). We shall show that for sufficiently small b and large a (independent of t) the estimator so defined by (29) 'p(X) = (1-- a +1IIX1l2)X has smaller risk than the usual estimator X. We have (with Y = X -t) 368 FOURTH BERKELEY SYMPOSIUM: JAMES AND STEIN (30) pQ, p) = EZF 1- + X_t1|2] =EtF [!IX _ t12-2b X'(X-0) + b2 I1X112)2 _ EeF(flX-tI2) - 2bE+[X1a+ (Xa12b IX-122b XVY+)t'- = EF(1XY112)-2bE a +IM Y + tiI22 F'(1XY-12) 2 (Y+ t)'Y-- (31) E a + {lY + t112 2 F'(11Y1a2) = Y+a '+IIllyl 1- lll2 + IIYII) ({2) )+°[(a +Il4112)J a+E ll |2+ a,fl+2!eY F+(flYfl2)+1111\} b~~~~~~~~ (IYl+V FI(1Y1-) +F[(I +II2 II2)1 {a + I2 + 11Y112 - (a + I1lI! + /ly +O112 To see that this is everywhere positive if b is sufficiently small and a sufficiently large we look at (32) +Vy - and (33) AE AF(I1IY =12) a+IIYA + 1Y12 = (A) ESTIMATION WITH QUADRATIC LOSS 369 It is clear thatf and g are continuous strictly positive valued functions on [1, ) with finite nonzero limits at X. It follows that (34) c = inf f(A) = 0,A>1 (35) d = sup g(A) < oo. A>1 Then, if b is chosen so that (36) 1--) _-2 d > O it follows from (30) and (31) that, for sufficiently large a (37) p(, p) < EZF(IiX - l12) for all t. The inadmissibility of the usual estimator for three or more location parameters does not require the assumption of normality. We shall give the following result without proof. Let X be a p-dimensional random coordinate vector with mean t, and finite fourth absolute moments: (38) E(Xi-t)4,_C< oc. For simplicity of notation we assume the Xi are uncorrelated and write (39) 2 = EJX,-i)2. Then for p 2 3 and (40) b < 2(p-2) min at and sufficiently large a depending only on C (41) E [(1 2[a + E X2/y2])Xi -{i < Et (X_-.)2 = a2 It would be desirable to obtain explicit formulas for estimators one can seriously recommend in the last two cases considered above. 3. Formulation of the general problem of admissible estimation with quadratic loss Let Z be a set (the sample space), 63 a a-algebra of subsets of Z and X a a-finite measure on 63. Let 0 be another set (the parameter space), e a o-algebra of subsets of 0, and p(. ) a nonnegative valued (6B-measurable function on Z X 0 such that for each (E 0, p(. 10) is a probability density with respect to X, that is, (42) fp(zO)dX(z) = 1. Let A, the action space, be the k-dimensional real coordinate space, a a e-measurable function on 0 to the set of positive semidefinite symmetric k X k matrices, 370 FOURTH BERKELEY SYMPOSIUM: JAMES AND STEIN and ,q a C-measurable function on 0 to A. We observe Z distributed over Z according to the probability density p(. 10) with respect to X, where 0 is unknown, then choose an action a E A and suffer the loss (43) L(0, a) = [a - q(0)]'a(6)[a - o()]An estimator (p of q(o) is a B-measurable function on Z to A, the interpretation being that after observing Z one takes action sp(Z), or in other words, estimates ,1(0) to be p(Z). The risk function p(., so) is the function on 0 defined by (44) p(0, sp) = Eo[ 0 and C a set in 5Y such that II(C n Sf) > 0 where S, is the set of 0 for which (51) Ee[,pi(X) - 7(0)]'a(0)[p(X) -()] < Ee[sp(X) -r(O)]'a(6)['p(X) -r7(0)] - e. Then, for any q E g(C) (52) f q(6) dH(O)Ee[&p1(X) - 7j()]'a(0)['p1(X) -7(0)] ff q(O) dH(O)Ee[4p(X) - 7(0)]'a(6)['P(X) - tn()] - e f q(D) dH(O). cfnsE It follows that (53) eII(Ccn ) f q(O) dHI(O) cns = f q(O)dIH(O) c -fi(@ d ( )fq(O)dll(O)Ee[vp(X) - 7(0)]'a(0)[sp>(X) - n()] < fi f q(O) dH(O)Ee[,p(X) -q(O)]'a(O)['p(X) -7(0)] f q(O) dHl(9) c - f q(O) dHJ(O)Ee['p(q,n)(X) -l(0)]'a(O)['p(q,ll)(X) - n()] fq(O) dH(o)E0['p(X) - 'p(qF,)(X)]'a(O)['P(X) -'p(qjn)(X)] f q(o) dIH(O) and this contradicts (49). 372 FOURTH BERKELEY SYMPOSIUM: JAMES AND STEIN 4. Admissibility of Pitman's estimator for location parameters in certain low dimensional cases The sample space Z of section 3 is now of the form $E X yIJ, where $tis a finitedimensional real coordinate space, and yijarbitrary and the o-algebra (B is a product a-algebra (B = M1M2 where G3, consists of the Borel sets in $r and (B2 is an arbitrary a-algebra of subsets of y3. Here X is the product measure X = yv where ,u is a Lebesgue measure on G3i and v is an arbitrary probability measure on (B2. The parameter space 0 and the action space A coincide with $c. The loss function is (54) L(6, a) = (a - 6)'(a - 0). We observe (X, Y) whose distribution, for given 0, is such that Y is distributed according to v and the conditional density of X - 0 given Y is p(. Y), a known density. We assume (55) f p(x, y) dx = 1 and (56) fxp(x, y) dx =0 for all y. Condition (56) is introduced only for the purpose of making the natural estimator X (see [6] or [16]). The condition (49) for the natural estimator X to be almost admissible becomes the existence of a sequence 7r, of densities with respect to Lebesgue measure in $Csuch that 1 r |||, _¢X)(1v | (57) ~inf 1 dp(y) dlf t~rg(x - ,;)p(iijy) dlii2(57) inf 7r(x) j 0. l4il 1 f 7r6(X - ,q)p(6qIy) d'q This is derived in formula (63) below. In [16] one of the authors proved the following theorem. THEOREM 4.1. When dim SC = 1, if in addition to the above conditions (58) f dv(y) [f x2p(xjy) dx]312 < oo, then X is an admissible estimator of 0, that is, there does not exist a function so such that (59) f dv(y) f [,p(x, y) - 0]2p(X- Bly) dx < f dv(y) f x2p(xly) dx for all 0 with strict inequality for some 0. This is proved by first showing that (57) holds with (60) 'ir,(x) = wa( + ESTIMATION WITH QUADRATIC LOSS 373 so that X is almost admissible, and then proving that this implies that X is admissible. It is not clear that the condition (58) is necessary. We shall sketch the proof of a similar but more difficult theorem in the twodimensional case. THEOREM 2. When dim X = 2, if in addition to (55) and (56) (61) f d'(y) [f llxll2 log'+' jlxjl2p(x, y) dx] < o, then X is an admissible estimator of 0, that is, there does not exist a function so on DC X lJ to $t such that (62) f d'(y) f ij,(x, y) - 01l2p(x - Oly) dx _ f dv(y) f l|xl|2p(x|y) dx for all 0 with strict inequality for some 0. SKETCH OF PROOF. First we show that (57) implies (49) in the present case. We take II to be a Lebesgue measure in $C and in accordance with the notation of the present section write 7r instead of q. Because of the invariance of our problem under translation we need prove (49) only for C equal to a solid sphere of radius 1 about the origin. Then the integral in the numerator of (49) is f lp(X- ti, Y)ir(.i) dt, 2 (63) j 7(r,) diEE X J |~~~~p(X -,1 Y)ir(tai) = f 7r,) dt f dv(y) f p(x -, y) dx 2f (x ti)p(x - i y)wr,) dt1 2 [f p(x - ,y)prx(-i,)dyd] f dv(y) f dx f 7r,(Q) dt p(x - i, Y) 2f (x -i -til [f p(x - 1, y)7r,(Q) dt1] = f dv(y) f dx J (X- )p(x - , y)7r.() dti| f p(x - t, y)7r.(Q) dt =|dp(y) vdx ?1l7r,(x-rnwn, y) dillf T,(X - q)p(i1, y) dr, and it is clear that (57) implies (49). 374 FOURTH BERKELEY SYMPOSIUM: JAMES AND STEIN To prove theorem 2 we define ire by -,log2V, III2< 11 0*[K, log2 a 1t112 < Ma, (64) |-log (1 () 1 _ II1I 2 _ Ma, a aTlogL2og 1il_Ma where 1 < < 1 + 6, 0 < M < 1, A'VM > 1, and these constants and B are chosen so that 'r is continuous everywhere and continuously differentiable except at lltllf = 1. A computation analogous to that in [16] but much more tedious leads to the following bound for the inner integral on the right side of (63) 12 1l||717re( -t)p(, y) do|| (65) I(y) = dx f 7rz(x -71)p('i2, y) dr7 aXi, aa', - a±a' where a is an arbitrary nonsingular p X p matrix. Also n (73) S = xixi=l is a sufficient statistic and if we make the transformation Xi -* aXi then S -- aSa'. We may confine our attention to estimators which are functions of S alone. The condition of invariance of an estimator p (a function on the set of positive definite p X p symmetric matrices to itself) under transformation by the matrix a is (74) (p(asa') = ad(s)a' for all s. Let us look for the best estimator (p satisfying (74) for all lower triangular matrices a, that is, those satisfying aij = 0 for j > i. We shall find that this p(S) is not a scalar multiple of S. At the end of the section we shall sketch the proof that such an estimator is minimax. Similar results hold for the quadratic loss function (75) L*(2, :) = tr (-' 2 -I)2 ESTIMATION WITH QUADRATIC LOSS 377 but I have not been able to get an explicit formula for a minimax estimator in this case. Putting s = I in (74) we find (76) (p(aa') = asp(I)a'. When we let a range over the set of diagonal matrices with 4 1 on the diagonal, this yields (77) (I) = ap(I)a', which implies that so(I) is a diagonal matrix, say A, with ith diagonal element Ai. This together with (74) determines so since any positive definite symmetric matrix S can be factored as (78) S = KK' with K lower triangular (with positive diagonal elements) and we then have (79) p(S) = KAK'. Since the group of lower triangular matrices operates transitively on the parameter space, the risk of an invariant procedure (p is constant. Thus we compute the risk only for 2 = I. We then have (80) p(I, s) = E[tr p(S) - log det sp(S) - p] = E(tr KAK' - log det KAK' - p] E tr KAK' - log detA - E log det S - p. But (81) E tr KAK' = EAiEKk1i,k - EAiExn-+l+pi =_ Ai(n + p - 2i + 1) since the elements of K are independent of each other, the ith diagonal element being distributed as Xn-i+l and the elements below the diagonal normal with mean 0 and variance 1. Also, for the same reason, (82) E log det S = E log X2 -+1. It follows that (83) p(2, p) = p(I, so) = E [(n + p - 2i + 1)Ai -log Ai] - E E log xn-i+i p. This attains its minimum value of p X21 (84) p(2Z, p*) = 1 - log -Elgn_1(=1L n + p-2i + I x = E [log (n + p - 2i + 1) - E log xn-_+I] 378 FOURTH BERKELEY SYMPOSIUM: JAMES AND STEIN when (85) i. = n + p - 2i + 1 We have thus found the minimax estimator in a class of estimators which includes the natural estimators (multiples of S) to be different from the natural estimators. Since the group of lower triangular matrices is solvable it follows from the results of Kiefer [9] that the estimator given by (79) and (85) is minimax. However, it is not admissible. One can get a better estimator by averaging this estimator and one obtained by permuting the coordinates, applying the method given above and then undoing the permutation. It must be admitted that the problem is somewhat artificial. 6. Some more unsolved problems In section 4 several conjectures and unsolved problems concerning estimation of location parameters have been mentioned. Some other problems are listed below. Of course, one can combine these in many ways to produce more difficult problems. (i) What are the admissible estimators of location parameters? In particular, what are the admissible minimax estimators of location parameters? (ii) What results can be obtained for more general loss functions invariant under translation? (iii) For a problem invariant under a group other than a translation group, when is the best invariant estimator admissible? In particular, is Pitman's estimator admissible when both locations and scale parameters are unknown? (iv) What can we say in the case of more complicated problems where there may be no natural estimator? For example, consider the problem in which we observe S1, * , S. independently distributed as aTxk and want to estimate a22 , a2 by &2 * , &2 with loss function (86) L(al * *Xn;a, * n** ) = It is clear that (87) 1k + 2 Si is a minimax estimator since the risk for this estimator is constant and it is minimax when all except one of the at are 0 (see Hodges and Lehmann [7]). But this estimator is clearly very poor if k is small and n is large. This problem arises in the estimation of the covariances in a finite stationary circular Gaussian process. ESTIMATION WITH QUADRATIC LOSS 379 REFERENCES [1] A. C. AITKEN, "On least squares and linear combinationi of observations," Proc. Roy. Soc. Edinburgh, Sect. A, Vol. 55 (1935), pp. 42-48. [2] D. BLACKWELL, "On the translation parameter problem for discrete variables," Ann. Math. Statist., Vol. 22 (1951), pp. 393-399. [3] C. BLYTH, "On minimax statistical decision procedures and their admissibility," Ann. Math. Statist., Vol. 22 (1951), pp. 22-42. [4] F. N. DAVID and J. NEYMAN, "Extension of the Markoff theorem on least squares," Statist. Res. Mem., Vol. 1 (1938), pp. 105-116. [5] R. A. FISHER, "On the mathematical foundations of theoretical statistics." Philos. Trans. Roy. Soc. London, Ser. A, Vol. 222 (1922), pp. 309-368. [6] M. A. GIRSHICK and L. J. SAVAGE, "Bayes and minimax estimates for quadratic loss functions," Proceedings of the Second Berkeley Symposium on Mathenaatical Statistics and Probability, Berkeley and Los Angeles, University of California Press, 1951, pp. 53-73. [7] J. L. HODGES, JR., and E. L. LEHMANN, "Some applications of the Cram6r-Rao inequality," Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, Berkeley and Los Angeles, University of California Press, 1951, pp. 13-22. [8] S. KARLIN, "Admissibility for estimation with quadratic loss," Ann. Math. Statist., Vol. 29 (1958), pp. 406-436. [9] J. KIEFER, "Invariance, minimax sequential estimation, and continuous time processes," Ann. Math. Statist., Vol. 28 (1957), pp. 573-601. [10] H. KUDO, "On minimax invariant estimators of the transformation parameter," Nat. Sci. Rep. Ochanomizu Univ., Vol. 6 (1955), pp. 31-73. [11] E. L. LEHMANN, Testing Statistical Hypotheses, New York, Wiley, 1959, pp. 231 and 338. [12] A. MARKOV, Calculus of Probability, St. Petersburg, 1908 (2nd ed.). (In Russian.) [13] E. J. G. PITMAN, "Location and scale parameters," Biometrika, Vol. 30 (1939), pp. 391- 421. [14] C. STEIN, "Inadmissibility of the usual estimator for the mean of a multivariate normal distribution," Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Berkeley and Los Angeles, University of California Press, 1956, Vol. 1, pp. 197-206. [15] ,"A necessary and sufficient condition for admissibility," Ann. Math. Statist., Vol. 26 (1955), pp. 518-522. [16] , "The admissibility of Pitman's estimator for a single location parameter," Ann. Math. Statist., Vol. 30 (1959), pp. 970-979. [17] , "Multiple regression," Contributions to Probability and Statistics, Essays in Honor of Harold Hotelling, Stanford, Stanford University Press, 1960, pp. 424-443. [18] A. WALD, "Contributions to the theory of statistical estimation and testing hypotheses," Ann. Math. Statist., Vol. 10 (1939), pp. 299-326. [19] R. A. WIJSMAN, "Random orthogonal transformations and their use in some classical distribution problems in multivariate analysis," Ann. Math. Statist., Vol. 28 (1957), pp. 415-423.