Bootstrap methods for nonparametric curve
estimation
Ricardo Cao (Universidade da Coruna, Spain) Masaryk University (Brno, Czech Republic)
April 13-14, 2011
Contents
1 Introduction to the bootstrap 5
1.1 Motivation............................ 6
1.2 The bootstrap method ..................... 8
1.3 Use of the bootstrap ...................... 11
1.4 An example: the median .................... 12
2 Introduction to nonparametric curve estimation 14
2.1   Nonparametric density estimation................ 14
2.1.1    Background....................... 14
2.1.2 Bias, variance and mean squared error......... 16
2.1.3 Mean integrated squared error (MISE)......... 18
2.2   Nonparametric regression estimation.............. 19
3   Bootstrap methods for density estimation 21
3.1   Bootstrap approximation for the sampling distribution of the
Parzen-Rosenblatt kernel estimator............... 21
3.1.1 Asymptotic distribution of the Parzen-Rosenblatt estimator .......................... 22
3.1.2 Plug-inapproximation.................. 23
3.1.3 Bootstrap approximation................ 24
3.2   Bootstrap methods for bandwith selection........... 26
3.2.1 Asymptotic expression for the optimal bandwidth   ... 26
3.2.2 Bootstrap version of MISE.............. 27
3.2.3 Closed expression for MISE* ............. 29
3.2.4 Pilot bandwidth choice................. 30
3.2.5 Asymptotic results.................... 31
3.2.6 Gaussiankernelcase................... 31
3.2.7 Comparisonwithotherbandwidthselectors ...... 33
4   Bootstrap methods for nonparametric regression estimation 35
4.1 Asymptotic distribution of the Nadaraya-Watson estimator  . . 35
4.2 Plug-in approximation...................... 37
4.3 Wild bootstrap.......................... 38
4.4 Smoothed bootstrap in the explanatory variable........ 42
4.5 Comparison of convergence rates................ 47
1   Introduction to the bootstrap
The basic ideas about the bootstrap methods are introduced in this section.
1.1 Motivation
Let us consider a simple random sample (SRS), (X1,X2,...,Xn), from a distribution function F. Consider the problem of constructing a confidence interval for the mean, M,of F, with known standard deviation, a. The classical statistic used for this aim is the well-known standardized diference of the sample mean and the population mean
T=n 1 (n pn=1 xi - m) _
d
If F is a normal distribution then T = N (0,1). Since the distribution of T is known (and it is tabulated) it can be used to construct exact confidence
d
intervals for \±- Even when the distribution of F is not normal then T — N (0, 1), but for samples of moderate or small size the approximation of the distribution of T by a standard normal distribution may be poor.
If F is the exponential distribution with parameter A (F (x) = 1 — exp (—Ax)) then
n d S = £ X = r (A,n),
i=1
	frequencies -i on		
			
	IJU \J ./I4 0		
	1 ' 1 \J 1 120		
	/      1 £- \J I inn		
	J 80		—•—frequencies
	/ 60		
	1           4 0		
			
			
-6                  -4                  -2 (		24	
Figure 1: Distribution of T with n = 5 for an exponential distribution with A =1.
so
2 (ns - a) _ i / a2
n 2
T =-^_-<u_ = n 2[— S - A
1 1 n
A2
which is essentially a recentered and rescaled Gamma distribution with mean zero and variance one. However the distribution of T it is quite different from anormalifn is small.
The relevant question is: Can we use a different approximation for the distribution of T?
Key idea: mimic the data generating process that gave rise to T using the empirical distribution function, Fn,instead of F.
1.2   The bootstrap method
Let us consider a simple random sample X = (Xi,X2,...,Xn) from a distribution function F, and a statistic of interest, R = R(X,F). An example of such a statistic is that one in the previous section:
R (X F) = n 1 (n Pn=iX -   = n2 (n pn=i X- J xdF (x)^
7
J x2dF (x) - (J xdF (x))2
1/2
The main idea of the the bootstrap method (see Efron (1979)) is to approximate the sampling distribution of R by the resampling distribution of
R* = R [X*, Fj ,where F is some estimator of the underlying cdf, F, and X* is a random resample obtained from F, i.e., X* = (X*,X2,...,X*^ where, conditionally on X,the X* are iid observations coming from F.
A general algorithm for the bootstrap method proceeds as follows:
1. Given the sample X = (Xi,X2,...,Xn), construct some estimator, F\of thetruecdf, F.
2. Draw bootstrap resamples X* = (X*,X|,...,X*) from F.
3. Construct R* = R (X*, F).
4. Approximate the sampling distribution of R = R (X,F) by the resampling distribution of R*.
It is very uncommon that the resampling distribution of R* could be computed in a closed form. Nevertheless, one can approximate this resampling distribution by Montecarlo just repeating Steps 2-3 above a large number of times (say B) and considering B replications of the bootstrap version of our statistic: R*1, R*2,..., R*B. The empirical cdf of these bootstrap values
is then an approximation of the resampling distribution of R*, which is an approximation of the sampling distribution of R.
A classical way to estimate F (in a nonparametric way) is to use Fn, the empirical cdf. This leads to the wellknown na'i've bootstrap given by P* (X* = Xi) = -I
n,for i = 1, 2,...,n. In parametric setups (i.e. F G {Fq/6 G ©})itisnatural to resample from Fg,where 0 is an estimator of °9. Thisisthe wellknown parametric bootstrap.
1
B
j=1
Some relevant monographs on the bootstrap are Hall (1992), Efron and Tib-shirani (1993), Shao and Tu (1995) and Davison and Hinkley (1997).
1.3   Use of the bootstrap
The bootstrap method is often used for
• estimating the sampling distribution of a statistic.
• estimating the bias, variance or MSE of some estimator.
• compute confidence intervals for some parameter 0.
• perform hypothesis tests for some parameter 0.
1.4   An example: the median
Let us consider a sample X = (Xi,X2,...,Xn) from a distribution F with median 6 (F) = F—1 ^2)- Suppose that the sample size is odd: n = 2m — 1. By considering the ordered statistics X(i) <X(2) < <X(n),the sample median is just the central ordered statistic: 0 = X(m). In order to construct confidence intervals for 6 we consider the statistic R = R (X,F) = X(m) — 6. The bootstrap version of R is R* = R (X*,Fn) = X(*m) — 6 (Fn) = X(*m) — X(m),where X* is a random resample coming from Fn.
In this setup it is possible to obtain a closed expression for the bootstrap distribution (this is very exceptional):
P* (X*m) < x)   =  P* (# {X*< x/i = 1, 2,...,n} > m)
=   t (n)Fn (x)j (1 — Fn (x))n—j.
Consequently,
P* (X(*m) = X(i))
where
P* (X(*m) < X(i)) - P* (X**m) <X(i))
n — i
—  j \n
n
(Hr)j (■■
n — i + 1N
n
n
n
m^rJ—1 )J(
n — i + 1
nn
So, the bootstrap distribution of R* is
n
n
P* (R* = X(i) — X(m)) = ( .)ajn-
2   Introduction to nonparametric curve estimation
In the following subsections some nonparametric methods for density and regression estimation are introduced.
2.1   Nonparametric density estimation
2.1.1 Background
Let (Xi,X2,...,Xn) be a SRS from a population with distribution function F, absolutelly continuous, density function /. The kernel density estimator proposed by Parzen (1962) and Rosenblatt (1956) is given by
/h (x) = \ X K (X—Xl) = 1 X Kh (x - Xi),
where (u) — hK (j^j, K is a kernel function (typically a symmetric density) and h> 0 is a smoothing parameter, often called bandwidth, that regulated thesizeofthe neighbourusedfor theestimation. This estimator generalizes the wellknown histogram, more precisely the moving histogram. Choosing K as the density of a U (—1,1), the Parzen-Rosenblatt estimator results in:
n 1   (x-X- \ 1 n
i-1 i-1
# {Xi G (x — h,x + h)}
2nh
which is the relative frequency of data Xi in the interval (x — h, x + h) divided by the length of this intervalo (2h).
It is typically assumed that the kernel function, K, is nonnegative and integrates out to one:
/oo K (u) du -1. o
It is also common to assume that K is a symmetric function: K (—u) — K (u).
The choice of the function K does not have a big impact in the properties of the estimator (just in its regularity: continuity, differentiability, etc.) but the
choice of the smoothing parameter is crucial for a correct estimation. In other words, the size of the neighbourhood for the nonparametric estimation should be adecuate (not to large, not too small).
2.1.2   Bias, variance and mean squared error
Straight forward calculations lead the bias of the Parzen-Rosenblatt estimator:
=   (Kh * f )(x) - f (x), where * denotes the convolution operator:
(f * g)(x) = J f (x - y) 9 (y) dy. Using the bias expression an asymptotic expression for the bias can be obtained:
E (fh (x)) - f (x) = ^h2f" (x) + O (h^ , with dK = J t2K (t) dt.
The variance can be handled similarly:
Far (fh (x))
1
nh2 1
nh2
1
Var K
[/k(
x - y h
f (y) dy -(/ KJ^-^f (y) dy) 2
1
=  n[((k-)2 * f) (x) - ((Kh * f )(x))2
nh
1
K2) h * f (x) -- [(K- * f )(x)]
Its asymptotic expression results in:
Var (f- (x)) =        (x) - 1 f (x)2 + O f-) with cK = J K (t)2 dt.
Consequently the mean squared error of the estimator is:
MSE (/h (x))   =  J (/h (x) - / (x)) 2 dx = Sesgo      (x)) 2 + Var (/h (x))
[(Kh * / )(x) - / (x)]2 + ^[(K2) h * / (x)
- - [(Kh * / )(x)]2
n
Its asymptotic expression is:
MSE (/h (x)) = ^h4/" (x)2 + f^/ (x) - I/ (x)2 + O (h6) + O (£)
2.1.3   Mean integrated squared error (MISE)
A global error measure (not for a particular x) of the estimator is them mean integrated squared error:
MISE (/h (x)) =
J E (/h (x) - / (x))2 dx = j MSE (ih (x)) dx =
/ [(Kh * /)(x) - / (x)]2 dx + fK - 1 / [(Kh * /)(x)]2 dx.
An asymptotic expression for it is the following:
(x)) = ^h4 f f" (x)2 dx+^-1 f f (x)2 dx+O fh6) +O f-1 .
J      4     J nh   n J v   ' \n/
The negative effect of choosing a too large or too small bandwidth (h)is evident from this expression.
2.2   Nonparametric regression estimation
Let {(X1,Y1), (X2,Y2),(Xn,Yi)} de aSRS from atwo-dimensional population (X, Y), with E (|Y|) < oo. Wewould liketoestimatethe regression function of Y given X: m (x) = E (Y|x=x). The regression function can be written as:
f (x,y) ,      Jyf (x,y) dy
m (x)   =   / yf2|1(y|x) dy = / y .^j) dy =
f1 (x) f1 (x)
Jyf1|2 (x|y) f2(y) dy     V (x)
f1 (x) f1 (x)
where fi (x) is the marginal density function of X and
* (x) = /yfi|2      f2 (y) dy = e (y/i|2 (x\Y)).
The functions V (x)and fi (x) can be estimated using the kernel method:
A, (x) =    g K (^).
*h (x) =       K (^) Y„
nh i=
which give the Nadaraya-Watson kernel estimator (see Nadaraya (1964) and Watson (1964)):
*h (x) _ n pn=i Kh (x - Xi) Y
h
mh (x) = IF-TT =
fi,h (x)       n pn=i Kh (x - Xi)
Similar properties to those mentioned for the Parzen-Rosenblatt kernel density estimator can be proved for the regression estimator.
3   Bootstrap methods for density estimation
In the next two subsections two different problems in nonparametric density estimation are considered: approximating the sampling distribution of the Parzen-Rosenblatt kernel estimator (to obtain confidence intervals for the density, for instance) and bandwidth selection. The bootstrap is used for these two aims.
3.1   Bootstrap approximation for the sampling distribution of the Parzen-Rosenblatt kernel estimator
Before introducing the bootstrap in this setup the limit distribution of the Parzen-Rosenblatt estimator will be presented. Other approximations will be also considered (see Cao (1990) for futher details on these results).
3.1.1   Asymptotic distribution of the Parzen-Rosenblatt estimator
The minimal requirements for the bias and variance to tend to zero when the sample size tends to infinity are h — 0, nh — oo. Under these assumptions
(nh)1/2 (Jh (x) - / (a)) — N (B,V).
On the other hand, it can be proved that the asymptotically optimal value of h, in the sense of MSE,is h = con-1/5,with
This choice for h leads to the following values for the mean and the variance of the normal limit distribution:
In order to use the limit distribution to construct confidence intervals for f (x) one can ...
B=
V=
2c^^2dKf" (x),
1. estimate B and V and use them in the normal distribution (plug-in method).
2. design a resampling plan and use the bootstrap method.
3.1.2   Plug-in approximation
It consists in estimating B and V by means of
B   =   100/2dKf/ (x) ,
where g is a suitable bandwidth to estimate the second derivative of the density function. Using the Berry-Esseen inequality the following rate of convergence can be obtained:
sup
ze R
P
(nh)1/2 f (x) - / (x)) < z] - «>(
z - B V
Op (n-1/5)
This rate is worse than that of the theoretical normal approximation, based on the exact mean and variance (Bn and Vn):
sup
ze R
P
(nh)i/2 (fh (x) - f (x)) < z] - «>(
n
O (n-2/5)
but is not worse than that of the asymptotic normal, N (B, V), which has a rate of order OP (n-V5V
3.1.3   Bootstrap approximation
The bootstrap resampling plan is as follows:
1. Use the sample (Xi,X2,...,Xn)and a pilot bandwidth, g,to compute the Parzen-Rosenblatt estimator, fg.
2. Draw bootstrap resamples ^Xj,X2>,...,Xn) from the density fg.
3. Construct the bootstrap version of the Parzen-Rosenblatt estimator
/h (x) = K(x-^)
4. Approximate the sampling distribution of (nh)1/2      (x) — / (x)^ by means of the resampling distribution of (nh)1/2 (^fj** (x) — /g (x)).
If we were interested in the bias or the variance of /h (x) (rather than in its asymptotic distribution) then Step 4 in the previous algorithm will be replaced by computing the bootstrap version of the bias, E* (/^ (x) — /g (x)) ,or the
variance, Var* (/h (x)Y
In the previous algorithm, the bandwidth g has to be asymptotically larger than h. In fact, a reasonable choice for g is the minimizer of E Asymptotically, this minimizer has the form
1/9
(/£ (x) — /" (x)
g
5/ (x) J K00 (t)2 dt
The convergence rate for the bootstrap aproximation is given by
sup P
zeR
Op (n-2/9)
(nh)V2 (A (x) - / (*)) < Z - P* [(nh)1/2 (ih (x) - fg (x^ < z
which is better than those of the theoretical normal aproximation and the plug-in method.
3.2   Bootstrap methods for bandwith selection
3.2.1   Asymptotic expression for the optimal bandwidth
The MISE has an asymptotic expression that may be used as a criterion to obtain an optimal value for the smoothing parameter:
h
MISE (h) = AMISE (h) + O (h6) + O (h)
with
n
1
f (x) dx.
The smoothing parameter that minimizes AMISE is
1/5
hAMISE
n
dK I f" (x)2 dx
There exist plenty of methods devoted to the problem of bandwidth selection. Among them we mention plug-in methods, cross validation methods (smoothed or not) and, of course, bootstrap methods (see, for instance, Marron (1992)).
3.2.2   Bootstrap version of MISE
The basic idea consists in providing a smoothed bootstrap resampling plan to estimate MISE. We will follow the proposal by Cao (1993). It consists of the following steps:
1. Given the sample (X1,X2,...,Xn) a pilot bandwidth, g,isusedtocom-pute the Parzen-Rosenblatt kernel estimator, /g.
2. Bootstrap resamples (X*,X2, ...,Xn) are drawn from the density /g.
3. For every S> 0, the bootstrap version of the Parzen-Rosenblatt estimator is computed
JS (x) = nhg X)
4. The bootstrap version of MISE is constructed:
MISE * (S)= / E * (/S (x) — /g (x))2 dx.
5. MISE* (S) is minimized in S> 0 and the bootstrap selector is obtained:
SMlSE = argmin MISE* (S)
S>0
3.2.3   Closed expression for MISE*
It is possible to obtain a closed expression for the bootstrap version of MISE:
2
MISE* (h)  =  / [(Kh * fg) (x) - fg (x)
/ [(Kh * fg) (x)
+^ -- / l(Kh * nh n
dx
2
dx
=  nh - ^ X [(Kh * Kg) * (Kh * Kg)] (x, - Xj)
1 n
+ n2   J2    [(Kh * Kg - Kg) * (Kh * Kg - Kg)] (X* - Xj)
n2
Such a property is not very common for the bootstrap in other setups.
3.2.4   Pilot bandwidth choice
The bandwidth selection problem for the piloto smoothing parameter, g,is close related to minimizing the mean squared error of the density curvature:
2"
The asymptotical value of the pilot bandwidth, g, minimizing the previous expression is
g0
(
J K00 (t)2 dt
3.2.5   Asymptotic results
Using any deterministic piloto bandwidth with the following property -—^ =
9-90 90
O (n-1/14Yitholds
hM/SE - hM/SE hM/SE
M/SE (feM/sE - MISE (hMJSE) MISE (hM/SE)
Op (n-5/14) OP (n-5/7) .
Using somewhat more sofisticated techniques (that let the pilot bandwidth, g, depend on h), a slightly better rates can be obtained:
hM/SE
Op (n-1/2)
3.2.6   Gaussian kernel case
If the kernel function, K, is Gaussian (the density la function of a N (0, i)), then:
Kfr is the density of a N ^0,h2)
Kg is the density of a N (o,g2) • Kh * Kg is the density of a N (0,h2 + g2)
• (Kh * Kg) * (Kh * Kg) is the density of a N (o, 2h2 + 2g2)
• (Kh * Kg) * Kg is the density of a N (0,h2 + 2g2)
• Kg * Kg is the density of a N (0, 2g2)
Consequently,
c 1 n
MISE* (h)  = -K -      Y K,   2    2,1/2 [Xi - Xi) K '        nh    n3 .4=     (2h2+2g2)1/2 V   i 3>
1     n r
+n2 ij=1 [K(2h2+2g2)1/2 (Xi - X)
-2K(h2+2g2)1/2 (Xi - X) + K(2g2)1/2 (Xi - X)
3.2.7   Comparison with other bandwidth selectors
The bootstrap method presented here is very similar to the smoothed cross validation method proposed by Hall, Marron and Park (1992). In comparative simulation studies (see Cao, Cuevas and Gonzalez-Manteiga (1993)) it can be observed that this bootstrap method is very competitive with other methods for bandwidth selection. In general this bootstrap method is the one which presents a better behaviour, together with the solve-the-equation plug-in method by Sheather and Jones (1991) and the smooth cross validation.
There exist other bootstrap bandwidth selectors that exhibit a much worse behaviour. Among them we include:
• Hall (1990), that resamples from the empirical cdf, so it does not mimic the bias.
• Faraway and Jhun (1990), that choose g as the cross validation bandwidth, that results to be too small.
• Taylor (1989), that chooses g = h ,so MISE* (h) — 0, when h — oo, which produces a global minimum of MISE* that is inconsistent with
hMISE.
4   Bootstrap methods for nonparametric regression estimation
In this section, two bootstrap methods are presented for nonparametric regression. The aim is to approximate the sampling distribution of the Nadaraya-Watson estimator. The asymptotic results show the behaviour of these bootstrap methods, conditionally on the sample of the explanatory variable as well as in an unconditional sense.
4.1   Asymptotic distribution of the Nadaraya-Watson estimator
Before embarking on presenting the bootstrap in this context, it is useful to present the limit distribution of the Nadaraya-Watson estimator, given by
mh (x) = -^j-
M;    n En=1 k- (x - Xi)
As in the density case, it can be proved that the minimal conditions required for the consistency of the estimator are h — 0, nh — oo,when n — oo. Under these assumptions,
(nh)1/2 (mh (x) — m (x)) — N (B, V).
On the other hand, it can be proved that the asimptotically optimal value for h,in the sense of MSE,is of the form h = q)n—1/5.For such a bandwidth, the mean and variance for the normal limit distribution are
1 5/2     m" (x) / (x) + 2m0 (x) f (x)
2 / (x)
a2 (x)
V = CK7(xr ■
where / (x) is the marginal density function of X and a2 (x) = Var (Y|x=x) is the conditional la variance of Y given X = x.
As for the density case, to construct confidence intervals for m (x)wemay
1. Estimate B and V and use them in the corresponding normal distribution (plug-in method).
2. Design a resampling plan using the bootstrap method.
The rates of convergence for the approximation of the (conditional or unconditional) distribution of the statistic to the normal limit distribution are:
sup
zR
PYIx [(nh)1/2 (mh (x) - m (x)) < z
sup
zR
P
1 /2
(nh) /  (mh (x) - m (x)) < z
0
0
z - B z - B
~1V~
Op (n-1/5) O (n-2/5) ,
where PY|x (B) denotes P (b|x
)
4.2   Plug-in approximation
The plug-in approximation consists in estimating B and V using suitable estimators of / (x), /0 (x), m (x), m0 (x), m" (x)and a2(x). For any of these functions one could use bandwidth selectors intended to approximate the optimal bandwidths for every one (this is a rather tedious process). Using such
an aproach, estimators for the bias, B, and the variance, "\/,can be obtained.   It may be proved for these estimators that II — B = Op (ji—2/9^j
and V — V = OP (n—2/5). Conseq uently the following (conditional and unconditional) convergence rates can be proved for the plug-in approximation:
sup
ze R
PY\x [(nh)1/2 (mh (x) — m (x)) < z
sup
P
(nh)1/2 (rhh (x) — m (x)) < z
0
0
z — B
Vh
z — II
Vh
OP (n—1/5 OP (n—2/9'
The firstrateisworse than andthe second oneisthe same as theratefor the theoretical normal limit approximation (see Cao (1991)).
4.3   Wild bootstrap
This bootstrap resampling method, proposed by Wu (1986) and estudied by Hardle and Marron (1991), proceeds as follows:
1. Using the Nadaraya-Watson estimator of m (x) and using the initial smoothing parameter, h, the residuals are constructed 0 = Y - m- (Xi), i =
1, 2,...,n.
2. For every index i = 1, 2,...,n, and conditionally on the observed sample, {(X1,Y[), (X2,Y2)(Xn,Yn)}, a bootstrap residual, g*,is drawn from a probability distribution fulfilling E* (g*) =0, E* (g**2) = g2 and
E* (g*3) =g3. Although the third moment condition is not strictly necessary, it is useful to prove the asymptotic validity of the method.
3. Using a pilot bandwidth, g, asymptotically larger than h (i.e. g/h —> o ), bootstrap versions of the observations from the response variable are drawn: Yi* = mOg (Xi) + g*, i = 1, 2,...,n.
4. The bootstrap resample    X1,Y1* , X2,Y2* ,...,(Xn,Yn*)   is used to construct a bootstrap version of the Nadaraya-Watson estimator:
mh (x) = -
hK)    npn=1 k-(x - Xi)
5. The sampling distribution of (nh)1/2 (mh (x) — m (x)) is approximated by the resampling distribution of (nh)1/2 (^fhh (x) — rhg (x)^.
Step 2 is usually carried out by considering a random variable, V*,fulfilling E* (V*) = 0, E* (V*2) = 1 and E* (V*3) = 1, drawing a sample of size n
from it, (vf* ,Vn* ,...,Vj*^j ,and then definining e* = hiV* for i = 1, 2,...,n.
A common choice for the distribution of V* is the discrete distribution that gives positive probability to only two points: P* (V* = a) = p and P* (V* = b) = 1 — p. The distribution can be obtained as the solution of the system of three equations given by the first three moments:
ap + b (1 — p) = 0, a2p + b2(1 — p) = 1, a3p + b3(1 — p)   = 1.
This gives rise to the so called wild bootstrap (or golden section bootstrap), with a = 1-21/2, b = 1+21/2, p = 5+51/2,i.e.
1 - 51/2 \        5 + 51/2 P * (V * =--1 =
P * V
2    ; 10
1 + 51/2 \ ^51/2
2 10
The selection of the pilot bandwidth g, appearing in Step 3, is strongly linked to the estimation of m00 (x), since this is the critical term to estimate B and V. Taking a pilot bandwidth of optimal order in this sense, go ' don-1/9, the following (conditional and unconditional) rates of convergence for the wild
bootstrap approximation can be obtained:
sup
ze R
PYlx [(nh)1/2 (mh (x) - m (x)) < z
-P* [(nh)1/2 (mh (x) - mg (x)) < z =  Op (n-2/9) ,
sup
zR
r      i /o
P (nh) /  (mh (x) - m (x)) < z OP (n-1/5) .
P* [(nh)1/2 (mh (x) - mg (x)) < z
4.4   Smoothed bootstrap in the explanatory variable
The idea of this resampling method is to consider the variability coming from the explanatory variable (in the wild bootstrap this part of the resampling is kept fixed) and also to allow the resampling distribution of Y*|x*=x not to be degenerate (as it is in the two-dimensional naive bootstrap).
The resampling plan, proposed by Cao and Gonzalez-Manteiga (1993), consists of the following steps:
1. Given the sample {(X^Y.), (X2,Y2),      (Xn,Yi)},some estimator of the joint distribution function of (X, Y) is constructed:
1 n r x
Fg (X, y) = 1{Yi<y} K (t - Xi) dt.
n
i=1
It resembles the empirical cumulative distribution function in the response variable and the smoothed distribution function in the expanatory variable.
2. Bootstrap resamples, j ^X*,Y*^ , (x*,Y*) ,      (Xn, Y*)}, are drawn from the distribution Fg (x,y).
3. A bootstrap version of the Nadaraya-Watson estimator is constructed:
rh-h (x) =
4. The resampling distribution of (nh)1/2       (x) - rhg (x)^ is used to approximate sampling distribution of the statistic: (nh) /  (rhh (x) - m (x)).
The optimal pilot bandwidth, g,isalsoofordern—1/9, i.e. asymptotically larger than h.
The two-dimensional distribution used for the resampling mechanismo in Step
2, Fg (x,y), can be replaced by a smoothed distribution in the two variables:
This is equivalent to resample from the two-dimensional density
1n
fg (x, y) =-£ Kg (x — Xi) Kg (y — Yi), ni—1
which is the Parzen-Rosenblatt kernel estimator of the two-dimensional variable (X'Y).
Straight forward calculations can be used to prove that if (X*,Y*)has distri-
bution Fg (x,y), then,
• X* has bootstrap marginal density fg (x).
• The bootstrap marginal distribution of Y* is the empirical cdf of the Yi:
FnY (y) = n pn=i l{Yi<y}-
• The regression function of the bootstrap resampling plan coincides with the Nadaraya-Watson estimation with bandwidth g:
E (Y*\X*=x)= ™g (x) .
• In fact, the conditional distribution of Y*\x*=x is
i Pn=i Kg (x - Xi) 1{Yi<y}
Fg (y\x) =
n
n En=i Kg (x - Xi) '
which is the Nadaraya-Watson kernerl estimador of the conditional distribution function.
As a consequence of the last remark it is easy to design a method to simulate bootstrap values of (X*,Y*), asrequiredinStep2. Todothis, itisenough to simulate X* from the Parzen-Rosenblatt estimator constructed with the
sample of the explnatory variable (this is just the classical smooth bootstrap) andthensimulate Y* from the discrete distribution that gives, to every datum Yi, the following probability
W (X *) =      Kg (X * — Xi)   , , i = 1, 2,...,n.
n En=1 Kg (x — Xj) '
The convergence rates of the bootstrap aproximation given by this method are:
sup
zeR
—PY *Ix*
PY|x [(nS)1/2 (mS (x) — m (x)) < z
(nS)1/2 (mfr (x) — mg (x)) < z
Op (n—2/9) , en probabilidad P,
sup
zR
P
OP (n
(nS)1/2       (x) — m (x)) < z — P * (nS)1/2 (mfr (x) — mg (x)) < z
2/9l .
4.5   Comparison of convergence rates
The following table collects a summary of the convergence rates obtained with all the approximations considered:
Approximation	conditional	unconditional
Normal limit (theoretical)	OP (n-1/5)	O (n-2/5)
Plug-in	OP (n-1/5)	OP (n-2/9;)
Wild bootstrap	OP (n-2/9)	OP (n-1/5)
Smooth bootstrap in the explanatory variable	OP* (n-2/9) in probability P	OP (n-2/9)
Apart from the theoretical normal limit (useless in practice), the convergence rates of the smooth bootstrap in the explanatory variable are equal to or better than (both conditionally and unconditionally) the rest of the methods. In the conditional setting the two bootstrap resmpling plans are the ones which providethebestconvergencerate(n-2/9,comparedton-1/5 for the plugin approximation).   In the unconditional setup the smooth bootstrap in the
explanatory variable and the plug-in approximation exhibit the best rate (n 2/9, compared to n-i/5 for the wild bootstrap).
References
Cao, R. (1990). Ordenes de convergencia para las aproximaciones normal y bootstrap en la estimacion no parametrica de la funcion de densidad. Trabajos de Estadfstica, 5, 23-32.
Cao, R. (1991). Rate of convergence for the wild bootstrap in nonparametric regression. Ann. Statist 19, 2226-2231.
Cao, R. (1993). Bootstrapping the mean integrated squared error. Jr. Mult. Anal. 45, 137—160.
Cao, R., Cuevas, A. and Gonzälez-Manteiga, W. (1993). A comparative study of several smoothing methods in density estimation. Comp. Statist. Data Anal. 17, 153—176.
Cao, R. and Gonzoalez-Manteiga, W. (1993). Bootstrap methods in regression smoothing. Journal of Nonparametric Statistics 2, 379-388.
Davison, A. C. and Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge University Press, Cambridge.
Efron, B. (1979). Bootstrap methods - Another look at the jackknife. Ann. Statist 7,1-26.
Efron, B. and Tibshirani, R.J. (1993). An introduction to the bootstrap. Chapman and Hall, New York.
Faraway, J.J. and Jhun, M. (1990). Bootstrap choice of bandwidth for density estimation. Jr. Amer. Statist. Assoc. 85, 1119—1122.
Hall, P. (1990). Using the bootstrap to estimate mean squared error and select smoothing parameter in nonparametric problems. J. Multivariate Anal. 32, 177—203.
Hall, P. (1992). The bootstrap and Edgeworth expansion. Springer-Verlag, New York.
Hall, P., Marron, J.S. and Park, B. (1992). Smoothed cross-validation. Probab. Theor. Rel. Fields 92, 1—20.
Hardle, W. and Marron, J. S. (1991). Bootstrap simultaneous error bars for nonparametric regression. Ann. Statist. 19, 778—796.
Marron, J.S. (1992). Bootstrap bandwidth selection. In Exploring the limits of the bootstrap, LePage, R. and Billard, L. eds., pp. 249—262. New York: Wiley.
Nadaraya, E.A. (1964). On estimating regression.  Theor. Probab. Appl. 9,
141-142.
Parzen, E. (1962). On estimation of a probability density function and mode. Ann. Math. Statist. 33, 1065—1076.
Rosenblatt, M. (1956). Remarks on some nonparametric estimate of a density function. Ann. Math. Statist. 27, 832—837.
Shao, J. and Tu, D.S. (1995). The jackknife and bootstrap. Springer-Verlag, New York.
Sheather, S.J. and Jones, M.C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. Jr.Royal Statist.Soc.Ser.B 53, 683—690.
Taylor, C. C. (1989). Bootstrap choice of the smoothing parameter in kernel density estimation. Biometrika 76, 705—712.
Watson, G.S. (1964). Smooth regression analysis. Sankhyä Ser. A 26,359372.
Wu, C.-F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Statist. 14, 1261—1350.