1
An Improved Estimator for Removing
Boundary Bias in Kernel CDF Estimation
Jan Koláček
Department of mathematics and statistics
Faculty of Science
Masaryk University
Brno, Czech Republic
www.muni.cz
COMPSTAT’08, 28. August, Porto
CONTENTS 1 - 1
Contents
• Introduction
• Kernel distribution estimators
• Boundary eﬀects
• Proposed estimator
• Examples
• References
COMPSTAT’08, 28. August, Porto
KERNEL ESTIMATORS 2 - 1
Kernel function
Let ν, k be nonnegative integers, 0 ≤ ν ≤ k − 2, k ≤ k0, ν + k even
integer. Let K be a real valued function continuous on R and
satisfying conditions
K ∈ Lip [−1, 1], support(K) = [−1, 1]
1
−1
xj
K(x)dx =



0, 0 ≤ j < k, j = ν
(−1)ν
ν!, j = ν
βk = 0, j = k .
Such a function K is called a kernel of order k and a class of such
functions is denoted by Sν,k.
COMPSTAT’08, 28. August, Porto
KERNEL ESTIMATORS 2 - 2
Table of kernels
ν k Kernel (on [−1, 1])
0 2 K0,2(x) = 3
4 (1 − x2
)
0 2 K0,2(x) = 15
16 (1 − x2
)2
0 2 K0,2(x) = 35
32 (1 − x2
)3
0 4 K0,4(x) = 15
32 (x2
− 1)(7x2
− 3)
2 4 K2,4(x) = 105
16 (1 − x2
)(5x2
− 1)
1 3 K1,3(x) = 15
4 x(1 − x2
)
COMPSTAT’08, 28. August, Porto
KERNEL DISTRIBUTION ESTIMATORS 3 - 1
Kernel distribution estimators
Let X1,. . . ,Xn be independent real random variables each having
the same cumulative distribution F. Our model is deﬁned by the
assumption F ∈ Ck0
, where k0 is a positive integer.
For the given data set the corresponding kernel estimate
of a distribution function F is
Fh,K(x) =
1
n
n
i=1
W
x − Xi
h
, W(x) =
x
−1
K(t)dt (1)
where h is a smoothing parameter called bandwidth (h = h(n)
is a non-random sequence of positive numbers) and K ∈ S0,2,
K(x) ≥ 0 on [−1, 1].
COMPSTAT’08, 28. August, Porto
KERNEL DISTRIBUTION ESTIMATORS 3 - 2
Optimal bandwidth
Under additional assumptions lim
n→∞
h = 0, lim
n→∞
nh = ∞ it can be
shown (e.g. Bowman, A., Hall, P., Prvan, T. [2]) that the leading term
of MISE (Mean Integrated Square Error) takes the form
MISE(Fh,K) =
1
n
F(x)(1 − F(x))dx − q1
h
n
var(Fh,K )
+ q2h4
bias
2
(Fh,K )
,
q1 =
1
−1
W(x)(1 − W(x))dx > 0, q2 =
β2
2
4
(F(2)
(x))2
dx.
Hence, the optimal bandwidth hF
opt,0,2 minimizing MISE with respect
to h is
hF
opt,0,2 = n−1/3 q1
4q2
1/3
. (2)
COMPSTAT’08, 28. August, Porto
BOUNDARY EFFECTS 4 - 1
Boundary Eﬀects
Assumptions:
• Xi, i = 1, . . . , n are nonnegative
• the distribution function F has a support [0, ∞)
• f(0) = 0
Boundary eﬀects arise by estimates in points “near” the left boundary,
it is for x ∈ [0, h].
In next, we will write
x = ch, 0 ≤ c ≤ 1.
COMPSTAT’08, 28. August, Porto
BOUNDARY EFFECTS 4 - 2
X ∼ Exp(1) – the kernel estimate of F (n = 100, hF
opt,0,2 = 0.8479)
−1 0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
h
COMPSTAT’08, 28. August, Porto
BOUNDARY EFFECTS 4 - 3
The Bias of Fh,K(x) in x = ch,
• “near” the left boundary (0 ≤ c < 1):
E(Fh,K(x)) − F(x) = hf(0)
−c
−1
W(t)dt
+ h2
f(1)
(0)



c2
2
+ c
−c
−1
W(t)dt −
c
−1
tW(t)dt



+ o(h2
)
• interior points (c ≥ 1):
E(Fh,K(x)) − F(x) =
h2
2
f(1)
(0)
1
−1
tW(t)dt + o(h2
)
COMPSTAT’08, 28. August, Porto
BOUNDARY EFFECTS 4 - 4
Possible solutions
• boundary kernels – estimators could be negative, some remedies
have been proposed
• pseudo-data – generating some extra data nearby the boundary
and then combining them with the original data
• data transformation
(a) a transformation is selected from a parametric family,
(b) a kernel estimator is applied to transformed data,
(c) estimated values are converted by an inverse formula
• reﬂection method – reﬂecting the data and applying the classical
kernel estimator
Fh,K(x) =
1
n
n
i=1
W
x − Xi
h
− W −
x + Xi
h
(3)
COMPSTAT’08, 28. August, Porto
PROPOSED ESTIMATOR 5 - 1
Proposed estimator
“Generalized” reﬂection method
(Zhang et al. [10], Karunamuni and Alberts [5] – the density case)
Fh,K(x) =
1
n
n
i=1
W
x − g1(Xi)
h
− W −
x + g2(Xi)
h
g1 = g2 ⇒ Fh,K(0) = 0
Set g := g1 = g2
• g is nonnegative, continuous and monotonically increasing
function deﬁned on [0, ∞)
• g−1
exists
• g(0) = 0
• g(1)
(0) = 1
• g(2)
exists and is continuous on [0, ∞).
COMPSTAT’08, 28. August, Porto
PROPOSED ESTIMATOR 5 - 2
The bias of Fh,K(x) at x = ch, 0 ≤ c < 1
E(Fh,K(x)) − F(x) = h2
f(1)
(0)[c2
/2 + 2cI1 − I2]
−f(0)g(2)
(0)[c2
+ 2cI1 − I2]
+ O(h3
),
where I1 =
−c
−1
W(t)dt, I2 =
c
−c
tW(t)dt
The bias of Fh,K(x) at x = ch, c ≥ 1
E(Fh,K(x)) − F(x) =
1
2
h2
f(1)
(0)β2 − f(0)g(2)
(0)[c2
+ β2]
+ O(h3
)
COMPSTAT’08, 28. August, Porto
PROPOSED ESTIMATOR 5 - 3
Set
g(2)
(0) =



d1
c2
2 +2cI1−I2
c2+2cI1−I2
, for 0 ≤ c < 1
d1
β2
c2+β2
, for c ≥ 1
(= Ac)
where
d1 =
f(1)
(0)
f(0)
.
COMPSTAT’08, 28. August, Porto
PROPOSED ESTIMATOR 5 - 4
A construction of g(y)
An estimate of d1
d1 =
f(1)
(0)
f(0)
= (ln f(x))
(1)
x=0 ≈ ˆd1 =
ln f∗
(h1) − ln f∗
(0)
h1
, h1 ≈ n− 1
6
(see Zhang et al. [10],
Karunamuni R.J., Alberts T. [5])
Hence ˆd1 ⇒ Ac
gc(y) = λA2
cy3
+
1
2
Acy2
+ y,
where λ is a positive constant such that λ > 1
12 .
(our experience: λ = 0.1)
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 1
A simulation study
• X ∼ Exp(0.005), n = 100 (Dette, H., Weissbach, R. [3])
• 1 000 replications
• We used the quartic kernel
K0,2(x) =
15
16
(1 − x2
)2
I[−1,1],
where IA is the indicator function on the set A.
• The optimal bandwidth was computed from (2)
• The results were compared with classical estimator (1) and the
reﬂection method (3)
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 2
X ∼ Exp(0.005) – the kernel estimate of F
(n = 100, hF
opt,0,2 = 231.35)
−300 −200 −100 0 100 200 300 400 500 600 700
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 3
A comparison
MISE – Mean Integrated Square Error on the interval [0, hF
opt,0,2]
Method Mean STD
Classical 0.0068 0.0014
Reﬂection 0.0020 0.0020
Proposed 0.0010 0.0014
Table 1. Means and STD’s for MISE
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 4
(1) (2) (3)
0
5
10
15
x 10
−3
MISE for estimates of CDF for the classical estimator with boundary
eﬀects (1), the reﬂection method (2) and for our proposed method (3).
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 5
Classical Reﬂection Proposed
c Mean STD Mean STD Mean STD
0.00 0.0215 0.0048 0.0000 0.0000 0.0000 0.0000
0.25 0.0009 0.0013 0.0023 0.0017 0.0008 0.0010
0.50 0.0021 0.0025 0.0032 0.0032 0.0016 0.0021
0.75 0.0026 0.0033 0.0027 0.0034 0.0017 0.0024
Table 2. Means and STD’s for MSE at x = chF
opt,0,2.
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 6
0 0.25 h 0.5 h 0.75 h 0 0.25 h 0.5 h 0.75 h 0 0.25 h 0.5 h 0.75 h
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
classical reflection proposed
MSE at points x = chF
opt,0,2, c = 0, 0.25, 0.5, 0.75 for the classical
estimator, the reﬂection method and for our proposed method.
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 7
Practical usage
ROC
• The Receiver Operating Characteristic (ROC) describes the
performance of a diagnostic test which classiﬁes subjects into
either group without condition G0 or group with condition G1
by means of a continuous discriminant score X, i.e. subject
is classiﬁed as G1 if X ≥ d and G0 otherwise for the given cutoﬀ
point d ∈ R.
• Let F0 and F1 be the distribution functions of X in the G0
and G1.
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 8
• The ROC is deﬁned as a plot of probability of
false classiﬁcation of subjects from G1
versus the probability of
true classiﬁcation of subjects from G0
across all possible cutoﬀ point values of X.
• ROC curve can be written as
R(p) = 1 − F1(F−1
0 (1 − p)), 0 < p < 1
where p is the false positive rate in (0, 1) as the corresponding
cut-oﬀ point d ranges from −∞ to +∞.
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 9
ROC
−5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
d
G0
G1
FPR
TPR
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 10
Real data
Consumer loans data
The use of some (not speciﬁed) scoring function for predicting the
solidity of a client.
We are interested in determining which clients are able to pay their
loans.
A test set: 332 clients – 309 have paid back their loans (group G0) and
22 had problems with payments or did not pay (group G1).
We use the ROC curve to assess the discrimination between clients
with and without a good solidity.
We want to know if our scoring function is a good predictor of the
solidity.
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 11
The estimate of f0(x) (ˆhf0
opt,0,2 = 0.0032) and f1(x)
(ˆhf1
opt,0,2 = 0.0153) with boundary eﬀects
−0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1
0
10
20
30
40
50
60
70
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 12
The estimate of f0(x) (ˆhf0
opt,0,2 = 0.0032) and f1(x)
(ˆhf1
opt,0,2 = 0.0153) with NO boundary eﬀects
−0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1
0
10
20
30
40
50
60
70
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 13
The estimate of F0(x) (ˆhF 0
opt,0,2 = 0.0068) and F1(x)
(ˆhF 1
opt,0,2 = 0.0286) with boundary eﬀects
−0.04 −0.02 0 0.02 0.04 0.06 0.08
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 14
The estimate of F0(x) (ˆhF 0
opt,0,2 = 0.0068) and F1(x)
(ˆhF 1
opt,0,2 = 0.0286) with NO boundary eﬀects
−0.04 −0.02 0 0.02 0.04 0.06 0.08
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
COMPSTAT’08, 28. August, Porto
EXAMPLES 6 - 15
The estimate of ROC
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
COMPSTAT’08, 28. August, Porto
REFERENCES 7 - 1
References
[1] Azzalini, A.: A note on the estimation of a distribution function
and quantiles by a kernel method. Biometrika, 68, No 1, pp.
326–328, 1981.
[2] Bowman, A., Hall, P., Prvan, T.: Bandwidth selection for the
smoothing of distribution functions. Biometrika, 85, No 4, pp.
799–808, 1998.
[3] Dette, H., Weissbach, R.: Kolmogorov-Smirnov-type testing for
the partial homogeneity of Markov processes – with application to
credit risk. Applied Stochastic Models in Business and Industry,
Vol. 23, No. 3, pp. 223–234, 2007.
[4] Horová, I., Zelinka, J.: Diﬀerent approaches to ROC curve ﬁtting
for a continuous diagnostic test. CSDA, submitted, 2007.
COMPSTAT’08, 28. August, Porto
REFERENCES 7 - 2
[5] Karunamuni, R.J., Alberts T.: On boundary correction in kernel
density estimation. Statistical Methodology 2, pp. 191–212, 2005.
[6] Lloyd, C.J., Zhou Yong: Kernel estimators of the ROC curve are
better than empirical. Statistics and Prob. Letters 44, pp.
221–228, 1999.
[7] Silverman, B.W.: Density estimation for statistics and Data
Analysis. Chapman and Hall, New York, 1986.
[8] Terrell, G. R.: The maximal smoothing principle in density
estimation. Journal of the American Statistical Association. Vol.
85, No. 410, pp. 440-447, 1990.
[9] Wand, I.P. and Jones, M.C.: Kernel smoothing. Chapman & Hall,
London, 1995.
[10] Zhang, S., Karunamuni, R.J., Jones, M.C.: An improved
estimator of the density function at the boundary. Journal of the
Amer. Stat. Assoc., 448, pp. 1231–1241, 1999.
COMPSTAT’08, 28. August, Porto