IEEE SIGNAL PROCESSING MAGAZINE [128] SEPTEMBER 2012 1053-5888/12/$31.00©2012IEEE
Ramsey Faragher
[lecture NOTES]
Understanding the Basis of the Kalman Filter
Via a Simple and Intuitive Derivation
T
his article provides a simple
and intuitive derivation of
the Kalman filter, with the
aim of teaching this useful
tool to students from disciplines
that do not require a strong
mathematical background. The most
complicated level of mathematics
required to understand this derivation is
the ability to multiply two Gaussian
functions together and reduce the result
to a compact form.
The Kalman filter is over 50 years old
but is still one of the most important
and common data fusion algorithms in
use today. Named after Rudolf E.
Kálmán, the great success of the
Kalman filter is due to its small computational
requirement, elegant recursive
properties, and its status as the optimal
estimator for one-dimensional linear
systems with Gaussian error statistics
[1]. Typical uses of the Kalman filter
include smoothing noisy data and providing
estimates of parameters of interest.
Applications include global
positioning system receivers, phaselocked
loops in radio equipment,
smoothing the output from laptop
trackpads, and many more.
From a theoretical standpoint, the
Kalman filter is an algorithm permitting
exact inference in a linear dynamical
system, which is a Bayesian model similar
to a hidden Markov model but where
the state space of the latent variables is
continuous and where all latent and
observed variables have a Gaussian distribution
(often a multivariate Gaussian
distribution). The aim of this lecture
note is to permit people who find this
description confusing or terrifying to
understand the basis of the Kalman filter
via a simple and intuitive derivation.
RELEVANCE
The Kalman filter [2] (and its variants
such as the extended Kalman filter [3]
and unscented Kalman filter [4]) is
one of the most celebrated and popular
data fusion algorithms in the field
of information processing. The most
famous early use of the Kalman filter
was in the Apollo navigation computer
that took Neil Armstrong to the moon,
and (most importantly) brought him
back. Today, Kalman filters are at work
in every satellite navigation device,
every smart phone, and many computer
games.
The Kalman filter is typically derived
using vector algebra as a minimum
mean squared estimator [5], an
approach suitable for students confident
in mathematics but not one that is easy
to grasp for students in disciplines that
do not require strong mathematics. The
Kalman filter is derived here from first
principles considering a simple physical
example exploiting a key property of the
Gaussian distribution—specifically the
property that the product of two
Gaussian distributions is another
Gaussian distribution.
PREREQUISITES
This article is not designed to be a thorough
tutorial for a brand-new student to
the Kalman filter, in the interests of
being concise, but instead aims to provide
tutors with a simple method of
teaching the concepts of the Kalman filter
to students who are not strong
mathematicians. The reader is expected
to be familiar with vector notation and
terminology associated with Kalman filtering
such as the state vector and covariance
matrix. This article is aimed at
those who need to teach the Kalman filter
to others in a simple and intuitive
manner, or for those who already have
some experience with the Kalman filter
but may not fully understand its foundations.
This article is not intended to be a
thorough and standalone education tool
for the complete novice, as that would
require a chapter, rather than a few
pages, to convey.
PROBLEM STATEMENT
The Kalman filter model assumes that
the state of a system at a time t evolved
from the prior state at time t-1 according
to the equation
x F x B u wt t t t t t1= + +- , (1)
where
■ xt is the state vector containing
the terms of interest for the system
(e.g., position, velocity, heading) at
time t
■ ut is the vector containing any
control inputs (steering angle, throttle
setting, braking force)
■ Ft is the state transition matrix
which applies the effect of each system
state parameter at time t-1 on
the system state at time t (e.g., the
position and velocity at time t-1
both affect the position at time t)
■ Bt is the control input matrix
which applies the effect of each
Digital Object Identifier 10.1109/MSP.2012.2203621
Date of publication: 20 August 2012
THE KALMAN FILTER IS
OVER 50 YEARS OLD BUT
IS STILL ONE OF THE MOST
IMPORTANT AND COMMON
DATA FUSION ALGORITHMS
IN USE TODAY.
IEEE SIGNAL PROCESSING MAGAZINE [129] SEPTEMBER 2012
control input parameter in the
vector ut on the state vector (e.g.,
applies the effect of the throttle setting
on the system velocity and
position)
■ wt is the vector containing the
process noise terms for each parameter
in the state vector. The process
noise is assumed to be drawn from a
zero mean multivariate normal
distribution with covariance given by
the covariance matrix Qt.
Measurements of the system can also
be performed, according to the model
H x vzt t t t= + , (2)
where
■ zt is the vector of measurements
■ Ht is the transformation matrix
that maps the state vector parameters
into the measurement domain
■ vt is the vector containing the
measurement noise terms for each
observation in the measurement vector.
Like the process noise, the measurement
noise is assumed to be zero
mean Gaussian white noise with
covariance Rt.
In the derivation that follows, we will
consider a simple one-dimensional tracking
problem, particularly that of a train
moving along a railway line (see
Figure  1). We can therefore consider
some example vectors and matrices in
this problem. The state vector xt contains
the position and velocity of the train
x
x
x
t
t
t
=
o
; E.
The train driver may apply a braking or
accelerating input to the system, which
we will consider here as a function of an
applied force ft and the mass of the train
m. Such control information is stored
within the control vector ut
u
m
f
t
t
= .
The relationship between the force
applied via the brake or throttle during
the time period ∆t (the time elapsed
between time epochs t-1 and t) and the
position and velocity of the train is given
by the following equations:
( )
( )
x x x t
m
f t
2
t t t
t
1 1
2
# T
T
= + +- -o
x x
m
f t
t t
t
1
T
= +-o o .
These linear equations can be written in
matrix form as
( )x
x
t x
x
t
t
m
f1
0 1 2
t
t
t
t
t1
1
2T
T
T
= +
-o
o
; ; ;
>
E E E
H
.
And so by comparison with (1), we can
see for this example that
tT
F B
( )t t1
0 1
andt t
2T T
= =
2
;
>
E
H
.
The true state of the system xt cannot be
directly observed, and the Kalman filter
provides an algorithm to determine an
estimate xtt by combining models of the
system and noisy measurements of certain
parameters or linear functions of
parameters. The estimates of the parameters
of interest in the state vector are
therefore now provided by probability
density functions (pdfs), rather than discrete
values. The Kalman filter is based
on Gaussian pdfs, as will become clear
following the derivation outlined below
in the “Solutions” section. To fully
describe the Gaussian functions, we
need to know their variances and covariances,
and these are stored in the covariance
matrix Pt. The terms along the
main diagonal of Pt are the variances
associated with the corresponding terms
in the state vector. The off-diagonal
terms of Pt provide the covariances
between terms in the state vector. In the
case of a well-modeled, one-dimensional
linear system with measurement errors
drawn from a zero-mean Gaussian distribution,
the Kalman filter has been
shown to be the optimal estimator [1].
In the remainder of this article, we will
derive the Kalman filter equations that
allow us to recursively calculate xtt by
combining prior knowledge, predictions
from systems models, and noisy mea-
surements.
The Kalman filter algorithm involves
two stages: prediction and measurement
update. The standard Kalman filter
equations for the prediction stage are
x F x B ut t t t t t t1 1 1= +; ;- - -t t (3)
P F P F Qt t t t t t t1 1 1
T
= +; ;- - - , (4)
where Qt is the process noise covariance
matrix associated with noisy control
inputs. Equation (3) was derived explicitly
in the discussion above. We can
derive (4) as follows. The variance associated
with the prediction xt t 1; -t of an
unknown true value xt is given by
[( )( ) ]P x x x xE ,t t t t t t t t
T
1 1 1= - -; ; ;- - -t t
and taking the difference between (3)
and (1) gives
Prediction (Estimate)
Measurement (Noisy)
0
r
[FIG1] This figure shows the one-dimensional system under consideration.
IEEE SIGNAL PROCESSING MAGAZINE [130] SEPTEMBER 2012
[lecture NOTES]continued
( )x x F x x wt t t t t t t1 1 1- = - +; ;- - -t t
[( ( )
) ( ( )
) ]
P F x x
F x x
E
w
w
t t t t t
t t t t
t
T
1 1 1 1
1 1 1
&
#
= +
-
+
; ;
;
- - - -
- -
t
t
[( )
( ) ]
[(
) ]
[
]
[ ].
F x x
x x
F F x
x w
w x
x F
w w
E
E
E
E
t t t
t t t
t
t t t
t t
t t
t t
1 1 1
1 1 1
1
1 1
1
1 1
T
T
T
T T
T
#
#
= -
-
+
-
+
-
+
;
;
;
;
- - -
- -
-
-
-
-
t
t
t
t
Noting that the state estimation errors
and process noise are uncorrelated
[( ) ]
( )
E x x w
E w x x 0
t t t t
T
t t t t
T
1 1 1
1 1 1
=
- =
;
;
- - -
- -
t
t6 @
[( )(
) ] [ ]
.
P F x x x
x F w w
P FP F Q
E
E
t t t t t t
t t t t
t t t t t
1 1 1 1 1
1 1
1 1 1
T T T
T
&
&
= -
+
= +
; ;
;
; ;
- - - - -
-
- - -
t
t
The measurement update equations are
given by
x H x )(x K zt t tt t t t t t1 1= + -; ; ;- -t t t (5)
P P K H Pt t t t t t1 t t 1= -; ; - ; - , (6)
where
K P H H P H R( ) .T T
t t t t t t t t t1 1
1
= +; ;- -
-
(7)
In the remainder of this article, we will
derive the measurement update equations
[(5)–(7)] from first principles.
SOLUTIONS
The Kalman filter will be derived here
by considering a simple one-dimensional
tracking problem, specifically that of
a train is moving along a railway line. At
every measurement epoch we wish to
know the best possible estimate of the
location of the train (or more precisely,
the location of the radio antenna mounted
on the train roof). Information is available
from two sources: 1) predictions
based on the last known position and
velocity of the train and 2) measurements
from a radio ranging system deployed at
the track side. The information from the
predictions and measurements are combined
to provide the best possible estimate
of the location of the train. The system is
shown graphically in Figure 1.
The initial state of the system (at
time t = 0 s) is known to a reasonable
accuracy, as shown in Figure 2. The
location of the train is given by a
Gaussian pdf. At the next time epoch
( )t 1 s= , we can estimate the new position
of the train, based on known limitations
such as its position and velocity at
t = 0, its maximum possible acceleration
and deceleration, etc. In practice, we may
have some knowledge of the control
inputs on the brake or accelerator by the
driver. In any case, we have a prediction of
the new position of the train, represented
in Figure 3 by a new Gaussian pdf with a
new mean and variance. Mathematically
this step is represented by (1). The variance
has increased [see (2)], representing
our reduced certainty in the accuracy of
our position estimate compared to t = 0,
due to the uncertainty associated with any
process noise from accelerations or decelerations
undertaken from time t = 0 to
time t = 1.
[FIG2] The initial knowledge of the system at time t = 0. The red Gaussian distribution represents the pdf providing the initial
confidence in the estimate of the position of the train. The arrow pointing to the right represents the known initial velocity of
the train.
??????
Prediction (Estimate)
[FIG3] Here, the prediction of the location of the train at time t = 1 and the level of uncertainty in that prediction is shown. The
confidence in the knowledge of the position of the train has decreased, as we are not certain if the train has undergone any
accelerations or decelerations in the intervening period from t = 0 to t = 1.
THE BEST ESTIMATE
WE CAN MAKE OF THE
LOCATION OF THE TRAIN IS
PROVIDED BY COMBINING
OUR KNOWLEDGE FROM
THE PREDICTION AND THE
MEASUREMENT.
IEEE SIGNAL PROCESSING MAGAZINE [131] SEPTEMBER 2012
At t = 1, we also make a measurement
of the location of the train using
the radio positioning system, and this is
represented by the blue Gaussian pdf in
Figure 4. The best estimate we can make
of the location of the train is provided by
combining our knowledge from the prediction
and the measurement. This is
achieved by multiplying the two corresponding
pdfs together. This is represented
by the green pdf in Figure 5.
A key property of the Gaussian function
is exploited at this point: the product of two
Gaussian functions is another Gaussian
function. This is critical as it permits an
endless number of Gaussian pdfs to be
multiplied over time, but the resulting
function does not increase in complexity or
number of terms; after each time epoch the
new pdf is fully represented by a Gaussian
function. This is the key to the elegant
recursive properties of the Kalman filter.
The stages described above in the figures
are now considered again mathematically
to derive the Kalman filter
measurement update equations.
The prediction pdf represented by
the red Gaussian function in Figure 3 is
given by the equation
( )y r e
2
1; ,
( )r
1 1 1
1
2
2 1
2
1
2
_
r
n v
v
v
n
-
.
(8)
The measurement pdf represented by
the blue Gaussian function in Figure 4
is given by
( )y r e
2
1; ,
( )r
2 2 2
2
2 2 2
2
2
2
_v
r
n
v
v
n
-
.
(9)
The information provided by these two pdfs
is fused by multiplying the two together,
i.e., considering the prediction and the
measurement together (see Figure 5). The
new pdf representing the fusion of the
information from the prediction and measurement,
and our best current estimate of
the system, is therefore given by the product
of these two Gaussian functions
( , )
.
y r
e
,
2
1
2
1
2
1
; ,
( ) ( )
( (
r r
r r
1 1 2 2
1
2 2
2
2 2
1
2
2
2 2 2
fused
) )
1
2
1
2
2
2
2
2
1
2
1
2
2
22 2
r r
r
n v n v
v v
v v
=
v
n
v
n
v
n
v
n
-
-
-
-
-
-
+
e
e#=
e o
(10)
The quadratic terms in this new
function can expanded and then the
whole expression rewritten in Gaussian
form
( )r n(
)
2
y r
1
; ,
2
2
fused fused fused
fused
2
2
fused
fused
r
n v
v
v
e=
, (11)
where
( )
1
2
2
2
1 2
2
2 1
2
1
1
2
2
2
1
2
2 1
fusedn
v v
n v n v
n
v v
v n n
=
+
+
= +
+
- (12)
and
2
1
2
2
2
1
2
2
2
1
2
1
2
2
2
1
4
fusedv
v v
v v
v
v v
v
=
+
= -
+
.
(13)
These last two equations represent the
measurement update steps of the
Kalman filter algorithm, as will be
shown explicitly below. However, to
present a more general case, we need to
consider an extension to this example.
In the example above, it was assumed
that the predictions and measurements
were made in the same coordinate frame
and in the same units. This has resulted
in a particularly concise pair of
??????
Measurement (Noisy)
Prediction (Estimate)
[FIG4] Shows the measurement of the location of the train at time t = 1 and the level of uncertainty in that noisy measurement,
represented by the blue Gaussian pdf. The combined knowledge of this system is provided by multiplying these two pdfs
together.
???
Measurement (Noisy)
Prediction (Estimate)
[FIG5] Shows the new pdf (green) generated by multiplying the pdfs associated with the prediction and measurement of the
train’s location at time t = 1. This new pdf provides the best estimate of the location of the train, by fusing the data from the
prediction and the measurement.
A KEY PROPERTY OF THE
GAUSSIAN FUNCTION IS
EXPLOITED AT THIS POINT:
THE PRODUCT OF TWO
GAUSSIAN FUNCTIONS
IS ANOTHER GAUSSIAN
FUNCTION.
IEEE SIGNAL PROCESSING MAGAZINE [132] SEPTEMBER 2012
[lecture NOTES]continued
equations representing the prediction
and measurement update stages. It is
important to note however that in reality
a function is usually required to map
predictions and measurements into the
same domain. In a more realistic extension
to our example, the position of the
train will be predicted directly as a new
distance along the railway line in units of
meters, but the time of flight measurements
are recorded in units of seconds.
To allow the prediction and measurement
pdfs to be multiplied together, one
must be converted into the domain of
the other, and it is standard practice to
map the predictions into the measurement
domain via the transformation
matrix Ht.
We now revisit (8) and (9) and, instead
of allowing y1 and y2 to both represent values
in meters along the railway track, we
consider the distribution y2 to represent
the time of flight in seconds for a radio
signal propagating from a transmitter
positioned at x = 0 to the antenna on the
train. The spatial prediction pdf y1 is converted
into the measurement domain by
scaling the function by c, the speed of
light. Equations (8) and (9) therefore must
be rewritten as
( )y s c e
2
1; , ,
C
c
s
c
1 1 1
2 21
1 2
1 2
_
r
n v v
v
n
-
-
^
`
c
h
j
m
(14)
and
( )y s e
2
1; ,
s
2 2 2
2
2
2 2
2
2
2
_
r
n v
v
v
n
-^
h
,
(15)
where both distributions are now defined
in the measurement domain, radio signals
propagate along the time “s” axis,
and the measurement unit is the second.
Following the derivation as before
we now find
c c
c
c c1
1 2
2
2
1 2
2
1
fused
v
n n
v
v
n
n
= +
+
-
`
` c
j
j m
.
c
c
c
1
1 2
2
2
1
2
2
1
fused& $n n
v
v
v
n
n
= +
+
-
J
L
K
K
K
K`
c
N
P
O
O
O
Oj
m
(16)
Substituting H c1= and ( )K H 1
2
v=
H2 2
2
2
1v v+^ h results in
K H1 2 1fused $n n n n= + -^ h. (17)
Similarly the fused variance estimate
becomes
c c
c
c
2
2
1 2
1 2
2
2
1 4
fusedv v
v
v
v
= -
+
`
`
`
j
j
j
c
c
c1
2
1
2
2
2
1
2
1
2
2
fused& v v
v
v
v
v
= -
+
J
L
K
K
K
K`
N
P
O
O
O
Oj
.
KH1
2
1
2
v v= - (18)
We can now compare certain terms
resulting from this scalar derivation
with the standard vectors and matrices
used in the Kalman filter algorithm:
■ xt tfused "n ;t : the state vector following
data fusion
■ xt t1 1"n -;t : the state vector before
data fusion, i.e., the prediction
■ Pt t
2
fused "v ; : the covariance matrix
(confidence) following data fusion
■ Pt t 11
2
"v -; : the covariance matrix
(confidence) before data fusion
■ zt2 "n : the measurement vector
■ Rt2
2
"v : the uncertainty matrix
associated with a noisy set of mea-
surements
■ HH t" : the transformation matrix
used to map state vector parameters
into the measurement domain
■ KK
H
H
t2
1
2
2
2
1
2
"
v v
v
=
+
P H H P H R( )T T
t t t t t t t t1 1
1
= +; ;- -
-
:
the Kalman gain.
It is now easy to see how the standard
Kalman filter equations relate to
(17) and (18) derived above:
H
H
H1 2
1
2
2
2
1
2
2 1fused $n n
v v
v
n n= +
+
-e ^o h
x x K z H xt t t t t t t t t1 1" = + =; ; ;- -t t t^ h
H
H
H2
1
2
2
1
2
2
2
1
2
1
2
fusedv v
v v
v
v= -
+
e o
P P K H Pt t t t t t t t1 1" = -; ; ;- - .
CONCLUSIONS
The Kalman filter can be taught using a
simple derivation involving scalar mathematics,
basic algebraic manipulations,
and an easy-to-follow thought experiment.
This approach should permit students
lacking a strong mathematical
interest to understand the core mathematics
underlying the Kalman filter in
an intuitive manner and to understand
that the recursive property of the filter
are provided by the unique multiplicative
property of the Gaussian function.
AUTHOR
Ramsey Faragher (ramsey@cantab.net)
is a principal scientist at the BAE
Systems Advanced Technology Centre,
United Kingdom.
REFERENCES
[1] B. D. O. Anderson and J. B. Moore, Optimal
Filtering. New York: Dover, 2005.
[2] R. E. Kalman, “A new approach to linear filtering
and prediction problems,” J. Basic Eng., vol. 82, no.
1, pp. 35–45, Mar. 1960.
[3] P. D. Groves, Principles of GNSS, Inertial, and
Multisensor Integrated Navigation Systems.
Norwood, MA: Artech House, 2008.
[4] S. J. Julier and J. K. Uhlmann, “Unscented filtering
and nonlinear estimation,” Proc. IEEE, vol. 92,
no. 3, pp. 401–422, 2004.
[5] J. Bibby and H. Toutenburg, Prediction and
Improved Estimation in Linear Models. New York:
Wiley, 1977. [SP]
THE KALMAN FILTER CAN
BE TAUGHT USING A SIMPLE
DERIVATION INVOLVING
SCALAR MATHEMATICS,
BASIC ALGEBRAIC
MANIPULATIONS, AND
AN EASY-TO-FOLLOW
THOUGHT EXPERIMENT.