23rd IEEE/IAPR International Conference on Pattern Recognition 2016, preprint
Learning Robust Features for Gait Recognition
by Maximum Margin Criterion
Michal Balazia (ORCID 0000-0001-7153-9984) and Petr Sojka (ORCID 0000-0002-5768-4007)
Faculty of Informatics, Masaryk University, Botanická 68a, 602 00 Brno, Czech Republic
xbalazia@mail.muni.cz and sojka@fi.muni.cz
Abstract—In the ﬁeld of gait recognition from motion capture
data, designing human-interpretable gait features is a common
practice of many fellow researchers. To refrain from ad-hoc
schemes and to ﬁnd maximally discriminative features we may
need to explore beyond the limits of human interpretability.
This paper contributes to the state-of-the-art with a machine
learning approach for extracting robust gait features directly
from raw joint coordinates. The features are learned by a
modiﬁcation of Linear Discriminant Analysis with Maximum
Margin Criterion so that the identities are maximally separated
and, in combination with an appropriate classiﬁer, used for gait
recognition. Experiments on the CMU MoCap database show
that this method outperforms eight other relevant methods in
terms of the distribution of biometric templates in respective
feature spaces expressed in four class separability coeﬃcients.
Additional experiments indicate that this method is a leading
concept for rank-based classiﬁer systems.
I. Introduction
From the surveillance perspective, gait pattern biometrics is
appealing for its possibility of being performed at a distance and
without body-invasive equipment or subject’s cooperation. This
allows data acquisition without a subject’s consent. As the data
are collected with high participation rate and surveilled subjects
are not expected to claim their identities, the trait is preferably
employed for identiﬁcation rather than for authentication.
Motion capture technology provides video clips of walking
individuals containing structural motion data. The format keeps
an overall structure of the human body and holds estimated
3D positions of major anatomical landmarks as the person
moves. These so-called motion capture data (MoCap) can be
collected online by a system of multiple cameras (Vicon) or a
depth camera (Microsoft Kinect). To visualize motion capture
data (see Figure 1), a simpliﬁed stick ﬁgure representing the
human skeleton (a graph of joints connected by bones) can be
recovered from the values of body point spatial coordinates.
With recent rapid improvement in MoCap sensor accuracy,
we believe in an aﬀordable MoCap technology that can be
installed in the streets and identify people from MoCap data.
Primary goal of this work is to project a method for learning
robust gait features from raw MoCap data. A collection of
extracted features builds a gait template that serves as the
walker’s signature. Templates are stored in a central database.
Recognition of a person involves capturing their walk sample,
extracting gait features to compose a template, and ﬁnally
querying this database for a set of similar templates to report
the most likely identity. Similarity of two templates is expressed
in a single number computed by a similarity/distance function.
z
y
x
Figure 1. Motion capture data. Skeleton is represented by a stick ﬁgure of
31 joints (only 17 are drawn here). Seven selected video frames of a walk
sequence contain 3D coordinates of each joint in time. The red and blue lines
track trajectories of hands and feet. [1]
Many geometric gait features have been introduced over
the past few years: Ahmed et al. [2] extract horizontal and
vertical distances of selected joint pairs. Andersson et al. [3]
calculate mean and standard deviation in the signals of lower
joint (hips, knees and ankles) angles. Ball et al. [4] select mean,
standard deviation and maximum of the signals of lower joint
(hips, knees and ankles) angles. Dikovski et al. [5] construct
7 diﬀerent feature sets from a broad spectrum of geometric
features, such as static body parameters, joint angles and interjoint
distances aggregated within a gait cycle, along with
various statistics. Kwolek et al. [6] extract bone rotations, interjoint
distances, and the person’s height. Preis et al. [7] deﬁne
thirteen pose attributes, eleven of them static body parameters
and the other two dynamic parameters: step length and walk
speed. Sinha et al. [8] combine a number of gait features:
areas of upper and lower body, inter-joint distances and all
features introduced by Ball et al. [4] and Preis et al. [7].
Clearly, joint angles and step length are schematic and humaninterpretable,
which is convenient for visualizations and for
intuitive understanding of the models, but unnecessary for
automatic gait recognition. This application prefers learning
features that maximally separate the identities and are not
limited by such dispensable factors. Section II gives a scheme
for learning the features by Maximum Margin Criterion.
We have implemented and evaluated all methods on a testing
database described in Section III-A. Their performance is
expressed in several evaluation metrics deﬁned in Section III-B.
Results are presented and discussed in Section III-C.
1
II. Learning Gait Features
In statistical pattern recognition, reducing space dimensionality
is a common technique to overcome class estimation
problems. Classes are discriminated by projecting highdimensional
input data onto low-dimensional sub-spaces by
linear transformations with the goal of maximizing the class
separability. We are interested in ﬁnding an optimal feature
space where a gait template is close to those of the same walker
and far from those of diﬀerent walkers.
Let the model of a human body have J joints and all samples
be linearly normalized to their average length T. Labeled
learning data in a sample (measurement) space have the form
𝒢L = {(gn, n)}NL
n=1 where
gn =
[︁
[γ1 (1) · · · γJ (1)]⊤
· · · [γ1 (T) · · · γJ (T)]⊤
]︁⊤
(1)
is a gait sample (one gait cycle) in which γj (t) ∈ R3
are 3D
spatial coordinates of a joint j ∈ {1, . . . , J} at time t ∈ {1, . . . , T}
normalized with respect to the person’s position and walk
direction. See that 𝒢L has dimensionality D = 3JT. Each
learning sample falls strictly into one of the learning identity
classes {ℐc}C
c=1. A class ℐc has Nc samples. Here ℐc ∩ ℐc′ = ∅
for c c′
and 𝒢L =
⋃︀C
c=1 ℐc. n is the ground-truth label of
the walker’s identity class. We say that samples gn and gn′
share a common walker if they are in the same class, i.e.,
gn, gn′ ∈ ℐc ⇔ n = n′ .
We measure class separability of a given feature space by a
representation of the Maximum Margin Criterion (MMC) used
by the Vapnik’s Support Vector Machines (SVM) [9]
𝒥 =
1
2
C∑︁
c,c′=1
(︁
(µc − µc′ )⊤
(µc − µc′ ) − tr (Σc + Σc′ )
)︁
(2)
which is actually a summation of 1
2C(C − 1) between-class
margins. The margin is deﬁned as the Euclidean distance of
class means minus both individual variances (traces of scatter
matrices Σc = 1
Nc
∑︀Nc
n=1
(︁
g(c)
n − µc
)︁ (︁
g(c)
n − µc
)︁⊤
and similarly for
Σc′ ). For the whole labeled data, we denote the between- and
within-class and total scatter matrices
ΣB =
CL∑︁
c=1
(µc − µ) (µc − µ)⊤
ΣW =
CL∑︁
c=1
1
Nc
Nc∑︁
n=1
(︁
g(c)
n − µc
)︁ (︁
g(c)
n − µc
)︁⊤
ΣT =
CL∑︁
c=1
1
Nc
Nc∑︁
n=1
(︁
g(c)
n − µ
)︁ (︁
g(c)
n − µ
)︁⊤
= ΣB + ΣW
(3)
where g(c)
n denotes the n-th sample in class ℐc and µc and µ are
sample means for class ℐc and the whole data set, respectively,
that is, µc = 1
Nc
∑︀Nc
n=1 g(c)
n and µ = 1
NL
∑︀NL
n=1 gn. Now we obtain
𝒥 =
1
2
C∑︁
c,c′=1
(µc − µc′ )⊤
(µc − µc′ ) −
1
2
C∑︁
c,c′=1
tr (Σc + Σc′ )
=
1
2
C∑︁
c,c′=1
(µc − µ + µ − µc′ )⊤
(µc − µ + µ − µc′ ) −
C∑︁
c=1
tr (Σc)
=tr
⎛
⎜⎜⎜⎜⎜⎝
C∑︁
c=1
(µc − µ) (µc − µ)⊤
⎞
⎟⎟⎟⎟⎟⎠ − tr
⎛
⎜⎜⎜⎜⎜⎝
C∑︁
c=1
Σc
⎞
⎟⎟⎟⎟⎟⎠
=tr (ΣB) − tr (ΣW) = tr (ΣB − ΣW) .
(4)
Since tr (ΣB) measures the overall variance of the class mean
vectors, a large one implies that the class mean vectors scatter
in a large space. On the other hand, a small tr (ΣW) implies
that classes have a small spread. Thus, a large 𝒥 indicates
that samples are close to each other if they share a common
walker but are far from each other if they are performed by
diﬀerent walkers. Extracting features, that is, transforming the
input data in the measurement space into a feature space of
higher 𝒥, can be used to link new observations of walkers
more successfully.
Feature extraction is given by a linear transformation (feature)
matrix Φ ∈ RD×̂︀D
from a D-dimensional measurement space
𝒢 = {gn}N
n=1 of not necessarily labeled gait samples into a ̂︀Ddimensional
feature space ̂︀𝒢 =
{︀
̂︀gn
}︀N
n=1 of gait templates where
̂︀D < D. Gait samples gn are transformed into gait templates
̂︀gn = Φ⊤
gn. The objective is to learn a transform Φ that
maximizes the accumulated margin of the feature space
𝒥 (Φ) = tr
(︁
Φ⊤
(ΣB − ΣW) Φ
)︁
. (5)
Once the transformation is found, all measured samples are
transformed into templates (in the feature space) along with
the class means and covariances. The templates are compared
by the Mahalanobis distance function
̂︀δ
(︀
̂︀gn,̂︀gn′
)︀
=
√︁
(︀
̂︀gn −̂︀gn′
)︀⊤ ̂︀Σ−1
T
(︀
̂︀gn −̂︀gn′
)︀
. (6)
Now we show that solution to the optimization problem in
Equation (5) can be obtained by eigendecomposition of the
matrix ΣB − ΣW. An important property to notice about the
objective 𝒥 (Φ) is that it is invariant w.r.t. rescalings Φ → αΦ.
Since it is a scalar itself, we can always choose Φ = f1‖ · · · ‖f̂︀D
such that f⊤
̂︀d
f̂︀d = 1 and reduce the problem of maximizing
𝒥 (Φ) into the constrained optimization problem
max
̂︀D∑︁
̂︀d=1
f⊤
̂︀d
(ΣB − ΣW) f̂︀d
subject to f⊤
̂︀d
f̂︀d − 1 = 0 ∀̂︀d = 1, . . . , ̂︀D.
(7)
To solve the above optimization problem, let us consider the
Lagrangian
ℒ
(︁
f̂︀d, λ̂︀d
)︁
=
̂︀D∑︁
̂︀d=1
f⊤
̂︀d
(ΣB − ΣW) f̂︀d − λ̂︀d
(︁
f⊤
̂︀d
f̂︀d − 1
)︁
(8)
2
with multipliers λ̂︀d. To ﬁnd the maximum, we derive it with
respect to f̂︀d and equate to zero
∂ℒ
(︁
f̂︀d, λ̂︀d
)︁
∂f̂︀d
=
(︁
(ΣB − ΣW) − λ̂︀dI
)︁
f̂︀d = 0 (9)
which leads to
(ΣB − ΣW) f̂︀d = λ̂︀df̂︀d (10)
where λ̂︀d are the eigenvalues of ΣB − ΣW and f̂︀d are the
corresponding eigenvectors. Therefore,
𝒥 (Φ) = tr
(︁
Φ⊤
(ΣB − ΣW) Φ
)︁
= tr
(︁
Φ⊤
ΛΦ
)︁
= tr (Λ) (11)
is maximized when Λ = diag
(︁
λ1, . . . , λ̂︀D
)︁
has ̂︀D largest eigenvalues
and Φ contains the corresponding leading eigenvectors.
In the following we discuss how to calculate the eigenvectors
of ΣB − ΣW and to determine an optimal dimensionality ̂︀D of
the feature space. First, we rewrite ΣB − ΣW = 2ΣB − ΣT. Note
that the null space of ΣT is a subspace of that of ΣB since
the null space of ΣT is the common null space of ΣB and ΣW.
Thus, we can simultaneously diagonalize ΣB and ΣT to some
∆ and I
Ψ⊤
ΣBΨ = ∆
Ψ⊤
ΣTΨ = I
(12)
with the D × rank (ΣT) eigenvector matrix
Ψ = ΩΘ− 1
2 Ξ (13)
where Ω and Θ are the eigenvector and eigenvalue matrices
of ΣT, respectively, and Ξ is the eigenvector matrix of
Θ−1/2
Ω⊤
ΣBΩΘ−1/2
. To calculate Ψ, we use a fast two-step
algorithm [10] in virtue of Singular Value Decomposition
(SVD). SVD expresses a real r × s matrix A as a product
A = UDV⊤
where D is a diagonal matrix with decreasing nonnegative
entries, and U and V are r×min {r, s} and s×min {r, s}
eigenvector matrices of AA⊤
and A⊤
A, respectively, and the
non-vanishing entries of D are square roots of the non-zero
corresponding eigenvalues of both AA⊤
and A⊤
A. See that ΣT
and ΣB can be expressed in the forms
ΣT = XX⊤
where X =
1
√
NL
[︀
(g1 − µ) · · ·
(︀
gNL
− µ
)︀]︀
and
ΣB = ΥΥ⊤
where Υ =
[︀
(µ1 − µ) · · ·
(︀
µCL
− µ
)︀]︀
,
(14)
respectively. Hence, we can obtain the eigenvectors Ω and the
corresponding eigenvalues Θ of ΣT through the SVD of X
and analogically Ξ of Θ−1/2
Ω⊤
ΣBΩΘ−1/2
through the SVD of
Θ−1/2
Ω⊤
Υ. The columns of Ψ are clearly the eigenvectors of
2ΣB −ΣT with the corresponding eigenvalues 2∆−I. Therefore,
to constitute the transform Φ by maximizing the MMC, we
should choose the eigenvectors in Ψ that correspond to the
eigenvalues of at least 1
2 in ∆. Note that ∆ contains at most
rank (ΣB) = C − 1 positive eigenvalues.
We found inspiration in the Fisher’s Linear Discriminant
Analysis (LDA) [11] that uses the Fisher’s criterion
𝒥 (ΦLDA) = tr
(︃
Φ⊤
LDAΣBΦLDA
Φ⊤
LDAΣWΦLDA
)︃
. (15)
However, since the rank of ΣW is at most NL −C, it is a singular
(non-invertible) matrix if NL is less than D+C, or, analogously
might be unstable if NL ≪ D. Small sample size is a substantial
diﬃculty as it is necessary to calculate Σ−1
W . To alleviate this,
the measured data can be ﬁrst projected to a lower dimensional
space using Principal Component Analysis (PCA), resulting in
a two-stage PCA+LDA feature extraction technique [12]
𝒥 (ΦPCA) = tr
(︁
Φ⊤
PCAΣTΦPCA
)︁
𝒥 (ΦLDA) = tr
(︃
Φ⊤
LDAΦ⊤
PCAΣBΦPCAΦLDA
Φ⊤
LDAΦ⊤
PCAΣWΦPCAΦLDA
)︃
(16)
and the ﬁnal transform is Φ = ΦPCAΦLDA. Given that there are
D principal components, then regardless of the dimensionality
D there are at least D + 1 independent data points. Thus, if
the D × D matrix Φ⊤
PCAΣWΦPCA is estimated from NL − C
independent observations and providing the C ≤ D ≤ NL − C,
we can always invert Φ⊤
PCAΣWΦPCA and this way obtain the
LDA estimate. Note that this method is sub-optimal for multiclass
problems [13] as PCA keeps at most NL − C principal
components whereas at least NL − 1 of them are necessary
in order not to lose information. PCA+LDA in this form has
been used for silhouette-based (2D) gait recognition by Su et
al. [14] and is included in our experiments with MoCap (3D).
On given labeled learning data 𝒢L, Algorithm 1 and
Algorithm 2 provided below are eﬃcient ways of learning
the transforms Φ for MMC and PCA+LDA, respectively.
Algorithm 1 LearnTransformationMatrixMMC(𝒢L)
1: split 𝒢L = {(gn, n)}NL
n=1 into {ℐc}CL
c=1 of Nc = |ℐc| samples
2: compute overall mean µ = 1
NL
∑︀NL
n=1 gn and individual class
means µc = 1
Nc
∑︀Nc
n=1 g(c)
n
3: compute ΣB =
∑︀CL
c=1
(µc − µ) (µc − µ)⊤
4: compute X = 1√
NL
[︀
(g1 − µ) · · ·
(︀
gNL
− µ
)︀]︀
5: compute Υ =
[︀
(µ1 − µ) · · ·
(︀
µCL
− µ
)︀]︀
6: compute eigenvectors Ω and corresponding eigenvalues Θ
of ΣT through SVD of X
7: compute eigenvectors Ξ of Θ−1/2
Ω⊤
ΣBΩΘ−1/2
through SVD of Θ−1/2
Ω⊤
Υ
8: compute eigenvectors Ψ = ΩΘ−1/2
Ξ
9: compute eigenvalues ∆ = Ψ⊤
ΣBΨ
10: return transform Φ as eigenvectors in Ψ
that correspond to the eigenvalues of at least 1/2 in ∆
Algorithm 2 LearnTransformationMatrixPCALDA(𝒢L)
1: split 𝒢L = {(gn, n)}NL
n=1 into {ℐc}CL
c=1 of Nc = |ℐc| samples
2: compute overall mean µ = 1
NL
∑︀NL
n=1 gn and individual class
means µc = 1
Nc
∑︀Nc
n=1 g(c)
n
3: compute ΣB =
∑︀CL
c=1
(µc − µ) (µc − µ)⊤
4: compute ΣW =
∑︀CL
c=1
1
Nc
∑︀Nc
n=1
(︁
g(c)
n − µc
)︁ (︁
g(c)
n − µc
)︁⊤
5: compute eigenvectors ΦPCA of ΣT = ΣB + ΣW
that correspond to D largest eigenvalues (we set D = CL)
6: compute eigenvectors ΦLDA
of (Φ⊤
PCAΣWΦPCA)−1
(Φ⊤
PCAΣBΦPCA)
7: return transform Φ = ΦPCAΦLDA
3
III. Experimental Evaluation
A. Database
For the evaluation purposes we have extracted a large number
of samples from the general MoCap database from CMU [15]
as a well-known and recognized database of structural human
motion data. It contains numerous motion sequences, including
a considerable number of gait sequences. Motions are recorded
with an optical marker-based Vicon system. People wear a
black jumpsuit and have 41 markers taped on. The tracking
space of 30 m2
, surrounded by 12 cameras of sampling rate of
120 Hz in the height from 2 to 4 meters above ground, creates a
video surveillance environment. Motion videos are triangulated
to get highly accurate 3D data in the form of relative body
point coordinates (with respect to the root joint) in each video
frame and stored in the standard ASF/AMC data format. Each
registered participant is assigned with their respective skeleton
described in an ASF ﬁle. Motions in the AMC ﬁles store bone
rotational data, which is interpreted as instructions about how
the associated skeleton deforms over time.
To use the collected data in a fairly manner, a prototypical
skeleton is constructed and used to represent bodies of all
subjects, shrouding the unique skeleton parameters of individual
walkers. Assuming that all walking subjects are physically
identical disables a trivial skeleton check as a potentially unfair
100% classiﬁer. Moreover, this is a skeleton-robust solution as
all bone rotational data are linked with a ﬁxed skeleton. To
obtain realistic parameters, it is calculated as the mean of all
skeletons in the provided ASF ﬁles.
3D joint coordinates are calculated using bone rotational
data and the prototypical skeleton. One cannot directly use raw
values of joint coordinates, as they refer to absolute positions in
the tracking space, and not all potential methods are invariant to
person’s position or walk direction. To ensure such invariance,
the center of the coordinate system is moved to the position
of root joint γroot (t) = [0, 0, 0]⊤
for each time t and axes are
adjusted to the walker’s perspective: the X axis is from right
(negative) to left (positive), the Y axis is from down (negative)
to up (positive), and the Z axis is from back (negative) to front
(positive). In the AMC ﬁle structure notation it is achieved by
zeroing the root translation and rotation (root 0 0 0 0 0 0)
in all frames of all motion sequences.
Since the general motion database contains all motion types,
we extracted a number of sub-motions that represent gait cycles.
First, an exemplary gait cycle was identiﬁed, and clean gait
cycles were then ﬁltered out using the DTW distance over bone
rotations. The similarity threshold was set high enough so that
even the least similar sub-motion still semantically represents
a gait cycle. Finally, subjects that contributed with less than
10 samples were excluded. The ﬁnal database has 54 walking
subjects that performed 3,843 samples in total, which makes
an average of about 71 samples per subject.
B. Performance Metrics
All results are estimated with nested cross-validation (see
Figure 2) that involves two partial cross-validation loops. In
Figure 2. Nested cross-validation.
the outer 3-fold cross-validation loop, NL labeled templates in
one fold are used for learning the features and thus training
the model. This model is frozen and ready to be evaluated for
class separability coeﬃcients on the remaining two folds of NE
labeled templates. Both learning and evaluation sets contain
templates of all C identities. Evaluation of classiﬁcation metrics
advances to the inner 10-fold cross-validation loop taking one
dis-labeled fold as a testing set and other nine labeled folds
as gallery. Test templates are classiﬁed by the winner-takes-all
strategy, in which a test template ̂︀gtest
gets assigned with the
label argmini
̂︀δ
(︁
̂︀gtest,̂︀g
gallery
i
)︁ of the gallery’s closest identity class.
Correct Classiﬁcation Rate (CCR) is often perceived as the
ultimate qualitative measure; however, if a method has a low
CCR, we cannot directly say if the system is failing because
of bad features or a bad classiﬁer. It is more explanatory to
provide an evaluation in terms of class separability of the
feature space. The class separability measures give an estimate
on the recognition potential of the extracted features and do
not reﬂect eventual combination with an unsuitable classiﬁer:
∙ Davies-Bouldin Index: DBI
DBI =
1
C
C∑︁
c=1
max
1≤c′≤C, c′ c
σc + σc′
̂︀δ
(︀
̂︀µc,̂︀µc′
)︀ (17)
where σc = 1
Nc
∑︀Nc
n=1
̂︀δ
(︀
̂︀gn,̂︀µc
)︀
is the average distance of all
elements in identity class ℐc to its centroid, and analogically
for σc′ . Templates of low intra-class distances and of high
inter-class distances have a low DBI.
∙ Dunn Index: DI
DI =
min
1≤c<c′≤C
̂︀δ
(︀
̂︀µc,̂︀µc′
)︀
max
1≤c≤C
σc
(18)
with σc from the above DBI. Since this criterion seeks classes
with high intra-class similarity and low inter-class similarity,
a high DI is more desirable.
∙ Silhouette Coeﬃcient: SC
SC =
1
NE
NE∑︁
n=1
b(̂︀gn) − a(̂︀gn)
max
{︀
a
(︀
̂︀gn), b(̂︀gn
)︀}︀ (19)
where a(̂︀gn) = 1
Nc
∑︀Nc
n′=1
̂︀δ
(︀
̂︀gn,̂︀gn′
)︀
is the average distance from
̂︀gn to other samples within the same identity class and b(̂︀gn) =
min
1≤c′≤C, c′ c
1
Nc′
∑︀Nc′
n′=1
̂︀δ
(︀
̂︀gn,̂︀gn′
)︀
is the average distance of ̂︀gn to
the samples in the closest class. It is clear that −1 ≤ SC ≤ 1 and
SC close to one means that classes are appropriately separated.
4
∙ Fisher’s Discriminant Ratio: FDR
FDR =
1
C
∑︀C
c=1
̂︀δ
(︀
̂︀µc,̂︀µ
)︀
1
NE
∑︀C
c=1
∑︀Nc
n=1
̂︀δ
(︀
̂︀gn,̂︀µc
)︀. (20)
High FDR is preferred for seeking classes with low intra-class
sparsity and high inter-class sparsity.
Apart from analyzing distribution of templates in the feature
space, it is schematic to combine the features with a rankbased
classiﬁer and to evaluate the system based on distance
distribution with respect to a probe. For obtaining a more
applied performance evaluation, we evaluate:
∙ Cumulative Match Characteristic: CMC
Sequence of Rank-k (for k on X axis from 1 up to the number
of classes) recognition rates (Y axis) for measuring ranking
capabilities of a recognition method. Its headline Rank-1 is
the well-known CCR.
∙ False Accept Rate vs. False Reject Rate: FAR/FRR
Two sequences of the error rates (Y axis) as functions of the
discrimination threshold (X axis). Each method has a value e
of this threshold giving Equal Error Rate (EER=FAR=FRR).
∙ Receiver Operating Characteristic: ROC
Sequence of True Accept Rate (TAR) and False Accept Rate
with a varied discrimination threshold. For a given threshold
the system signalizes both TAR (Y axis) and FAR (X axis).
The value of Area Under Curve (AUC) is computed as the
integral of the ROC curve.
∙ Recall vs. Precision: RCL/PCN
Sequence of the rates with a varied discrimination threshold.
For a given threshold the system signalizes both RCL (X axis)
and PCN (Y axis). The value of Mean Average Precision
(MAP) is computed as the area under the RCL/PCN curve.
C. Results of Comparative Evaluation
In this section we provide comparative evaluation results of
the proposed MMC-based feature extraction method against the
related methods mentioned in Section I in terms of performance
metrics deﬁned in Section III-B. All methods were implemented
with their best-performance features, as Table I shows. To
ensure a fair comparison, we evaluate all methods on the same
experimental database described in Section III-A. Figure 3 and
Table II present results of the four class separability coeﬃcients
and the rank-based classiﬁcation metrics.
method features template dimensionality
Ahmed [2] HDF + VDF 24
Andersson [3] all described features 80
Ball [4] all described features 18
Dikovski [5] Dataset 3 71
Kwolek [6] g_all 660
Preis [7] static and dynamic 13
Sinha [8] all described features 45
PCA+LDA learned by PCA+LDA between C and NL − C
MMC learned by MMC up to C − 1
Table I
Configuration details of each implemented method. We advise readers to read
the original papers for better understanding of the terminology in this table.
The goal of the MMC-based learning is to ﬁnd a linear
discriminant that maximizes the misclassiﬁcation margin. This
optimization technique appears to be more eﬀective than
designing geometric gait features. Figure 3 and in Table II
indicate the best results: lowest DBI, highest DI, highest (and
exclusively positive) SC, second highest FDR and, combined
with rank-based classiﬁer, the best CMC, FAR/FRR, ROC and
RCL/PCN scores along with all CCR, EER, AUC and MAP.
We interpret the high scores as a sign of robustness. Apart
from performance merits, the MMC method is also eﬃcient:
relatively low-dimensional templates (see the third column in
Table I) and Mahalanobis distance (6) ensure fast distance
computations and thus contribute to high scalability.
IV. Conclusions and Future Work
The ﬁeld of pattern recognition has recently advanced to
an era where best results are often obtained using a machine
learning approach. Finding optimal features for MoCap-based
gait recognition is not an exception. This work introduces
the concept of learning robust and discriminative features
directly from raw joint coordinates by a modiﬁcation of the
Fisher’s Linear Discriminant Analysis with Maximum Margin
Criterion with the goal of maximal separation of identity classes.
The introduced MMC method avoids instinctive drawing of
ad-hoc features; on the contrary, they are computed from a
much larger space beyond the limits of human interpretability.
The collection of learned features achieves leading scores in
four class separability coeﬃcients and therefore has a strong
potential in gait recognition applications. This is demonstrated
by outperforming other methods in numerous rank-based
classiﬁcation metrics. We believe that MMC is a suitable
criterion for optimizing gait features; however, our future work
will continue with research on further potential optimality
criterions and machine learning approaches.
Furthermore, we are investigating on whether these features
can discriminate diﬀerent people than exclusively who they are
learned on. Can the number of learning identities be (much)
smaller than the number of walkers encountered in the real
operation? The main idea is to learn what aspects of walk
people generally diﬀer in and extract those as gait features. This
is particularly important for a system to aid video surveillance
applications where encountered walkers never supply labeled
data and new identities can appear on the ﬂy.
Acknowledgments: The data used in this project was created
with funding from NSF EIA-0196217 and was obtained from
http://mocap.cs.cmu.edu [15]. Our extracted database and
implementation details of the compared methods are available
at https://gait.ﬁ.muni.cz to support reproducibility of results.
References
[1] J. Valcik, J. Sedmidubsky, and P. Zezula, “Assessing Similarity Models
for Human-Motion Retrieval Applications,” Computer Animation and
Virtual Worlds, vol. 27, no. 5, pp. 484–500, 2016. [Online]. Available:
http://dx.doi.org/10.1002/cav.1674
[2] M. Ahmed, N. Al-Jawad, and A. Sabir, “Gait Recognition Based on
Kinect Sensor,” Proc. SPIE, vol. 9139, pp. 91 390B–91 390B–10, 2014.
[Online]. Available: http://dx.doi.org/10.1117/12.2052588B
5
0.4
0.6
0.8
1
CMC
Ahmed
Andersson
Ball
Dikovski
Kwolek
0
0.2
0.4
1591317212529333741454953Rank
Kwolek
Preis
Sinha
PCA+LDA
MMC
0.4
0.6
0.8
1.0
CMC
0.0
0.2
0.4
1 5 9 13 17 21 25 29 33 37 41 45 49 53
rank
0.4
0.6
0.8
1.0
FAR/FRR
0.0
0.2
0.4
0 e 2e
FAR/FRR
discrimination threshold
0.4
0.6
0.8
1.0
TAR
0.0
0.2
0.4
0.0 0.2 0.4 0.6 0.8 1.0
FAR
0.4
0.6
0.8
1.0
PCN
0.0
0.2
0.4
0.0 0.2 0.4 0.6 0.8 1.0
RCL
Figure 3. Four rank-based classiﬁer performance metrics CMC (top left), EER-centered FAR/FRR (top right), ROC (bottom left) and RCL/PCN (bottom right).
class separability coeﬃcients classiﬁcation based metrics
method DBI DI SC FDR CCR EER AUC MAP
Ahmed [2] 189.31 1.1823 −0.1323 0.9213 0.7134 0.411 0.6387 0.1617
Andersson [3] 175.5 1.2047 −0.1117 0.9277 0.7787 0.413 0.6545 0.1926
Ball [4] 200.75 1.014 −0.14 0.982 0.6963 0.3906 0.6612 0.1454
Dikovski [5] 138.44 1.788 −0.0873 1.2803 0.8926 0.3625 0.6964 0.2582
Kwolek [6] 151.28 1.2027 −0.074 0.974 0.9089 0.4072 0.6477 0.2121
Preis [7] 2138.3 0.0313 −0.3756 1.0957 0.13 0.4041 0.6236 0.0579
Sinha [8] 166.83 1.3 −0.1256 1.0413 0.7666 0.3706 0.6809 0.1858
PCA+LDA 195.19 1.021 −0.084 0.8207 0.8314 0.447 0.6216 0.1643
MMC 96.2 1.8453 0.2227 1.278 0.9102 0.3223 0.7551 0.2996
Table II
Four class separability coefficients and supplementary rank-based classifier performance metrics related to Figure 3 for all tested gait recognition methods.
[3] V. O. Andersson and R. M. Araujo, “Person Identiﬁcation Using
Anthropometric and Gait Data from Kinect Sensor,” in Proc.
of the Twenty-Ninth AAAI Conference on Artiﬁcial Intelligence
(AAAI-15). AAAI Press, 2015, pp. 425–431. [Online]. Available:
http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9680
[4] A. Ball, D. Rye, F. Ramos, and M. Velonaki, “Unsupervised clustering
of people from ’skeleton’ data,” in Proceedings of the Seventh Annual
ACM/IEEE International Conference on Human-Robot Interaction, ser.
HRI ’12. New York, NY, USA: ACM, 2012, pp. 225–226. [Online].
Available: http://doi.acm.org/10.1145/2157689.2157767
[5] B. Dikovski, G. Madjarov, and D. Gjorgjevikj, “Evaluation of Diﬀerent
Feature Sets for Gait Recognition Using Skeletal Data from Kinect,” in
37th Intl. Convention on Information and Communication Technology,
Electronics and Microelectronics, May 2014, pp. 1304–1308. [Online].
Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6859769
[6] B. Kwolek, T. Krzeszowski, A. Michalczuk, and H. Josinski, “3D
Gait Recognition Using Spatio-Temporal Motion Descriptors,” in
Proc. of Intelligent Information and Database Systems: 6th Asian
Conference, ACIIDS 2014, Bangkok, Thailand, Part II, ser. LNCS,
vol. 8398. Springer, Apr. 2014, pp. 595–604. [Online]. Available:
https://doi.org/10.1007/978-3-319-05458-2_61
[7] J. Preis, M. Kessel, M. Werner, and C. Linnhoﬀ-Popien, “Gait
Recognition with Kinect,” in 1st International Workshop on Kinect
in Pervasive Computing, New Castle, UK, June 18–22, 2012,
pp. 1–4. [Online]. Available: https://www.researchgate.net/publication/
239862819_Gait_Recognition_with_Kinect
[8] A. Sinha, K. Chakravarty, and B. Bhowmick, “Person Identiﬁcation
Using Skeleton Information from Kinect,” in ACHI 2013: Proc. of the
Sixth Intl. Conf. on Advances in CHI, 2013, pp. 101–108. [Online].
Available: https://www.thinkmind.org/index.php?view=article&articleid=
achi_2013_4_50_20187
[9] V. N. Vapnik, The Nature of Statistical Learning Theory. New York,
NY, USA: Springer-Verlag, 1995.
[10] H. Li, T. Jiang, and K. Zhang, “Eﬃcient and Robust Feature
Extraction by Maximum Margin Criterion,” IEEE Transactions on
Neural Networks, vol. 17, no. 1, pp. 157–165, Jan 2006. [Online].
Available: https://doi.org/10.1109/TNN.2005.860852
[11] R. A. Fisher, “The Use of Multiple Measurements in Taxonomic
Problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179–188, 1936.
[Online]. Available: https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
[12] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs.
Fisherfaces: Recognition Using Class Speciﬁc Linear Projection,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997.
[Online]. Available: https://doi.org/10.1109/34.598228
[13] M. Loog, R. P. Duin, and R. Haeb-Umbach, “Multiclass Linear Dimension
Reduction by Weighted Pairwise Fisher Criteria,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 23, no. 7, pp. 762–766,
Jul. 2001. [Online]. Available: https://doi.org/10.1109/34.935849
[14] H. Su, Z.-W. Liao, and G.-Y. Chen, “A gait recognition method using
L1-PCA and LDA,” in Proc. of the Eighth Intl. Conf. on Machine
Learning and Cybernetics, vol. 6, Jul. 2009, pp. 3198–3203. [Online].
Available: https://dx.org/10.1109/ICMLC.2009.5212776
[15] CMU Graphics Lab, “Carnegie-Mellon Motion Capture (MoCap) Database,”
2003, http://mocap.cs.cmu.edu.
6