3rd IEEE/IAPR International Joint Conference on Biometrics 2017, preprint You Are How You Walk: Uncooperative MoCap Gait Identification for Video Surveillance with Incomplete and Noisy Data Michal Balazia xbalazia@mail.muni.cz Petr Sojka sojka@fi.muni.cz Faculty of Informatics, Masaryk University, Botanick´a 68a, 602 00 Brno, Czech Republic Abstract This work offers a design of a video surveillance system based on a soft biometric – gait identification from MoCap data. The main focus is on two substantial issues of the video surveillance scenario: (1) the walkers do not cooperate in providing learning data to establish their identities and (2) the data are often noisy or incomplete. We show that only a few examples of human gait cycles are required to learn a projection of raw MoCap data onto a low-dimensional subspace where the identities are well separable. Latent features learned by Maximum Margin Criterion (MMC) method discriminate better than any collection of geometric features. The MMC method is also highly robust to noisy data and works properly even with only a fraction of joints tracked. The overall workflow of the design is directly applicable for a day-to-day operation based on the available MoCap technology and algorithms for gait analysis. In the concept we introduce, a walker’s identity is represented by a cluster of gait data collected at their incidents within the surveillance system: They are how they walk. 1. Introduction Public safety issues are constantly evolving and security monitoring agencies are facing more challenges than ever before. Offering a range of security systems indispensable to investigators, the surveillance industry appears to be at the beginning of a massive expansion. Video surveillance technology records video footage for the potential future identification of suspicious individuals and activities. Many public places, such as banks and airports, already have surveillance cameras installed, but these require intelligent approaches to human identification. A useful early-warning system would analyze the collected video footage and release an alert before an adverse event takes place. Triggered by detection of an abnormal behavior, the system would instantly identify all participants in the scene, rapidly investigate their previous activities, and launch the tracking of the suspects. A typical video surveillance environment has some crucial characteristics that need to be taken into account when designing an identification system. Data are captured by a system of video cameras covering a large tracking space. People walk in various directions, at various speeds, and are often in crowds. They wear a variety of clothes and shoes and often carry large objects. Since they do not cooperate in this, an eventual learning model is only available in an externally annotated database. A central database stores hundreds of subject identities that are encountered repeatedly, each contributing with multiple biometric samples. Identification has to be performed in real time, that is, in a few seconds. Tracking and identification have to be automatic as operator interventions are slow and costly. Gait (walk) pattern has several attractive properties as a soft biometric trait. From the surveillance perspective, gait pattern biometrics is appealing for its possibility of being performed at a distance and without body-invasive equipment or subject cooperation. This enables sample acquisition even without a subject’s consent. The goal of this work is to design a method for identifying individuals from videos from their gait pattern. Uncooperative gait identification has been addressed by Martin-Felez and Xiang [29] by casting gait identification as a bipartite ranking problem. Their model learns a ranking function in a higher dimensional space where true matches and wrong matches become more separable than in the original space. The output of the model is a ranking function which gives a higher score if a pair of gait templates belong to the same person than to different people. Chen and Xu [11] extend the bipartite ranking approach by integrating sparse coding re-ranking and multi-view hypergraph-based re-ranking framework, calling it a sparse coding multi-view hypergraph learning re-ranking method. 2. Identification Workflow In accordance with the outlined video surveillance environment, our human identification system has the following 4-phase workflow (see Figure 1): 1 Spo�ed walker Acquiring MoCap data Phase I Detec�ng gait Phase II Iden�fying walkers Phase IV MoCap data Gait template Gait sample Iden�fied walker DEPARTURES BANK OPEN ING HOU RS BANK Extrac�ng gait features Phase III Figure 1: Video surveillance workflow. A person is (I) captured on an RGB-D camera in the form of MoCap data, (II) gait is then detected to form a gait sample, from which (III) gait template is extracted and (IV) the person is identified. Phase I – Acquiring Motion Capture Data Motion capture (MoCap) technology acquires video clips of people and derives structural motion data. The format maintains an overall structure of the human body and holds the estimated 3D positions of the main anatomical landmarks as the person moves. MoCap data can be collected by RGB-D sensors such as Microsoft Kinect, Asus Xtion or Vicon. For a schematic visualization, a simplified stick figure representing the human skeleton (a graph of joints connected by bones) can be automatically recovered from the values of body point spatial coordinates. With recent rapid improvements in sensor technology and pose estimation techniques [15, 18], an accurate and affordable MoCap system [17] is at our disposal to aid gait identification for applications in video surveillance. Phase II – Detecting Gait Cycles People spotted in our tracking space do not walk all the time; on the contrary, they perform various activities. Identifying people from gait requires processing video segments where they are actually walking. Gait cycles need to be first filtered out from the video sequences of general motion. There are methods [5, 33] for detecting gait cycles directly as well as action recognition methods [12, 19, 21, 25, 34] that need a demonstrative example of a gait cycle to query general motion sequences. Phase III – Extracting Gait Features Once a query motion clip has been cut into clean gait cycles, the identification mechanism proceeds with transforming the sample of raw MoCap data into a representation that contains discriminative gait information. A collection of extracted gait features builds a gait template which serves as the walker’s signature. But as in video surveillance environment the walkers cannot be relied upon to be cooperative, we are left with the problem of obtaining highly discriminative gait features for the walkers without a labeled learning dataset containing their very own samples. Many research groups investigate the discriminatory power of geometric gait features designed by hand and without any statistical learning. They typically combine static body parameters (bone lengths, person’s height) with dynamic gait features such as step length, walk speed, joint angles and inter-joint distances, along with various statistics (mean, standard deviation or maximum) of their signals. These, in particular, are the horizontal and vertical distances of selected joint pairs by Ahmed et al. [2], lower limb triangles by Ali et al. [3], lower body angles, step length, cycle time and velocity by Andersson and Araujo [4], lower limb (hips, knees and ankles) angles by Ball et al. [10], axis rotations of the major bones by Kwolek et al. [24], eleven static body parameters, and step length and walk speed by Preis et al. [30]. Dikovski et al. [14] construct seven different feature sets from a broad spectrum of geometric features, such as static body parameters, joint angles and inter-joint distances 2 aggregated within a gait cycle, along with various statistics. Sinha et al. [32] combine areas of upper and lower body, inter-joint distances as well as all of the features introduced by Ball et al. [10] and Preis et al. [30]. These features are convenient for visualizations and for intuitive understanding, but their schematism and human-interpretability are unnecessary for automatic identification. Instead, we [6, 7] prefer to statistically learn the features on an auxiliary labeled database with the goal of maximally separating the identity classes in the feature space and use these features to identify all potential walkers. Their linear model is learned in a supervised manner through (1) a modification of Fisher’s Linear Discriminant Analysis [16] with Maximum Margin Criterion (MMC) and (2) a combination of Principal Component Analysis and Linear Discriminant Analysis (PCA+LDA) to project the high-dimensional input data onto low-dimensional sub-spaces. The similarity of templates is expressed in the Mahalanobis distance function. Both these and the non-learning approaches create an unsupervised environment suitable for searching for similar templates and for clustering templates into potential identities, which is the main focus of the application outlined in the following phase. Phase IV – Identifying Walkers Identification is most commonly formulated as a classification problem: A walker’s identity is established (classified) by picking one from the pool of registered identities. This model is suitable for applications where participants reveal their identities at registration (closed-set identification). During video surveillance, on the other hand, new identities can appear on the fly (open-set identification), and labeled data for all the people encountered may not always be available. Nobody is claiming their identities since people are recorded without their consent. Needless to say, this task requires a different understanding of what a person’s identity is. You are how you walk. Your identity is your gait pattern itself. Instead of classifying walker identities as names or numbers that are not available in any case, a forensic investigator rather asks for information about their appearances captured by surveillance system – their location trace (see Figure 2) that includes timestamp and geolocation of each appearance. In the suggested application, walkers are clustered rather than classified. Identification is carried out as a query-by-example: A similarity search query ranks recorded gait templates on the basis of their similarity to the query template, which represents their likelihood of belonging to the same person. A decision mechanism operates on the basis of clustering or thresholding to determine which of the templates are finally accepted to establish the location trace. Although one can allow more templates to be accepted, doing so will enlarge the location trace with a chance of falsely accepting some templates of another person. rejected accepted query 39° 43' 04'' N 104° 51' 50'' W 2017/06/24 22:49:38 39° 49' 06'' N 104° 52' 10'' W 2017/06/23 13:24:19 39° 51' 05'' N 104° 40' 34'' W 2017/06/21 07:55:16 DEPARTURES BANK Figure 2: Illustration of a person’s location trace. Each point represents a gait template and contains additional information about the place and time of the corresponding incident within the surveillance system. Given a query template (green dot), the retrieved cluster of templates (red dots) is expected to contain other templates of the same person. Map obtained from https://www.openstreetmap.org. 3. Evaluation We have based our evaluation on the gait recognition framework [8]. The evaluation focuses on the Phase III in Section 2. We investigate the impact of externalizing the learning identities as a resolution for walker uncooperativeness by evaluating discriminativeness of the feature space learned by a certain amount of separate learning data. One also desires a system with a model that does not collapse even if data are mapped wrongly due to inaccuracies or failures of the data acquisition technology, for which we evaluate robustness incomplete and noisy data. The final stage is the evaluation clusterability of individual feature spaces for potential data pre-processing. We implemented and evaluated all competitive MoCap gait identification methods [2, 3, 4, 6, 10, 14, 24, 30, 32]. The state-of-the-art records additional methods [1, 20, 22, 23, 31] which we have implemented but not evaluated due to the high demands they place on computational resources. 3 For the purpose of evaluation, we selected the MoCap database of the CMU Graphics Lab [13], which is available under the Creative Commons license. Normalization and extraction of gait cycles from this database is described in [8] and is available for download at [9]. The gait data hold C = 64 walking subjects that performed N = 5 923 samples in total, which resulted in an average of about 83 samples per subject. Data in the measurement (sample) space have the form 𝒢 = {(gn, n)}N n=1 where gn is a tensorial representation of a gait sample (a single gait cycle) that contains 3D spatial coordinates (positions) of joints in all video frames, normalized with respect to the person’s position and direction of walking. Each of the N learning samples falls strictly into one of the C identity classes representing a single walker labeled n. A class ℐc ⊆ 𝒢 has Nc samples. Classes ℐ = {ℐc}C c=1 are complete and mutually exclusive. We say that samples (gn, n) and (gn′ , n′ ) share a common walker if and only if they belong to the same class: (gn, n) , (gn′ , n′ ) ∈ ℐc ⇔ n = n′ . We evaluate the implemented methods with the crossidentity evaluation setup, in which the collections of learning data 𝒢L = {(gn, n)}NL n=1 of CL identities and evaluation data 𝒢E = {(gn, n)}NE n=1 of CE identities are disjunct. An evaluation configuration is parametrized by (CL,CE) specifying how many learning and how many evaluation identity classes are selected from the database. Models are eventually learned on the learning part and evaluated on the evaluation part transformed into individual feature spaces 𝒢E = {(gn, n)}NE n=1 → ̂︀𝒢E = {︀ ̂︀gn, n }︀NE n=1 of CE identities, as determined by corresponding methods. Evaluation results are presented in terms of the following metrics: ∙ Davies-Bouldin Index: DBI = 1 CE ∑︀CE c=1 max 1≤c′≤C, c′ c σc+σc′ ̂︀δ(̂︀µc,̂︀µc′ ) where σc = 1 Nc ∑︀Nc n=1 ̂︀δ (︀ ̂︀gn,̂︀µc )︀ is the average distance of all elements in identity class ℐc to its centroid, and analogically for σc′ . Templates of low intra-class distances and of high inter-class distances have a low DBI. ∙ Silhouette Coefficient: SC = 1 NE ∑︀NE n=1 b(̂︀gn)−a(̂︀gn) max{a(̂︀gn),b(̂︀gn)} where a(̂︀gn) = 1 Nc ∑︀Nc n′=1 ̂︀δ (︀ ̂︀gn,̂︀gn′ )︀ is the average distance from̂︀gn to other samples within the same identity class and b(̂︀gn) = min 1≤c′≤C, c′ c 1 Nc′ ∑︀Nc′ n′=1 ̂︀δ (︀ ̂︀gn,̂︀gn′ )︀ is the average distance of ̂︀gn to the samples in the closest class. It is clear that −1 ≤ SC ≤ 1 and a SC close to one means that classes are appropriately separated. ∙ area under Receiver Operating Characteristic curve (ROC) ∙ area under Precision-Recall curve (PR) The system can potentially employ various clustering mechanisms that need a high degree of accuracy. Consider a clustering algorithm that returns the clusters 𝒞 = {𝒞c′ }C c′=1 approximating the real identity classes ℐ = {ℐc}C c=1. The following metrics evaluate a clustering algorithm together with a particular feature extraction method, which gives an insight into clusterability of the corresponding feature space: ∙ Purity: P = 1 NE ∑︀ 𝒞c′ ∈𝒞 maxℐc∈ℐ |𝒞c′ ∩ ℐc| ∙ Rand Index: RI = TP+TN TP+FP+FN+TN ∙ F-measure: F = 2pr p+r where p = TP TP+FP and r = TP TP+FN ∙ Jaccard Index: JI = TP TP+FP+FN ∙ Fowlkes-Mallows Index: FMI = √︁ TP TP+FP · TP TP+FN where TP is the number of true positives, TN of true negatives, FP of false positives, and FN of false negatives. A pair of templates of the same identity falling in the same cluster is a true positive, of different identities in different clusters is a true negative, of different identities in the same cluster is a false positive, and of the same identity in different clusters is a false negative. Discriminativeness Designing a surveillance system with uncooperative subjects, one should also consider the availability of an auxiliary learning database that is large enough to be used for learning the model. In the following series of experiments we measure the four evaluation metrics on a sequence of configurations with an increasing number of learning identities and the rest of the database as an evaluation part. Given that the benchmark dataset contains 64 identities in total, these configurations range from (2, 62) to (32, 32). Technically, it is possible to continue up until (62, 2), but configurations with more learning than evaluation identities are insignificant. Each configuration (CL, 64 − CL) is constructed from the previous configuration (CL − 1, 64 − CL) + 1 by picking one identity in the evaluation part at random and moving it into the learning part. Observing from Figure 4, the discriminativeness of the approaches based on statistical learning (MMC and PCA+LDA) grows quickly on the first configurations with very few learning identities, which one can interpret as an analogy to the Pareto (80–20) principle. The results indicate that even 10 identities can be enough for learning the MMC transform to identify 54 other people more accurately than the other methods tested. Roughly speaking, the MMC method achieves the top results in all of the metrics at a configuration of about (10, 54) and keeps or increases its discriminativeness further on. This experiment provides a lower bound estimate for the volume of learning data given the volume of the population for surveillance: With a learning database smaller than this lower bound, taking the non-learned geometric features of Dikovski et al. [14] or Kwolek et al. [24] is recommended, otherwise the MMC method best identifies walkers within a population of more than triple the learning identities. 4 0 0.5 DCT Ahmed Andersson Dikovski Preis MMCAhmed Andersson Dikovski Preis MMC Ali Ball Kwolek Sinha PCA+LDA 10 100 1000 DBI Ahmed Ali Andersson Ball Dikovski Kwolek 1 10 (2,62) (3,61) (4,60) (5,59) (6,58) (7,57) (8,56) (9,55) (10,54) (11,53) (12,52) (13,51) (14,50) (15,49) (16,48) (17,47) (18,46) (19,45) (20,44) (21,43) (22,42) (23,41) (24,40) (25,39) (26,38) (27,37) (28,36) (29,35) (30,34) (31,33) (32,32) Preis Sinha MMC PCA+LDA (a) DBI -0.4 -0.3 -0.2 -0.1 0.0 0.1 SC Ahmed Ali Andersson Ball Dikovski Kwolek -0.6 -0.5 -0.4 (2,62) (3,61) (4,60) (5,59) (6,58) (7,57) (8,56) (9,55) (10,54) (11,53) (12,52) (13,51) (14,50) (15,49) (16,48) (17,47) (18,46) (19,45) (20,44) (21,43) (22,42) (23,41) (24,40) (25,39) (26,38) (27,37) (28,36) (29,35) (30,34) (31,33) (32,32) Preis Sinha MMC PCA+LDA (b) SC 0.70 0.75 0.80 0.85 ROC Ahmed Ali Andersson Ball Dikovski Kwolek 0.60 0.65 (2,62) (3,61) (4,60) (5,59) (6,58) (7,57) (8,56) (9,55) (10,54) (11,53) (12,52) (13,51) (14,50) (15,49) (16,48) (17,47) (18,46) (19,45) (20,44) (21,43) (22,42) (23,41) (24,40) (25,39) (26,38) (27,37) (28,36) (29,35) (30,34) (31,33) (32,32) Preis Sinha MMC PCA+LDA (c) ROC 0.20 0.30 0.40 0.50 0.60 PR Ahmed Ali Andersson Ball Dikovski Kwolek 0.00 0.10 0.20 (2,62) (3,61) (4,60) (5,59) (6,58) (7,57) (8,56) (9,55) (10,54) (11,53) (12,52) (13,51) (14,50) (15,49) (16,48) (17,47) (18,46) (19,45) (20,44) (21,43) (22,42) (23,41) (24,40) (25,39) (26,38) (27,37) (28,36) (29,35) (30,34) (31,33) (32,32) Preis Sinha MMC PCA+LDA (d) PR Figure 3: Simulations with 31 different (CL,CE) configurations (horizontal axes) on four evaluation metrics (vertical axes). 70 75 80 85 90 95 100 105 110 %score DBI SC ROC PR 65 70 root lhipjoint rhipjoint lfemur rfemur ltibia rtibia lfoot rfoot ltoes rtoes lclavicle rclavicle lhumerus rhumerus lradius rradius lwrist rwrist lhand rhand lfingers rfingers lthumb rthumb lowerback upperback thorax lowerneck upperneck head pelvis torso leftarm rightarm arms leftleg rightleg legs leftlimbs rightlimbs limbs allbuttorso allbutarms allbutlegs excluded Figure 4: Four evaluated metrics of the MMC method with incomplete data. In each column a subset of joints is systematically excluded from the input: first 31 columns (root – head) exclude a single joint and last 14 columns (pelvis – all but legs) exclude multiple joints. Structure of the human body is the following: head, pelvis = {root, lhipjoint, rhipjoint}, left leg = {lfemur, ltibia, lfoot, ltoes}, left arm = {lhumerus, lradius, lwrist, lhand, lfingers, lthumb}, right leg = {rfemur, rtibia, rfoot, rtoes}, right arm = {rhumerus, rradius, rwrist, rhand, rfingers, rthumb}, torso = {lowerback, upperback, thorax, lowerneck, upperneck, lclavicle, rclavicle}. Configuration (9, 55). 0.60 0.65 0.70 0.75 0.80 ROC Ahmed Ali Andersson Ball Dikovski Kwolek Preis 0.50 0.55 0 10 20 30 40 50 60 70 80 90 100 % noise Preis Sinha MMC PCA+LDA 0.10 0.15 0.20 0.25 0.30 PR Ahmed Ali Andersson Ball Dikovski Kwolek Preis 0.00 0.05 0 10 20 30 40 50 60 70 80 90 100 % noise Preis Sinha MMC PCA+LDA (a) ROC and PR for an x% noise simulated by multiplying each measured value by a random number in interval (1 − x/100, 1 + x/100). 0.60 0.65 0.70 0.75 0.80 ROC Ahmed Ali Andersson Ball Dikovski Kwolek Preis 0.50 0.55 0 10 20 30 40 50 60 70 80 90 100 % noise Preis Sinha MMC PCA+LDA 0.10 0.15 0.20 0.25 0.30 PR 0.00 0.05 0 10 20 30 40 50 60 70 80 90 100 % noise (b) ROC and PR for an x% noise simulated by substituting each measured value with a random value with probability x%. Figure 5: ROC and PR of all implemented methods with noisy data. Configuration (9, 55). 5 Robustness Video surveillance environments are vulnerable to the accuracy of data acquisition. We conducted two series of experiments where we assumed a (9, 55) configuration with fixed learning and evaluation identities on which we simulated measurement errors by (1) excluding some data and by (2) adding random noise. Incomplete data are simulated by excluding various subsets of joints with all their tracked positions and score percentages of incomplete data (with subscript new) to complete data (with subscript old) were calculated for DBI as 100·DBIold/DBInew, for SC as 100·(SCnew+1)/SCold+1, for ROC as 100·ROCnew/ROCold, and for PR as 100·PRnew/PRold. A noise of x% was simulated in two ways: by (2a) multiplying each measured value by a random number from the interval (1 − x/100, 1 + x/100) and by (2b) substituting each measured value with a random value between 0 and 1 with x% probability in which the 100% noise represents completely random data. All methods were evaluated on noisy data but only the MMC method was evaluated on incomplete data as other methods do not give instructions for dealing with incomplete data. Figures 4 and 5 illustrate how the methods cope with incomplete and noisy data, respectively. As for the data incompleteness, one can observe that particular joints are more discriminatory than others as their exclusion causes a more significant drop in scores. This information can be potentially used for fine-tuning the hand-picked geometric features. Furthermore, input data pruning has a positive impact on the duration of model learning. Regarding the noisy data, note that the first type of noise is shrouding less than the second type as the measured values change only up to double and they preserve their means and covariances. The results make it clear that the methods of geometric features drop in score more quickly than the MMC and PCA+LDA methods. Clusterability Various pre-clustering techniques can be applied to the feature space in order to improve the accuracy of the location trace retrieval. One perfectly accurate location trace would be the cluster of gait templates of and only of the query walker, although this request is near impossible on large datasets. The last experiment takes again the (9, 55) configuration and measures the purity and other four indexes of the clustering obtained by K-Means into K = CE = 55 clusters. Results in Table 1 show that the geometric features of Andersson et al. [4], Dikovski et al. [14], Kwolek et al. [24] and the latent features learned by MMC and PCA+LDA define the most suitable sub-spaces for KMeans clustering. 4. Conclusion We present a plausible high-level workflow of a MoCap gait identification system in a video surveillance environment with uncooperative walkers. The main focus is on the Table 1: Methods evaluated on the five clusterability metrics. Configuration (9, 55). method P RI F JI FMI Ahmed 0.3402 0.9486 0.1388 0.0746 0.1418 Ali 0.2677 0.9472 0.0986 0.0519 0.1013 Andersson 0.3574 0.9526 0.26 0.1494 0.262 Ball 0.3409 0.9491 0.1581 0.0859 0.1611 Dikovski 0.4446 0.9542 0.2583 0.1483 0.2619 Kwolek 0.4571 0.9496 0.2319 0.1311 0.2329 Preis 0.1778 0.9464 0.0462 0.0237 0.0482 Sinha 0.3069 0.9473 0.1143 0.0606 0.1169 MMC 0.4491 0.9538 0.2147 0.1203 0.2202 PCA+LDA 0.437 0.9538 0.224 0.1261 0.2291 evaluation of whether the Phase III of the workflow (extracting gait features) can meet the requirements of the Phase IV of the workflow (a target application for identifying walkers). Eight methods for extracting geometric gait features and two methods for statistical learning the features have been implemented and evaluated on the CMU MoCap database of 64 subjects and 5,923 gait samples. Feature space of each method is evaluated for (1) discriminativeness, (2) robustness to incomplete and noisy data and (3) clusterability. With the evaluation set of 55 identities, the MMC method learned on 9 identities achieves the top results in discriminativeness and improves with increasing learning identities; together with the PCA+LDA method they are highly robust to noisy and incomplete data; and result in a rather pure clustering. The MMC method appears to be learning the top quality features, reaching the state-of-the-art in MoCap gait identification. Our suggested concept of person identification is based on a completely different perspective from previous approaches: According to the phrase You are how you walk, a walker identity is represented as their location trace, that is, a cluster of gait templates from their incidents within the surveillance system. This concept allows for data-driven models that (1) can be learned from different people as well as from different scenes and datasets, making it more generally applicable with limited data per person in a learning set, even if there is only a single gait sample available for each person, and (2) do not make assumptions about the learning and evaluation sets having the same covariate conditions as long as the auxiliary learning set is rich in them to identify as many people as possible. This makes it particularly suitable for applications of uncooperative recognition, such as walker re-identification [26, 35] or next location prediction [27, 28]. Acknowledgments Data used in this project was created with funding from NSF EIA-0196217 and was obtained from http://mocap.cs.cmu.edu [13]. Our extracted database and evaluation framework are available online at https: //gait.fi.muni.cz to support reproducibility of results. 6 References [1] F. Ahmed, P. P. Paul, and M. L. Gavrilova. DTW-Based Kernel and Rank-Level Fusion for 3D Gait Recognition Using Kinect. The Visual Computer, 31(6):915–924, 2015. [2] M. Ahmed, N. Al-Jawad, and A. Sabir. Gait Recognition Based on Kinect Sensor. Proc. SPIE, 9139:91390B–91390B– 10, 2014. [3] S. Ali, Z. Wu, X. Li, N. Saeed, D. Wang, and M. Zhou. Transactions on Computational Science XXVI: Special Issue on Cyberworlds and Cybersecurity, chapter Applying Geometric Function on Sensors 3D Gait Data for Human Identification, pages 125–141. Springer, Berlin, Heidelberg, 2016. [4] V. O. Andersson and R. M. Araujo. Person Identification Using Anthropometric and Gait Data from Kinect Sensor. In Proc. of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), pages 425–431. AAAI Press, 2015. [5] E. Auvinet, F. Multon, C.-E. Aubin, J. Meunier, and M. Raison. Detection of gait cycles in treadmill walking using a Kinect. Gait & Posture, 41(2):722–725, 2015. [6] M. Balazia and P. Sojka. Learning Robust Features for Gait Recognition by Maximum Margin Criterion. In Proc. of 23rd International Conference on Pattern Recognition, ICPR 2016, pages 901–906. IEEE, 2016. [7] M. Balazia and P. Sojka. Walker-independent features for gait recognition from motion capture data. In A. RoblesKelly, M. Loog, B. Biggio, F. Escolano, and R. Wilson, editors, Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, S+SSPR 2016, M´erida, Mexico, November 29–December 2, 2016, Proceedings, pages 310–321, Cham, 2016. Springer International Publishing. [8] M. Balazia and P. Sojka. An evaluation framework and database for mocap-based gait recognition methods. In B. Kerautret, M. Colom, and P. Monasse, editors, Reproducible Research in Pattern Recognition: First International Workshop, RRPR 2016, Canc´un, Mexico, December 4, 2016, Revised Selected Papers, pages 33–47, Cham, 2017. Springer International Publishing. [9] M. Balazia and P. Sojka. Gait recognition from motion capture data, 2017. https://gait.fi.muni.cz. [10] A. Ball, D. Rye, F. Ramos, and M. Velonaki. Unsupervised clustering of people from ’skeleton’ data. In Proceedings of the Seventh Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI ’12, pages 225–226, New York, NY, USA, 2012. ACM. [11] X. Chen and J. Xu. Uncooperative gait recognition: Reranking based on sparse coding and multi-view hypergraph learning. Pattern Recognition, 53:116 – 129, 2016. [12] W. Choensawat, W. Choi, and K. Hachimura. Similarity retrieval of motion capture data based on derivative features. Advanced Computational Intelligence and Intelligent Informatics, 16(1):13–23, 2012. [13] CMU Graphics Lab. Carnegie-Mellon Motion Capture (MoCap) Database, 2003. http://mocap.cs.cmu.edu. [14] B. Dikovski, G. Madjarov, and D. Gjorgjevikj. Evaluation of Different Feature Sets for Gait Recognition Using Skeletal Data from Kinect. In 37th Intl. Convention on Information and Communication Technology, Electronics and Microelectronics, pages 1304–1308, May 2014. [15] M. Ding and G. Fan. Articulated and generalized gaussian kernel correlation for human pose estimation. IEEE Transactions on Image Processing, 25(2):776–789, Feb 2016. [16] R. A. Fisher. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2):179–188, 1936. [17] S. Han, M. Achar, S. Lee, and F. Pe˜na-Mora. Empirical assessment of a rgb-d sensor on motion capture and action recognition for construction worker monitoring. Visualization in Engineering, 1(1):6, 2013. [18] A. Haque, B. Peng, Z. Luo, A. Alahi, S. Yeung, and F. Li. Viewpoint invariant 3d human pose estimation with recurrent error feedback. CoRR, abs/1603.07076, 2016. [19] M. C. Hu, C. W. Chen, W. H. Cheng, C. H. Chang, J. H. Lai, and J. L. Wu. Real-Time Human Movement Retrieval and Assessment With Kinect Sensor. IEEE Transactions on Cybernetics, 45(4):742–753, Apr. 2015. [20] S. Jiang, Y. Wang, Y. Zhang, and J. Sun. Real Time Gait Recognition System Based on Kinect Skeleton Feature. In C. Jawahar and S. Shan, editors, Computer Vision – ACCV 2014 Workshops, volume 9008 of LNCS, pages 46–57. Springer, 2015. [21] I. Kapsouras and N. Nikolaidis. Action recognition in motion capture data using a bag of postures approach. In Pattern Recognition (ICPR), 2014 22nd International Conference on, pages 2649–2654, Aug. 2014. [22] T. Krzeszowski, A. Switonski, B. Kwolek, H. Josinski, and K. Wojciechowski. DTW-Based Gait Recognition from Recovered 3-D Joint Angles and Inter-ankle Distance. In L. J. Chmielewski, R. Kozera, B.-S. Shin, and K. Wojciechowski, editors, Proc. of Computer Vision and Graphics: International Conference, ICCVG 2014, Warsaw, Poland, volume 8671 of LNCS, pages 356–363. Springer, Sept. 2014. [23] M. S. N. Kumar and R. V. Babu. Human gait recognition using depth camera: A covariance based approach. In Proc. of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP ’12, pages 20:1–20:6, New York, NY, USA, 2012. ACM. [24] B. Kwolek, T. Krzeszowski, A. Michalczuk, and H. Josinski. 3D Gait Recognition Using Spatio-Temporal Motion Descriptors. In Proc. of Intelligent Information and Database Systems: 6th Asian Conference, ACIIDS 2014, Bangkok, Thailand, Part II, volume 8398 of LNCS, pages 595–604. Springer, Apr. 2014. [25] D. Leightley, B. Li, J. S. McPhee, M. H. Yap, and J. Darby. Exemplar-Based Human Action Recognition with Template Matching from a Stream of Motion Capture, pages 12–20. Springer International Publishing, Cham, 2014. [26] X. Ma, X. Zhu, S. Gong, X. Xie, J. Hu, K.-M. Lam, and Y. Zhong. Person re-identification by unsupervised video matching. Pattern Recogn., 65(C):197–210, May 2017. [27] U. Mahbub and R. Chellappa. PATH: person authentication using trace histories. CoRR, abs/1610.07935, 2016. [28] U. Mahbub, S. Sarkar, V. M. Patel, and R. Chellappa. Active user authentication for smartphones: A challenge data set and 7 benchmark results. In 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS), pages 1–8, Sept 2016. [29] R. Martn-Flez and T. Xiang. Uncooperative gait recognition by learning to rank. Pattern Recognition, 47(12):3793–3806, 2014. [30] J. Preis, M. Kessel, M. Werner, and C. Linnhoff-Popien. Gait Recognition with Kinect. In 1st International Workshop on Kinect in Pervasive Computing, New Castle, UK, June 18–22, pages 1–4, 2012. [31] J. Sedmidubsky, J. Valcik, M. Balazia, and P. Zezula. Gait Recognition Based on Normalized Walk Cycles. In Advances in Visual Computing, volume 7432 of LNCS, pages 11–20. Springer, 2012. [32] A. Sinha, K. Chakravarty, and B. Bhowmick. Person Identification Using Skeleton Information from Kinect. In ACHI 2013: Proc. of the Sixth Intl. Conf. on Advances in CHI, pages 101–108, 2013. [33] J. Valcik, J. Sedmidubsky, M. Balazia, and P. Zezula. Identifying walk cycles for human recognition. In M. Chau, A. G. Wang, W. T. Yue, and H. Chen, editors, Intelligence and Security Informatics: Pacific Asia Workshop, PAISI 2012, Kuala Lumpur, Malaysia, May 29, 2012. Proceedings, pages 127– 135, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. [34] S. Vantigodi and V. B. Radhakrishnan. Action recognition from motion capture data using meta-cognitive rbf network classifier. In Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2014 IEEE Ninth International Conference on, pages 1–6, Apr. 2014. [35] L. Zheng, Y. Yang, and A. G. Hauptmann. Person reidentification: Past, present and future. arXiv preprint arXiv:1610.02984, 2016. 8