Abstract

This paper presents a human gait recognition algorithm based on a leg gesture separation. Main innovation in this paper is gait recognition using leg gesture classification which is invariant to covariate conditions during walking sequence and just focuses on underbody motions and a neuro-fuzzy combiner classifier (NFCC) which derives a high precision recognition system. At the end, performance of the proposed algorithm has been validated by using the HumanID Gait Challenge data set (HGCD), the largest gait benchmarking data set with 122 objects with different realistic parameters including viewpoint, shoe, surface, carrying condition, and time. And it has been compared to recent algorithm of gait recognition.

1. Introduction

In the last decade, there have been great interests in applying human biometrics for identification and verification purposes, for instance, in video surveillance and human recognition areas. Amongst there have been lots of researches in using ear and face recognition, body tracking and hand gesture recognition, and recently gait recognition using in the human identification areas. But as a comparison between human gait and other various biometrics, such as hand geometry, iris, face, voice, signature, and fingerprint [1], the human gait has some eligible advantages over them that make the gait recognition an ideal method in identification procedures. For instance, there is no need to subject cooperation in gait recognition, and it can operate without interrupting or interfacing with the subject activities [2]. In other words, we can recognize people using human gait regardless of their clothes or the backgrounds [3]. Further it is difficult to conceal or disguise in application scenarios like bank robbery that other biometrics such as face recognition or fingerprint are impossible in detection. Moreover, it is nonobstructive and effective for identifying at long distances like surveillance applications in public places [4].

The previous works have been classified under similar covariate conditions (e.g., clothing, surface, carrying, etc.). But in this paper we proposed an improved and also novel method of classification which is only based on different gestures of leg during walking without body parts tracking and invariant to different covariate conditions.

As it is indicated in Figure 1, with a more careful focus on the sequences of sample energy halation images, we can obviously conclude that because of negligible changes of bust’s organs during walking cadence and little effect of added objects (e.g., carrying a bag, wearing a coat, etc.) on the gait, for obtaining higher recognition rate we can only focus on leg gesture for gait recognition and derive accurately from its classification.

As a review to fundamental of the usual gait recognition algorithm, we can express that in the walking process functional versatility of the body joints allows the lower and upper limbs to readily accommodate stairs, doorways, changing surfaces, and obstacles in the path of progression. Efficiency in these endeavors depends upon free joint mobility and muscle activity that is selective with timing and intensity. Energy conservation is optimal in the normal pattern of limb action. A person will perform one’s walking pattern in a fairly repeatable and unique way, and medical research has been trying to apply these gait patterns for the treatment of pathologically abnormal patients [4].

As a brief introduction of the approach of this paper we can express the following procedures.

Five states of human gait are extracted after background estimation and human detection in the scene. Leg gestures are classified over directional chain code of bottom part of silhouette contour. A spatiot-emporal data base, namely, Energy Halation Image (EHI), is constructed over bottom part of human silhouette from train film sequence for five leg gestures separately. Eigen space of energy halation is applied to multilayer perceptron neural network. Five neural network systems recognize people but with medium recognition rate. A Neuro-fuzzy fusion technique is used for obtaining high recognition rate. Experimental results are performed over a suitable data base. It includes 20 samples for eight people which each sample have 100 frames approximately. 99% recognition rate of the proposed system is obtained over 10 samples test patterns.

1.1. Recent Works

Leg gesture studies have various applications. Among this, some interest work indicates importance of leg gesture classification as in [57]. In [8], matching between stored prototypes and silhouette images helps for state classification. View point of this paper [8] is based on pattern matching and recognition of state using hidden Markov model; it helps to insert the prior knowledge of gait in state recognition.

The infrared thermal imaging was applied to collect gait video, and an infrared thermal gait database was established in [9]. Infrared is useful to detect human body and remove noises from complex background; illumination variations in [10] show that using Principle Component Analysis (PCA) on accelerometer-based gait data gives a large improvement in the performance of gait recognition system.

Reference [2] argues that selecting the most relevant gait features that are invariant to changes in gait covariate conditions is the key to develop a gait recognition system that works without subject cooperation. So [2] proposes Gait Entropy Image to perform automatic feature selection on each pair of gallery and probe gait sequences. The performance of gait recognition decreases because of low-resolution (LR) sequences. Reference [11] proposes method for solving this solution. They proposed a new algorithm called superresolution with manifold sampling and backprojection, which learns the high-resolution (HR) counterparts of LR test images from a collection of HR/LR training gait image patch pairs.

Reference [12] presents a novel framework for gait recognition augmented with soft biometric information. Geometric gait analysis is based on Radon transforms and on gait energy images. User height and stride length information is extracted and utilized in a probabilistic framework for the detection of soft biometric features of substantial discrimination power.

In [13] multiple gait features fusion was explored with the framework of the factorial hidden Markov model (FHMM). The FHMM has a multiple-layer structure and provides an alternative to combine several gait features without concatenating them into a single augmented feature. Besides, the feature concatenation was used to directly concatenate the features and the parallel HMM (PHMM) was introduced as a decision-level fusion scheme, which employs traditional fusion rules to combine the recognition results at decision level.

Tactile ground surface indicators installed on sidewalks help visually impaired people walk safely. The visually impaired distinguish the indicators by stepping into its convexities and following them. However, these indicators sometimes cause the nonvisually impaired to stumble. In [5] have been studied effects of these indicators by comparing the kinematics and kinetic variables of walking on paths with and without indicators.

Another interest for gait identification is that of reflect gait degeneration due to ageing that might have closer linkage to the causes of falls. This would help to undertake appropriate measures to prevent falls. Like in many other developed countries, falls in older population have been identified as a major health issue in Australia [6]. In [7] automatic recognition of young-old gait types from their respective gait patterns has been studied using support vector machine. Ageing influences gait patterns causing constant threats to locomotor balance control.

Biomechanical analysis of gait has been successfully applied in human clinical gait analysis [14]. With regards to gait recognition, a major early result from psychology is by Johansson [15], who used point light displays to demonstrate the ability of humans to rapidly distinguish human locomotion from other motion patterns. Cutting and Kozlowski [16] showed that this ability also extends to recognition of friends.

Identification of people by analysis of gait patterns extracted from video has recently become a popular research problem. However, the conditions under which the problem is “solvable” are not understood or characterized as in [17]. The biggest limitation in human motion analysis is the underlying difficulty of tracking the human body for subsequent interpretation [18, 19].

As a solution for making it possible to identify human gait from a sequence of segmented noisy silhouettes in low-resolution video, a model-based gait cycle extraction based on the prediction-based hierarchical active shape model (ASM) is presented in [1]. Moreover in [20] there is a presentation of a new gait recognition method that does not presume the existence of strict lab conditions for its operation.

As it mentioned, the gait recognition is an effective way for identifying from a distance but there are two different obstacles in this situation. First in the low-resolution case the performance of gait recognition is abated because of noisy images. Furthermore, as a usual procedure of gait recognition the gait sequences are projected onto a nonoptimal low-dimensional subspace to reduce the data complexity which again would lead to decline of gait recognition performance. A new algorithm is proposed in [11] called super resolution with manifold sampling and back projection (SRMS), which learns the high-resolution (HR) counterparts of LR test images from a collection of HR/LR training gait image patch pairs.

1.2. Contributions and Motivation

Recognizing gait with body decomposition to details and fusion of them were not observed in the literature. Main contribution of this paper is gesture classification for human gait recognition. But some new notes can be found in this paper as follows.(a)A new spatio-temporal data base, namely, energy halation.(b)Five-feature space generation using leg gesture concept.(c)Human gait recognition based on leg gesture classification.(d)Neuro-fuzzy-based combiner classifiers (NFCCs).(e)Presentation of complete system in gait recognition.

Low performance in human gait recognition systems is one of motivations of the proposed method. Human detection in the scene, object tracking, and classifiers capability over time-dependent features are some of problems in obtaining low recognition rate. So, we try to present a complete system in human gait recognition which includes many features.

2. The Proposed Method

Block diagram of the proposed method can be abstracted in Figure 2. Five parts of this system are as follows and are explained in the next subsections.(i)Background estimation,(ii)leg gesture recognizer,(iii)energy halation image construction (spatio-temporal data base),(iv)gait recognition in Eigen space,(v)neuro-fuzzy-based combiner classifier.

2.1. Background Estimation

Several approaches are known to separate foreground from background. If the background is known a simple thresholding yields to the foreground. One suitable way in object detection is background estimation. This paper uses probability density function (PDF) estimation of each pixel [21]. Gaussian PDF can model variation of scene because of flicker, CCD noise, and shadow approximately. For obtaining mean and variance of Gaussian PDF, (1) and (2) are used which can accept scene variations. Results of human detection in the scene are shown in Figure 3:𝜇𝑡(𝑥,𝑦)=(1𝛼)𝜇𝑡1(𝑥,𝑦)+𝛼𝐼𝑡𝜎(𝑥,𝑦),(1)2𝑡(𝑥,𝑦)=(1𝛼)𝜎2𝑡1𝐼(𝑥,𝑦)+𝛼𝑡(𝑥,𝑦)𝜇𝑡(𝑥,𝑦)𝑇𝐼𝑡(𝑥,𝑦)𝜇𝑡,(𝑥,𝑦)(2) where 𝐼𝑡(𝑥,𝑦), is the pixel’s current value in location (𝑥,𝑦) and 𝜇𝑡1 the previous average, 𝜎𝑡1 the previous variance; 𝑇 is transpose; 𝛼 is an empirical weight often chosen as a tradeoff between stability and quick update. At each 𝑡 frame time, the 𝐼, pixel’s value can then be classified as a foreground pixel if the inequality ||𝐼𝑡𝜇𝑡||>𝑘𝛿𝑡(3) holds, where 𝑘, is threshold value.

2.2. Leg Gesture Recognizer

After background estimation and human detection in the scene, binary human image (blob) is obtained. After cutting a bottom of blob image (waist to sole), distribution function of directional chain code is extracted from blob contour. After normalizing the chain code to its maximum, a multilayer perceptron neural network (MLP-NN) is used for leg gesture recognizing with this feature. Block diagram of leg gesture classifier is shown in Figure 3. For training the proposed artificial neural network, we have five states of five people (five images for each person). So after creating images’ database, we named them as the following sequence.(i)First digit denotes the state of person (1-2-3-4-5).(ii)Second digit denotes the person (1-2-3-4-5).(iii)Third digit denotes the number of the image of each person (1-2-3-4-5).

Therefore we now have 125 named images in the database for training. Moreover, we considered five different angles in the video sequence of samples for each state like the one in Figure 4. Then by considering 5 different angle states (Figure 5), in our program, we would have the angles of every one of the five gait states which is shown in Figure 6.

One of leg gesture classifier parts is gesture data base which is necessary for training of MLP-NN using backpropagation algorithm. Five states are determined for leg gesture which depends on frame rate and type of application. Figure 4 shows these five states for number of people. Gesture data base is collected from a set of film which includes 160 sequences of eight people. Obtained manually gesture data base includes five leg states, and for each state 100 images have been collected. Extracted distribution directional chain code is shown in Figures 7(a) and 7(b) shows directional chain codes histogram for difference state.

However, trained neural network cannot classify leg gestures perfectly but this problem compensates in creation of spatio-temporal data base and using classifier.

2.3. Energy Halation Image Construction (Spatio-temporal Data Base)

Spatio-temporal data base use for compact presentation of film sequence and use in many applications as image retrieval, gesture analysis, action recognition, and behavioral recognition in the scene.

In this sub-section we propose a spatio-temporal like motion history image (MHI) in [22] which pseudo code is as follow and results is named energy halation images (EHI).

Each input frame belong to one of five leg gestures and is used for generation of five energy halation images.(1)Initializing:Let EH𝑖,𝑖=1,2,,5, beforced to zeros with dimension 220 × 90.Let 𝑗=0; 𝑗 is frame’s index.(2)𝑗=𝑗+1.(3)𝐼𝑖blob matrix of 𝑗th frame with size 𝑥×𝑦; 𝑖 is state of leg (1 to 5).

Note: (𝑥,𝑦) is less than (220, 90) for each blob size.(4)Adding zero rows and columns bilateral of 𝐼(𝑥,𝑦) that become 𝐼(220,90) matrix;(5)EH𝑖=EH𝑖+𝐼𝑖; 𝑖 is state of leg gesture.(6)If it is not end of sequence go to step 2.(7)End.

Obtained results include five images of energy halation for each input sequence. As an example, Figure 8 shows five images of energy halation for three people.

2.4. Gait Recognition in Eigen Space

As face recognition and similar applications, we use Eigen space transform for reducing the dimensions of the energy halation images before applying to MLP-neural network. Training MLP-NN is performed over each leg gesture for human gait recognition. So five trained MLP-NNs are created and use for human identification but each network recognized people separately based on different features (these features are energy halation over each leg gesture).

Recognition rate of each network does not satisfy the using system as good human gait recognizer so we combine neural networks output using neuro-fuzzy-based mixer classifiers which is followed in the next sub-section.

2.5. Neuro-Fuzzy-Based Combiner Classifier

Neuro-fuzzy system has been proved to have significant results in modeling nonlinear functions. Neuro-fuzzy system has been used frequently in the literature as fishing predictions [23], vehicular navigation [24], identifying the turbine speed dynamics [25], radio frequency power amplifier linearization [26], microwave application [27], image denoising [28, 29], prediction in cleaning with high pressure water [30], sensor calibration [31], fetal electrocardiogram extraction from ECG signal captured from mother [32], and identification of normal and glaucomatous eyes [33].

In a neuro-fuzzy system, the membership functions (MFs) are extracted from a data set that describes the system behavior. The neuro-fuzzy system learns features in the data set and adjusts the system parameters according to given error criterion. In a fused architecture, NN learning algorithms are used to determine the parameters of fuzzy inference system. Below, we have summarized the advantages of the neuro-fuzzy system technique. Fusion of output classifiers with linear combiner has been pointed in [34]. In this paper, we used a nonlinear mixer classifier which is based on neuro-fuzzy system for the first time in human gait recognition.

3. Experimental Results

A set of film including 160 sequences of eight people is used as data base. Frame rate per second is 25, and image size is 352 × 288. Some images from data base are shown in Figure 9.

Leg gesture recognizer is a three-layer MLP neural network with eight input neurons and five output neurons and fifteen neurons in hidden layer that can categorize input frames to 5 states. An example of this stage is shown in Figure 10.

As it was mentioned before, each gesture helps in categorization of frame sequence in five images of energy halation are performed, and five MLP neural networks are trained over 10 film sequences for 8 people. Each network has 50 neurons in input layer, and three hidden layers with 100, 90, 40 neurons and 8 neurons in output layer. In testing phase, captured confusion matrixes for two networks are shown in Tables 1 and 2. These tables show that fusion of networks increases performance. As an example, network 2 can recognize people 1 but network 1 cannot perform recognition over this people as well. Confusion matrix after application of neuro-fuzzy combiner is shown in Table 3. Recognition rate increases to 99.8% over test pattern whereas learning of neuro-fuzzy system has been performed over learning patterns.

As an approach to evaluate our proposed method (gait recognition based on NFCC), we also analyzed comparison between our method and different algorithms of HumanID Gait Challenge Dataset (HGCD) and compared its result with a recent algorithm of gait recognition which have been evaluated by HGCD (Table 4).

4. Conclusion

An interesting note was found in this paper “human gait recognition based on leg gestu.” But this paper includes a new spatio-temporal gait data base (Energy Halation Image), neuro-fuzzy-based combiner classifier (NFCC). To overcome the limitation of recognition performance rate, we proposed a system for gait feature fusion. We used five spatio-temporal data bases and applied their features in Eigen space to five neural networks separately. Performance of each NN for test samples was low (about 70% to 80%). Then we used a neuro-fuzzy combiner classifier for mixing the neural networks for the first time in gait recognition. Result of combination of neural network outputs was satisfying.

Appendix

Neuro-Fuzzy Inference System Architecture

Neural Networks (NNs) are demonstrated to have powerful capability of expressing relationship between input-output variables. In fact it is always possible to develop a structure that approximates a function with a given precision. However, there is still distrust about NNs identification capability in some applications. Fuzzy set theory plays an important role in dealing with uncertainty in plant modeling applications. Neuro-fuzzy systems are fuzzy systems, which use NNs to determine their properties (fuzzy sets and fuzzy rules) by processing data samples. Neuro-fuzzy integrates to synthesize the merits of both NN and fuzzy systems in a complementary way to overcome their disadvantages. The fusion of an NN and fuzzy logic in neuro-fuzzy models possesses both low-level learning and computational power of NNs and advantages of high-level human-like thinking of fuzzy systems. For identification, hybrid neuro-fuzzy system called ANFIS combines an NN and a fuzzy system together. ANFIS has been proved to have significant results in modeling nonlinear functions. In ANFIS, the membership functions (MFs) are extracted from a data set that describes the system behavior. The ANFIS learns features in the data set and adjusts the system parameters according to given error criterion. In a fused architecture, NN learning algorithms are used to determine the parameters of fuzzy inference system. Below, we have summarized the advantages of the ANFIS technique.(i)Real-time processing of instantaneous system input and output data’s. This property helps using of this technique for many operational researches problems.(ii)Offline adaptation instead of online system-error minimization, thus easier to manage and no iterative algorithms being involved.(iii)System performance is not limited by the order of the function since it is not represented in polynomial format.(iv)Fast learning time.(v)System performance tuning is flexible as the number of membership functions and training epochs can be altered easily.(vi)The simple if-then rules declaration and the ANFIS structure are easy to understand and implement.

A typical architecture of ANFIS is shown in Figure 1, in which a circle indicates a fixed node and a square indicates an adaptive node. For simplicity, we consider two inputs 𝑥, 𝑦 and one output 𝑧 in the FIS. The ANFIS used in this paper implements a first-order Sugeno fuzzy model. Among many FIS models, the Sugeno fuzzy model is the most widely used for its high interpretability and computational efficiency and built-in optimal and adaptive techniques. For a first-order Sugeno fuzzy model, a common rule set with two fuzzy if-then rules can be expressed as follows.𝑧Rule1If𝑥is𝐴1and𝑦𝑖𝑠𝐵1,then1=𝑝1𝑥+𝑞1𝑦+𝑟1,𝑧(A.1)Rule2If𝑥is𝐴2and𝑦is𝐵2,then2=𝑝2𝑥+𝑞2𝑦+𝑟2,(A.2) where 𝐴𝑖,𝐵𝑖(𝑖=1,2)  𝐴𝑖 and 𝐵𝑖 are fuzzy sets in the antecedent and 𝑝𝑖,𝑞𝑖,𝑟𝑖(𝑖=1,2) are the design parameters that are determined during the training process. As in Figure 11, the ANFIS consists of five layers.

Layer 1, every node 𝑖 in this layer is an adaptive node with a node function: 𝑂1𝑖=𝜇𝐴𝑖𝑂(𝑥),𝑖=1,2,1𝑖=𝜇𝐵𝑖(𝑦),𝑖=3,4,(A.3) where 𝑥, 𝑦 are the input of node 𝑖 and 𝜇𝐴𝑖(𝑥) and 𝜇𝐵𝑖(𝑦)can adopt any fuzzy membership function (MF). In this paper, Gaussian MFs are used: Gaussian(𝑥,𝑐,𝜎)=𝑒(1/2)((𝑥𝑐)/𝜎)2,(A.4) where 𝑐 is center of Gaussian membership function and 𝜎 is standard deviation of this cluster.

Layer 2, every node in the second layer represents the ring strength of a rule by multiplying the incoming signals and forwarding the product as. 𝑂2𝑖=𝜔𝑖=𝜇𝐴𝑖(𝑥)𝜇𝐵𝑖(𝑦),𝑖=1,2.(A.5)

Layer 3, the 𝑖th node in this layer calculates the ratio of the 𝑖th rule’s ring strength to the sum of all rules’ ring strengths: 𝑂3𝑖=𝜛𝑖=𝜔𝑖𝜔1+𝜔2,𝑖=1,2,(A.6) where𝜛 is referred to as the normalized ring strengths.

Layer 4, the node function in this layer is represented by 𝑂4𝑖=𝜛𝑖𝑧𝑖=𝜛𝑖𝑝𝑖𝑥+𝑞𝑖𝑦+𝑟𝑖,𝑖=1,2,(A.7) where 𝜛𝑖 is the output of layer 3 and {𝑝𝑖,𝑞𝑖,𝑟𝑖} are the parameter set. Parameters in this layer are referred to as the consequent parameters.

Layer 5, the single node in this layer computes the overall output as the summation of all incoming signals: 𝑂51=2𝑖1𝜛𝑖𝑧𝑖=𝜔1𝑧1+𝜔2𝑧2𝜔1+𝜔2.(A.8)

It is seen from the ANFIS architecture that when the values of the premise parameters are fixed, the overall output can be expressed as a linear combination of the consequent parameters: 𝜛𝑧=1𝑥𝑝1+𝜛1𝑦𝑞1+𝜛1𝑟1+𝜛2𝑥𝑝2+𝜛2𝑦𝑞2+𝜛2𝑟2.(A.9)

The hybrid learning algorithm combining the least square method and the backpropagation (BP) algorithm can be used to solve this problem. This algorithm converges much faster since it reduces the dimension of the search space of the BP algorithm. During the learning process, the premise parameters in layer 1 and the consequent parameters in layer 4 are tuned until the desired response of the FIS is achieved. The hybrid learning algorithm has a two-step process. First, while holding the premise parameters fixed, the functional signals are propagated forward to layer 4, where the consequent parameters are identified by the least square method. Second, the consequent parameters are held fixed while the error signals, the derivative of the error measure with respect to each node output, are propagated from the output end to the input end, and the premise parameters are updated by the standard BP algorithm.