Abstract

A gait energy image contains much gait information, which is one of the most effective means to recognize gait characteristics. The accuracy of gait recognition is greatly affected by covariates, such as the viewing angle, occlusion of clothing, and walking speed. Gait features differ somewhat by angles. Therefore, how to improve the recognition accuracy of a cross-view gait is a challenging task. This study proposes a new gait recognition algorithm structure. A Gabor filter is used to extract gait features from gait energy images, since it can extract features of different directions and scales. We use linear discriminant analysis (LDA) to tackle the problem that the feature dimension restricts the process. Finally, the improved local coupled extreme learning machine based on particle swarm optimization is used for the classification process of the extracted features of the gait. The proposed method and other current mainstream algorithms are compared in terms of the recognition accuracy based on the CASIA-A and CASIA-B datasets, and the simulation results show that the proposed algorithm has good performance and performs well at cross-view gait recognition.

1. Introduction

Identity recognition is a foundation of human social life, and increasing interest in biometrics of recognition algorithms has led to rapid improvements in biometric technologies with better performance. Biometric recognition is identification based on personal physiological or behavioral characteristics [1]. Common biometric recognition technologies include facial, fingerprint, palm print, and iris recognition. Biological patterns and features are usually unique and are impossible to forge or copy [2], so biometric authentication has great advantages over the traditional authentication.

A person’s gait, or way of walking, is a complex spatiotemporal biological feature that can be used to distinguish an individual [3], and hence, to realize personal identification [4]. Research [5] shows that the human gait is unique, and it is difficult to fake. We all have the experience of identifying friends and family through gait information. Unlike biometrics, such as the face, fingerprint, and iris, gait sequences can be collected undisturbed at long distances with minimal cooperation [6]. Therefore, gait recognition has the advantages that it requires no contact and is noninvasive. Furthermore, gait is difficult to hide or camouflage [7].

Gait recognition methods can be categorized as model-based [810] or appearance-based [11, 12]. Model-based methods construct the gait image to analyze the changes of parameters in the movement process and obtain static or dynamic parameter information of the body, such as height, leg length, swing angle, swing frequency of arms or legs, and stride length. Lee and Grimson [13] proposed to divide the silhouette of a foreground walking person into seven regions, fit each with an ellipse, and obtain the parameter information as the gait characteristics for classification and recognition. Cunado et al. [14] modeled the leg as two linked pendulums, one being the thigh between the knee and hip and the other the calf between the knee and ankle. They extracted the step length and distance from the hip to the ankle as gait characteristics for identification and classification. Training and test sequences are often selected from gait sequences captured from different angles [15]. The model-based method is difficult as regards model construction, which requires the simulation of gait information. Another problem is the complexity of parameter calculation.

Appearance-based methods focus on the shape of the silhouette or the overall movement of an individual. For example, an appearance-based method analyzes the changes of the contour of a walker over time [16] to obtain spatiotemporal characteristics for gait recognition. Wu et al. [17] proposed a nonnegative matrix factorization (NMF) method to obtain the local structure features of a human body to compensate for the loss of accuracy. A two-dimensional linear discriminant analysis (2DLDA) was proposed to project features into the discriminant space to improve classification. Aggarwal and Vishwakarma [18] proposed a 2D spatiotemporal template to describe gait motion, calculate Zernike moment invariants, and extract features from the spatial distribution of the directional gradient and the new mean method of directional pixels. More and Deore [19] proposed to fuse dynamic and static features for gait recognition, using a cross-wavelet transform to extract dynamic features, and a bipartite graph model to extract static features, followed by normalized feature fusion. In the Bayesian framework, k-means clustering is used for classification and recognition. Appearance-based methods are increasingly popular because they do not require modeling of all or part of the human body, they have low computational complexity, and they are insensitive to the quality of the profile image.

Gait recognition has three steps: target detection, feature extraction, and recognition. Feature extraction plays an important role in the recognition process and directly affects its accuracy. Hence, a suitable feature extraction algorithm is necessary. The Gabor filter is increasingly used in image processing [20]. Biological studies have shown that its expression of frequency and direction is similar to that of the human visual system [21], it can well simulate the sensory field function of single cells in the cerebral cortex, and it is a biomimetic mathematical model. In addition, it can carry out multidirection and multiscale feature extraction. Therefore, we use a Gabor filter with eight directions and five scales to extract features from the gait energy image, and the output of the Gabor filter is the extracted feature of the gait. However, the gait feature dimension increases the computing cost, and high-dimension features contain some redundant information, so it is necessary to reduce the dimension of the data extracted by means of the Gabor filter. For this, we use linear discriminant analysis (LDA), which is a supervised algorithm that can choose the best projection direction of classification performance and enhance the linear separability of data. It is a powerful tool for data dimensionality reduction. An improved local coupled extreme learning machine based on the particle swarm optimization (LC-PSO-ELM) algorithm is used to classify the gait feature and achieve better recognition accuracy. To demonstrate the algorithm’s effectiveness, we compare the dimension-reduction method to principal component analysis (PCA), and the proposed method of gait recognition in this paper is compared with the other current mainstream algorithms based on the CASIA-A and CASIA-B benchmark gait databases of the Institute of Automation, Chinese Academy of Sciences. The simulation results show that the proposed method can improve the recognition accuracy of gait and performs well at cross-view gait recognition.

The contributions of this study can be summarized as follows:(1)An automatic gait recognition structure is proposed(2)A Gabor filter is used to extract features of the gait energy map, and LDA is used to reduce the feature dimension, thus retaining as much feature information as possible(3)An improved local coupled extreme learning machine based on the particle swarm optimization (LC-PSO-ELM) algorithm is used to classify the extracted feature of the gait, which can improve the recognition accuracy at cross-view

The rest of this study is arranged as follows: the gait feature extraction method is described in Section 2. Dimension reduction and the recognition method are described in Section 3. Experimental results and discussion are presented in Section 4. Section 5 presents our conclusions.

2. Gait Feature Extraction

Feature extraction is an important step in gait recognition. In this study, a Gabor filter is used to extract gait features from a gait energy image (GEI).

2.1. Gait Energy Image

The GEI is one of the most effective ways to represent gait information [22]. It is the average silhouette of a gait cycle, which is a normalized cumulative energy image in a complete cycle [23]. The random noise of the image sequence in the periodic process is suppressed in the process of averaging the image. The obtained average image is robust and contains rich static and dynamic information. More and more researchers prefer to extract gait features from the GEI [24]. It is defined aswhere is the number of frames of the silhouette image in a period and is the gait silhouette at the time . The GEI generation process is shown in Figure 1.

2.2. Gabor Filter

A Gabor function can be used for edge detection in image processing. The two-dimensional Gabor filter can obtain the optimal localization in the spatial and frequency domains, so it can well describe the local structure information of an image corresponding to the spatial scale, spatial location, and direction selectivity. The frequency and direction representation of a Gabor filter are close to those of the human vision system and are often used to represent and describe texture features [25, 26]. We choose the following two-dimensional Gabor kernel form [27]:where and define the direction and scale, respectively, of the Gabor kernel, , represents modulus calculation, , , and , is the maximum frequency, and is the distribution coefficient of the kernel function in the frequency domain. In this paper, the Gabor filter is used to extract the features of five scales and eight directions. We set the parameters of formula (2) as , , , , and . We extract features of gait energy images by a Gabor kernel with 40 directions and scales. The extracted features are represented by a high-dimensional column vector, and high-dimensional gait feature data can be obtained from the output of the Gabor filter.

3. Dimension Reduction and Recognition Methods

The extracted feature of the gait based on the Gabor filter is high-dimensional data, and the character dimension for the classification algorithm is 4400. Therefore, some dimensionality reduction methods should be used to decrease the input complexity of the classification algorithm. In this study, we use LDA for dimension reduction of gait features.

3.1. Linear Discriminant Analysis

LDA is widely used for dimension reduction [28]. This algorithm can produce a reduced sample whose data have the maximum distance between classes and the minimum variance within classes in the new dimension space and the best linear separability. For the -dimensional feature data, LDA adopts the orthogonal transformation method to obtain -dimensional new feature data that are unrelated and are beneficial to classification, while minimizing information loss and . Considering the classification information of data, the algorithm can make the data easier to identify after dimension reduction. The algorithm selects the axis of the orthogonal transformation in the direction of large data variance. It reduces the correlation of different types of data and causes the transformation result to highlight the differences of feature data [29].

Given the input and output datasets , where , ,and is the number of datasets. Assume that the data matrix is grouped as , where is the number of classes. Suppose and are the mean vector and covariance matrix of the class , respectively. Let and be two arbitrary classes in the group of , the projection of the center of these two classes on a straight line is, respectively, and , and the covariance of the two types of classes is and , respectively. The LDA method projects high-dimensional data onto a lower dimensional space by maximizing the interclass variance from different classes and minimizing intraclass variance from the same class simultaneously, thus achieving maximum class discrimination in the dimensionality-reduced space. That is, a matrix is calculated based on the objective function as follows [30]:

The intraclass divergence matrix is defined as follows:

The class dispersion matrix is defined as follows:

Therefore, the general form of matrix representation of above objective function (3) can be obtained as follows:

3.2. Recognition Method
3.2.1. Extreme Learning Machine

Extreme learning machine (ELM) is a fast learning algorithm that was proposed by Huang in 2006 [31]. Unlike the traditional single-hidden layer neural network, ELM can randomly initialize the input weight and bias and obtain the corresponding output weight. The calculation is relatively simple, the computational complexity is low, and it has been studied and used by many researchers. In this paper, an improved ELM algorithm based on the local coupled extreme learning machine (LC-ELM) [32] and particle swarm optimization (PSO) is used for gait recognition. We first introduce the basic ELM learning algorithm.

For a single-hidden-layer neural network, suppose there are arbitrary samples , , . A neural network with neurons in a hidden layer can be expressed as follows:where is the activation function, is the number of the input samples, is the input weight, is the output weight, is the bias of hidden layer neurons, and is the inner product of and .

The goal of ELM neural network learning is to minimize the output error, which can be expressed as follows:

The above equations can be written compactly aswhere and . is called the hidden-layer output matrix of the ELM learning algorithm [33].

The training process of ELM learning algorithm is equivalent to finding a least squares solution of the linear system , and the above equation can be translated towhere is the Moore–Penrose generalized inverse of .

ELM algorithms are implemented by a fully coupled framework between the input layer and hidden layer. This fully coupled structure may incur higher computing costs when ELM has many hidden nodes and higher dimensions of input data.

3.2.2. Improved Extreme Learning Machine

To reduce the amount of computation, Qu [32] proposed LC-ELM, which was to decouple the input layer and the hidden layer based on ELM. Each hidden node is assigned a parameter in an input space [34]. Given a learning sample, the fuzzy membership function is used to measure the distance between the hidden node and the input sample as the activation degree of the hidden node. A fuzzy membership function and similarity relation are used to realize LC-ELM, which can reduce the complexity of the weight search space and improve the generalization performance. LC-ELM can be expressed as follows:

In the LC-ELM learning algorithm, the similarity relation denotes the distance between the input and the -th hidden layer neurons, which is expressed as the address . is a fuzzy membership function, and there are many choices, such as a Gaussian, sigmoid, or reverse sigmoid function. The underlying radius parameter is kept in to adjust the width of the activation area, which is also an optimized parameter, to match the address parameter . Combining the structure of LCFNN with the learning mechanism of ELM, LC-ELM also is a three-step learning algorithm whose network parameters (input weights , biases between the input layer and hidden layer, and address of hidden neurons) are assigned randomly, which is the same as ELM [32].

Compared with ELM, LC-ELM reduces the complexity of the algorithm structure, but its parameters are input randomly, and it may not be optimal, which will affect its performance. We propose a local coupled extreme learning machine based on particle swarm optimization [34] to improve the performance. This algorithm uses a particle swarm optimization strategy to optimize the four parameters in LC-ELM, so as to improve its accuracy and generalization performance.

Therefore, the particles in the searching space of the LC-PSO-ELM are composed of a set by the parameter values of input weights, hidden biases, address, and radius, which can be defined aswhere . When the optimal parameters of the local coupled extreme learning machine based on the particle swarm optimization algorithm are established based on formula (12), the weights between the hidden layer and the output layer of the algorithm are determined analytically based on formula (10). The flowchart of the LC-PSO-ELM algorithm is shown in Figure 2.

LC-PSO-ELM improves accuracy and generalization performance, and hence, we use it to recognize and classify gait features after dimension reduction.

4. Experiment and Analysis

To evaluate algorithm performance, we used the CASIA gait database [35] of the Center of Biometrics and Safety Technology, Institute of Automation, Chinese Academy of Sciences. We conducted three experiments, identified as A, B, and C, based on the CASIA-A and CASIA-B datasets. Experiment A examined the gait data of 20 subjects from the CASIA-A database. Each tester had 12 gait sequences with angles of 0°, 45°, and 90°. There was no angle discrimination. The gait data from all perspectives were examined together and were divided into training and testing sets at a 7 : 3 ratio. This small volume of experimental data was used to verify the superiority of the proposed algorithm. The experimental results are shown in Table 1.

The normal sequences in CASIA-B were used in experiment B. The database contained 124 gait sequences. The data volume was large, and there were many cross angles (0°, 18°, ..., 162°, 180°). We did not consider cross view in this experiment. The results are shown in Table 2.

The normal sequences in CASIA-B were also used in experiment C, which mainly considered the influence of cross view. The difference between the training and test sets was 18 degrees. The results are shown in Table 3. The average results of six trials with different training and test sets are shown.

4.1. Experimental Details

First, we compare the performance of the ELM and LC-PSO-ELM learning algorithms based on the gait data, as shown in Table 4. We selected the sigmoid function as the activation function for the two algorithms. The wave kernel was selected as the similarity function. The population size NP was set at 200, the maximum iteration time was 50, and the other control parameters are listed in Table 5.

Because the performance of ELM and LC-PSO-ELM is greatly influenced by hidden neurons, it is necessary to determine the number of these to ensure better classification performance. The experiment was performed by gradually increasing the number of hidden neurons, and the test results were recorded separately. The number of hidden neurons corresponding to the best classification result was identified as the best number for the algorithm. To show the superiority of the proposed method, we ran simulations with the dimension reduction and classification methods. The curve of the experimental results with the number of neurons in the hidden layer is shown in Figure 3. For each curve, the red curve represents the training accuracy, and the blue curve is the test accuracy.

Figure 3 including four subgraphs of Figures 3(a)3(d) shows accuracy results of dimension reduction using the ELM algorithm with PCA and LDA, respectively. As is known, PCA is also a common dimensionality reduction method. Different from LDA, PCA is unsupervised. It neglects the classification information and only ensures that the internal information of data is maximized after dimension reduction. In the experiments, the input dimension of data based on the LDA method is 123, and the input dimension of data based on the PCA method is 240 in the classification method, respectively.

Subgraphs of Figures 3(c) and 3(d), respectively, represent the accuracy results of classification using LC-PSO-ELM after dimensionality reduction by the two methods. The accuracy increases with the number of neurons in the hidden layer and then tends to stabilize. The best number of hidden layer neurons differs by an algorithm. Fewer hidden layer neurons are needed for gait recognition from data after dimension reduction by LDA. Comparing Figure 3, the number of hidden layer neurons corresponding to the best classification results in subgraphs of Figures 3(b) and 3(d) is 9 and 6, respectively, which means that LDA is more conducive to dimension reduction classification. After using the same method for data dimensionality reduction, LC-PSO-ELM needs fewer hidden neurons to obtain the best classification results. In addition, it is obvious that in subgraphs of Figures 3(c) and 3(d), fewer hidden neurons are needed to obtain the best classification results, about 145 and 6, respectively.

Compared with the traditional ELM algorithm, LC-PSO-ELM can simplify the algorithm structure, optimize the parameters of the neural networks, reduce the influence of random input parameters, and improve classification performance. The subgraph of Figure 3(d) shows the best classification performance being obtained most quickly, and the combination of LDA dimension reduction and LC-PSO-ELM classification can obtain higher recognition accuracy, which has certain advantages.

Therefore, as shown in Table 6 in the experiments of simulations, in pursuit of better generalization performance of different algorithms, the hidden neurons of ELM learning algorithm based on dimensional reduction methods of PCA or LDA are 275 and 10 in CASIA-A dataset, respectively, While the hidden neurons of the LC-PSO-ELM learning algorithm are 140 and 8, respectively. In the CASIA-B dataset with and without considering influence of cross view, the hidden neurons of the LC-PSO-ELM learning algorithm based on LDA are between 8 and 25 for increasing the generalization performance of different cross angles (0°, 18°, ..., 162°, 180°).

4.2. Experimental Results

Performance comparison between the ELM learning algorithm and the LC-PSO-ELM learning algorithm based on different dimensional reduction methods of PCA or LDA on the CASIA-A dataset is illustrated in Table 1. The recognition accuracies of ELM and LC-PSO-ELM based on LDA are 97.82% and 98.24%, while the algorithms based on PCA are 93.10% and 93.52%, respectively, which shows that the LDA for data dimensionality reduction is more conducive to classification.

Compared with ELM, the LC-PSO-ELM algorithm obtains higher recognition accuracy, demonstrating that LC-PSO-ELM has better classification performance with a compact network configuration of fewer hidden neurons.

In Table 2, the proposed method of LC-PSO-ELM based on the LDA dimensional reduction method and two other algorithms of the references of [4, 36] are compared based on the recognition accuracy of testing process on the CASIA-A datasets. The proposed algorithm structure achieves the highest recognition accuracy of 98.24%, which is, respectively, 5.9% and 0.86% greater than the existing algorithms of references of [4, 36], with 92.25% and 97.38%. The proposed algorithm structure has a certain effectiveness and superiority for gait recognition.

To demonstrate the superior performance of the proposed algorithm, Table 3 compares it to state-of-the-art methods based on gait recognition accuracy on the CASIA-B datasets. The algorithm achieves almost all the best classification accuracy results. Among the compared methods, Binsaadoon and El-Alfy [37] used the FLGBP method, with 88.59% average accuracy of gait recognition in 11 views. Zhang et al. [38] used the KPCA-LPP method and achieved 91.12% average gait recognition accuracy. Chao et al. [39] achieved 95% average gait recognition accuracy using the method of regarding the gait as a set. Wolf et al. [40] used the 3DCNN method with average gait recognition accuracy of 97.35%. The proposed algorithm improved the average gait recognition accuracy by 10.07%, 7.54%, 3.66%, and 1.31%, respectively, over the studies of Binsaadoon, Zhang, Chao, and Wolf.

In recent years, Yu et al. [41] proposed a gait recognition framework that is more sensitive to the perspective drawing of a walking gait, which caused the recognition accuracy to differ greatly, with an average recognition rate of 38.53%. Wang et al. [42] proposed a method based on multiview gait sequence fusion, with an average gait recognition accuracy of 88.75%. Wang et al. [43] used the TS-GAN method and achieved 88.1% average gait recognition accuracy. The results of the CASIA-B dataset with considering influence of cross view are shown in Table 7 based on the proposed method and the above gait recognition approaches.

Compared with the above methods, the proposed algorithm improved the average gait recognition accuracy by 57.27%, 7.05%, and 7.7%, respectively. The highest recognition accuracy was 99.48%, the lowest was 97.16%, and hence, there is little difference between the highest and lowest accuracy. The proposed algorithm attained almost all of the best classification accuracy figures from different views.

In conclusion, in this paper, a gait recognition method based on LC-PSO-ELM and the LDA dimensional reduction method is proposed and the performance is compared with some gait recognition methods on the CASIA dataset with and without considering influence of cross view. Table 2 shows results under the same view, while Tables 3 and 7 are under different views. The proposed algorithm has good recognition accuracy of gait and can be effectively applied to gait recognition. In particular, Table 7 shows that cross-view gait recognition based on the proposed method also has high accuracy. The proposed algorithm can effectively avoid the impact of different views on gait recognition.

5. Conclusions

As a promising biometric technology, gait recognition has attracted wide attention. We used a Gabor filter to extract multidirectional and multiscale features from gait energy images. Linear discriminant analysis was used to reduce the dimensionality of feature data. The improved extreme learning machine algorithm was used for recognition and classification. We conducted experiments on the CASIA datasets, and the results demonstrate the effectiveness of the proposed method. The algorithm has low complexity and good generalization performance. However, this study does not consider the influence of covariate factors, such as clothes and bags, as well as that of cross view, where the angle difference is more obvious. Hence, subsequent research will focus on recognition and classification under cross view and covariate factors.

Data Availability

The data used to support the findings of this research are included in the article. Further data and codes generated or used during the study can be obtained from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (grant no. 61973185 and 61773226), the Development Plan of the Young Innovation Team in Colleges and Universities of Shandong Province (grant no. 2019KJN011), the Shandong Province Key Research and Development Program (grant no. 2018GGX103054), and the Young Doctor Cooperation Foundation of Qilu University of Technology (Shandong Academy of Sciences) (grant no. 2018BSHZ2008).