EURASIP Journal on Advances in Signal Processing
Volume 2008 (2008), Article ID 629102, 8 pages
doi:10.1155/2008/629102
Research Article

Human Gait Recognition Based on Multiview Gait Sequences

Department of Electronic Engineering, Division of Engineering, King's College London WC2R2LS, UK

Received 6 June 2007; Revised 10 October 2007; Accepted 23 January 2008

Academic Editor: Juwei Lu

Copyright © 2008 Xiaxi Huang and Nikolaos V. Boulgouris. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Most of the existing gait recognition methods rely on a single view, usually the side view, of the walking person. This paper investigates the case in which several views are available for gait recognition. It is shown that each view has unequal discrimination power and, therefore, should have unequal contribution in the recognition process. In order to exploit the availability of multiple views, several methods for the combination of the results that are obtained from the individual views are tested and evaluated. A novel approach for the combination of the results from several views is also proposed based on the relative importance of each view. The proposed approach generates superior results, compared to those obtained by using individual views or by using multiple views that are combined using other combination methods.

1. Introduction

Gait recognition [1] aims at the identification of individuals based on their walking style. Recognition based on human gait has several advantages related to the unobtrusiveness and the ease with which gait information can be captured. Unlike other biometrics, gait can be captured from a distant camera, without drawing the attention of the observed subject. One of the earliest works studying human gait is that of Johansson [2], who showed that people are able to recognize human locomotion and to identify familiar persons, by presenting a series of video sequences of different patterns of motion to a group of participants. Later, Cutting and Kozlowski in [3] used moving light displays (MLDs) to further show the human ability for person identification and gender classification.

Although several approaches have been presented for the recognition of human gait, most of them limit their attention to the case in which only the side view is available since this viewing angle is considered to provide the richest information of the gait of the waking person [47]. In [8], an experiment was carried out using two views, namely, the frontal-parallel view and the side view, from which the silhouettes of the subjects in two walking stances were extracted. This approach exhibited higher recognition accuracy for the frontal-parallel view than that of the side view. The side view was also examined in [9] together with another view from a different angle, and the static parameters, such as the height of the walking person, as well as distances between body parts, were used in the template matching. Apart from the recognition rate, results were also reported based on a small sample set using a confusion metric which reflects the effectiveness of the approach in the situation of a large population of subjects. The authors in [10] synthesize the side view silhouettes from those captured by multiple cameras employing visual hull techniques. In [11], the perspective projection and optical flow-based structure of motion approach was taken instead. In [12], information from multiple cameras is gathered to construct a 3D gait model.

Among recent works, the authors in [13] use improved discriminant analysis for gait recognition, the authors in [14] use information of gait shape and gait dynamics, while the authors in [15] use a gait energy image (GEI). However, all above approaches are based only on side view sequences.

In this paper, we use the motion of body (MoBo) database from the Carnegie Mellon University (CMU) in order to investigate the contribution of each viewing direction to the recognition performance of a gait recognition system. In general, we try to answer the fundamental question: if several views are available to a gait recognition system, what is the most appropriate way to combine them in order to enhance the performance and the reliability of the system? We provide a detailed analysis of the role and the contribution of each viewing direction by reporting recognition results of systems based on each one of the available views. We also propose a novel way to combine the results obtained from different single views. In the proposed approach, we set a weight for each view, based on its importance as it is calculated using statistical processing of the differences between views. The experimental results demonstrate the superior performance of the proposed weighted combination approach in comparison to the single-view approach and other combination methods for multiple views.

The paper is organized as follows. Section 2 presents the recognition performance of individual views in a multiview system. The proposed method for the combination of different views is presented in Section 3. Section 4 reports the detailed results using the proposed approach for the combination of several views. Finally, conclusions are drawn in Section 5.

2. Gait Recognition Using Multiple Views

The CMU MoBo database does not contain explicitly the reference set and the test sets as in [5]. In this work, we use the “fast walk” sequences as the reference set, and the “slow walk” sequences as the test set. As mentioned in the introduction, our goal is to find out which viewing directions have the greatest contribution in a multiview gait recognition system. To this end, we adopt a simple and straightforward way in order to determine the similarity between gait sequences in the reference and test databases. Specifically, from each gait sequence, taken from a specific viewpoint, we construct a simple template by averaging all frames in the sequence(1)where , , are the silhouettes in a gait sequence and is the number of silhouettes. This approach for template construction was also taken in [1517].

Let , denote the templates corresponding to the th and the th subjects in the test database and the reference database, respectively. Their distance is calculated using the following distance metric:(2)where is the -norm and , are the silhouettes belonging to the th test subject and th reference subject, respectively. The associated frame indices and run from 1 to the total number of silhouettes in a sequence ( and , resp.). Essentially, a template is produced for each subject by averaging all silhouettes in the gait sequence. Specifically, the Euclidean distance between two templates is taken as a measure of their dissimilarity. In practice, this means that a smaller template distance corresponds to a closer match between two compared subjects.

In order to evaluate the contribution of various viewing directions in the human gait recognition, we choose MoBo database [18] from the CMU which contains walking subjects captured from six cameras located in positions as shown in Figure 1. The database consists of walking sequences of 23 male and 2 female subjects, who were recorded performing four kinds of activities, that is, fast walk, slow walk, and so on. Before the application of our methodologies, we use bounding boxes of silhouettes, then align and normalize all silhouettes so that they have uniform dimensions, that is, 128 pixels tall and 80 pixels wide, in order to eliminate height differences of the walking subjects. We use five (see Figure 2) out of the six available viewing directions, omitting the north view, since it is practically identical to the south view (i.e., the frontal view). The cumulative match scores for each of these five viewing directions are shown in Figure 4, and the recognition rates at rank 1 and rank 5 are reported in Table 1.

Table 1: The recognition rates of the five viewing directions reported at rank 1 and rank 5.
Figure 1: Camera arrangement in the CMU MoBo database. Six cameras are oriented clockwise in the east, southeast, south, southwest, northwest, north, with the walking subject facing toward the south.
Figure 2: Available views for multiview gait recognition.
Figure 3: Templates constructed using the five available views.
Figure 4: Cumulative match scores for five viewing directions, namely, the east, southeast, south, southwest, and the northwest.

One can see clearly from Table 1 that the results obtained using the south and the east viewing directions are the best, especially at rank 1. Results achieved using the rest of the viewing directions are worse. This is a clear indication that the south and the east viewing directions capture most of the gait information of the walking subjects and, therefore, are the most discriminant viewing directions. In the next section, we will show how to combine results from several viewing directions in order to achieve improved recognition performance.

3. Combination of Different Views Using a Single Distance Metric

In this section, we propose a novel method for the combination of results from different views in order to improve the performance of a gait recognition system. In our approach, we use weights in order to reflect the importance of each view during the combination. This means that instead of using a single distance for the evaluation of similarity between walking persons and , we use multiple distances between the respective views and combine them in a total distance which is given by(3)where is the total number of available views. Therefore, our task is to determine the weights , which yield smaller total distance when , and larger when .

Suppose that , are random variables representing the distances between a test subject and its corresponding reference subjects (i.e., “within class” distance), and , are random variables representing the distances between a test subject and a reference subject other than its corresponding subject (i.e., “between class” distance).

In order to maximize the efficiency of our system, we first define the distance between corresponding subjects in the reference and test databases:(4)and the weighed distance between noncorresponding subjects:(5)

In an ideal gait recognition system, should always be smaller than . In practice, a recognition error takes place whenever . Therefore, the probability of error is(6)

We define the random variable as(7)if we assume that and are normal random vectors, then is a normal random variable with probability density distribution(8)where is the mean value of , is the variance of .

Therefore, using (7) and (8), the probability of error in (6) is expressed as(9)

Furthermore, if , then the above expression is equivalent to(10)

The probability of error can therefore be minimized by minimizing , or equivalently by maximizing . To this end, we have to calculate and . If denotes statistical expectation, then the mean value of is(11)where and are the mean vectors of and . The variance of is(12)

If we assume that and are independent, then(13)

Therefore, the optimization problem becomes equivalent to maximizing(14)where(15)

The maximization of the above quality is reminiscent of the optimization problem that appears in two-class linear discriminant analysis. Trivially, the ratio can be maximized by determining a vector that satisfies [19](16)for some . In the case that we are considering, the optimal is given by(17)If we assume that the distances corresponding to different views are independent, then(18)where is the total number of available views. Therefore, the optimal weight vector is(19)

Of course, the practical application of the above theory requires the availability of a database (other than the test database) which will be used in conjunction with the reference database for the determination of , , , . In our experiments, we used the CMU database of individuals walking with a ball for this purpose.

In the ensuring section, we will use the weight vector in (19) for the combination of views and the evaluation of the resulting multiview gait recognition system.

4. Experimental Results

For the experimental evaluation of our methods, we used the MoBo database from the CMU. The CMU database has 25 subjects walking on a treadmill. Although this is an artificial setting that might affect the results, using this database was essentially our only option since this is the only database that provides five views. We used the “fast walk” sequences as reference and the “slow walk” as test sequences. We also used the “with a ball” sequences in conjunction with the reference sequences for the determination of the weights in (19). The comparisons of recognition performance are based on cumulative match scores at rank 1 and rank 5. Rank 1 results report the percentage of subjects in a test set that were identified exactly. Rank 5 results report the percentage of test subjects whose actual match in the reference database was in the top 5 matches. In this section, we present the results generated by the proposed view combination method. These results are compared to the results obtained using different single views and other combination methods.

Initially, we tried several simple methods for the combination of the results obtained using the available views. Specifically, the total distance between two subjects was taken to be equal to the mean, max, min, median, and product of the distances corresponding to each of the five viewing directions. Such combination approaches were originally explored in [20]. As shown in Figure 5 and Table 2, among all the above combination methods, the most satisfactory results were obtained by using the Product and Min rules.

Table 2: The recognition rates of the proposed and the other five combination methods.
Figure 5: Cumulative match scores for the proposed and the other five combination methods.

In the sequel, we applied the proposed methodology for the determination of the weights in (3). Based on (19), the weights for the combination of the distances of the available views were calculated and are tabulated in Table 3. As seen, the most suitable views seem to be the frontal (east) and the side (south) views since these views are given the greater weights.

Table 3: The weights calculated by the proposed method.

The above conclusion is experimentally verified by studying the recognition performance that corresponds to each of the views independently. The cumulative match scores and the recognition rates that are achieved using each view as well as those achieved by the proposed method are shown in Figure 6 and Table 4, respectively. As we can see, the south and the east views have the highest recognition rates, as well as the highest weights, which means that the weights calculated by the proposed method correctly reflect the importance of the views. The results obtained by the proposed combination method are superior to those obtained from single views.

Table 4: The recognition rates of the five viewing directions and the proposed combination method.
Figure 6: Cumulative match scores for five viewing directions and the proposed combination method.

Since superior results are generally achieved using the frontal (east) and side (south) views (see Figure 7), the proposed method was also used to combine those two views. Figure 8 shows that the combination of the east and the south views using the proposed method has much better performance than using the views individually. It is interesting to note that, in theory, using two views should be sufficient for capturing the 3D information in a sequence. Although here we use silhouettes (so there is no texture that could be used for the estimation of 3D correspondence), it appears that the combination of these two views seems very efficient. By trying other combinations of the two views, we discovered that the optimal combination of the east and the south view is the only one which outperforms all single views.

Figure 7: Frontal view and side view.
Figure 8: Cumulative match scores for the east and the south viewing directions and the proposed combination method.

The proposed system was also evaluated in terms of verification performance. The most widely used method for this task is to present receiver operating characteristic (ROC) curves. In an access control scenario, this means calculating the probability of positive recognition of an authorized subject versus the probability of granting access to an unauthorized subject. In order to calculate the above probabilities, different thresholds were set for examining the distances between the test and reference sequences. We calculated the distances for the five intraviews, and combined them using weights and five other existing methods mentioned in the previous section. Figure 9 shows the ROC curves of the methods using single views and combined views. In Table 5, verification results are presented at 5%, 10%, and 20% false alarm rate for the proposed method and the existing methods. As seen, within the five viewing directions, the frontal (east) and side (south) views have the best performances; and among the five existing combination methods, the Min method obtains the best results. As expected, the proposed method has superior verification performance, in comparison to any of the single-view methods as well as in comparison to the other methods for multiview recognition.

Table 5: The verification rates for the single-view and combined-views methods.
Figure 9: The ROC curves: (a) single-view methods and the proposed method, (b) the proposed and five existing combination methods.

5. Conclusion

In this paper, we investigated the exploitation of the availability of various views in a gait recognition system using the MoBo database. We showed that each view has unequal discrimination power and therefore has unequal contribution to the task of gait recognition. A novel approach was proposed for the combination of the results of different views into a common distance metric for the evaluation of similarity between gait sequences. By using the proposed method, which uses different weights in order to exploit the different importance of the views, improved recognition performance was achieved in comparison to the results obtained from individual views or by using other combination methods.

Acknowledgment

This work was supported by the European Commission funded FP7 ICT STREP Project ACTIBIO, under contract no. 215372

References

  1. N. V. Boulgouris, D. Hatzinakos, and K. N. Plataniotis, “Gait recognition: a challenging signal processing technology for biometric identification,” IEEE Signal Processing Magazine, vol. 22, no. 6, 78 pages, 2005.
  2. G. Johansson, “Visual motion perception,” Scientific American, vol. 232, no. 6, 76 pages, 1975.
  3. J. E. Cutting and L. T. Kozlowski, “Recognizing friends by their walk: gait perception without familiarity cues,” Bulletin Psychonometric Society, vol. 9, no. 5, 353 pages, 1977.
  4. L. Lee and W. E. L. Grimson, “Gait analysis for recognition and classification,” in Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '02), p. 148, Washington, DC, USA, May 2002.
  5. S. Sarkar, P. J. Phillips, Z. Liu, I. R. Vega, P. Grother, and K. W. Bowyer, “The humanID gait challenge problem: data sets, performance, and analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 2, 162 pages, 2005.
  6. N. V. Boulgouris, K. N. Plataniotis, and D. Hatzinakos, “Gait recognition using linear time normalization,” Pattern Recognition, vol. 39, no. 5, 969 pages, 2006.
  7. M. Ekinci, “Gait recognition using multiple projections,” in Proceedings of the 7th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '06), p. 517, Southampton, UK, April 2006.
  8. R. T. Collins, R. Gross, and J. Shi, “Silhouette-based human identification from body shape and gait,” in Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '02), p. 351, Washington, DC, USA, May 2002.
  9. A. Y. Johnson and A. F. Bobick, “A multi-view method for gait recognition using static body parameters,” in Proceedings of the 3rd International Conference on Audio and Video-Based Biometric Person Authentifcation (AVBPA '01), p. 301, Halmstad, Sweden, June 2001.
  10. G. Shakhnarovich, L. Lee, and T. Darrell, “Integrated face and gait recognition from multiple views,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '01), vol. 1, p. 439, Kauai, Hawaii, USA, December 2001.
  11. A. Kale, A. K. R. Chowdhury, and R. Chellappa, “Towards a view invariant gait recognition algorithm,” in Proceedings of IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS '03), p. 143, Miami, Fla, USA, July 2003.
  12. G. Zhao, G. Liu, H. Li, and M. Pietikainen, “3D gait recognition using multiple cameras,” in Proceedings of the 7th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '06), p. 529, Southampton, UK, April 2006.
  13. D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant analysis and gabor features for gait recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 10, 1700 pages, 2007.
  14. Z. Liu and S. Sarkar, “Improved gait recognition by gait dynamics normalization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 6, 863 pages, 2006.
  15. J. Man and B. Bhanu, “Individual recognition using gait energy image,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 2, 316 pages, 2006.
  16. Z. Liu and S. Sarkar, “Simplest representation yet for gait recognition: averaged silhouette,” in Proceedings of the 17th International Conference on Pattern Recognition (ICPR '04), vol. 4, p. 211, Cambridge, UK, August 2004.
  17. G. V. Veres, L. Gordon, J. N. Carter, and M. S. Nixon, “What image information is important in silhouette-based gait recognition?,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '04), vol. 2, p. 776, Washington, DC, USA, June-July 2004.
  18. R. Gross and J. Shi, “The cmu motion of body (MoBo) database,” Robotics Institute, Carnegie Mellon University, Pittsburgh, Pa, USA, 2001.
  19. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, John Wiley & Sons, New York, NY, USA, 2001.
  20. J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, 226 pages, 1998.