Department of Electronic Engineering, Division of Engineering, King's College London WC2R2LS, UK
Abstract
Most of the existing gait recognition methods rely on a single view, usually the side view, of
the walking person. This paper investigates the case in which several views are available for gait
recognition. It is shown that each view has unequal discrimination power and, therefore, should have
unequal contribution in the recognition process. In order to exploit the availability of multiple views,
several methods for the combination of the results that are obtained from the individual views are
tested and evaluated. A novel approach for the combination of the results from several views is also
proposed based on the relative importance of each view. The proposed approach generates superior
results, compared to those obtained by using individual views or by using multiple views that are
combined using other combination methods.
1. Introduction
Gait
recognition [1] aims
at the identification of individuals based on their walking style. Recognition
based on human gait has several advantages related to the unobtrusiveness and
the ease with which gait information can be captured. Unlike other biometrics,
gait can be captured from a distant camera, without drawing the attention of
the observed subject. One of the earliest works studying human gait is that of
Johansson [2], who
showed that people are able to recognize human locomotion and to identify
familiar persons, by presenting a series of video sequences of different
patterns of motion to a group of participants. Later, Cutting and Kozlowski in
[3] used moving light
displays (MLDs) to further show the human ability for person identification and
gender classification.
Although several approaches have been presented for
the recognition of human gait, most of them limit their attention to the case
in which only the side view is available since this viewing angle is considered
to provide the richest information of the gait of the waking person [4–7]. In [8], an experiment was carried
out using two views, namely, the frontal-parallel view and the side view, from
which the silhouettes of the subjects in two walking stances were extracted.
This approach exhibited higher recognition accuracy for the frontal-parallel
view than that of the side view. The side view was also examined in [9] together with another view
from a different angle, and the static parameters, such as the height of the
walking person, as well as distances between body parts, were used in the
template matching. Apart from the recognition rate, results were also reported
based on a small sample set using a confusion metric which reflects the
effectiveness of the approach in the situation of a large population of
subjects. The authors in [10]
synthesize the side view silhouettes from those captured by multiple cameras
employing visual hull techniques. In [11], the perspective projection and optical flow-based
structure of motion approach was taken instead. In [12], information from multiple cameras is gathered to
construct a 3D gait model.
Among recent works, the authors in [13] use improved discriminant
analysis for gait recognition, the authors in [14] use information of gait shape and gait dynamics,
while the authors in [15] use a gait energy image (GEI). However, all above
approaches are based only on side view sequences.
In this paper, we use the motion of body (MoBo)
database from the Carnegie Mellon University (CMU) in order to investigate the
contribution of each viewing direction to the recognition performance of a gait
recognition system. In general, we try to answer the fundamental question: if
several views are available to a gait recognition system, what is the most
appropriate way to combine them in order to enhance the performance and the
reliability of the system? We provide a detailed analysis of the role and
the contribution of each viewing direction by reporting recognition results of
systems based on each one of the available views. We also propose a novel way
to combine the results obtained from different single views. In the proposed
approach, we set a weight for each view, based on its importance as it is
calculated using statistical processing of the differences between views. The
experimental results demonstrate the superior performance of the proposed
weighted combination approach in comparison to the single-view approach and
other combination methods for multiple views.
The paper is organized as follows. Section 2 presents
the recognition performance of individual views in a multiview system. The
proposed method for the combination of different views is presented in Section
3. Section 4 reports the detailed results using the proposed approach for
the combination of several views. Finally, conclusions are drawn in Section 5.
2. Gait Recognition Using Multiple Views
The CMU MoBo database does not contain explicitly the reference set and the
test sets as in [5].
In this work, we use the “fast walk” sequences as the reference set, and
the “slow walk” sequences as the test set. As mentioned in the introduction,
our goal is to find out which viewing directions have the greatest contribution
in a multiview gait recognition system. To this end, we adopt a simple and
straightforward way in order to determine the similarity between gait sequences
in the reference and test databases. Specifically, from each gait sequence,
taken from a specific viewpoint, we construct a simple template
by averaging
all frames in the sequence
(1)where
,
, are the silhouettes in a gait sequence and
is the number
of silhouettes. This approach for template construction was also taken in
[15–17].
Let
,
denote the
templates corresponding to the
th and the
th subjects in
the test database and the reference database, respectively. Their distance is
calculated using the following distance metric:
(2)where
is the
-norm and
,
are the
silhouettes belonging to the
th test subject
and
th reference
subject, respectively. The associated frame indices
and
run from 1 to the total number of silhouettes in a
sequence (
and
, resp.). Essentially, a template is produced
for each subject by averaging all silhouettes in the gait sequence.
Specifically, the Euclidean distance between two templates is taken as a
measure of their dissimilarity. In practice, this means that a smaller template
distance corresponds to a closer match between two compared subjects.
In order to evaluate the contribution of various
viewing directions in the human gait recognition, we choose MoBo database [18]
from the CMU which contains walking subjects
captured from six cameras located in positions as shown in Figure 1. The
database consists of walking sequences of 23 male and 2 female subjects, who
were recorded performing four kinds of activities, that is, fast walk, slow
walk, and so on. Before the application of our methodologies, we use bounding
boxes of silhouettes, then align and normalize all silhouettes so that they
have uniform dimensions, that is, 128 pixels tall and 80 pixels wide, in order
to eliminate height differences of the walking subjects. We use five (see
Figure 2) out of the six available viewing directions, omitting the north view,
since it is practically identical to the south view (i.e., the frontal view).
The cumulative match scores for each of these five viewing directions are shown
in Figure 4, and the recognition rates at rank 1 and rank 5 are reported in
Table 1.
Table 1: The recognition
rates of the five viewing directions reported at rank 1 and rank 5.
Figure 1: Camera arrangement in the CMU MoBo
database. Six cameras are oriented clockwise in the east, southeast, south,
southwest, northwest, north, with the walking subject facing toward the south.
Figure 2: Available views for multiview gait recognition.
Figure 3: Templates constructed using the five available views.
Figure 4: Cumulative match scores for five viewing directions,
namely, the east, southeast, south, southwest, and the northwest.
One can see
clearly from Table 1 that the results obtained using the south and the east
viewing directions are the best, especially at rank 1. Results achieved using
the rest of the viewing directions are worse. This is a clear indication that
the south and the east viewing directions capture most of the gait information
of the walking subjects and, therefore, are the most discriminant viewing
directions. In the next section, we will show how to combine results from
several viewing directions in order to achieve improved recognition performance.
3. Combination of Different Views Using a Single Distance Metric
In this
section, we propose a novel method for the combination of results from
different views in order to improve the performance of a gait recognition
system. In our approach, we use weights in order to reflect the importance of
each view during the combination. This means that instead of using a single
distance for the evaluation of similarity between walking persons
and
, we use multiple distances between the respective views
and combine them in a total distance which is given by
(3)where
is the total
number of available views. Therefore, our task is to determine the weights
, which yield smaller total distance when
, and larger when
.
Suppose that
, are random variables representing the distances
between a test subject and its corresponding reference subjects (i.e., “within class” distance), and
, are random variables representing the distances
between a test subject and a reference subject other than its corresponding
subject (i.e., “between class” distance).
In order to maximize the efficiency of our system, we
first define the distance
between
corresponding subjects in the reference and test databases:
(4)and the weighed distance between
noncorresponding subjects:
(5)
In an ideal gait recognition system,
should always
be smaller than
. In practice, a recognition error takes place
whenever
. Therefore, the probability of error
is
(6)
We define the random variable
as
(7)if we assume that
and
are normal
random vectors, then
is a normal
random variable with probability density distribution
(8)where
is the mean value
of
,
is the variance
of
.
Therefore, using (7) and (8), the probability of error
in (6) is expressed as
(9)
Furthermore, if
, then the above expression is equivalent
to
(10)
The probability of error can therefore be minimized by
minimizing
, or equivalently by maximizing
. To this end, we have to calculate
and
. If
denotes
statistical expectation, then the mean value of
is
(11)where
and
are the mean
vectors of
and
. The variance of
is
(12)
If we assume that
and
are
independent, then
(13)
Therefore, the optimization problem becomes equivalent
to maximizing
(14)where
(15)
The maximization of the above quality is reminiscent
of the optimization problem that appears in two-class linear discriminant
analysis. Trivially, the ratio can be maximized by determining a vector
that satisfies
[19]
(16)for some
. In the case that we are considering, the optimal
is given
by
(17)If we assume that the distances
corresponding to different views are independent, then
(18)where
is the total
number of available views. Therefore, the optimal weight vector
is
(19)
Of course, the practical application of the above
theory requires the availability of a database (other than the test database)
which will be used in conjunction with the reference database for the
determination of
,
,
,
. In our experiments, we used the CMU database of
individuals walking with a ball for this purpose.
In the ensuring section, we will use the weight vector
in (19) for the combination of views and the evaluation of the resulting
multiview gait recognition system.
4. Experimental Results
For the
experimental evaluation of our methods, we used the MoBo
database from the CMU. The CMU database has 25
subjects walking on a treadmill. Although this is an artificial setting that
might affect the results, using this database was essentially our only option
since this is the only database that provides five views. We used the “fast
walk” sequences as reference and the “slow walk” as test sequences.
We also used the “with a ball” sequences in conjunction with the
reference sequences for the determination of the weights in (19). The
comparisons of recognition performance are based on cumulative match scores at
rank 1 and rank 5. Rank 1 results report the percentage of subjects in a test
set that were identified exactly. Rank 5 results report the percentage of test
subjects whose actual match in the reference database was in the top 5 matches.
In this section, we present the results generated by the proposed view
combination method. These results are compared to the results obtained using
different single views and other combination methods.
Initially, we tried several simple methods for the
combination of the results obtained using the available views. Specifically,
the total distance between two subjects was taken to be equal to the mean, max, min, median, and product of the distances
corresponding to each of the five viewing directions. Such combination
approaches were originally explored in [20]. As shown in Figure 5 and Table 2, among all the
above combination methods, the most satisfactory results were obtained by using the Product and Min rules.
Table 2: The recognition
rates of the proposed and the other five combination methods.
Figure 5: Cumulative match scores for the proposed and the other
five combination methods.
In the sequel, we applied the proposed methodology for
the determination of the weights in (3). Based on (19), the weights for the
combination of the distances of the available views were calculated and are tabulated
in Table 3. As seen, the most suitable views seem to be the frontal (east)
and the side (south) views since these views are given the greater weights.
Table 3: The weights
calculated by the proposed method.
The above conclusion is experimentally verified by
studying the recognition performance that corresponds to each of the views
independently. The cumulative match scores and the recognition rates that are
achieved using each view as well as those achieved by the proposed method are
shown in Figure 6 and Table 4, respectively. As we can see, the south and the
east views have the highest recognition rates, as well as the highest weights,
which means that the weights calculated by the proposed method correctly
reflect the importance of the views. The results obtained by the proposed
combination method are superior to those obtained from single views.
Table 4: The recognition
rates of the five viewing directions and the proposed combination method.
Figure 6: Cumulative match scores for five viewing directions
and the proposed combination method.
Since
superior results are generally achieved using the frontal (east) and side
(south) views (see Figure 7), the proposed method was also used to combine
those two views. Figure 8 shows that the combination of the east and the south
views using the proposed method has much better performance than using the
views individually. It is interesting to note that, in theory, using two views
should be sufficient for capturing the 3D information in a sequence. Although
here we use silhouettes (so there is no texture that could be used for the
estimation of 3D correspondence), it appears that the combination of these two
views seems very efficient. By trying other combinations of the two views, we
discovered that the optimal combination of the east and the south view is the
only one which outperforms all single views.
Figure 7: Frontal view and side view.
Figure 8: Cumulative match scores for the east and the south
viewing directions and the proposed combination method.
The proposed system was also evaluated in terms of
verification performance. The most widely used method for this task is to
present receiver operating characteristic (ROC) curves. In an access control
scenario, this means calculating the probability of positive recognition of an
authorized subject versus the probability of granting access to an unauthorized
subject. In order to calculate the above probabilities, different thresholds
were set for examining the distances between the test and reference sequences.
We calculated the distances for the five intraviews, and combined them using
weights and five other existing methods mentioned in the previous section.
Figure 9 shows the ROC curves of the methods using single views and combined
views. In Table 5, verification results are presented at 5%, 10%, and 20% false
alarm rate for the proposed method and the existing methods. As seen, within
the five viewing directions, the frontal (east) and side (south) views have the
best performances; and among the five existing combination methods, the Min method obtains the best results. As expected, the proposed method has superior
verification performance, in comparison to any of the single-view methods as
well as in comparison to the other methods for multiview recognition.
Table 5: The
verification rates for the single-view and combined-views methods.
Figure 9: The ROC curves: (a) single-view methods and the
proposed method, (b) the proposed and five existing combination methods.
5. Conclusion
In this paper,
we investigated the exploitation of the availability of various views in a gait
recognition system using the MoBo database. We showed that
each view has unequal discrimination power and therefore has unequal
contribution to the task of gait recognition. A novel approach was proposed for
the combination of the results of different views into a common distance metric
for the evaluation of similarity between gait sequences. By using the proposed
method, which uses different weights in order to exploit the different
importance of the views, improved recognition performance was achieved in
comparison to the results obtained from individual views or by using other
combination methods.
Acknowledgment
This work was supported by the European Commission funded FP7 ICT STREP Project ACTIBIO,
under contract no. 215372
References
- N. V. Boulgouris, D. Hatzinakos, and K. N. Plataniotis, “Gait recognition: a challenging signal processing technology for biometric identification,” IEEE Signal Processing Magazine, vol. 22, no. 6, 78 pages, 2005.
- G. Johansson, “Visual motion perception,” Scientific American, vol. 232, no. 6, 76 pages, 1975.
- J. E. Cutting and L. T. Kozlowski, “Recognizing friends by their walk: gait perception without familiarity cues,” Bulletin Psychonometric Society, vol. 9, no. 5, 353 pages, 1977.
- L. Lee and W. E. L. Grimson, “Gait analysis for recognition and classification,” in Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '02), p. 148, Washington, DC, USA, May 2002.
- S. Sarkar, P. J. Phillips, Z. Liu, I. R. Vega, P. Grother, and K. W. Bowyer, “The humanID gait challenge problem: data sets, performance, and analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 2, 162 pages, 2005.
- N. V. Boulgouris, K. N. Plataniotis, and D. Hatzinakos, “Gait recognition using linear time normalization,” Pattern Recognition, vol. 39, no. 5, 969 pages, 2006.
- M. Ekinci, “Gait recognition using multiple projections,” in Proceedings of the 7th IEEE International Conference on
Automatic Face and Gesture Recognition (FGR '06), p. 517, Southampton, UK, April 2006.
- R. T. Collins, R. Gross, and J. Shi, “Silhouette-based human identification from body shape and gait,” in Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '02), p. 351, Washington, DC, USA, May 2002.
- A. Y. Johnson and A. F. Bobick, “A multi-view method for gait recognition using static body parameters,” in Proceedings of the 3rd International Conference on Audio and Video-Based Biometric Person Authentifcation (AVBPA '01), p. 301, Halmstad, Sweden, June 2001.
- G. Shakhnarovich, L. Lee, and T. Darrell, “Integrated face and gait recognition from multiple views,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '01), vol. 1, p. 439, Kauai, Hawaii, USA, December 2001.
- A. Kale, A. K. R. Chowdhury, and R. Chellappa, “Towards a view invariant gait recognition algorithm,” in Proceedings of IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS '03), p. 143, Miami, Fla, USA, July 2003.
- G. Zhao, G. Liu, H. Li, and M. Pietikainen, “3D gait recognition using multiple cameras,” in Proceedings of the 7th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '06), p. 529, Southampton, UK, April 2006.
- D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant analysis and gabor features for gait recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 10, 1700 pages, 2007.
- Z. Liu and S. Sarkar, “Improved gait recognition by gait dynamics normalization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 6, 863 pages, 2006.
- J. Man and B. Bhanu, “Individual recognition using gait energy image,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 2, 316 pages, 2006.
- Z. Liu and S. Sarkar, “Simplest representation yet for gait recognition: averaged silhouette,” in Proceedings of the 17th International Conference on Pattern Recognition (ICPR '04), vol. 4, p. 211, Cambridge, UK, August 2004.
- G. V. Veres, L. Gordon, J. N. Carter, and M. S. Nixon, “What image information is important in silhouette-based gait recognition?,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '04), vol. 2, p. 776, Washington, DC, USA, June-July 2004.
- R. Gross and J. Shi, “The cmu motion of body (MoBo) database,” Robotics Institute, Carnegie Mellon University, Pittsburgh, Pa, USA, 2001.
- R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, John Wiley & Sons, New York, NY, USA, 2001.
- J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, 226 pages, 1998.