About this Journal Submit a Manuscript Table of Contents
Advances in Mechanical Engineering
Volume 2013 (2013), Article ID 546206, 8 pages
http://dx.doi.org/10.1155/2013/546206
Research Article

An Improved Pedestrian Detection Algorithm Integrating Haar-Like Features and HOG Descriptors

1Beijing Urban Engineering Design and Research Institute, Beijing 100037, China
2College of Information Engineering, North China University of Technology, Beijing 100041, China

Received 24 July 2013; Revised 9 October 2013; Accepted 23 October 2013

Academic Editor: Wuhong Wang

Copyright © 2013 Yun Wei et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Considering the importance of pedestrian detection in a variety of applications such as advanced robots and intelligent surveillance systems, this paper presents an improved pedestrian detection method through integrating Haar-like features, AdaBoost algorithm, histogram of oriented gradients (HOG) descriptor, and support vector machine (SVM) classifiers, in which the head and shoulder information is utilized especially. Due to the fast training speed of Haar-like features and the high detection efficiency of HOG features, the proposed method can classify pedestrians precisely with higher speed. Experimental results validated the efficiency and effectiveness of the proposed algorithm.

1. Introduction

As an important factor of visual analysis, pedestrian detection through images/videos analysis has been a standing research topic in the field of computer vision. It is a necessary precondition for intelligent surveillance, automated vehicles, advanced human-machine interface [13], pedestrian safety crossing behavior [4], and so forth. Many algorithms have been proposed and investigated following the vast number of studies conducted both domestically and internationally [57]. Though many experimental results have demonstrated the feasibility of these algorithms, pedestrian detection still remains a challenging task nowadays. In this field, appearances of pedestrians vary widely due to factors such as pose, clothing and accessories, lighting conditions, and complex background, making it difficult to detect pedestrians efficiently and accurately in different circumstances.

Recently there have been numerous efforts in the field of pedestrian detection. Most work is based on the method of statistical classification, focusing on feature extraction and sample training algorithm. The representative methods of feature extraction are based on Haar-like features [8] and HOG features [5]. Viola et al. [8] proposed Haar-like features and built an efficient moving person detector which can obtain good results given large training samples at a comparatively high speed but low detection rate. Dalal and Triggs [5] proposed HOG features for pedestrian presentation, using video frames and achieving relatively high accuracy but low speed. There are two main sample training algorithms, that is, AdaBoost learning algorithm and SVM algorithm. AdaBoost detection method is a classification algorithm based on the extension of Haar-like characteristics, cascaded-classifier, and AdaBoost algorithm. This method uses high capacity of the weak classifier (subclassifier) and adopts boosting method to constitute a strong classifier (stage classifier), generating finally a cascade structure for effective human detection. SVM algorithm provides a novel method of classification based on the idea of structural risk minimization (SRM) [911]. It is one of the most sophisticated non-parametric supervised classifiers available today, with many different configurations depending on the functions used for generating the transform space for decision surface construction [10]. The aim of SVM is to achieve the low probability of generalization error.

This paper presents a pedestrian detection approach based on both Haar-like features and HOG descriptors, taking advantage of head and shoulder feature information. The detection process includes two steps. The first step is to use the Haar-like features implemented in OpenCV with default Real AdaBoost method for rough features filtering. The second step is to extract the HOG descriptors and train a SVM classifier for more optimal detection.

2. Key Points of Fast and Effective Pedestrian Detection

In this section, the whole detection scheme is presented, including the extraction of Haar-like features, the creation of HOG descriptors, classifiers organization, and pedestrian detection implementation.

2.1. Detection Scheme

Our pedestrian detection scheme consists of two main modules: the training process of positive and negative samples and the detection process, as shown in Figure 1.

546206.fig.001
Figure 1: Framework of Pedestrian Detection.

Training process: this process is to collect positive samples and negative samples with the images in different sizes and normalize these images to the size of for Haar-like classifier training and in size for HOG training. Firstly, we will use the original Haar-like features implemented in OpenCV to describe human characteristics. It is worthwhile to mention that these Haar-like features are multiscaled orientated, simplifying the calculation of pedestrian region. Then, we will train each feature to form weak classifiers and combine them into a cascade classifier with Real AdaBoost method. Furthermore, the gradient calculation module will be used to compute the gradient for each positive and negative sample to construct HOG features. In the end of the training process, a SVM classifier is trained for better detection, using the extracted HOG features.

Detection process: this process is to input the original image into the Haar-like feature classifier at appropriate stages, and after a rough classification for some pedestrians or nonpedestrians regions, the screened region will be inputted into the HOG filter using SVM for further detection, and then the final execution results of the algorithm will be reported.

2.2. Haar-Like Features and HOG Creation
2.2.1. Haar-Like Features

Haar-like features are digital media features used in object detection. They were named for their intuitive similarity with Haar wavelets and were used in the first real-time object detector. Considering the work with only image intensities (i.e., the RGB pixel values at each pixel of image), Papageorgiou et al. [12] discussed the work with an alternate feature set based on Haar wavelets instead of usual image intensities. Viola and Jones [13] adopted the idea of using Haar wavelets and developed the so-called Haar-like features.

Haar-like features consider adjacent rectangular regions at a specific location in a detection window, sum up the pixel intensities in each region, and calculate the difference between these sums. This difference is then used to categorize subsections of an image. A two-rectangle Haar-like feature can be defined as the difference between the sum of pixels within two rectangular regions, which can be at any position or scale of the original image. The basic regions have the same size and shape, horizontally or vertically adjacent to each other [14, 15] (see Figure 2).

fig2
Figure 2: Several types of basic Haar features.

Given that the base resolution of the detector is fixed, the exhaustive set of rectangle features is quite large. However, with integral image, Haar-like features can be calculated efficiently and conveniently. Integral images are defined as two-dimensional lookup tables in the form of a matrix with the same size of the original image. Each element of the integral image contains the sum of all pixels located on the up-left region of the original image (in relation to the position of the element) [16]. It allows counting any rectangular sum in the image, at any position or scale, using only four array references (see Figure 3).

546206.fig.003
Figure 3: Integral image principle (, where points , , , and belong to the integral image ).

In the detection phase, a window of multiscale is moved over the input image, and for each subsection of the image, Haar-like feature is calculated. This difference is then compared with a trained threshold that separates nonpedestrians from pedestrians. Because such a Haar-like feature classification is only a weak learner or classifier, a large number of Haar-like features are needed to describe an object with sufficient accuracy. Motivated by the work of Tieu and Viola, feature selection is achieved through a simple modification of the AdaBoost procedure; that is, the weak learner is constrained, so that each weak classifier returned can only depend on a single feature [17]. In the Viola-Jones object detection framework, the Haar-like features are organized into the so-called cascade classifiers to form a strong learner or classifier.

2.2.2. HOG Creation

Histogram of oriented gradients (HOG) [5] is used to extract features, and it is a descriptor using the information of gradient orientation. The basic idea of HOG descriptor is that the appearance and shape of the local object can be described quite well by the distribution of local intensity gradients or edge directions. The method normalizes local histograms of image gradient orientations as image features. In practice, the detection window is divided into small regions called “cells” (represented with Cell in Figure 4), and for all the cells, we accumulate local 1D histogram of gradient directions over the pixels of the cell (). Here the gradient is magnitude. Then the combined histograms form an object representation as feature vector. Each cell size is set to pixels. While for better performance in various conditions of illumination, shadowing, and so forth, local responses should be contrast normalized by accumulating a measure of local histogram energy over larger regions called “blocks” (represented with Block in Figure 4). Finally, the results will be used to normalize all the cells within the block and get the final feature vector. Each block size is set to pixels and adjacent blocks are overlapped with each other (see in Figure 4). Thus, the normalized descriptor blocks (represented with in Figure 4) are referred to as HOG descriptors.

546206.fig.004
Figure 4: Overview of HOG feature vector constructed by cells and blocks.
2.3. Real AdaBoost for Haar Features

Boosting algorithm is a popular machine learning method with short period of learning time and high detection rate. Despite the significant use of boosting in reducing the error of a “weak” learning algorithm, the true practical value of boosting can only be assessed by testing the method on “real” learning problems [18]. The first effective boosting algorithm was presented by Freund and Schapire, and the improved AdaBoost algorithm was proposed to make it more practical and easier to implement. The general principle of AdaBoost is to combine linearly a series of weak classifiers to produce a superior classifier. AdaBoost chooses the best weak classifier to minimize the upper bound of training error, decrease the weights of correctly classified samples, and increase the weights of wrongly classified training samples [19]. Due to this scheme, many AdaBoost based algorithms for face and pedestrian detection have been developed with impressive performances in terms of both speed and accuracy, following the famous face detector proposed by Viola and Jones [13].

In [20], Schapire and Singer improved the AdaBoost algorithm and extended the discrete binary decision rules to deal with the weak classifier, and the domain of the classifier output is the set of real numbers (). So the new algorithm was called Real AdaBoost. For a detection sample, the output of Real AdaBoost method shows the likelihood of a candidate region being a potential pedestrian, and through comparing with the Discrete AdaBoost, the output is identified indeed as a pedestrian or not. The process of Real AdaBoost algorithm can be described as follows.(i) Suppose as the training samples where for positive and negative sample, respectively, and for the total number of samples, initialize the probability distribution as: , .(ii) For ( is the number of training iterations,)(a)for each weak classifier in space of weak classifiers ,(1) separate the sample space into ,(2) calculate , where ,(3) set the output of weak classifier , where is an arbitrarily small and positive number,(4) normalize the weights: ,(b)Choose the weak classifier to minimize (c)update the probability distribution (iii) Get the strong classifier where is 1 when , or otherwise −1.

In this paper, Real AdaBoost algorithm is used to select Haar-like features and train classifiers, and it is an effective procedure for identifying a small number of good “features” candidates which have significant amount of information [8].

2.4. Support Vector Machine (SVM) Classifier on HOG Features

Based on previous work in [5], this paper used a linear support vector machine (SVM) classifier to find an optimal hyperplane. Firstly, SVM transforms the input data into a higher dimensional space by means of a linear discriminant function and then constructs an optimal separating hyper-plane between the two classes in the transformed space. Data vectors nearest to the separated line in the transformed space are called support vectors (SV). SVM is considered to be one of approximate implementation methods attaining low probability of generalization error.

Suppose that as the linearly separable sample set, where and for class label of , the linear discriminant function of -dimensional space and the corresponding hyper-plane equation can be defined as Then, is normalized to make the two classes of samples with . Therefore, to attain the maximum hyper-planes distance is equal to get the minimum . Finally, an optimal hyper-plane is found and samples are classified accurately unless they meet the following precondition:

Furthermore, the new classifiers give essentially perfect results on the MIT pedestrian test set [21, 22], and our study has created a more challenging set containing positive samples and negative samples with a large range of poses and backgrounds. Combined with the extracted HOG features, the coefficients of the trained linear SVM give a measure of how much weight each cell of each block can have in the final discrimination decision. In conclusion, SVM is a more general and flexible method on regression estimation problems.

3. Experimental Results

In this paper, the pedestrian detection algorithm is tested on 4,009 positive samples and 128,837 negative samples. The approach is on the basis of the default detector implemented in OpenCV, the extraction of the Haar-like features, and the implementation of AdaBoost object detection algorithm. With low cascade stages, it can attain high detection rate and high false positive rate also. For the next stage, the HOG-based classifier realized with OpenCV and SVM can bring high detection rate nevertheless low detecting speed.

Combining with the advantages of Haar-like features and HOG descriptors, a new pedestrian detection algorithm is proposed and implemented. The structure are shown in Figure 5. Detection results of the combined classifiers are shown in Tables 1 and 2.

tab1
Table 1: Detection rate performance of the method combining Haar features classifier cascade with a part of stages and HOG descriptors with different thresholds.
tab2
Table 2: False positive rate performance of the method combining Haar features classifier cascade a part of different stages and HOG descriptors with different thresholds.
546206.fig.005
Figure 5: Program structure of the integrated algorithm.

Through balancing the performances of accuracy against the efficiency and taking detection rate, false positive rate, and consuming time into consideration, the test is performed on Haar-like features classifier cascaded with stage 8 and the overlapped HOG descriptor with threshold −1.5 (see the comparisons in Figure 6). Curves in Figure 6(a) show the variations between detection rate and false positive rate, based on different stages of Haar-like classifier. High cascade stage brings low false positive rate and fairly fast detection speed, but at the same time, it leads to low detection rate. The curves that in Figure 6(b) reflect the detection performances of HOG descriptor with respect to various thresholds. Appropriate threshold can make good use of HOG features to detect pedestrian more accurately.

fig6
Figure 6: Graph (a) shows performance of the integrated algorithm with respect to stages of Haar-like classifier. Graph (b) shows performance of the integrated algorithm with respect to thresholds of HOG classifier.

Meanwhile, we make a comparison with the proposed combined method and the approach based separately on Haar-like features or HOG descriptor (Figure 7). It can be seen that both accuracy and speed of the proposed algorithm is significantly improved, resulting in low false positive rate and high detection rate. When stage 8 and threshold −1.5 are chosen, the detection rate and the false positive rate are 95.14% and 2.36%, respectively. Though the false positive rate is a little higher than other methods under comparison, it is still within the acceptable limit. Table 3 shows a comprehensive comparison among the traditional methods and the proposed integrated algorithm using a significant amount of experimental images data. In experimental environment, the algorithm is up to 38% faster under equal conditions. Representative pedestrian detection results as shown in Figure 8.

tab3
Table 3: Comparison of detection performances for the proposed algorithm and the traditional methods.
546206.fig.007
Figure 7: Performance of the proposed method comparing with the independent Haar or HOG based approach.
fig8
Figure 8: Representative pedestrian detection results.

4. Summaries and Conclusions

In this paper, the proposed pedestrian detection framework utilized the Haar-like features and histogram of oriented gradients for feature extraction and a combination of AdaBoost and linear support vector machine for classification and verification. Two main parameters are used in the proposed approach, that is, the stage of Haar-like feature cascade classifiers and the threshold of HOG based classifier using SVM for verification. Once appropriate parameters are determined and calibrated, the integrated method can produce good results balancing performances and applicability.

The proposed framework was implemented using OpenCV, and through investigations using real world images and videos, it was shown that the proposed approach can achieve better performances in terms of both speed and efficiency than the algorithms based only on separate use of the Haar-like feature or the HOG descriptor. In particular, the proposed integrated algorithm is more accurate than the Haar-like features based algorithm. Compared with algorithms using merely HOG descriptor, the proposed algorithm can achieve better detection rate and lower false positive rate.

The research findings from this work are significant for many real world applications. For example, the proposed pedestrian detection algorithm can be integrated into developing the advanced automatic vehicles that are under massive investigation by many advanced research or industry institutes. In this regard, future work can be conducted mainly on implementing this proposed algorithm in real time environment or, in other words, in a running car. In addition, an object tracking architecture and strategy will be incorporated and additional work will be done to avoid false positives and reduce the processing time further.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by Natural Science Foundation of China under grant (71101025), National 863 Program under Grant (2011AA110301), the National Key Technology R&D Program under Grant no. 2011BAK21B01, and Project of Beijing Municipal Commission of Education (no. KM201210009008).

References

  1. Y. Li and M. K. H. Leung, “Unsupervised learning of human perspective context using ME-DT for efficient human detection in surveillance,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), pp. 1–8, June 2008. View at Publisher · View at Google Scholar · View at Scopus
  2. C. Dai, Y. Zheng, and X. Li, “Pedestrian detection and tracking in infrared imagery using shape and appearance,” Computer Vision and Image Understanding, vol. 106, no. 2-3, pp. 288–299, 2007. View at Publisher · View at Google Scholar · View at Scopus
  3. N. A. Ogale, A Survey of Techniques for Human Detection from Video, University of Maryland, 2006.
  4. H. Guo, W. Wang, W. Guo, X. Jiang, and H. Bubb, “Reliability analysis of pedestrian safety crossing in urban traffic environment,” Safety Science, vol. 50, no. 4, pp. 968–973, 2012. View at Publisher · View at Google Scholar · View at Scopus
  5. N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), vol. 1, pp. 886–893, June 2005. View at Publisher · View at Google Scholar · View at Scopus
  6. M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio, “Pedestrian detection using wavelet templates,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '97), pp. 193–199, June 1997. View at Scopus
  7. S. Paisitkriangkrai, C. Shen, and J. Zhang, “Fast pedestrian detection using a cascade of boosted covariance features,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 8, pp. 1140–1151, 2008. View at Publisher · View at Google Scholar · View at Scopus
  8. P. Viola, M. J. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” International Journal of Computer Vision, vol. 63, no. 2, pp. 153–161, 2005. View at Publisher · View at Google Scholar · View at Scopus
  9. M. A. Hearst, S. T. Dumais, E. Osman, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intelligent Systems and Their Applications, vol. 13, no. 4, pp. 18–28, 1998.
  10. B. Karaçal, R. Ramanath, and W. E. Snyder, “A comparative analysis of structural risk minimization by support vector machines and nearest neighbor rule,” Pattern Recognition Letters, vol. 25, no. 1, pp. 63–71, 2004. View at Publisher · View at Google Scholar · View at Scopus
  11. V. Vapnik, The Nature of Statistical Learning Theory, Springer, 2000.
  12. C. P. Papageorgiou, M. Oren, and T. Poggio, “General framework for object detection,” in Proceedings of the 6th IEEE International Conference on Computer Vision, pp. 555–562, January 1998. View at Scopus
  13. P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '01), vol. 1, pp. I511–I518, December 2001. View at Scopus
  14. P. Viola and M. Jones, “Robust real-time object detection,” International Journal of Computer Vision, vol. 57, no. 2, pp. 137–154, 2004. View at Publisher · View at Google Scholar
  15. R. Lienhart and J. Maydt, “An extended set of Haar-like features for rapid object detection,” in Proceedings of the International Conference on Image Processing (ICIP '02), vol. 1, pp. 900–903, September 2002. View at Scopus
  16. F. C. Crow, “Summed-area tables for texture mapping,” ACM SIGGRAPH Computer Graphics, vol. 18, no. 3, pp. 207–212, 1984. View at Scopus
  17. K. Tieu and P. Viola, “Boosting image retrieval,” International Journal of Computer Vision, vol. 56, no. 1-2, pp. 17–36, 2004. View at Publisher · View at Google Scholar · View at Scopus
  18. Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997. View at Scopus
  19. R. E. Schapire, “A brief introduction to boosting,” in Proceedings of the 16th International Joint Conference on Artificial Intelligence, pp. 1401–1406, Morgan Kaufmann, Stockholm, Sweden, 1999.
  20. R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions,” Machine Learning, vol. 37, no. 3, pp. 297–336, 1999. View at Publisher · View at Google Scholar · View at Scopus
  21. A. Mohan, C. Papageorgiou, and T. Poggio, “Example-based object detection in images by components,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 4, pp. 349–361, 2001. View at Publisher · View at Google Scholar · View at Scopus
  22. C. Papageorgiou and T. Poggio, “Trainable system for object detection,” International Journal of Computer Vision, vol. 38, no. 1, pp. 15–33, 2000. View at Publisher · View at Google Scholar · View at Scopus
  23. G. Lim, E. Sybingco, I. V. Marfori, and A. Y. Chua, “Vision based pedestrian detection using Histogram of Oriented Gradients, Adaboost & Linear Support Vector Machines,” in Proceedings of the IEEE Region 10 Conference (TENCON '12), pp. 1–5, 2012.
  24. H.-J. Song, M.-L. Shen, and W.-F. Liu, “Target track system design based on SVM and AdaBoost,” in Proceedings of the 1st International Congress on Image and Signal Processing (CISP '08), vol. 4, pp. 652–656, May 2008. View at Publisher · View at Google Scholar · View at Scopus
  25. H. Schneiderman and T. Kanade, “Statistical method for 3D object detection applied to faces and cars,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '00), vol. 1, pp. 746–751, June 2000. View at Scopus
  26. H. He and P. Shi, “Pedestrian detection method based on improved AdaBoost algorihm,” Journal of Anqing Teachers College, vol. 3, no. 15, pp. 40–43, 2009.