Sensing and Intelligent Perception in Robotic ApplicationsView this Special Issue
Research Article | Open Access
Pedestrian Detection in Crowded Environments through Bayesian Prediction of Sequential Probability Matrices
In order to safely navigate populated environments, an autonomous vehicle must be able to detect human shapes using its sensory systems, so that it can properly avoid a collision. In this paper, we introduce a Bayesian approach to the Viola-Jones algorithm, as a method to automatically detect pedestrians in image sequences. We present a probabilistic interpretation of the basic execution of the original tool and develop a technique to produce approximate convolutions of probability matrices with multiple local maxima.
Being able to detect and avoid pedestrians is an essential feature of autonomous vehicles, if they are to guarantee a safe behavior in populated environments. However, automatically detecting human shapes in images is a very complex procedure for a computer vision system, and it has been widely studied before.
One of the most usual frameworks in literature is Viola-Jones , based on feature training and classifier cascades, which is explained in detail in Section 2.1. This technique has been improved by its authors by considering object motion [2, 3] and also by applying several classifiers simultaneously  or RealBoost to improve weak classifiers .
The main contributions of this paper are the introduction of a Bayesian approach to pedestrian detection methods—exemplified by, but not limited to, the Viola-Jones framework—, by creating a statistical interpretation of the basic execution of the original algorithm and developing a technique to produce approximate convolutions of probabilistic matrices with multiple local maxima. This aims to increase the precision of the framework for its usage on autonomous vehicles, in order to more efficiently detect and avoid obstacles and pedestrians in image sequences.
Furthermore, the method we present can be used with both preprocessed binary results and unaltered probabilistic elements. As the latter are commonly returned by the sensors of a robot, this allows for greater flexibility and a more accurate management of the uncertainty of the available data.
1.1. Related Work
Another important algorithm for detecting pedestrians consists of using Histograms of Oriented Gradients (HOG) to define the features on an image . This algorithm has been implemented for FPGA-based accelerators  and GPUs  and combined with Support Vector Machine (SVM) classifiers [9, 10]. Variations of histogram-based detection methods, such as Co-occurrence HOG  and combinations with wavelet methods  also exist. Bayesian methods have also been applied to the problem of pedestrian detection .
Both HOG and Viola-Jones algorithms are included in the official release of OpenCV . Although the former usually provides very precise detection results, as studied in , it has been proved to perform slightly slower than the latter and is therefore less suitable for a real-time operation like pedestrian detection for a moving vehicle.
2. Materials and Methods
2.1. Viola-Jones Framework
The Viola-Jones object detection framework uses object features which, similarly to Haar-like features , are defined by additions and subtractions of the sums of pixel values within rectangular, nonrotated areas of an image. The different types of features used by Viola-Jones are shown in Figure 1.
Thanks to the usage of integral images, such thatwhere is the integral of image , these operations can be done in constant time. For example, the sum of all the pixels of the rectangle in Figure 2 would be calculated assince each value is the sum of all the pixels in the rectangle defined by the opposite corners and .
A set of classifiers are then trained using AdaBoost , and a cascade architecture allows the result to be used in real-time, by immediately discarding a sample as soon as one classifier rejects it, as shown in Figure 3.
2.2. Bayesian Model
Let and be two random variables.(i) expresses the existence or absence of objects of interest (in our case, pedestrians) within an image, for each pixel location.(ii) shows an equivalent value, as returned by the Viola-Jones detection when applied to an image.It is possible to use as evidence to evaluate the degree of belief of proposition (i.e., ), by applying Bayes’ theorem:
The common use of a Bayesian model is to weed out wrong positive detections by comparing them to previous observations. However, when detecting pedestrians this decision could be damaging to the procedure, since false positives are preferable to false negatives, a missed detection involves immediate danger, whereas a false detection would only cause a less efficient route.
Therefore, we propose a reverse application of Bayes’ theorem, which filters absences of objects rather than detections, by considering the reverse values of the presented variables:where and are calculated as explained in the following subsections.
The default behavior of the Viola-Jones detection method, for a given image, is to return a set of rectangles within which objects of interest have been found.
A binary matrix can be produced from these areas, such that each cell is set to 1 if it belongs to one of them, and 0 otherwise. In our work, the binary matrix corresponding to the th rectangle is named .
Some of these marked areas may be superfluous (false positives), and others may overlap. The more rectangles that overlap over a group of pixels, the more likely it will be to contain an actual object of interest.
The original Viola-Jones algorithm allows for a minimum overlap restriction: a rectangle would only be valid if it can be computed as the intersection of a given number of overlapping detections.
Instead, we suggest to produce a detection matrix, such that the value of each one of its cells is equal to the number of rectangles that overlap over its corresponding pixel (Figure 4). This matrix is equal to the sum of the binary matrices of all the observed detections.
The likelihood matrix for the probability of absence of objects of interest within an image is proportional to the opposite value of the detection matrix; for detections, this would be
The concept of associating a weight value to each detection was also presented in the Soft Cascade method . Its results are returned as rectangular areas, but unlike Viola-Jones, these are isolated and as such cannot be processed into probabilistic matrices. Preliminary tests showed that, because of this restriction, the accuracy of this technique is noticeably inferior to that of the probabilistic interpretation of Viola-Jones that we present in this work. Therefore, we chose not to use Soft Cascade in our experiments.
The usage of Bayes’ theorem involves an evolution of the resulting posterior probability function, in order to produce the prior probability function for the following iteration of the algorithm (typically a convolution is applied).
Ideally, at each time step , the location of an object is determined by a certain probability distribution. The distribution of the appearance of objects of interest in our experiments is extracted from the normalized addition of overlapping binary rectangular distributions, which is asymmetrical and has a flat top. A new probability distribution was developed to approximate this behavior.
Let be a set of detections as returned by the Viola-Jones method for a particular object of interest. An object can be represented as a tuple, such that(i) is the number of elements in set ,(ii) is the minimal rectangle area that holds the intersection of all the elements in , and(iii) is the minimal rectangle area that holds the union of all the elements in .
Using these data, a two-dimensional function which simulates the summation of all the elements in was modeled:
If considering a single dimension, rectangles and can be seen as two segments and , respectively, where (Figure 5).
Consider the following function:
The shape of suits our needs, but its height is scaled down so that, for two dimensions, the summation of the detections of a single object can be calculated asfor , and where , , , and are, respectively, the leftmost, rightmost, upper, and lower limits of area , and , , , and are the corresponding limits of area .
A probability matrix can therefore be generated, using the tuples which define the detected objects of interest. For objects
In order to isolate each object of interest among the added distributions of all the detections in an image, we locate the maximum value in the probability matrix and analyze its adjacent cells to define a tuple, such that(i)area contains all the cells that share a maximum probability value , caused by the overlapping of all the involved detection rectangles, and(ii)area contains all the cells that are delimited by local minima and zero values, so that we can assume that all nonzero cells that are not contained in belong to unrelated detections.After an object is located, its data are stored and it is removed from the probability matrix. This procedure is repeated until the matrix is empty.
Once all objects are extracted, they are matched to those of previous time steps to study their relative movement. When the objects involved are clearly individual, their movements can be analyzed and predicted separately. In our case, their number and their correspondences between frames are unknown.
Using a minimum mean square error estimation, each object is then added to a previously stored trajectory, which is used to predict new values for the following time step, using a linear regression over the tuple values.
3. Results and Discussion
Our method was tested over twelve image sequences, described in Table 1 and exemplified by Figure 7. Dataset ETSII was recorded in the parking lot of the Computer Engineering School of Universidad de La Laguna. Datasets ITER1 and ITER2 were filmed in the outer limits and in the parking lot of the Institute of Technology and Renewable Energy (ITER) facilities in Tenerife (Spain), respectively.
These three image sequences were captured by the visual sensors of the VERDINO prototype (Figure 8), a modified EZ-GO TXT-2 golf cart equipped with computerized steering, braking, and traction control systems. Its sensor system consists of a differential GPS, an Inertial Measurement Unit (IMU), an odometer, three Sick LMS221-30206 laser range finders, two thermal stereo cameras, and two Santachi DSP220x optical cameras.
Datasets BAHNHOF, JELMOLI, and SUNNY DAY were downloaded from Andreas Ess’ Robust Multi-Person Tracking from Mobile Platforms website at the Swiss Federal Institute of Tecnology. These image sequences were recorded using a pair of AVT Marlins F033C and have been used in publications [19–22].
Datasets CAVIAR1 to CAVIAR4 belong to the Context Aware Vision using Image-based Active Recognition (CAVIAR) project  and were recorded in a shopping center in Portugal using a static camera. The selected image sequences correspond to the corridor views of clips WalkByShop1 (CAVIAR1), OneShopOneWait1 (CAVIAR2), OneShopOneWait2 (CAVIAR3), and ThreePastShop1 (CAVIAR4).
Dataset DAIMLER corresponds to the Daimler pedestrian detection benchmark dataset, introduced in , and dataset CALTECH corresponds to sequence V002 from testing set seq06 of the Caltech pedestrian detection benchmark [15, 25]. Both datasets were recorded from a vehicle driving through regular traffic in an urban environment.
Ten tests were conducted over each image dataset; the average results are shown in Figures 10 and 9. As explained in Section 2.2, the main goal of our detection enhancement method is to reduce the amount of false negatives returned by the Viola-Jones framework. As such, classic analysis techniques such as receiver operating characteristic (ROC) and detection error tradeoff (DET) curves, which depend on the amount of false positives of the results, do not properly display the improvement introduced by our approach. We instead present the average ratio between the amount of false positives returned by both the original and the enhanced detection methods, and the amount of true positives found in the input frames.
(f) SUNNY DAY
We observed that our Bayesian approach always provides less conservative detection rates than Viola-Jones, successfully lowering the rate of false positives for all datasets. Results were especially good for the ETSII, ITER, CAVIAR, and DAIMLER datasets. The sequences for these sets have good visibility, which results in more accurate detections by the original method and, consequently, a higher improvement introduced by our approach.
The rest of the datasets have higher occlusion rates and feature pedestrians in poses and locations that complicate their detection, thus lowering the enhancement of a Bayesian processing. This effect was especially noticeable for the CALTECH dataset, which features very few clearly visible pedestrians.
We have developed a Bayesian approach to the Viola-Jones detection method and applied it to a real case where pedestrians must be located and avoided by a self-guided device. Our method describes a statistical modification of the original tool, which is combined with a form of approximate convolution of two-dimensional probability matrices with multiple local maxima.
Our algorithm has been proved to improve the precision of the results, by restricting a probabilistic matrix returned by the original method to the area where objects are expected to appear, according to their previously observed movements.
It was found that our method behaves best when pedestrians are clearly visible, so that the detections by the original method can be properly enhanced by a Bayesian processing. More accurate detection algorithms are expected to improve the results of our approach in situations of high visual occlusion. This proposal serves as grounds for further research.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors gratefully acknowledge the contribution of the Spanish Ministry of Economy and Competitiveness (http://www.mineco.gob.es/) under Project STIRPE DPI2013-46897-C2-1-R. Javier Hernández-Aceituno’s research is supported by a FPU Grant (Formación de Profesorado Universitario) FPU2012-3568, from the Spanish Ministry of Science and Innovation (http://www.micinn.es/). The authors gratefully acknowledge the funding granted to the Universidad de La Laguna by the Agencia Canaria de Investigación, Innovación y Sociedad de la Información; 85% was cofunded by the European Social Fund.
- P. Viola and M. J. Jones, “Robust real-time face detection,” International Journal of Computer Vision, vol. 57, no. 2, pp. 137–154, 2004.
- P. Viola, M. J. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” International Journal of Computer Vision, vol. 63, no. 2, pp. 153–161, 2005.
- M. J. Jones and D. Snow, “Pedestrian detection using boosted features over many frames,” in Proceedings of the 19th International Conference on Pattern Recognition (ICPR '08), pp. 1–4, December 2008.
- T. Gao and D. Koller, “Active classification based on value of classifier,” in Advances in Neural Information Processing Systems, vol. 24, pp. 1062–1070, 2011.
- B. Rasolzadeh, L. Petersson, and N. Pettersson, “Response binning: improved weak classifiers for boosting,” in Proceedings of the IEEE Intelligent Vehicles Symposium, pp. 344–349, Tokyo, Japan, 2006.
- N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '05), vol. 1, pp. 886–893, June 2005.
- Y. Zhu, Y. Liu, D. Zhang, S. Li, P. Zhang, and T. Hadley, “Acceleration of pedestrian detection algorithm on novel C2RTL HW/SW co-design platform,” in Proceedings of the 1st International Conference on Green Circuits and Systems (ICGCS '10), pp. 615–620, June 2010.
- V. Prisacariu and I. Reid, “fastHOG—a real-time GPU implementation of HOG,” Tech. Rep. 2310/09, Department of Engineering Science, Oxford University, 2009.
- F. Suard, A. Rakotomamonjy, A. Bensrhair, and A. Broggi, “Pedestrian detection using infrared images and histograms of oriented gradients,” in Proceedings of the IEEE Intelligent Vehicles Symposium, pp. 206–212, Tokyo, Japan, 2006.
- M. Bertozzi, A. Broggi, M. D. Rose, M. Felisa, A. Rakotomamonjy, and F. Suard, “A pedestrian detector using histograms of oriented gradients and a support vector machine classifier,” in Proceedings of the 10th International IEEE Conference on Intelligent Transportation Systems (ITSC 2007), pp. 143–148, October 2007.
- T. Watanabe, S. Ito, and K. Yokoi, “Co-occurrence histograms of oriented gradients for pedestrian detection,” in Advances in Image and Video Technology, T. Wada, F. Huang, and S. Lin, Eds., vol. 5414 of Lecture Notes in Computer Science, pp. 37–47, Springer, Berlin, Germany, 2009.
- H. Schneiderman and T. Kanade, “A statistical method for 3D object detection applied to faces and cars,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '2000), pp. 746–751, June 2000.
- B. Wu and R. Nevatia, “Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors,” in Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV '05), vol. 1, pp. 90–97, October 2005.
- G. Bradski, “The OpenCV library,” Dr. Dobb's Journal of Software Tools, 2000.
- P. Dollár, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: a benchmark,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR '09), pp. 304–311, June 2009.
- M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio, “Pedestrian detection using wavelet templates,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 193–199, June 1997.
- Y. Freund and R. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” in Computational Learning Theory, P. Vitányi, Ed., vol. 904 of Lecture Notes in Computer Science, pp. 23–37, Springer, Berlin, Germany, 1995.
- L. Bourdev and J. Brandt, “Robust object detection via soft cascade,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), vol. 2, pp. 236–243, June 2005.
- A. Ess, B. Leibe, and L. van Gool, “Depth and appearance for mobile scene analysis,” in Proceedings of the IEEE 11th International Conference on Computer Vision (ICCV '07), pp. 1–8, October 2007.
- A. Ess, B. Leibe, K. Schindler, and L. van Gool, “A mobile vision system for robust multi-person tracking,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), pp. 1–8, IEEE, Anchorage, Alaska, USA, June 2008.
- A. Ess, B. Leibe, K. Schindler, and L. van Gool, “Moving obstacle detection in highly dynamic scenes,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA '09), pp. 56–63, May 2009.
- A. Ess, B. Leibe, K. Schindler, and L. van Gool, “Robust multiperson tracking from a mobile platform,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1831–1846, 2009.
- R. Fisher, J. Santos-Victor, and J. Crowley, “Context aware vision using image-based active recognition,” EC's Information Society Technology's Programme Project IST2001-3754, 2001.
- M. Enzweiler and D. M. Gavrila, “Monocular pedestrian detection: survey and experiments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 12, pp. 2179–2195, 2009.
- P. Dollár, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: an evaluation of the state of the art,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 743–761, 2012.
Copyright © 2016 Javier Hernández-Aceituno et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.