Abstract

A fast pedestrian recognition algorithm based on multisensor fusion is presented in this paper. Firstly, potential pedestrian locations are estimated by laser radar scanning in the world coordinates, and then their corresponding candidate regions in the image are located by camera calibration and the perspective mapping model. For avoiding time consuming in the training and recognition process caused by large numbers of feature vector dimensions, region of interest-based integral histograms of oriented gradients (ROI-IHOG) feature extraction method is proposed later. A support vector machine (SVM) classifier is trained by a novel pedestrian sample dataset which adapt to the urban road environment for online recognition. Finally, we test the validity of the proposed approach with several video sequences from realistic urban road scenarios. Reliable and timewise performances are shown based on our multisensor fusing method.

1. Introduction

Pedestrians are vulnerable participants among all objects involved in the transportation system when crashes happen, especially those in motion under urban road scenarios [1]. In 2009, it was found that in the first global road safety assessment of World Health Organization report, traffic accident is one of the major causes of death and injuries around the world. 41% to 75% of road traffic fatal accidents are involving pedestrians, and the lethal possibility of pedestrians is 4 times compared with that of vehicle occupants. Therefore, pedestrian safety protection should be taken seriously [2].

Active pedestrian safety protection could be able to avoid collision between vehicles and pedestrians essentially, thus it has become the most promising technique to enhance the safety and mobility of pedestrians. For active safety systems, various types of sensors are utilized to detect pedestrians and generate appropriate warnings for drivers or perform autonomous braking in the case of an imminent collision [3]. To detect pedestrian in real time is one of the most crucial part. The current research is mainly focused on the application of visual sensors [46], infrared (IR) imaging sensors [7, 8], and radar sensors [9, 10] to aware of pedestrians and obtain their safety state information for realizing active pedestrian protection. Each type of sensors has its advantages and limitations. In order to enhance the advantages and overcome the limitations, multimodality information fusion has become the development trend of pedestrian detection and security warning [3]. Depending on the complementarity information from different sensors, more reliable and robust pedestrian detection results could be obtained by processing multisource and heterogeneous data. In past two decades, reducing accidents involving vulnerable road users with fusion of different kinds of sensors has already been focused on by some research projects, such as APVRU [11], PAROTO [12], PROTECTOR [13, 14], and SAVE-U [15, 16] in European countries. Some considerable and available researches have been conducted by various groups. Scheunert et al. [17] detected range discontinuities utilized by laser scanner and high brightness region in the image by far infrared sensor (FIR). Data fusion based on Kalman filter handled the combination of the outputs from laser scanner and FIR. Szarvas et al. [18] created a range mapping method to identify pedestrian location and scale by laser radar and camera fusion. Neural network was utilized for image-based feature extraction and object classification. Töns et al. [19] combined the radar sensor, IR, and vision sensor for robust pedestrian detection, classification, and tracking. Bertozzi et al. [20] fused stereo vision and IR to obtain disparity data together for pedestrian detection. Vision was used to preliminary detect the presence of pedestrians in a specific region of interest. Results were merged with a set of regions of interest provided by a motion stereo technique. Combining with a laser scanner and a camera, Broggi et al. presented [21] an application for detecting pedestrian appearing just behind occluding obstacles.

Despite some achievements have already been made, complementary advantages of multisensor data fusion are not fully realized. Multimodality data fusion-based pedestrian detection algorithm should be further improved for higher detection accuracy rate and timewise performance boosting, especially in some complicated urban road environment. This paper aims to propose a real time pedestrian recognition algorithm based on laser scanner and vision information fusion. In the first stage, combining with the information of radar scanning, the pedestrian candidate regions in the image are located by space-image perspective mapping model, which could effectively reduce the computational time cost for pedestrian recognition. In the second stage, ROI-IHOG feature extraction method is proposed for further improving the computation efficiency, which could ensure the real time and reliability of online pedestrian recognition effectively.

The reminder of this paper is organized as follows: a brief overview of proposed pedestrian recognition system is presented firstly, followed by the pedestrian candidate region estimation based on laser scanner and vision information fusion, and then we focus on vision-based pedestrian recognition. Finally, we test the validity of the proposed approach with several urban road scenarios and conclude the works.

2. System Overview

2.1. System Architecture

The research of pedestrian recognition is carried out on the multisensor vehicle platform, as shown in Figure 1. This experimental platform is a modified Jetta. It is equipped with a vision sensor, a laser scanner, and two near-infrared illuminators to detect pedestrians in the range of 90° in front of the vehicle.

The architecture of the proposed pedestrian detection system based on multisensor is shown in Figure 2. The system is running on an Intel Core I5 CPU, 2.27 GHZ, RAM 2.0 GB PC. The system includes offline training and online recognition. For offline training, a novel pedestrian dataset adapt the urban road environment is established first, and then the pedestrian classifier is trained by SVM. For online recognition, a Sony SSC-ET185P camera installed on the top front of the experimental vehicle is used to capture continuous image. Potential pedestrian candidate regions are identified in the image through the radar data from a SICK LMS211-S14 laser scanner and the perspective mapping model between world coordinates and image coordinates. For each image, all candidate regions are scaled to and judged by the classifier trained offline.

2.2. Sensor Selection

The Sony SSC-ET185P camera has been chosen for several reasons. The camera has a high color reproduction and sharp images. It includes a 18x optical zoom and 12x digital high-quality zoom lens with autofocus, so the camera can capture high quality color images during the day. Although the system is now being tested under daylight conditions, two near-infrared illuminators are mounted on both sides of the laser radar in front of the vehicle, which allow the object detection due to a specific illumination for the extension of its application at night.

The laser scanner is a SICK LMS211-S14. The detection capabilities (scanning angle of 90°, minimum angular resolution of 0.5° up to 81.91 m range) are suitable for our goal. The laser scanner only scans a flat data, the ranging principle is a time-of-flight method, and it measures the round trip time of flight to determine the distance by emitting light pulses to the target. It takes 13 ms of once scanning which could be able to meet the needs of real time.

2.3. Vehicle Setup

The laser scanner and two near-infrared illuminators are located in the front bumper in horizontal, as shown in Figure 3(a). The camera is placed at the top front of the vehicle, with the same centerline of the laser scanner, as show in Figure 3(b). The horizontal distance between the camera and the laser scanner is 2.3 m, and the camera height is 1.6 m, which are two key parameters of the camera calibration.

The MINE V-cap 2860 USB is used to connect between the camera and the PC. An RS-422 Industrial serial and MOXA NPort high-speed card provide an easy connection between the laser and PC. Figure 4 shows the hardware integration of the proposed system.

3. Potential Pedestrian Location Estimation

Most current pedestrian detection methods are simply depending on visual sensors that cannot meet the real time application. In our work, we attempt to utilize laser radar sensor to detect obstacle locations for potential pedestrian position estimation in world coordinates, and then we make use of the camera calibration and the space-image perspective mapping model to mark the pedestrian candidate region in the image. Pedestrian recognition algorithm proposed later is performed only for the candidate regions instead of the entire image, which could effectively reduce the computational time cost for a good real time application.

In our experimental platform, a SICK LMS211-S14 laser scanner is utilized. The scanning angle is 90° in front of the host vehicle and the minimum angular resolution of 0.5° (in Figure 5). Thus, we can get 181 data arrays from radar sensor scanning once time. Each data array includes two parameters: the angle and the distance between the obstacle and the host vehicle. A data array could be denoted as , where is the total number of the array, and is the data of the th array.

Obviously, a set of laser beams from the same target should have the similar distances and the similar angles. Based on this, a clustering method is applied for 181 data to determine which belong to the same target, which is denoted as where is the minimum angular resolution of the radar; is the distance of the th array; are the distance threshold. According to the installation location of the radar, the part of pedestrian’s knees would be scan. Taking into account the actual physical characteristics of the pedestrian (legs separated or closed) in spatial, are set as 10 cm and 70 cm, respectively. Then, the potential pedestrian location parameters (the start data, the end data, and the data amount) of each target are recorded. The target distance could be expressed by the average distance of all beams from the target: . Its direction could be represented as , where is first angle value of the target, and is the last one. Finally, we convert the radar data from polar coordinate to Cartesian coordinate as where is the data in the polar coordinate; is the data in the Cartesian coordinate, which represent the target location in space. The possible pedestrian locations are 2D data in world coordinate. Their corresponding regions in the image are then located by a piecewise camera calibration and the perspective space-image mapping model. This map is projected into the image in order to identify the regions and scale to search for pedestrians in the image. The camera height is 1.6 m, which is a parameter of the camera calibration. We can obtain the space-image mapping model as follows: where , , and are location parameters in world coordinate; are corresponding parameters in image coordinate. We divided the detection area into four sections which gradually to determine, respectively, the mapping model parameters more accurately by the least square method.

In order to detect pedestrians more accurately and faster, we should determine the detection size of the candidate pedestrian imaging region at different distances in front of the vehicle. We assumed that the pedestrian template is 2 m height and 1 m width (a little larger than real pedestrian). The relationship between the pedestrian’s width and height of the imaging region and the pedestrian location in space could be found by the calibration experiment. The potential pedestrian region’s width and height in the image could be denoted as , where is vertical distance from the target to the host vehicle.

4. Pedestrian Recognition

4.1. Feature Representation

In 2005, Dalal and Triggs [22] proposed the grids of histogram of oriented gradient (HOG) descriptors for pedestrian detection. Experiment results showed that HOG feature sets significantly outperformed existing feature sets for human detection. However, HOG-based algorithm is too time consuming, especially for multi-scale object detection. The approach should be further optimized because it is not suitable for real time pedestrian safety protection.

In this paper, for fast pedestrian detection, the region of interest (ROI) of a pedestrian sample is found by calculating the average gradient of all positive samples in RSPerson dataset mentioned below. We can find that the gradient features at the head and limbs of pedestrian samples are most obvious. On the other hand, the gradients of the background area in the sample image offer less effect for pedestrian detection which may also disturb the processing performance. Therefore, in order to reduce HOG feature vector dimension of a whole image (3780 dimensions), several important areas are considered as ROI of a selected sample image to calculate the HOG feature. Accordingly, the computation amount of HOG feature is greatly reduced, and pedestrian recognition speed is improved. Through the analysis of average gradient value of pedestrian samples which is shown in Figure 6, four regions of interest are identified as ROI: the head region, the leg region, the left arm region, and the right arm region. These regions could be part of the overlaps each other and cover the body’s contours basically.

For a color image, gradients of each color channel are calculated. The gradients which have the largest amplitude among three color channel are selected as the the gradient vector of each pixel. Optimal ROI location, width, and height of a sample image is shown in Table 1.

Similar with Dalal’s method, for calculating the feature vector of ROI in a detection window, the cell’s size is defined as pixel, and the block’s size is defined as a cell. The window’s scan step is 8 pixels, the width of a cell. A total of 49 blocks could be extracted in a detection window. For each pixel in the image, the gradient vector is denoted as . In general, one-dimensional centrosymmetric template operator is used for calculating the gradient vector: Accordingly, the gradient magnitude could be calculated as The gradient orientation is unsigned, it is defined as To compute the gradient histogram of a cell, each pixel casts a vote weighted by its gradient magnitude, for the bin corresponding to its gradient orientation. All of gradient orientations are group into 9 bins. Thus, every block has a gradient histogram with 36 dimensions, and ROI-HOG feature vector has dimensions. Furthermore, integral histograms of oriented gradients (IHOG) [23] are utilized for farther speed up the process of feature extraction. The histograms of oriented gradients of the pixel could be expressed as follows: The integral feature vectors in -orientation is as follows: The integral feature vectors in -orientation is as follows: As shown in Figure 7, IHOG of a cell could be calculated as Accordingly, IHOG of a block could be calculated as IHOG method only need scan the entire image for once and storage the integral gradient data. Any area’s HOG feature could be obtained with simple addition and subtraction operations without repeated calculation of the gradient orientation and magnitude of each pixel.

4.2. Sample Selection for Training

For pedestrian recognition in urban road environment, we build a pedestrian sample dataset called RSPerson (Person Dataset of Road System) dataset. In the sample dataset, the positive samples are including walking pedestrians, standing still pedestrians, and group pedestrians with different size, pose, gait, and clothing. Some preexperimental studies have shown that the selection of negative samples is particularly important for reduction of false alarms. Thus, boles, trash cans, telegraph poles, and bushed which are likely to be mistaken for pedestrians, as well as some normal objects such as roads, vehicles, and other infrastructures are selected to form negative samples. This is most beneficial for our pedestrian detection system. In RSPerson dataset, each sample image is normalized to pixels for training. Figure 8 shows some samples of RSPerson dataset.

4.3. Pedestrian Recognition with SVM

Before online recognizing pedestrian, we should construct a classifier offline trained by SVM algorithms. Firstly, training dataset and test dataset are built from RSPerson dataset. The training dataset includes 2000 pedestrian and 2000 nonpedestrian samples, and the testing dataset includes 500 pedestrian and 500 nonpedestrian samples. The training dataset samples are handled, and features are extracted to form training vectors. With cross-validation based on grid search method, the proper parameters of SVM are selected. RBF kernel is chosen as kernel function, and the penalty factor as well as the kernel parameter . After that, the pedestrian classifier could be constructed. Finally, testing dataset samples are chosen to test the performance of the classifier. We use the DET curve which contains two indicators: miss rate and FPPW (false positive per window) to evaluate performance of SVM classifiers. The performance of pedestrian recognition based on ROI-IHOG is shown in Figure 9.

For online recognition, once the potential pedestrian locations are located by laser radar, candidate regions in the image are confirmed accordingly by the perspective mapping model. For each candidate region, scale transforming is carried out for normalization of pixels, and then, ROI-IHOG feature vector could be extracted. Based on these steps, we can judge whether the candidate is a true pedestrian or not by the classifier trained with SVM.

5. Experimental Results

For testing the validity of the proposed method, several video sequences from realistic urban traffic scenarios are tested for performance assessment of our pedestrian recognition experimental platform. Firstly, the pedestrian candidate locations are estimated based on laser radar data processing and space-image perspective mapping model. Some candidate region segmentation results are shown in Figure 10. In this way, potential pedestrian regions are located in the image, but some other obstacles (poles, shrub, etc.) are also located as positives.

Secondly, the proposed ROI-IHOG+SVM algorithm is tested with several video sequences. In this step, pedestrian recognition only depends on ROI-IHOG+SVM for an entire image without fusing the laser information. The recall could reach 93.8% under FPPW. The image size is pixels. The average detection time is about 600 ms/frame. Some detection results are shown in Figure 11.

Finally, fusing information from laser and vision sensor, each candidate region detected is scaled to the size of pixels and extracted the ROI-IHOG feature. According to our recognition method, the candidate region is considered to be a pedestrian or not by the classifier trained with SVM. Based on multisensor fusion, the average detection time is about 18 ms for a candidate. Thus, if there are 5 candidate regions in each image of the video sequence averagely, the processing speed is about 11 frame/s which could be satisfied the real time requirement. Several recognition results (Figure 12) indicate that the proposed pedestrian detection approach based on multisensor fusion has good performance, which could provide an effective support for active pedestrian safety protection.

6. Conclusions

A fast pedestrian recognition algorithm based on multisensor fusion is developed in this paper. Potential pedestrian candidate regions are located by laser scanning and the perspective mapping model, and then ROI-IHOG feature extraction method is proposed for reducing computational time cost. Moreover, SVM is utilized with a novel pedestrian sample dataset which adapt to the urban road environment for online recognition. Pedestrian recognition is tested with radar, vision, and two-sensor fused, respectively. Reliable and timewise performances are shown on fusion-based pedestrian recognition. The processing speed could reach 11 frame/s which could be satisfied the real time requirement. In future work, we will further study the key technologies for pedestrian safety, such as pedestrian tracking, pedestrian behavior recognition, and conflict analysis between pedestrians and the host vehicle.

Acknowledgments

This work is partly supported by National Science Foundation of China (nos. 51108208, 51278220), Postdoctoral Science Foundation Funded Project of China (no. 20110491307), and Fundamental Research Funds for the Central Universities of China (no. 201103146).