Abstract

This paper describes a design of fast recognition of road information based on mobile terminal. Firstly, based on the HOG algorithm, we study and verify the effects of different parameters on the performance of the algorithm. Secondly, we test 800 images randomly selected from the INRIA pedestrian dataset to obtain the optimal parameters for the mobile terminal and the proportion of video resolution and detection window. Then, under the same test conditions, the time overheads of the SVMLight and the LibSVM are recorded and SVMLight training time is significantly less than LibSVM. Thirdly, we design and implement a real-time road information recognition and warning system on the Windows platform and Android platform. Features include real-time pedestrians detection, voice warning, and road signs recognition. When the vehicle speed is less than 30 km/h, the video resolution is less than 720 × 576 and the detection window/image ratio is less than 1 : 50; the system can guarantee low delay and high recognition rate (97.2%).

1. Introduction

The World Health Organization reported that there were about 1240000 people who died in road traffic around the world each year [1]. According to the statistics of National Bureau of Statistics, vehicle ownership in China reached 137.4 million in 2013. In the last decade (2001–2010), nearly nine hundred thousand people in China died in various types of road accidents. The Chinese National Highway Traffic Safety Administration said that about ninety percent of all accidents happened because of the negligence of the driver. Road information collection and warning system in real-time becomes an urgent need. On the other hand, the numbers of smart phone users in China exceeded 500 million with all kinds of mobile terminals (mobile phones, tablets, and ultrabooks) growing popularly. If mobile terminals could combine with traditional vehicle auxiliary driving system, the additional value of mobile terminals can be increased and fill the gap to a certain extent in the vehicle without auxiliary system. The traffic accidents caused by driver inattention are shown in Figure 1.

The number of the mobile phone users in China has been more than one billion. And 47 percent of mobile phone owners have used the intelligent mobile phone. According to the statistical data published by IDC in 2012 (Table 1), the market share of the Android mobile phone has been 68.8%, especially the number of Chinese Android mobile phones has been nearly 350 million. Table 1 is the statistics of the global mobile phone shipments and the share of major operating system.

2. Previous Work

2.1. Pedestrian Detection

Pedestrian detection system (PDS) has been one of the most active research topics in the field of computer vision and intelligent vehicle. Using the camera mounted on the vehicle to detect pedestrians, it could give an early warning to drivers if a pedestrian found ahead was in some possible danger.

Broggi et al. [2] establish the model of the head and shoulders on the basis of the two value image templates with different size. Contrasting the edge of the input image with the template to identify pedestrian, the method has been used in a project named ARGO about smart car in Italy, Parma University. Collins et al. [3] present a method using optical flow, which calculates the residual flow in the moving area to realize the detection of pedestrians. Papageorgiou and Poggio [4] describe an object class in terms of an overcomplete dictionary of local, oriented, multiscale intensity differences between adjacent regions, efficiently computable as a Haar wavelet transform. Viola et al. [5] use AdaBoost to take advantage of both motion and appearance information to detect a walking person. Ronfard et al. [6] build on a general “body plan” methodology and a dynamic programming approach for efficiently assembling candidate parts into “pictorial structures.” Dalal and Triggs [7] propose a histogram of oriented gradient (HOG) algorithm and have proved that the algorithm is outstanding in pedestrian detection.

The HOG features are widely used for object detection. It represents the images’ local appearance and shapes by using local gradient or the edge of the direction of density distribution. The image is divided into small cells. Histogram of oriented gradient of each cell is computed and the result is normalized. The normalized block descriptor is called the HOG descriptor. Figure 2 is the main steps of the algorithm.

Walk et al. [8] propose an improved HOG algorithm based on histograms of flow (HOF) feature and color self-similarity (CSS) feature. And it uses HIK SVM classifier.

2.2. Dataset

Researchers in the field of pedestrian detection, in the course of the study, set up several open datasets for test. MIT pedestrian dataset, as shown in Figure 3, is relatively simple. The INRIA person dataset, as shown in Figure 4, is currently being used to test and evaluation, supporting complex background, and more posture of people.

3. Overview of the Method

We design a real-time road information recognition and warning system on the Windows platform and Android platform based on HOG algorithm. Features include real-time pedestrians detection, voice warning, and road signs recognition. Firstly, based on the HOG algorithm, we study and verify the effects of different parameters on the performance of the algorithm. Secondly, we test 800 images randomly selected from the INRIA pedestrian dataset to obtain the optimal parameters for the mobile terminal and the proportion of video resolution and detection window. Then, under the same test conditions, the time overheads of both the SVMLight and the LibSVM are recorded.

Figure 5 is the technology roadmap. The single camera, which is standard across all mobile devices, is placed in the vehicle to get road information. The road information includes pedestrian detection and road sign text extraction.

Figure 6 is the system solution. We consider that the vehicle driver’s attention should be firmly fixed on the road ahead. If the warning is just posted on screen, they might look at screen frequently, which would increase chance of drawing attention away from the task of driving. So we design the voice broadcast to reduce the risk.

Our experimental environment is in Tables 2 and 3. Table 2 is the hardware environment.

Table 3 is the development environment.

4. Kernel Algorithm for Pedestrian Detection

Figure 7 is the main results of pedestrian detection algorithm based on HOG.

4.1. Gamma Correction

Gamma correction can suppress noise to improve the robustness to partial shade. Gamma normalized as

We selected 100 images to test the image recognition rate for different values of the gamma. Figure 8 is a sample of gamma correction (normalized).

Table 4 shows the mistake rate and missing rate with different gamma value.

Experiment shows that when the sample size is 100 and gamma value is from 0.5 to 2, gamma normalization did not improve the performance of the algorithm.

4.2. Gradient Calculation and the Template Selection

The gradient of pixel () is defined as

Gradient magnitude is computed using the following:

Gradient direction is computed using the following:

is the pixel value of pixel dot . is the value of horizontal gradient of pixel dot . is the value of vertical gradient of pixel dot . is the gradient magnitude of pixel dot . is the gradient direction of pixel dot .

We use different gradient template processing the same image. And Figure 9 is the result of the edges of the image.

Processed with a simple template [], (b) does not fully display the outline of the man. (d) and (g), processed with [] and Sobel, respectively, are blurred severely. With the diagonal I template, the left of (e) is not clear. And with the diagonal II template, the right of (f) is not clear. With a simple template [], (c) is the best one because we can distinguish clearly between human contour and the background.

4.3. Bin

Using the INRIA person training set, we test the influence of the gradient space and the number of bins on recognition rate. Figure 10 shows that the number of bins in different directions influences pedestrian detection.

4.4. Normalization and Descriptor Block

Because of changing levels of local light changing levels of local light and changing contrast between the background and the foreground, the gradient changes in the very large scope. Local contrast normalization is used to deal with this question. The cells are taken into the larger blocks with contrast normalization to each block. The final descriptor is the histogram vector of cells in all blocks in the detection window. In order to improve the algorithm recognition rate, overlap is between blocks of 0.5.

Figure 11 shows that the influence of different block on pedestrian detection.

Cell size is the number of pixels of a cell. Block is composed of cell. The -axis is the identification rate of pedestrian detection. Experiments show that pedestrian recognition rate is lower as cell becomes larger.

The performance of 2 × 2 or 3 × 3 block size is better than that of 1 × 1 or 4 × 4 block size. When block size is 2 × 2 and cell size is 8 × 8; the pedestrian recognition rate is the best.

4.5. Detection Window

The experiment involves 100 groups of images (64 × 128 detection window) and we perform detection in different video resolution. The time overhead of calculation of HOG features is recorded in Table 5.

By using the multiple threads, frame cache, and GPU technology, the system delay is not more than 453.1 ms. Based on the above results, the conclusion is that as long as the video resolution is less than 720 × 576 and the detection window/image ratio is less than 1: 50, the system can guarantee low delay and high recognition rate.

4.6. SVM

Training 20 groups of HOG features under the same testing conditions, we get the time overhead statistical table about SVMLight and LibSVM as in Table 6.

Experiments show that, in the condition that the number of the positive samples is equal to the negative and equal resolution, SVMLight training time is significantly less than LibSVM.

Some positive samples are in Figure 12.

Some negative samples are in Figure 13.

Some examples of the hard examples are in Figure 14.

4.7. Programme

Figure 15 shows the details of training and recognition. (a) is the drawing of training process. (b) is the schematic diagram of recognition.

5. Performance Comparison of Training Tools

CvSVM is a training class of OpenCV. OLT (Object Detection and Localization Toolkit) is a HOG algorithm Toolkit, supporting Windows and Linux operating system.

We use the OLT to generate the final training file named Model_4BiSVMLight.alt. In the Ubuntu environment, training lasts for 1 hour and 13 minutes and lasts for 47 minutes in Windows. Using the data file trained by OLT to test the INIRA pedestrian dataset, we can take the correct rate increase to 97.2% (280/288).

6. Result

The actual test results show that the stability range of the pedestrian detection system is 2–12 meters; the valid range is 2–15 meters. The actually measuring results are in Figure 16.

Combined with the corresponding relationship between speed and braking distance (Figure 17), we can get the conclusion that when speed is less than 30 km/h, the system can support the driver effectively.

7. Conclusion

When block size is 2 × 2 and cell size is 8 × 8, with a simple template ], the pedestrian recognition rate is the best. When the vehicle speed is less than 30 km/h, the video resolution is less than 720 × 576 and the detection window/image ratio is less than 1 : 50; the system can guarantee low delay (483 ms) and high recognition rate (97.2%).

The recognition rate is low yet as soon as the object is partly occluded. And the supported speed is lower than average. So we still need further research to solve above questions.

Conflict of Interests

The authors declare that there is no conflict of interests regarding to the publication of this paper.