Abstract

The use of night vision systems in vehicles is becoming increasingly common. Several approaches using infrared sensors have been proposed in the literature to detect vehicles in far infrared (FIR) images. However, these systems still have low vehicle detection rates and performance could be improved. This paper presents a novel method to detect vehicles using a far infrared automotive sensor. Firstly, vehicle candidates are generated using a constant threshold from the infrared frame. Contours are then generated by using a local adaptive threshold based on maximum distance, which decreases the number of processing regions for classification and reduces the false positive rate. Finally, vehicle candidates are verified using a deep belief network (DBN) based classifier. The detection rate is 93.9% which is achieved on a database of 5000 images and video streams. This result is approximately a 2.5% improvement on previously reported methods and the false detection rate is also the lowest among them.

1. Introduction

On average, at least one person globally dies in a vehicle crash every minute. Auto accidents also injure at least ten million people each year, with two or three million of these people seriously injured [1]. To address this problem, Advanced Driver Assistant Systems (ADAS) are used more and more often to provide assistance and supplementary information for drivers. Current ADAS developments include many functions such as lane departure warning, forward collision warning, parking assistance systems, and night vision enhancement [2].

The vision sensor is one of the most popular sensors in ADAS and many algorithms are designed to use it including vehicle detection, lane detection, pedestrian detection, and traffic sign recognition [36]. Among these methods, vehicle detection is a popular research area. Monocular vision is often used in this task. The Sivaraman model considers vehicle detection as a two-class classification problem and trains an Adaboost classifier with a Haar feature [7]. Stereo vision is another common method. Hermes et al. use stereo vision and propose a vehicle detection method based on density map clustering [8]. Additionally, motion based algorithms have also been used for overtaking vehicle detection [9].

The current reality is that a large number of road accidents occur during times of low visibility such as at night time. Statistical data also demonstrates that more than half of accidents causing fatalities occur at night. However, most existing vehicle detection systems and algorithms are more focused on daylight vehicle detection with visible spectrum cameras. Although some researchers have put effort into night-time vehicle detection with vehicle lamp detection, the detection results are often affected by many factors such as low illumination, light reflection on rainy days, and the camera exposure time [10]. To compensate for these limitations of visible spectrum cameras, far infrared (FIR) sensors are becoming more and more frequently employed for night-time vehicle detection tasks. They do not require any illumination source and rely purely on heat signatures from the environment to produce a gray scale image. Therefore, FIR sensors can detect the infrared heat signatures generated by vehicle parts such as the engine, wheels, and exhaust pipe. A group of typical images captured at night by a visible spectrum camera and FIR camera is shown in Figure 1.

Compared with the visible spectra camera, the image captured by the FIR spectrum camera lacks color and detailed information. Up until now, not much work has been done specifically on vehicle detection in FIR images. In 2000, Andreone (University of Parma) designed an FIR camera based vehicle detection prototype car and detected vehicles by mainly relying on size, shape, and the content distribution of the image high intensity area [11]. In 2010, Besbes et al. introduced a machine learning framework to this task and designed a vehicle detection method with SURF-based features and an SVM classifier [12]. There are also many researchers that focus on the similar problem of detecting pedestrians in an FIR image. Most of the proposed methods use a two-class classification framework and use different features such as HOG and different classifiers such as SVM and Adaboost [13, 14].

A review of the approaches proposed in the literature finds that vehicle detection in an FIR image is most commonly performed using a two-class classification framework. However, a newly proposed approach in pattern recognition named deep learning has also been used in rare studies. Classifiers such as SVM and Adaboost are all shallow learning models because they can be modeled as a structure with one input layer, one hidden layer, and one output layer. Deep learning refers to a class of machine learning techniques which exploit hierarchical architectures for representation learning and pattern classification. In contrast to shallow models, deep learning has the ability to learn multiple levels of representation and abstraction that enables better understanding of the image data. Deep belief networks (DBN) model is a typical deep learning structure which is first proposed by Hinton and has demonstrated its success in simple image classification tasks of MNIST [15].

In this work, a deep learning based vehicle detection algorithm for FIR images is proposed. In Section 2, the vehicle candidate generation method in FIR image will be described. Candidate shape segmentation will then be performed with a contour generation method in Section 3. In Section 4, vehicle verification will be implemented using shape feature vectors and a deep belief network. The experiments and conclusion will be presented in Sections 5 and 6 separately.

2. Vehicle Candidate Generation

For vehicle detection in an image, two steps are usually performed which are vehicle candidate generation (VCG) and vehicle candidate verification (VCV) [16]. In VCG, all image areas which have any probability of being vehicles will be selected. In this step, prior knowledge of vehicles may often be used, such as horizontal/vertical edges, symmetry, color, shadow, and texture. In VCV, the image areas selected in VCG will be further verified to eliminate those which are not vehicles. In this step, a two-class classification framework is often used and a classifier that can distinguish vehicles from nonvehicles will be trained from a set of training images. In our work, we will also follow this two-step framework.

The FIR images reflect information on the temperature of objects. Therefore, FIR image cannot show detailed information available in the visible domain such as texture, color, and shadows. Obviously, the existing vehicle detection algorithms used with visible spectrum cameras are not suitable for far infrared cameras because of the inherent differences between images generated by cameras in the IR and visible spectra. Due to this, vehicle candidate generation mainly focuses on hot spots in the image.

In the VCG step of our method, a low threshold is firstly applied to the pixel values. The pixels with values that are lower than the threshold are considered to be low temperature areas and are removed, while hot spots in the images are preserved. Since the relationship between the temperature of the object and the brightness in FIR image is constant for a specific FIR camera, the threshold can be chosen manually. In our application, the threshold is set at 150 which corresponds to a temperature of 30°C. Figure 2 shows the original FIR image and the processed image separately.

In the processed image, there obviously exist many hot spots that are not belonging to vehicles such as the hot road surface, hot road lamps, or other sources. To eliminate these sources of interference, a connected region searching algorithm is applied on the processed image and all regions that do not satisfy the rules below are considered to be nonvehicle hot spots:(1)Rule 1: connected regions with a length/width ratio below 0.3 or above 1.5 are considered to be nonvehicle hot spots.(2)Rule 2: connected regions with less than 120 pixels are considered to be nonvehicle hot spots.

This further processing eliminates many nonvehicle hot spots. The remaining hot spots will all be considered to be vehicle candidates. As seen in Figure 3, two vehicle candidates are identified in this particular FIR image.

3. Vehicle Candidate Contour Segmentation

In a traditional visible spectrum image-based vehicle detection framework, a two-class classifier will be trained and used to classify all vehicle candidates that are identified. However, traditional image feature descriptors such as Haar and HOG are more suitable for local pixel information representation and in an FIR image; the detailed information is relatively small. Therefore, features such as the vehicle candidate shape or contours will be used, which are based on global information rather than pixel values. In this section, contour segmentation is firstly required to segment the full vehicle candidate objects.

One way to find continuous contours in a gray scale image is using edge detection to find continuous contours in the edge image. However, since edges are generally discrete and independent, the contours that are generated may appear broken and branching. Another method transforms the scale image to a binary image first and then uses the chain code method to generate contours in the binary image. For this method, it is critical that binary image is of good quality. Traditional binary segmentation methods are usually based on a global threshold obtained from a pixel histogram. This type of method is suitable when the inner foreground image has small variations, such as pedestrians in an FIR image. However, for our application, the lower part of the vehicle such as the wheels is usually brighter than the upper part of the vehicle such as the windows. Therefore, a global threshold based method could easily eliminate the upper part of vehicles in an FIR image.

Based on the analysis above, a maximum distance based local adaptive threshold determination method is proposed to generate the binary segmentation. This method firstly sets a global threshold based on a histogram to get a binary image. Then, a maximum distance based local threshold will be further decided for each subregion around the edge of the binary image that has been generated. The main steps of the local threshold determination method are given below.

(1) Set the processing area of each vehicle candidate. Specifically, the original vehicle candidate areas are expanded twice vertically and 1.5 times horizontally.

(2) Apply a median filter on the vehicle candidate area to eliminate obvious noise.

(3) Set as the vehicle candidate area image. Set a global threshold using the OTSU method. Fill in the blank area of the binary image and get the edge image of which is set as .

(4) Choose the edge points with a five-point gap in . Then, set the selected edge points as the center and find the selected edge point neighborhood region to obtain a new local threshold for this small region, which is decided using the maximum distance based method. This method is based on the concept that a threshold should divide a gray scale histogram into two parts. The best threshold produces the largest gap between the mean value of the two parts and the mean value of the whole image. The distance measurement function iswhere is the total number of gray scale levels, is the proportion of gray scale level , and is the threshold, .

Then, transform the small region to a binary image with this new threshold and set the new binary image to be .

(5) Repeat step (4) until the complete areas around each of the selected edge points are processed and a full contour of the vehicle candidates is obtained.

The processed result of the contour generation method is shown in Figure 4.

4. Vehicle Candidate Verification

In this section, a deep belief network (DBN) based vehicle candidate verification algorithm will be proposed.

The machine learning based method is very popular and effective for vehicle candidate verification tasks in the visible spectrum. Within the many existing machine learning methods, SVM (support vector machines) and Adaboost are the two most common classifiers [1720]. However, most classifiers including these two classifiers are based on a shallow learning model and can be modeled as a structure consisting of one input layer, one output layer, and a single hidden layer. Recently, a new machine learning structure called deep learning has been proposed, which has a hierarchical architecture that can be exploited for representation learning and pattern classification. The deep model is superior to the existing shallow models as it can learn multiple levels of representation and abstraction of image data.

There are many types of deep architectures such as deep belief networks (DBN) and deep convolution neural networks (DCNN) which use DBN as a typical deep learning structure, as first proposed by Hinton et al. [21] and used in many tasks such as MNIST classification, 3D object recognition, and voice recognition. In our work, DBN is applied and a classifier is trained for vehicle candidate verification tasks.

4.1. Deep Belief Network (DBN) for Vehicle Candidate Verification

In this subsection, the overall architecture of the DBN classifier for vehicle candidate verification will be firstly introduced.

Let be the set of training samples which contain vehicle contour images and nonvehicle contour images which are generated manually by our group. consists of samples which are shown below:

In , is a training sample and all samples are resized to . represents the labels corresponding to , which can be written as

In , is the label vector of . If belongs to a vehicle, . Otherwise, .

The purpose of the vehicle candidate verification task is to learn the mapping function from the training data to the label data based on a given training set. With this trained mapping function, unknown contour images can be classified as either vehicle or nonvehicle.

In this task, a DBN architecture is applied to address this problem. Figure 5 shows the overall architecture of the DBN. It is a fully interconnected deep belief network including one visible input layer , hidden layers , and one visible label layer La at the top. The visible input layer maintains a neural number which is equal to the dimensions of the training feature, that is, the pixel values of the training samples. On the top, the La layer just has two states which can be either or .

The learning process of the DBN has two main steps. In the first step, the parameters of the two adjacent layers will be refined with the greedy-wise reconstruction method. This step will be repeated until the parameters of all the hidden layers are fixed. This first step is also called the pretraining process. In the second step, the whole pretrained DBN will be fine-tuned with the La layer information based on back propagation. This second step can be considered to be the supervised training step.

4.2. Pretraining Method

In this subsection, the training method of the whole DBN for vehicle candidate verification will be presented.

Assume that the size of the upper layer is whose size is decided in bilinear projection as mentioned in Zhong et al.’s work [22]. In this step, the parameters of the two adjacent layers will be refined using the greedy-wise reconstruction method proposed by Hinton et al. [21]. The visible input layer and the first hidden layer are taken here as an example; other adjacent layers use the same pretraining method.

The visible input layer and the first hidden layer contract a Restrict Boltzmann Machine (RBM). is the neural number in and is that of . The energy of state () in this RBM is where are the parameters between the visible input layer and the first hidden layer , are the symmetric weights from input neural in to the hidden neural in , and and are the th and th bias of and . So, the RBM has the following joint distribution:

Here, is the normalization parameter and the probability that is assigned to of this model is

Then, the conditional distribution over the visible input state in layer and the hidden state in can be given by the logistic function:where .

Finally, the weights and biases can be updated step by step using random Gaussian distribution values , , and with the Contrastive Divergence algorithm [23]. The updating formulas are

Here, represents the expectation with respect to the data distribution and represents the reconstruction distribution after one step. The step size is set to 1.

As mentioned previously, the whole pretraining process will be performed on lower layer groups to upper layer groups () one at a time.

4.3. Global Fine-Tuning

In the above unsupervised pretraining process, the greedy layer-wise algorithm is used to learn the DBN parameters. In this subsection, a traditional back propagation algorithm will be used to fine-tune the parameters using the information of the label layer .

Since the pretraining process has already identified strong initial parameters, the back propagation step is just used to finely adjust the parameters so that local optimum parameters can be found. At this stage, the learning objective is to minimize the classification error , where and are the real label and the output label of data in layer .

5. Experiments and Analysis

5.1. DBN Based Vehicle Verification Effect

The proposed DBN based vehicle verification method is trained on our image dataset captured by a SAT NV628 FIR camera as shown in Figure 6. The total number of samples for training and testing are 2700 and 500, respectively.

By using the proposed method, four different architectures of 2D-DBN are applied. They all contain one visible layer and one label layer, but with one, two, three, and four hidden layers, respectively. In training, the critical parameters of the proposed 2D-DBN in experiments are set to and and the image samples for training are all resized to pixels.

The detection results of the four 2D-DBN architectures and two common shallow models (SVM and Adaboost) are shown in Table 1. It is observed that the 2D-DBN with three hidden layers maintains the highest detection rate in the test set. It is also seen that deep architecture performs much better than the existing shallow models.

5.2. System Overall Effect

All the methods described below are tested using the same image dataset containing 5000 images captured by our group. In the dataset, there are 6382 vehicles and around 37% in near range (less than 25 m), 61% in medium range (25 m to 75 m), and 12% in far range (more than 75 m). Some of the vehicle candidate generation effects are shown in left column of Figure 7. Based on the vehicle candidate generation results, the DBN based vehicle candidate verification method is further applied. Some of the vehicle candidate verification results are shown in Figure 7. The left column shows identified vehicle candidates marked in red and the right column shows the verified vehicles marked with a blue rectangle.

The overall vehicle detection effects are shown in Table 2, as well as some state-of-the-art vehicle detection effects.

From the results shown in Table 2, it is seen that the proposed vehicle detection framework exhibits the lowest false positive (FP) rate while achieving the highest true positive (TP) rate, which is 2.5% higher than that of Bassem’s method.

Figure 8 shows a group of vehicle sensing results in a continuous video. The blue rectangles represent correct vehicle detection and the red rectangles represent missed detection or false detection. From the results, it can be seen that some light spots with similar shapes to vehicles are recognized to be vehicles, such as in the fourth image. Vehicles which are not in the rear view are easily missed. Besides, the strong occlusion between different vehicles easily causes missed detection because they may be decided as one object in vehicle candidate generation step. Generally, most vehicles in the rear view are detected correctly and pedestrians and bicycles are not falsely detected.

6. Conclusion

In this work, a new method is proposed for night-time vehicle detection in far infrared images. Compared with existing methods, a maximum distance based local adaptive threshold determination method is proposed to generate the vehicle candidate and a deep learning framework is introduced to perform vehicle candidate verification. Overall, this two-step vehicle detection method achieves the highest vehicle detection rate compared with existed state-of-the-art methods. Additionally, the processing time is below 50 ms per frame which satisfies requirements for real-time applications.

Conflict of Interests

The authors declare that they have no conflict of interests.

Acknowledgments

This work has been supported by the National Natural Science Foundation of China under the Grants 61403172, 61203244, and 51305167, China Postdoctoral Science Foundation (2014M561592), China Postdoctoral Science Foundation Special Funding (2015T80511), Information Technology Research Program of Transport Ministry of China under the Grant 2013364836900, Natural Science Foundation of Jiangsu Province (BK20140555), and Jiangsu University Scientific Research Foundation for Senior Professionals (12JDG010, 14JDG028).