Abstract

This study proposes a new vehicle type recognition method that combines global and local features via a two-stage classification. To extract the continuous and complete global feature, an improved Canny edge detection algorithm with smooth filtering and non-maxima suppression abilities is proposed. To extract the local feature from four partitioned key patches, a set of Gabor wavelet kernels with five scales and eight orientations is introduced. Different from the single-stage classification, where all features are incorporated into one classifier simultaneously, the proposed two-stage classification strategy leverages two types of features and classifiers. In the first stage, the preliminary recognition of large vehicle or small vehicle is conducted based on the global feature via a -nearest neighbor probability classifier. Based on the preliminary result, the specific recognition of bus, truck, van, or sedan is achieved based on the local feature via a discriminative sparse representation based classifier. We experiment with the proposed method on the public and established datasets involving various challenging cases, such as partial occlusion, poor illumination, and scale variation. Experimental results show that the proposed method outperforms existing state-of-the-art methods.

1. Introduction

Vehicle type recognition (VTR) is one key component of intelligent transportation systems (ITS) and has a wide range of applications such as traffic flow statistics, intelligent parking systems, electronic toll collection systems, and access control systems [1]. For example, it can be utilized to realize the automatic fare collection (AFC) according to different vehicle types in some paying parking lots or be applied to the nonstop toll collection system to realize automatic toll calculation in highway toll stations. Additionally, it can also be used to find and locate the vehicles that break traffic regulations and are escaping from the accident scene in traffic video monitoring.

With the extensive use of traffic surveillance cameras, image-based methods are attracting more and more attention of researchers in the VTR. The vehicle face image contains precious information for the VTR, and extracting features from the vehicle face image will lead to a better recognition result. However, illumination change, scale variation, and partial occlusion will badly influence the performance of the VTR in real-world traffic environments. In order to improve the performance of the VTR, researchers have proposed many effective methods. These existing methods mainly consist of two key steps, that is, feature extraction and classifier design, which directly determine how well the VTR method works.

There are many typical features that can be applied to the VTR, such as edge based feature [2, 3], color based feature [4], symmetry based feature [57], SIFT descriptor based feature [8, 9], HOG descriptor based feature [10], and Gabor filter based feature [11]. The edge based feature extraction methods extract the edge of vehicle image by a certain edge operator, such as Sobel operator. The symmetry based methods utilize projection or corner detection algorithms according to the geometric symmetry of the vehicle face image in spatial profile to detect and recognize the vehicle. The two kinds of methods are able to extract the geometrical contour of vehicle image accurately and quickly using small storage space and little computation time. However, these methods are easily influenced by some adverse factors, such as illumination change, scale variation, and partial occlusion; when these factors occur, their performance in feature extraction will degrade. Therefore, these feature extraction methods are commonly used to extract the global contour of vehicle image, and the extracted features also only apply to the preliminary recognition in the VTR.

Unlike the two kinds of methods mentioned above, feature extraction methods, such as SIFT descriptor based, HOG descriptor based, or Gabor filter based, can extract structural details of vehicle image from multiple scales and orientations, and they are insensitive to illumination change or scale variation. Therefore, they are commonly used for precise recognition. However, due to extracting multiple features from multiple scales and orientations, these feature extraction methods always generate a large amount of additional feature information compared with the original image, which will increase the computational complexity of VTR algorithms.

Intuitively, global information means the holistically geometrical configuration of vehicle contour, while structural details are embedded in the local variations of vehicle appearance. Therefore, extracting both global geometrical information and local structural details from vehicle images through certain feature extraction methods and leveraging the extracted feature information via suitable classifiers will help improve the performance of the VTR.

In terms of the classifier design, typical classifiers include KNN [3, 4], SVM [1214], and ANN [15]. For the KNN classifier, it has a simple principle and does not need training in advance. However, when the number of the samples in training set increases, its computation time will also increase accordingly. The methods based on SVM or ANN classifier can effectively utilize various vehicle features and obtain good classification performance. However, these methods need to train classifier parameters in advance by collecting many samples of different types of vehicles and are easy to fall into a local optimum solution during training the classifier parameters. The classifier based on sparse representation has been successfully applied to the face recognition due to excellent characteristics: without involving complex parameter training and only needing to consider original image samples as a dictionary without any additional transformation [16]. Further research finds that if we can learn a discriminative dictionary from the original dictionary via certain dictionary learning schemes before pattern recognition, then we will achieve more accurate and reliable classification results based on the learned dictionary than based on the original dictionary [17].

Additionally, the above-mentioned classification methods adopt a single-stage classification strategy; that is, all features are incorporated into one classifier together to recognize the vehicle type. When the number of the recognized vehicles types increases, the methods based on the single-stage classification need lots of training samples to train many classifier parameters, which will inevitably increase the difficulty of classifier design for a given recognition performance [18].

To address the aforementioned limitations, this paper proposes a new VTR method combining global and local features via a two-stage classification, whereby the global feature and local feature are jointly applied to the VTR, and their advantages in expressing vehicle geometrical contour and structural details are leveraged by a proposed two-stage classification strategy. The proposed method enables an accurate and reliable VTR. First, the global feature is used to preliminarily recognize the type of a vehicle from the geometrical contour viewpoint, and the local feature is further used to recognize the specific type from the structural details viewpoint. Second, due to exploiting a two-stage classification strategy, the total classification task is appropriately assigned to two different classifiers. Therefore, the design of each classifier is simplified and their design difficulty is also lowered accordingly. This improves the overall classification performance of the VTR in accuracy and reliability compared with the methods based on the single-stage classification strategy.

This paper advances the research on VTR by making the following specific contributions: First, an improved Canny edge detection algorithm with smooth filtering and non-maxima suppression abilities is proposed to extract a continuous and complete global feature of vehicle image. Second, the whole vehicle image is partitioned into four nonoverlapping patches based on the key parts of a vehicle, and the local feature is extracted by a set of Gabor wavelet kernels with five scales and eight orientations based on four partitioned key patches. When the vehicle is partially occluded, it still can be correctly recognized by using the local feature extracted from other nonoccluded patches. Third, a -nearest neighbor probability classifier (KNNPC) with the Hausdorff distance measure is proposed to improve the reliability of the first stage of classification, where vehicle type is preliminarily recognized as a large or small vehicle from the geometrical contour viewpoint. Fourth, a discriminative sparse representation based classifier (DSRC) that adopts a dictionary learning scheme based on the Fisher discrimination criterion is introduced to the second stage of classification, which enables a more specific classification based on the extracted local feature.

The rest of this paper is organized as follows. Section 2 presents the global and local feature extraction methods as well as the image partition method based on the key parts of a vehicle. Section 3 describes a two-stage classification strategy for the VTR. Experiments and analysis are shown in Section 4 to illustrate the effectiveness of the proposed VTR method. The final section summarizes this study and future research directions.

2. Feature Extraction

As mentioned previously, both the global geometrical contour and local structural details of a vehicle play important roles in the VTR. Therefore, there is a need to extract these features through corresponding feature extraction methods. In this paper, the global geometrical contour is extracted by an improved Canny edge detection algorithm with smooth filtering and non-maxima suppression abilities, and the local structural details are extracted by a set of Gabor wavelet kernels with multiple scales and orientations.

2.1. Global Feature Extraction

The edge of vehicle image contains rich contour information of the vehicle. Therefore, it is regarded as a global feature to preliminarily recognize the type of a vehicle in this paper.

Commonly, some operators can be used to extract the edge of a vehicle, such as Sobel, Roberts, Prewitt, and Canny. However, these edge detection algorithms based on a certain operator have their own limitations. For example, the Sobel and Prewitt operators have the ability to fast detect the edge of an object but cannot produce a thin edge; therefore, they are unsuitable for accurate location. The Roberts operator is capable of locating the edge accurately but is sensitive to noises; therefore, it cannot effectively suppress the noises existing in the image. The Canny operator has the abilities to smooth a strong edge and suppress noises. It also can extract accurate and complete edge under good illumination; however, when the illumination becomes poor, it cannot detect a weak edge [19].

In order to achieve a better edge, we propose an edge detection method based on the improved Canny operator to extract the global feature of vehicle images. It exploits a double-threshold algorithm based on OTSU to self-adaptively determine the edge of a vehicle according to illumination changes. Based on non-maxima suppression and double-threshold judgment, the proposed method can find a continuous and complete edge. The detailed steps are as follows.

Step 1. According to (1), smooth the input image using a Gaussian filter to remove Gaussian noise [20]. where is variance and indicates convolution operation. In this paper, when , good smoothing results can be obtained. Therefore, we let , and accordingly.

Step 2 (calculate gradient magnitude). The gradient of each pixel in the smoothed image is determined by applying the Sobel operator. The Sobel operators for and directions are, respectively, In order to improve real-time performance, the gradient magnitude and gradient direction are determined by where and .

Step 3. Implement non-maxima suppression on the gradient magnitude calculated in Step 2 to determine the candidates of edge pixels. We define a mask template that can traverse the entire image. In this template, if the gradient magnitude of the central pixel is not less than that of two other pixels along the gradient orientation , then we keep the maximal gradient magnitude and let other gradient magnitude be equal to zero; that is, if is maximum, then let ; otherwise, let . The specific comparison process is as follows: if or , then we compare with and ; if , then we compare with and ; if , then we compare with and ; if , then we compare with and .

Step 4. Double thresholds are used to determine strong and weak edges. We set two thresholds and . (i) If , then the pixel at is determined as an edge pixel and let . (ii) If , then the pixel at is determined as a nonedge pixel and let . (iii) If , then continue to search in a neighborhood based on the current central pixel to find whether there is a pixel whose gradient magnitude is more than . If such a pixel exists, then the pixel is also determined as an edge pixel and let ; otherwise, the pixel is determined as a nonedge pixel and let .

Different from the traditional Otsu algorithm [21] that only determines a single threshold, in this step, we propose a self-adaptive algorithm to determine the two thresholds of and based on the histogram of the gradient image . Assume that the gradient magnitude ranges from zero to in the ; that is, , and we divide the pixels into three categories according to the gradient magnitude, that is, , , and , where is used to indicate nonedge pixels and their range is defined as ; is used to indicate edge pixels and their range is defined as ; and is used to indicate the pixels that cannot be definitely determined as edge pixels or nonedge pixels and their range is defined as . Let denote the number of the pixels whose gradient magnitude is , denotes the total number of the pixels in the gradient image , and indicates the percentage of the pixels whose gradient magnitude is in the gradient image ; that is, . The expectation of the gradient magnitude in the whole image is .

The expectations of the gradient magnitude of the pixels in , , and are, respectively,

,  , and , where , , and .

In order to determine and , we define an evaluation function inspired by the traditional Otsu algorithm:

Calculate and compare every ; let , where , or , and , or . Then, we let and ; the two thresholds and are determined accordingly.

2.2. Local Feature Extraction

The global feature can be used to recognize the type of a vehicle roughly, such as large or small. In order to further recognize a specific type, such as sedan, van, bus, or truck, other features to represent the local structural details of a vehicle need to be extracted.

2.2.1. Image Partition Based on Key Parts

Not all parts in a vehicle face image are useful for the VTR; only some key parts with salient features (e.g., vehicle roof, windscreen and rear-view mirror, hood, and license plate) are available. Additionally, the partial occlusion always occurs under real-world traffic environments. If we partition the vehicle face image into several key patches, even when the partial occlusion occurs, we can still recognize the vehicle type through other key parts in other nonoccluded patches. Therefore, we averagely partition the vehicle face image into four key patches from the top to the bottom, (i) vehicle roof, (ii) windscreen and rear-view mirror, (iii) hood, and (iv) license plate, as shown in Figure 1.

2.2.2. Local Feature Extraction

Gabor wavelets, whose kernels act very similarly to mammalian visual cortical cells, have strong characteristics of spatial locality and orientation, making them a suitable choice for image feature extraction in the VTR [22]. Therefore, the Gabor wavelet representation of the vehicle image is introduced to extract the local features in every partitioned patch in this paper, which can not only obtain better structural details with multiple scales and multiple orientations but also improve the robustness to illumination change or partial occlusion. The Gabor wavelet kernels can be defined by [22]where and define the orientation and scale of the Gabor kernels, respectively, , denotes the norm operator, represents the pixel coordinates, and the wave vector is defined aswhere , , is the maximum frequency, and is the spacing factor between kernels in the frequency domain.

It is usual to use the Gabor wavelets at five different scales, , and eight orientations, , with the following parameters: , , and [23].

For Gabor feature extraction, we convolve the image with a set of Gabor wavelet kernels defined by (6) at every pixel :where , is the convolution result corresponding to the Gabor wavelet kernel at orientation and scale , and it also is called Gabor feature image in this paper, expresses gray level distribution of an image, and represents the convolution operator. Therefore, the set forms the Gabor wavelet representation of the image .

Applying the convolution theorem, we can derive every via the fast Fourier transform (FFT) [24].where and indicate the Fourier transform and inverse Fourier transform, respectively.

To leverage the advantage of Gabor wavelets with five scales and eight orientations, we concatenate all these Gabor feature images in set and derive an augmented feature vector . Before the concatenation, we first downsample every into by a factor to reduce the space dimension and normalize it to zero mean and unit variance. We then transform every into a vector by concatenating its columns. Finally, the reduced Gabor feature vector is defined as , where is the transpose operator.

3. Recognition

3.1. Two-Stage Classification Strategy

Unlike the single-stage classification based methods that need to design a more complicated classifier, collect more training samples, and spend more computational time on training classifier parameters, we propose a two-stage classification strategy based on two different types of classifiers and features. In the first stage of classification, we firstly recognize the type of the test sample as large vehicle or small vehicle using the KNNPC based on the extracted global feature. Based on this, we further recognize the type of the large vehicle as bus or truck as well as the type of the small vehicle as van or sedan using the DSRC based on the extracted local feature in the second stage of classification. The detailed classification process is illustrated in Figure 2.

3.2. Preliminary Recognition Based on Global Feature and KNNPC

In the first stage of classification, we propose a robust classification method based on the local feature and KNNPC in the first stage of classification. This method first estimates the cumulative probabilities of the test sample on its -nearest neighbors that may belong to different classes and then selects the maximum weighted class as the classification result. The selection of the -nearest neighbors is based on an improved Hausdorff distance measure (IHDM), and the cumulative probabilities of the test sample are based on Gaussian kernel density estimation (KDE).

3.2.1. Improved Hausdorff Distance Measure

Hausdorff distance (HD) is one of the commonly used measures for object matching. It calculates the distance between two point sets of the edges in two-dimensional binary images without establishing correspondences. Compared with other methods, such as Euclidean distance, the HD has better robustness to noises and partial occlusion due to not involving point-to-point distance calculation. In order to enhance the first stage of classification of the VTR, we introduce an IHDM based on a statistics scheme to calculate the HD between the test sample and training samples [25].

The classical HD measure between two point sets and with sizes and , respectively, is defined aswhere represents the directed distance between two sets and . The distance value of point to the set is defined as and the directed distance is denoted bywhere represents Euclidean norm.

Because the classical HD measure is sensitive to noises and partial occlusion, the scheme of the least trimmed square (LTS) is introduced. In the IHDM, the directed distance is defined by a linear combination of order statistics:where represents the th distance value in the sorted sequence (); . A parameter , , depends on the amount of occlusion. The measure is minimized by keeping the smaller distance values after large distance values are eliminated.

3.2.2. Kernel Density Estimation

Assume that the number of the target classes is , and for each class there are () samples. First, we obtain the -nearest neighbors to the test sample in training set using the proposed IHDM. Suppose that is the point set that consists of the edge points extracted from the test sample by the global feature extraction method proposed in Section 2.1, indicates the point set that consists of the edge points extracted from the th training sample in the sample set by the global feature extraction method proposed in Section 2.1, and , . According to (12), we can calculate the Hausdorff distance between and every , defined as , . Compare ; we can obtain the smallest values of , defined as , . The training samples corresponding to the smallest values will be regarded as the -nearest neighbors to the test sample.

Then, the KDE method [26] is used to estimate the cumulative influences on from its -nearest neighbors corresponding to different classes. We use Gaussian kernel function and set window width parameter in the estimation, where is a coefficient, to narrow (larger ) or expand (smaller ) the influences of the neighbors with different distances. Finally, we get where is the weight of belonging to the class and indicates that every belongs to the same class.

The final classification result is determined by

3.3. Precise Recognition Based on Local Feature and DSRC

To exploit the Gabor feature of vehicle image, before the following precise recognition, we need to firstly express all samples using their reduced Gabor feature vector that is computed by the proposed local feature extraction method in Section 2.2. Then, based on the reduced Gabor feature vectors, we set up training set and test set to design the DSRC.

The core idea of the sparse representation based classification (SRC) methods is to represent a test sample using a sparse linear combination of training samples [27]. Suppose that there are classes of samples, and let be the set of training samples, called dictionary, where is the subset of training samples from class . Let be a test sample. The procedures of the SRC are summarized as follows.

(i) Sparsely represent on via -minimization:where is a scalar constant.

(ii) Implement classification viawhere and and is the coefficient vector associated with the class . Obviously, the SRC method classifies the test sample as the category to which the smallest representation residual belongs.

Poststudies find that the employed dictionary plays an important role in sparse representation based image classification. While learning a dictionary from the training data has led to state-of-the-art results in image classification, many models of dictionary learning harness only the one-sided discriminative information in either the representation coefficients or the representation residual, which limits their performance. In this paper, we proposed a DSRC that adopts a novel dictionary learning scheme based on Fisher discrimination criterion. Based on this, a structured dictionary, whose atoms have correspondences to the subject class labels, is learned, by which both the representation residual and representation coefficients can be used to distinguish different classes.

3.3.1. Dictionary Learning Based on Fisher Discrimination Criterion

Unlike the method based on the shared dictionary, we adopt a new dictionary learning scheme based on Fisher discrimination criterion [17], which learns a structured dictionary , where is the subdictionary associated with class . Let express the set of training samples with classes, and let be the sparse coefficient matrix of over ; that is, , where is the subset of class . We can write as , where is the coefficient matrix of over . Besides requiring that should have powerful ability to represent (i.e., ), we also require that should have powerful ability to distinguish the images in . For this reason, the dictionary learning scheme based on Fisher discrimination criterion is defined as follows: where is the discriminative data fidelity term; is the sparsity penalty; is a discrimination term imposed on the coefficient matrix ; and and are scalar parameters. Each atom of is constrained to have a unit -norm to avoid that has arbitrarily large -norm, resulting in trivial solutions of the coefficient matrix . Further, by means of the Fisher discrimination criterion, and are defined as and , where denotes the trace of a matrix, and indicate the within-class scatter and between-class scatter of , respectively, , , where and are the mean vectors of and , respectively, and is the number of samples in class ; is a parameter.

Although the objective function in (17) is not jointly convex to , we will find that it is convex with respect to each of and when the other is fixed. Therefore, the objective function can be divided into two subproblems by optimizing and alternatively: updating with fixed and updating with fixed. The alternative optimization is iteratively implemented to find the desired dictionary and coefficient matrix .

Suppose that the dictionary is fixed, and then the objective function in (17) is reduced to a sparse representation problem to compute . We can compute class by class. When computing , all , , are fixed. The objective function in (17) is further simplified intowhere ; and are the mean vector matrices (by taking the mean vector or as all the column vectors) of class and all classes, respectively. We can solve (18) to obtain using the improved iterative projection method (IPM) [28].

Then we will discuss how to update , when is fixed. We also update class by class. That is, when every is updated, all , , are fixed. The objective function in (17) is reduced to where , is the representation matrix of over , and is the representation of over subdictionary . Equation (19) can be efficiently solved to obtain every via the algorithm like [29].

3.3.2. Classification Scheme

Using the dictionary obtained by the proposed dictionary learning scheme based on Fisher discrimination criterion to represent the test sample, both the representation residual and the representation coefficients will be discriminative, and hence we can make use of both of them to achieve more accurate classification results.

Let express the reduced Gabor feature vector of the test sample ; then sparsely represent on via -minimization:where is a constant, , and is the coefficient subvector associated with subdictionary .

By considering the discrimination capability of both representation residual and representation vector, we define the following metric for classification:where is a preset weight to balance the contribution of the two terms to classification. The classification rule is defined as

4. Experiments

4.1. Experiment Setup

To validate the proposed method, we constructed a dataset including 6,000 vehicle images. The vehicle images are captured by a camera fixed on an overpass with 640 480 pixels and 256 gray scale levels. The proportion of the challenging vehicle images that are partially occluded by other vehicles or captured in a bad illumination condition is about 10% in the whole dataset. The location of each vehicle is adjusted to the center of the whole image and the size is cropped into pixels by manual operations in advance. Figure 3 shows the example images of the dataset under various conditions.

To facilitate the VTR, all vehicle images in the whole dataset are firstly divided into two datasets: large vehicle and small vehicle. The large vehicle dataset consists of two subdatasets: bus and truck. The small vehicle dataset consists of two subdatasets: van and sedan. The numbers of the images in every subdataset are all 1,500.

All the experiments are conducted on the computer with 3 GHz CPU and 16 Gb memory, and all program codes are compiled and run on Matlab 2014b.

4.2. Results of Global Feature Extraction

In order to verify the advantage of the improved Canny operator, the edge detection results based on other three operators such as Sobel, Roberts, and Prewitt are compared in Figure 4. As can be seen from Figure 4, the proposed method based on the improved Canny operator in Section 2.1 can obtain a more accurate and complete edge compared to the methods based on three other operators.

In addition, we compare the global feature extraction method based on the improved Canny operator with the method based on traditional Canny operator. Comparative results are shown in Figure 5, where original gray images are in the first column, the detection results based on traditional Canny operator are in the second column, and the detection results based on the improved Canny operator are in the third column. Additionally, in order to verify the performance of the proposed global feature extraction method under various illumination, Figure 5(a) is captured in the morning in a fine day with good illumination, Figures 5(b) and 5(c) are captured at dusk in a cloudy day, and Figure 5(d) is captured in the afternoon in a fine day, but the bus is partially covered by shadow for the lighting is shielded by a building nearby. As can be seen from Figure 5, we can find that the method based on the improved Canny operator can obtain a more continuous and complete edge with respect to different kinds of vehicles compared to the method based on traditional Canny operator, even though the illumination condition was poor.

4.3. Results of Local Feature Extraction

Based on the method proposed in Section 2.2.2, we use the Gabor wavelet kernels with five different scales and eight different orientations to extract the Gabor feature of every local patch of the detected vehicle image. Take the patch of the hood as an example, the extracted Gabor feature image by a set of Gabor wavelet kernels with five different scales and eight different orientations is shown in Figure 6.

As can be seen from Figure 6, the feature extraction method based on the Gabor wavelet kernels can extract many structural details of local patch of vehicle image from multiple scales and multiple orientations, and the extracted Gabor feature images can be regarded as local feature for the VTR.

In the paper, the resolution of every patch is defined as pixels. After implementing the convolution operation, the dimension of augmented feature vector will reach 92160 (). The increased dimension will result in slow computation speed and large memory occupation, which will be adverse to the following recognition and classification. Therefore, before implementing the VTR, we need to downsample using an appropriate sample factor . In order to select an appropriate sample factor, we experiment on the augmented Gabor feature vector defined in Section 2.2.2 with five different downsampling factors, respectively: , 32, 64, 128, or 256. Experimental results show that the average accuracy rates based on the DSRC proposed in Section 3.3 are 95.8%, 95.9%, 95.9%, 96.8%, 73%, and 34%, respectively, when , 16, 32, 64, 128, or 256. It is very clear that when , the DSRC has the highest accuracy rate. Therefore, in this paper, we let , and the dimension of the augmented Gabor feature vector is reduced to 1440 () accordingly, which will reduce the computational complexity of VTR on the premise to assure a high recognition accuracy.

4.4. Results of Two-Stage Classification

In order to demonstrate the performance of the proposed two-stage classification strategy, we introduce three evaluation criteria: precision, recall, and accuracy [30]. Their definitions are as follows: precision = TP/(TP + FP), recall = TP/(TP + FN), and accuracy = (TP + TN)/(TP + FN + FP + TN), where TP, FP, FN, and TN are abbreviations for true positives, false positives, false negatives, and true negatives, respectively.

We randomly select 400 samples as training samples and 400 samples as test samples from four vehicle type datasets, bus, truck, van, and sedan, respectively.

4.4.1. Results of the First Stage of Classification

For the first stage of classification, we experiment on the whole dataset. We randomly select 1200 samples as training samples and 400 samples as test samples. If the type of the test sample is recognized as bus or truck, then the test sample is determined as a large vehicle. Similarly, if the type of the test sample is recognized as van or sedan, then the test sample is determined as a small vehicle. Table 1 shows the experimental results where the test samples are captured under good illumination and no occlusion. Further, Table 2 gives the results under bad illumination or partial occlusion.

As can be seen from Tables 1 and 2, the first stage of classification still has high accuracy and reliability, even though the test samples are captured under bad illumination or partial occlusion.

4.4.2. Results of the Second Stage of Classification

Based on the result of the first stage of classification, if the test sample is recognized as a large vehicle, the large vehicle dataset including the bus and truck images needs to be used in the following second stage of classification. Similarly, if the test sample is recognized as a small vehicle, the small vehicle dataset including the van and sedan images needs to be used. We still randomly select 1200 samples as training samples and 400 samples as test samples from the large vehicle dataset or small vehicle dataset in the second stage of classification. Table 3 shows the experimental results where the test samples are captured under good illumination or no occlusion. Table 4 gives the results under bad illumination or partial occlusion.

As can be seen from Tables 3 and 4, although the performance of the second stage of classification slightly degrades compared with the first stage of classification, it still has very good reliability.

To verify that the proposed method exploiting the dictionary learning scheme based on Fisher discrimination criterion is effective, after implementing the first stage of classification, we use the traditional SRC method that does not exploit the dictionary learning scheme based on Fisher discrimination criterion to implement the second stage of classification. The classification results under good illumination and no occlusion are shown in Table 5.

As can be seen from Tables 3 and 5, the proposed classification method that exploits the dictionary learning scheme based on Fisher discrimination criterion is superior to the traditional method in terms of precision, recall, and accuracy. Therefore, exploiting the dictionary learning scheme based on Fisher discrimination criterion in the second stage of classification is very effective for improving recognition performance of classifier for the VTR.

In order to demonstrate the efficacy of the two-stage classification strategy, the proposed KNNPC in Section 3.2 and the DSRC in Section 3.3 are regarded as single-stage classifiers to implement the classification task of four types of vehicles, respectively. We also randomly select 1200 samples as training samples and 400 samples as test samples from the whole dataset. The results of single-stage classification based on the KNNPC and global feature and those based on the DSRC and local feature are shown in Tables 6 and 7, respectively. It is clearly noted that the proposed two-stage classification strategy overpasses the single-stage classification strategy in terms of precision, recall, and accuracy. Further analysis finds out that the extracted global feature has an excellent ability to distinguish the large vehicles from small vehicles or to distinguish the small vehicles from large vehicles based on the KNNPC. When the four types of vehicles are mixed together, it becomes difficult for the global feature to distinguish the buses or trucks in the large vehicle dataset or distinguish the vans or sedans in the small vehicle dataset. Moreover, when the four types of vehicles are mixed together, the single-stage classification based on the DSRC and local feature needs to train more classifier parameters simultaneously using more training samples than when two types of vehicles are mixed together for a given recognition performance. Therefore, the performance of the single-stage classification based on the DSRC and local feature will degrade compared with the proposed two-stage classification strategy.

4.5. Comparison of Results with Other Methods

In order to compare our method with other popular methods, we test our method on the dataset used in [31]. Similar to [31], the experiments on daylight images and nighttime images are performed, respectively. Before implementing the classification, we firstly divide the dataset in [31] into two categories: large vehicle dataset and small vehicle dataset, where large vehicle dataset consists of two types of vehicles, bus and truck, and small vehicle dataset consists of three types of vehicles, passenger car, minivan, and sedan. Our method averagely achieves 96.3% classification accuracy on daylight images and 89.5% on nighttime images, better than the results of previous methods, as demonstrated in Table 8. Additionally, we also test our method on the BIT-Vehicle dataset provided in [1]; our method achieves 90.1% classification accuracy, yet the accuracy of the method used in [1] reaches 88.11%.

The underlying reasons are as follows: the proposed Canny edge operator and Gabor wavelet kernels are able to extract discriminative global and local features for VTR. The proposed two-stage classification strategy can leverage the advantages of the extracted global and local features according to their characteristics; that is, the extracted global feature that can represent the geometrical contour of a vehicle is just applied to the first stage of classification to determine whether the test sample belongs to large vehicle or small vehicle, and then the local feature that can represent the structural details of a vehicle is just applied to the second stage of classification to determine whether the sample belongs to bus or truck in the large vehicle dataset as well as van or sedan in the small vehicle dataset. The dictionary learning scheme based on Fisher discrimination criterion is able to learn a discriminative classifier for precision recognition in the second stage of classification. Extracting local feature from the four partitioned patches enables strong robustness to partial occlusion.

5. Conclusions

The two key steps of improving the VTR are the feature extraction and classifier design. Based on the need to recognize the vehicle type accurately and reliably, we propose a VTR method combining global and local features via two-stage classification. The improved Canny edge detection algorithm is capable of extracting the continuous and complete global feature. The employed Gabor wavelet kernels with five scales and eight orientations are able to successfully extract the local feature. The proposed KNNPC is able to realize the preliminary recognition of a large vehicle or small vehicle based on the global feature. Further, the DSRC has a stronger ability in recognizing bus, truck, van, or sedan based on the local feature. As demonstrated by the experiments on the challenging dataset and a compared dataset, the proposed method can solve the VTR problem much more efficiently and outperforms existing state-of-the-art methods.

The study offers the possibility of developing more sophisticated VTR methods. First, this method can be extended to the VTR context involving more vehicle types. Second, more effective features and corresponding feature extraction algorithms can be adopted. Third, more discriminative classifiers can be incorporated into the two-stage classification.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (nos. 61304205, 61502240, 61203273, and 41301037), Natural Science Foundation of Jiangsu Province (no. BK20141002), and Innovation and Entrepreneurship Training Project of College Students (nos. 201710300051 and 201710300050).