Video analytics play a critical role in most recent traffic monitoring and driver assistance systems. In this context, the correct detection and classification of surrounding vehicles through image analysis has been the focus of extensive research in the last years. Most of the pieces of work reported for image-based vehicle verification make use of supervised classification approaches and resort to techniques, such as histograms of oriented gradients (HOG), principal component analysis (PCA), and Gabor filters, among others. Unfortunately, existing approaches are lacking in two respects: first, comparison between methods using a common body of work has not been addressed; second, no study of the combination potentiality of popular features for vehicle classification has been reported. In this study the performance of the different techniques is first reviewed and compared using a common public database. Then, the combination capabilities of these techniques are explored and a methodology is presented for the fusion of classifiers built upon them, taking into account also the vehicle pose. The study unveils the limitations of single-feature based classification and makes clear that fusion of classifiers is highly beneficial for vehicle verification.

1. Introduction

Vision-based scene understanding has been the focus of increasing interest for the last couple of decades due to its low cost and flexibility. Among the many fields of application of vision computing, advanced driver assistance systems play a leading role in furtherance of the ambitious goal of reducing car accidents. In particular, studies show that most of the accidents are produced by other vehicles [1]. Therefore, much effort has been devoted in recent years to vision-based vehicle detection.

Most of the video-based vehicle detection methods perform in a two-stage fashion. The first stage addresses hypothesis generation and entails a quick search over the image to find the potential vehicle locations. This stage typically relies on appearance-related features such as color [2], shadow [3], or edges [4]. Then, in the second stage, these hypotheses are further analyzed and a final decision is taken on the presence or not of vehicles in those locations. This paper focuses on the second stage, that is, hypothesis verification.

Vision-based vehicle hypothesis verification methods increasingly resort to learning-based methods, especially on account of the growing processing capabilities. The task is thus usually addressed as a supervised classification problem, in which candidates are classified into one of two classes, that is, vehicles or nonvehicles. In this context, the selection of features to train the classifiers plays a critical role.

Widespread techniques for feature extraction include principal component analysis, wavelet transform, histograms of oriented gradients (HOG), and Gabor filters. Wavelet transform was used in some of the early methods for vehicle verification [5, 6]. The simplest wavelet form is the Haar transform which provides a local analysis of the images, and has been used for feature extraction in many applications, such as image coding, compression, and retrieval. Gabor filters constitute an alternative to wavelets for joint space-frequency representation of images, and have been shown to be better suited for vehicle detection [7]. On the other hand, principal component analysis (PCA) is a well-known technique for feature extraction which has naturally also been used for vehicle images [8, 9]. Finally, histograms of oriented gradients are extensively applied for people detection, in spite of being relatively recent, and are now being explored for vehicle verification (e.g., [10]).

Unfortunately, although these methods are claimed to perform well for vehicle verification, the lack of common databases and of objective and comprehensive tests makes it difficult to have a quantitative measure of the performance of each method for vehicle/nonvehicle classification and of the comparison among them. In addition, although the scarce statistics published for each of them in the literature disclose a reasonably good performance, they also hint that we are still far from flawless classification.

In this context, the combination of the different techniques arises as the natural way to overcome the limitations of each of them and to exploit their heterogeneous nature in a common framework. Unfortunately, the fusion of these techniques has only scarcely been explored in the literature. As an example, in [7], Gabor and wavelet features are combined assuming that they produce complementary results. However, no comprehensive analysis of the fusion potential of feature extraction techniques for vehicle classification is reported in the literature.

In this work, the most representative state-of-the-art feature extraction techniques are assessed for vehicle classification, and an in-depth study of the combination of all of them is carried out. First, all these techniques are reviewed and a classification scheme based upon them is presented, which accounts for the pose and distance of the vehicle to the observer. We put a special emphasis on the design of affordable descriptors or the proposal of less-demanding configurations for existing descriptors. In this context, the use of explicit features, which are linked to some a priori knowledge on the vehicle appearance, is also explored, in the belief that they can provide fast and meaningful information for the ensemble. In the second part of the paper, a thorough analysis of the fusion capabilities of these techniques is made by considering the diversity of the sources and different normalization and combination procedures. A graphical illustration of the studied fusion approach is provided in Figure 1. Finally, a methodology is proposed to find the best combination of sources according to the vehicle pose. The use of a common public dataset allows us to objectively compare the methods among them and to assess the gain of the fusion approach, which is shown to be substantial.

2. Single-Feature Classifiers

According to the revision of the state-of-the-art, in this section, the most successful and representative descriptors are selected and a classification scheme is built upon each of them in order to assess their individual performance and their limits. As stated in Section 1, the most common descriptors are principal component analysis (PCA), histograms of oriented gradients (HOG), and wavelet-based methods, particularly Gabor filters. In this section, the fundamentals of each of these methods are briefly reviewed and descriptors and classifiers based on each of them are presented. Apart from this complex implicit features, the descriptors and classification performance of the most common explicit features, that is, symmetry and gradient, are also enclosed.

A common methodology is employed to evaluate the classification performance of all the descriptors. This relies on a 5-fold 50% holdout cross-validation procedure; that is, for each experiment, half of the samples are randomly selected for the training set and the other half for the testing set. The final performance is measured in terms of the average accuracy (i.e., probability of correct classification) over the five experiments. The database in [11] is used, which is an open access vehicle image database. This dataset is representative in terms of number and variability as it contains 4000 vehicle images (selected to comprise different colors, sizes, vehicle types, etc.) and 4000 nonvehicle images acquired from traffic sequences under different lighting and weather conditions. In addition, vehicles are categorized in four regions (front, left, and right regions in the close/middle range and far range) depending on their relative position with respect to the camera. This will allow for the analysis of different feature combinations according to the vehicle pose.

2.1. Principal Component Analysis

The goal of principal component analysis is to derive a smaller set of features which accurately represent the original dataset. In particular, PCA finds the linear subspace of lower dimensionality that maximizes the variance of the original set, which is called principal subspace. A comprehensive description of this method can be found in [12].

Let us denote the data points in the original space in and their mean and covariance, and , respectively. As shown in [12], the maximization of the variance is an eigenvalue problem; namely, the principal subspace of dimension is composed of the eigenvectors associated with the largest eigenvalues of .

In the case of image feature representation, each image of size can be represented by a row feature vector . According to the above description, the principal subspace is given by the first eigenvectors of . Finally, the projections of the original data points onto the directions given by these eigenvectors constitute the PCA features.

Regarding the design of the classifier, support vector machines (SVMs) deliver the best generalization error [13] and are thus also used for the evaluation of implicit feature performance in this work. In particular, a linear SVM is used as a baseline for comparison of the methods. Table 1 summarizes the performance of such classifier over the PCA features described above for each image region as a function of the principal subspace dimensionality. As can be observed, the optimum dimensionality of the principal subspace varies for the different image regions (40 dimensions for the front close/middle region and 60 for the other regions), and the average detection rate setting the appropriate operation point for each of them is 93.04%.

2.2. Histograms of Oriented Gradients

Histograms of oriented gradients describe an image by a dense set of local histograms of gradient orientations. The idea is that the image is divided into disjoint regions, known as cells, and then a histogram is built by counting occurrences of gradient orientations in each region. In practice, the orientation range ( or depending on whether the sign of the gradient is considered) is divided into bins, and each pixel in the region votes for its corresponding bin. In the initial proposal by Dalal and Triggs [14] an additional normalization step is considered in which several adjacent cells are further grouped into blocks so as to relieve illumination and shadowing effects; the histograms of the cells in the block are concatenated and normalized according to the or norm. The complete final descriptor comprises the histograms of all the image blocks.

We propose several modifications with respect to the original HOG descriptor in order to better adapt to the addressed multifeature vehicle classification. Indeed, the real-time operation need poses a stringent constraint in the available processing resources and thus in the complexity of the descriptor. Therefore, we propose to adapt to the a priori known vehicle structure to modify the descriptor so that it is less demanding but still effective. In particular, vertical and horizontal edges are clearly preeminent in the vehicle rear owing to the rectangular structures in it, such as the back window, the license plate, the taillights, or the vehicle rear contour itself.

Hence, on the one hand, instead of using a dense grid of square cells as in [14], the image is only divided into vertical or horizontal cells, named, respectively, V-HOG and H-HOG, as shown in Figure 2. An in-depth comparison of V-HOG and H-HOG is performed in [15], where V-HOG is proven to be more efficient than H-HOG (this was also intuitively expected as the frequency of horizontal edges is higher than that of vertical edges, as a result, a larger number of cells is needed in H-HOG). This alternative scheme (V-HOG) involves a processing complexity reduction from to with respect to standard HOG, where is the number of cells. On the other hand, the normalization step in [14] entails a big computational overhead, while the performance gain proves to be small for the case of vehicles; hence it has been being dispensed with. In contrast, the sign of the gradient is taken into account as it is informative in some cases.

The performance of the vertical cell HOG descriptor (V-HOG) using SVM is summarized in Table 2. The optimum values of and are selected for each region. As can be observed, the classification accuracy is significantly higher than that of PCA, with an average correct classification rate of 96.46%.

2.3. Gabor Filters

Among multiresolution transforms for image processing, Gabor filters display a number of advantages regarding the resolution orientation and aliasing. Hence, they have been broadly used for many applications relating to texture analysis (e.g., [16]) and for image-based object (and in particular vehicle) detection and classification. The spatial 2D Gabor filter is composed of a complex sinusoid carrier and a Gaussian envelope:

The Fourier transform of this Gabor function is thus a Gaussian function shifted from the origin: where and and for simplicity it is assumed that .

In order to capture the frequency content at different scales and orientations, a bank of Gabor filters is required. These filters can be readily obtained by scaling and rotating : where , , and . The parameters of the bank are thus the number of scales , the number of orientations , the maximum frequency , and the spacing between frequencies, . A graphical representation of the Gabor filter bank in the frequency domain is provided in Figure 3.

In particular, the log-Gabor variation [17] of the Gabor functions is used here. The frequency response of this filter in polar coordinates is where and and represent, respectively, the frequency and angular bandwidth. This family of log-Gabor filters has several advantages over the traditional Gabor functions. In particular, the latter have a nonzero DC-component and therefore provide an excessive overlapping of the filters in the low frequencies. In contrast, log-Gabor filters cover more uniformly the midfrequencies and retrieve the highest frequencies. This is especially important since, as suggested by Field [17], the amplitude of natural images falls of a factor of ; thus it has a tail at high frequencies.

An in-depth description of the use of log-Gabor filters for vehicle verification, and proof of its superiority over traditional Gabor filters, is provided in our previous work [18]. As in the prior descriptors, the classification performance is evaluated by means of SVM. The optimum parameters are , , , , and (see [18]), and the associated accuracy rates are shown in Table 3, together with the values of the maximum frequency, , which varies for the different regions. As can be observed, this descriptor outperforms V-HOG and PCA in the close/middle region but falls below V-HOG in the far range. This unveils the necessity of classifier fusion, as discussed later.

2.4. Gradient

Aside from symmetry, gradient has been traditionally the most popular explicit feature for vehicle detection. In a previous work [19], we presented a new simple-but powerful gradient-based descriptor. As opposed to more complex techniques, such as HOG, this descriptor makes use of the knowledge of the vehicle structure in such a way that it achieves high discrimination performance with a small feature set. In particular, two properties relating to vehicles gradient are used: on the one hand, most of the edges are vertical and horizontal, and on the other hand, there is a high density of edges on account of the rich texture in the vehicle rear.

These properties are used in [19] to build a two-feature descriptor based on the HOG scheme. The first feature measures the average distance to the vertical or horizontal direction: the smaller this distance is, the more likely the image corresponds to a vehicle. The second feature is the number of cells with high density of gradients, which discriminates between homogenous background patches (e.g., belonging to the road and the sky) and vehicle instances. Please refer to [19] for more details on this descriptor.

In contrast to the implicit descriptors above, this gradient-based descriptor allows for the use of a generative model. In particular, a bivariate normal distribution is used to fit the data in the described two-feature space and a Bayesian classifier is used to evaluate its performance. Linear and quadratic classifiers are tested assuming, respectively, equal and different covariance matrices for the vehicle and nonvehicle classes. Exhaustive tests for both as a function of the cell size, , and the number of orientations, , can be found in [19]. The results are summarized in Table 4 for the optimum parameters, and , where it is clear that the quadratic classifier outperforms the linear classifier.

2.5. Symmetry

The rear of vehicles is typically symmetrical with respect to the vertical axis. This feature has been widely used in the literature for the detection and classification of vehicles. Most of the reported works make use of the symmetry definition introduced in [20]. In this method, first vertical symmetry is checked for every row of a grayscale image, , by shifting the symmetry axis. If we denote by this symmetry axis and by , the horizontal shift with respect to it, the symmetry for the row is where all possible widths are hypothesized up to the size of , . In turn, the even and odd parts of , , and , in (5), are given by

As stated, the symmetry measure of the input image is computed by integrating 1D symmetry values in the vertical axis, :

The offset and scaling in (7) ensure that the symmetry value is in the range , so that it conveys a probability of the image holding a vehicle according to this feature. In particular, the values maximizing the matrix determine the center and bounding box of the hypothesized vehicle.

The distribution of this feature for the vehicle and nonvehicle samples in the database is shown in Figure 4 for the front close/middle range. The former is right-skewed and resembles a Rayleigh distribution; this is confirmed by the Kolmogorov-Smirnov test, as shown in [19]. In turn, the nonvehicle distribution is symmetrical and bell-shaped, as the Gaussian distribution, but has heavier tails; thus it is modeled by a -Student distribution. As in the case of the gradient-based descriptor, a Bayesian classifier is used to evaluate the discrimination power of the symmetry feature. This is summarized in Table 5. As expected, the accuracy is more limited than in the previous descriptors due to its very simple nature (as a matter of fact, symmetrical structures may also arise in the background), but it is useful in a multicue approach, as shown later.

3. Combination of Classifiers

In the previous sections, a set of descriptors has been analyzed for characterization of vehicles and different classifiers have been presented to address vehicle verification. In this section, we aim to combine the information of the different classifiers so as to enhance the overall recognition performance.

Fusion of information from different sources can be performed at feature level, at matching score level, or at decision level. The latter combines the hard decisions from the different classifiers and is therefore too rigid since much information is lost throughout the classification chain. In turn, although integration at an earlier stage is bound to be more effective due to the richer information of the input data, fusion at feature level is rarely employed in practice as the relation between the feature spaces associated with different information sources is usually unknown. In addition, concatenation of different feature vectors leads to the curse of dimensionality problem [21]. Therefore, a postclassification scheme using the soft outputs of the different classifiers is preferred here.

The goal is thus to combine the output of the different classifiers and to generate a single scalar score. This will be used to make the final decision and also give information on the confidence in the decision. Our ensemble consists of three classifiers based on implicit features, that is, PCA, V-HOG (i.e., the best cost-effective variation of HOG), and log-Gabor, and two classifiers using explicit features, namely, gradient- and symmetry-based classifiers. In addition a classifier ensemble will be designed for each image region, according to the different performance of the above-mentioned classifiers region-wise. However, it must be taken into account that the nature of the output delivered by the classifiers is different. On the one hand, the gradient- and symmetry-based classifiers output likelihoods of the input samples are given the vehicle and the nonvehicle classes, as the distributions of the data have been modeled by known functions (bivariate Gaussian for gradient-based descriptor, Rayleigh, and -Student for symmetry). Since there is no prior information on the classes, a priori probabilities are equal and posterior probabilities of each class are just the normalized likelihoods. In contrast, the other three classifiers, based on PCA, HOG, and log-Gabor, are built upon support vector machines and therefore do not provide probabilistic outputs. Instead, a soft value is output that measures the distance to the decision surface, : if , the sample is classified as vehicle, if as nonvehicle. Hence, a normalization scheme is necessary that transforms these values to a common range indicating the support for the hypothesis that the input vector submitted for classification comes from vehicle class. In Section 3.1, the used normalization schemes are described. Once the classifier outputs are in the same domain, normalized scores are combined through a combination rule, as discussed in Section 3.2.

In addition, another key issue for the success of classifier combination is the diversity. Indeed, the classifiers in the ensemble should be as accurate as possible, while at the same time they should not make coincident errors. In other words, we expect that, if one classifier makes errors, there is another classifier that does not make errors in the same input samples (even if it does make errors in others). A number of measures have been proposed for diversity in the literature. Those are reviewed in Section 3.3 and applied to our classifier ensemble.

3.1. Normalization of Classifier Outputs

The objective of normalization is to have the output of the classifiers in the same range, so that fusion can be performed. As stated, the classifiers using explicit features, that is, those based on symmetry and gradient, deliver likelihoods of the samples to belong to each of the two classes, and , where indicates the vehicle class and indicates the nonvehicle class. Since prior probabilities of vehicle and nonvehicle classes are equal, , the posterior probabilities are given by

In particular, we will only retain the probability . On the other hand, a normalization rule is sought that transforms the soft output of SVM to the range , indicating the support of the vehicle class. Several normalization schemes have been proposed in the literature, such as min–max, -score, tanh, or double sigmoid normalization (see [21] for a complete survey). In particular, min–max normalization is extensively used [22]. Although this technique is very efficient, it also lacks robustness to outliers; therefore its variant robust min–max is sometimes preferred [23]. In this study, the most popular methods, that is, robust min–max and double sigmoid normalization, are adopted and compared. These rules are described below, where the normalized output is denoted by .(i)Min–max: where and denote, respectively, the minimum and maximum values of the SVM classifier for the dataset. This normalization transforms the values to the range while maintaining their original distribution. As and are extracted from the dataset, this method is highly sensitive to outliers and robust min–max is preferred.(ii)Robust min–max: it is similar to min–max, only the and are selected as the 5 and the 95 percentile of the soft output distribution. As a result, tails are disregarded and the pernicious effect of outliers is avoided. The output distributions for the vehicle (genuine) and the nonvehicle (impostor) classes for the front close/middle region are shown in Figure 5. Soft outputs above zero should be mapped in the interval and soft values below zero in the interval , indicating negative and positive vehicle support, respectively. Therefore, the normalization rule is where is the 5 percentile of the genuine class and is the 95 percentile of the impostor class. The parameters () for PCA, V-HOG, and log-Gabor are, respectively, , , and for the front close/middle region. The values for the remaining regions are derived in the same manner as explained above and are as follows: for the left region, , and ; for the right region, ; and for the far region, .(iii)Double sigmoid: This normalization rule is determined by the values and , among which the function is linear. In order to set these values, similar to min–max normalization, the 5 percentile of the vehicle class, , and the 95 percentile of the nonvehicle class, , are retained, and then , and . This way, , for and the support decreases linearly to for . The tails, in contrast, decrease nonlinearly.

3.2. Combination Rules

Let denote an input sample and let , , be the set of classifiers. As a result of normalization, each classifier delivers a value in the interval ; that is, . This value is the support that classifier gives to the hypothesis that corresponds to the vehicle class, denoted by . The overall degree of support, , is a combination of the individual supports given by the classifiers. Among the several combiners proposed in the literature (see [24], chapter 5, for an exhaustive survey), we adopt the most popular ones, that is, simple average and weighted average.(i)Simple average: (ii)Weighted average: The weights are typically selected to minimize the variance of , with the restriction that . As shown in [24], one way to find the weights is to assume that the approximation errors, , are normally distributed with zero mean (in this case, the target value is for vehicles and for nonvehicles). Under this assumption, the weights minimizing the variance of are given by [24] where , is the covariance matrix of the classifiers approximation errors, and is a column vector of ones.

3.3. Diversity

The success of the ensemble depends to a large extent on the fact that the classifiers complement each other, that is, in the diversity of the classifier outputs. A number of diversity measures have been specifically proposed in the literature for binary output classifiers that classify samples as correct or incorrect (also called oracle output classifiers), both considering the members of the ensemble pairwise and all the classifier ensembles together. The former is adopted here as done in common practice. Popular pairwise diversity measures include the disagreement measure and the Double-Fault measure [24]. Also, more general statistical measures of relationship such as the correlation coefficient and the -statistic are sometimes used as indicators of the diversity of the ensemble. They all are based in a table of the joint outputs of classifiers and , as shown in Table 6. The entries in the table are the probabilities of the respective pair of correct/incorrect outputs. In this work, the Double-Fault measure is used as the main diversity measure as we believe that it is more important to detect the classifiers that commit simultaneous errors than those that are simultaneously correct, especially when the individual classifiers already deliver fairly high correct classification rates, as is the case. In addition, the correlation coefficient and the values of the weights minimizing the variance in the weighted average rule are used as complementary indicators of the diversity whenever the Double-Fault measure is not sufficiently informative. The Double-Fault and correlation measures are defined below.(i)Double-Fault measure: it gives the probability of both classifiers being wrong. According to Table 6, the measure is given by (ii)Correlation coefficient: the correlation between two binary classifier outputs is

The strategy for the construction of the ensemble is the following. First, the classifiers are independently trained using the same training set and the performance of the ensemble is evaluated on the testing set according to the combination rules explained in Section 3.2. This is repeated 5-fold using 50% holdout cross-validation, and the average joint performance of the ensemble is derived. Then, a new smaller ensemble is proposed by selecting the two least diverse classifiers and removing the one featuring the worst performance. Then, the overall performance of the new ensemble is evaluated. Only if the overall performance of the ensemble is better (or at least similar), a new iteration is realized by proposing a smaller ensemble using the same strategy. This is repeated iteratively and the smallest ensemble is selected before the joint performance is noticeably degraded. The strategy is carried out independently for the different image regions, as the response of the classifiers varies.

4. Results and Performance of Classifier Ensemble

Different classifier ensembles are proposed according to the strategy described above for each image region. The performance of the classifiers is evaluated for the different combinations of normalization and fusion rules explained in Sections 3.1 and 3.2: robust min–max normalization with simple average combination (RMM-SA), robust min–max normalization with weighted average (RMM-WA), double sigmoid normalization with simple average (DS-SA), and double sigmoid normalization with weighted average (DS-WA). The results for the different regions are enclosed in Tables 714, including the performance of the proposed ensembles and the diversity measures. As performance is concerned, apart from accuracy, recall and precision rates are also provided for completeness.

4.1. Front Close/Middle Region

The ensemble involving all classifiers, based on PCA, V-HOG, log-Gabor (), symmetry (), and gradient () features, is almost flawless, as it achieves an overall performance of using the double sigmoid normalization and weighted average combination (see Table 7). The weights that decrease the variance of the approximation error are 0.28, 0.15, 0.39, 0.07, and 0.11 for PCA, V-HOG, , , and , respectively. The features yielding the largest Double-Fault measure are PCA and symmetry, that is, (see Table 8(a)). Hence, the first proposed reduced ensemble removes symmetry, as it is less accurate than PCA and also has the lowest weight of the ensemble. As shown in Table 7, the performance is then degraded in , which might not justify the computational saving (only one classifier is omitted). However, note that and also feature a almost as high and have the largest correlation (see Table 8(b)), so we may well try and also omit as the least accurate classifier among the pair. Observe that the accuracy remains the same as in the previous iteration, thus encouraging the use of the smaller ensemble PCA+V-HOG+. In addition, V-HOG and entail a high correlation coefficient and have the largest ; thus in the next iteration a reduced ensemble without V-HOG is tested. The accuracy decreases slightly, from to , as shown in Table 7. However, note that the ensemble composed of PCA and log-Gabor based classifiers involves a performance loss below , while the computational saving is significant (only 2 out of the 5 classifiers are used), which may well justify the use of the reduced ensemble.

4.2. Left Region

As in the front region, the best joint performance is attained by double sigmoid normalization within a weighted average combination scheme (see Table 9). The weights that decrease the variance of the approximation error in this scheme are 0.31, 0.11, 0.45, 0.09, and 0.04 for PCA, V-HOG, , , and . In this case, the greatest is obtained with the two explicit classifiers (symmetry- and gradient-based). Although the accuracy of is higher than that of , the weight of the former in the final decision is almost negligible, and it also displays a high correlation coefficient of 0.2478 with V-HOG (see Table 10(b)), so we proceed by leaving out gradient-based classifier. Naturally, the results of the reduced ensemble are only slightly worse than those of full ensemble. Among the remaining classifiers, the highest Double-Fault rates are observed for +PCA (13.2) and +V-HOG (13.6); thus one could try and remove symmetry in the next iteration. On the other hand, V-HOG has high correlation with and worse performance. Therefore, experiments leaving out either or V-HOG are performed. The ensemble PCA++ renders higher accuracy and involves a small loss of with respect to the previous iteration ensemble. A further thinning of the ensemble by leaving out symmetry is not worth, as the accuracy falls below . Hence, the ensemble composed of PCA++ is held.

4.3. Right Region

The full ensemble achieves a performance as high as using double sigmoid normalization and weighted average, as shown in Table 11. The weights of the classifiers are , , , and for PCA, V-HOG, , , and , respectively. Since the weight of V-HOG is almost negligible, this is removed in the first iteration. Naturally, the accuracy obtained with the ensemble PCA+++ is almost the same. According to Table 12(a), the maximum Double-Fault is committed by and . Although has worse accuracy, has high correlation with and lower weight in the ensemble. Experiments confirm that removing yields higher accuracy than leaving out ( versus ). In fact, the performance of the reduced ensemble PCA++ isequal to that of PCA+++. Further reduction of the ensemble has been tested (symmetry is dismissed as PCA and only commit Double Faults), but the performance decays in .

4.4. Far Region

The best performance for the full ensemble in the far region is achieved using a robust min–max normalization scheme within a weighted average combination framework. The results of the different reduction iterations in the ensemble are enclosed in Table 13. In this case, the least diverse classifiers are and , which have an associated Double Fault of (see Table 14(a)). In the first iteration , which has worse accuracy, is left out and a reduced ensemble comprising PCA, V-HOG, , and is proposed. The performance of the reduced ensemble is optimized with a weighted combination using double sigmoid normalization and only decreases by with respect to the full ensemble. Among the remaining classifiers, V-HOG and have the greatest and correlation rates. However, a reduction of the ensemble by disregarding either or V-HOG results in a severe loss of accuracy, as shown in Table 13; the joint performance plummets, respectively, to and . In turn, removal of symmetry, which has the worst individual performance, results in an accuracy of , which is also not affordable. In summary, an ensemble comprising at least PCA+V-HOG++ is required to surpass accuracy.

5. Discussion

Fusion of classifiers has proven to greatly improve the performance of the individual classifiers. In fact, the descriptors and classifiers presented throughout this chapter have been designed to exploit information of different nature regarding the appearance of the vehicle. As a result, the combination of all the classifiers performs better than any subset of combinations for all image regions, which ratifies the diversity of the sources. Specifically, explicit features have been proven to provide valuable information even if their independent performance is limited. Notwithstanding, some features have been proven to be more diverse than others. Accordingly, reduced ensembles of classifiers have been proposed for each image region, discarding the classifiers that produce little or residual gain. In particular, PCA and log-Gabor features are retained in all the reduced ensembles due to their high diversity. Remarkably, although V-HOG entails better individual performance than PCA, it has higher correlation with than the latter; therefore its contribution to the ensemble is smaller, and it is only selected in the reduced ensemble pertaining to the far region. In contrast, symmetry, which is by far the weakest classifier, has proven to provide diverse information and is thus included in the reduced ensemble of three of the four regions. Besides, weighted average combination, especially combined with double-sigmoid normalization, has been shown to be significantly more effective than simple average, as the contributions of the classifiers can be adjusted according to their accuracy and diversity.

Table 15 compares the performance (in terms of accuracy) of the proposed classifier combinations with that of the individual classifiers. Both the full classifier ensembles and the reduced ensembles are referred. We observe that the reduced ensemble boosts the performance of the separate classifiers, even in the front close/middle region, where only two classifiers are utilized. Fusion is especially beneficial in the right and the far regions, which constitute the most challenging scenarios. Indeed, in the right region, the performance of all the individual classifiers is worse than that in the left region owing to the more heterogeneous nature of the traffic participants (slow vehicles, such as buses and trucks). Nevertheless, the combined accuracy is similar to that of the left region. Remarkably, in the far region the performance of the best individual classifiers is boosted in almost , thanks to the classifier fusion, and in fact the achieved accuracy is almost as high as in the other regions.

6. Conclusions

In this paper the possibilities for the combination of different sources to achieve vehicle classification through image analysis have been studied. The first part of the study is devoted to the analysis of the individual performance of popular techniques for vehicle verification and the comparison among them. Classifiers based on Gabor and HOG features are shown to achieve the best results and to outperform PCA and other classifiers based on features as symmetry and gradient. However, the outcome of the study discloses that, although these features do achieve high accuracy rates, their performance is limited under some scenarios. Interestingly enough, the performance of Gabor-based classifier fades in the far range, whereas that of HOG-based classifier falls short in the close/middle range, which already points to the necessity of feature combination. In the second part of the study, a methodology for the fusion of classifiers built upon the different features is presented. The experiments reveal that classifier fusion result in a substantial gain of performance, especially in the more challenging scenarios such as the far rage, where it yields a gain of nearly 3% with respect to the best single-feature based classifier.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


This work was partially supported by the Ministerio de Economía y Competitividad of the Spanish Government under Project TEC2010-20412 (Enhanced 3D TV).