Abstract

Vehicle logo detection from images captured by surveillance cameras is an important step towards the vehicle recognition that is required for many applications in intelligent transportation systems and automatic surveillance. The task is challenging considering the small target of logos and the wide range of variability in shape, color, and illumination. A fast and reliable vehicle logo detection approach is proposed following visual attention mechanism from the human vision. Two prelogo detection steps, that is, vehicle region detection and a small RoI segmentation, rapidly focalize a small logo target. An enhanced Adaboost algorithm, together with two types of features of Haar and HOG, is proposed to detect vehicles. An RoI that covers logos is segmented based on our prior knowledge about the logos’ position relative to license plates, which can be accurately localized from frontal vehicle images. A two-stage cascade classier proceeds with the segmented RoI, using a hybrid of Gentle Adaboost and Support Vector Machine (SVM), resulting in precise logo positioning. Extensive experiments were conducted to verify the efficiency of the proposed scheme.

1. Introduction

With the development of traffic infrastructures and the automobile industry, intelligent transportation system (ITS) is becoming ever more significant. As an important part of ITS, automatic vehicle recognition (AVR) has been attracting a significant amount of attention in the recent years due to its commercial value and practical significance in road traffic monitoring and management. AVR is also pertinent to many intelligent surveillance applications [16]. For example, police often need to perform search for a specific vehicle from recorded traffic image database captured at toll stations or check points to assist in tracking suspects. In many situations, the conventional vehicle registration plate recognition techniques are insufficient to identify the vehicle. This is particularly relevant when license plate information is unreliable, for example, when the license plate is forged.

To facilitate a reliable vehicle identification process, it is necessary to introduce additional criteria as complements to the number plate recognition systems. Vehicle logo is the most visually distinguishable mark for automobile brands, and has been considered as one of the important components in an AVR system. A vehicle logo can provide the identity of the vehicle manufacturer, which can be used for a number of different purposes involving vehicle verification, identification, and recognition. With logos accurately detected, the manufacturer recognition will become straightforward. Therefore, automatic logo detection is an important enabler for applications in ITS and surveillance.

However, due to the ever-changing road environments, vehicle images are usually captured with great variabilities, caused by different backgrounds and lighting conditions. Motion and varying shooting angles of surveillance camera will cause distortions to the logo shapes. And the tiny portion of logo area in a captured image adds extra difficulty to the task. A reliable logo detection algorithm must not only meet the challenge of variations in a logo’s visual appearance but also distinguish logos from other small visual patterns that may appear on the vehicles. Although the classification of vehicle brand by using logo information has been a subject of interest for several years, the bottleneck of logo detection has remained largely unsolved. In a few of the published works, license plate position was used as the only clue to locate the logo area [7]. With rear-view vehicle images, Dlagnekov and Belongie exploited SIFT features [8] to find logos and subsequently identify vehicle manufacturer and model [9]. The approach, however, features of extensive computational costs which makes it impractical in real applications. Psyllos et al. further improved the SIFT-based matching scheme, yielding much reduced computational costs and better recognition performance [10]. However, the detection of points of interests (or keypoints), in SIFT operations is usually not robust to the image variations in light, size, or color.

In the computer vision community, a de facto object detection technology that has been generally accepted for applications is by using a sliding window across the image and exploiting a binary classifier at each window to decide the presence of the desired target object. The most famous example of the technology is the Viola-Jones detector [11] which is characterized by the Adaboost classifier, together with Haar-like image features. Support Vector Machine (SVM) with Histograms of Oriented Gradients (HOG) feature [12] is another representative method along the same lines. Though many successful results have been reported, including the detection of faces, pedestrians or cars [1115], sliding window approaches are generally slow and computationally expensive. In addition, a detector’s robustness to different variations such as scale and pose changes is still an unresolved problem.

Taking the above problems into account, this paper aims at developing an integrated and efficient logo detection system from frontal vehicle images. By integration, several methodologies are incorporated into the system to complement the possible shortcoming of a single detection method. Specifically, there are three main contributions in the proposed scheme. Firstly, attention mechanism of human vision system is taken into account. When observing an image containing vehicles, people can quickly focus on certain objects like logos if they have an interest in them. Such attention mechanisms; that is, the process of selecting and gating visual information based on the saliency found in the image itself and prior knowledge about scenes, objects, and their interrelations [17], have been an intriguing research area in vision science for many years [1820]. Rather than delving into the attention models, our intuition is to simulate this automatic focusing process by first extracting vehicle areas from the original image for coarse representation, followed by RoI segmentation from the detected vehicle. Then, a logo can be quickly and accurately localized from the focused, small RoI area via the sliding window methodology.

Our second contribution for logo detection is the proposal of a knowledge-based approach in the localization of RoI that contains the logo. The layout of logos and other parts of the frontal region of a vehicle are similar in almost all different vehicles. Generally, the logo is directly above the plate number. Such a prior knowledge of the geometric positional relationship between logo and the license plate, together with local symmetry characteristics, should be exploited in searching and segmenting the RoI that cover the logo. From this consideration, the standard position of the car license plate area will be predetermined and then used as the base line for logo position. Such an incorporation of prior knowledge in the RoI segmentation accelerates the logo searching process to a great extent.

Proceeded with the estimation of logo RoI region, the last contribution of this paper is the proposal of a new two-stage cascaded classification scheme to accurately localize and detect logos, which comprises of Adaboost and SVM. Though Adaboost algorithm proves to be satisfactory for logo detection, it not only requires a long calculation time but also has relatively high false alarm rates. To further improve the performance of Adaboost detection, some hybrid systems that combine Adaboost and SVM have been published. For example, a combined classifier, called cascade-Adaboost-SVM classifier, was proposed in [21]. When Adaboost classifier of each layer could not achieve a preset performance, the Adaboost classifier will be substituted by SVM, and SVM training is based on the feature dimensions selected by Adaboost. Comparing with their hybrid system, our cascaded classification was designed in a different way and for a different purpose. The Gentle Adaboost and SVM were trained independently but applied serially. In other words, when no logo is detected from a given RoI, the SVM detector will be invoked. By exploiting the complementary advantage of Adaboost and SVM, the detection performance can be thoroughly enhanced.

The rest of the paper is organized as follows. Section 2 overviews the overall system proposed in the paper. Section 3 briefly introduces the vehicle image acquisition and vehicle detection. Section 4 provides details of RoI segmentation from the detected vehicle image and the two-stage approach for the exact logo detection. Section 5 expounds the experiments and reports on the detection results, followed by conclusion in Section 6.

2. Overview of the Proposed Logo Detection System

The overall logo detection processes can be illustrated in Figure 1. To overcome the speed bottleneck for most of the object detection algorithms, our logo detection follows the attention principle to rapidly find the special region in the vehicle image that contains the logo. As the first step, vehicle detection is implemented by an improved Adaboost algorithm. A multiresolution vehicle detection scheme is introduced as an improvement over the cascade boosted classifiers proposed recently by Negri et al. [16, 22]. Building on solutions from previous works from Negri et al., the implementation of a new decision strategy renders current detection method to be robust to environmental changes. To accurately locate the logo position, an RoI with a logo inside is then segmented based on the prior knowledge that vehicle logo is located in the vehicle frontal area over the plate number. Towards that purpose, license plate is located with proper edge extraction algorithm and morphological processing. Finally, a cascaded scheme of Gentle Adaboost and SVM is then subsequently proposed to detect the logo with high accuracy.

3. Image Collection and Vehicle Detection

The local police department of Dushu Lake Higher Education Town in Suzhou provided a large collection of vehicle images recorded with their traffic surveillance cameras over a one-week period. The images were captured using CCD cameras (SP-140N) installed at 10 different intersections, from 7:30 am to 21:50 pm, encompassing a wide range of weather and illumination conditions. The minimum distance from a camera to surveillance target area is about 15 feet. From the recorded images of more than 50,000, 2500 images of different vehicles of four categories were selected, including car, van, light truck, and bus. All of the images contained frontal views of a single vehicle captured from variable distances. The original images have color pixels. A sample of the selected images is shown in Figure 2.

There are a variety of methods to detect objects including the seminal work of Viola and Jones [11] and support vector machines (SVM) with Histograms of Oriented Gradients (HOG) [12]. To implement a vehicle detection system, we prepared training samples consisting of a total of 1225 positive images together with 521 negative images. Some of the examples are shown in Figures 3 and 4. A vehicle detector that combines the HOG and Haar-like features and Adaboost classifier was adopted, as explained in Figure 5. Further details are expounded in the following.

As elaborated in [6, 16], it is advantageous to combine Haar-like feature with HOG using a cascade of boosted classifiers. Haar-like feature uses scalar values to represent differences in average intensities between two rectangular regions. Figure 6 shows the set of Haar filters used in our algorithm. These filters consist of two or three rectangles. To compute the output of a filter on a certain region of image, the sum of all pixels values in the grey region is subtracted from the sum of all pixels values in the white one (and normalized by a coefficient in case of a filter with three rectangles). HOG descriptor uses a set of local histogram to describe objects. These histograms count occurrences of gradient orientation in a local part of the image. More specifically, feature extraction is implemented by dividing the image into small spatial regions (or cells). For each cell, a local 1-D histogram of gradient directions is accumulated over the pixels found in that cell. The practice from [16] for the setting of HOG parameter was adopted, that is, 9 bins, cells, and blocks. The training of the boosted classifier will be based on the paradigm proposed in [6].

For vehicle detection from static images, a single detector often brings in unsatisfactory results due to the complex nature of the problem. The rationale of a cascade of boosted classifiers proposed in [16] is that generative and discriminative classifiers are complementary in enhancing detection performance. In our application, two important performance control parameters in [16] are fixed; that is, the minimum acceptable correct detections rate and maximum acceptable false alarm rate . To further improve the vehicle detection performance, hypothesis images will be first scaled down. Specifically, the original high-resolution images will be first resized to , , and , respectively. Then, Haar and HOG features of each scaled image will be extracted and passed to the detector. The cascade Adaboost classifier will generate hypothesis images delineated by rectangular boxes, which indicate potential vehicle objects. These hypothesis images will undergo scaling up operation with proportions of 2, 4, and 8 times. The bounding boxes of these amplified hypothesis images will be mapped back onto the image with size , followed up by a thresholding operation, which disregards the small hypothesis images of size less than that of the threshold. There exists two different relationships for the final bounding boxes. The first is containment relation, which means one box may be located inside another. The second relation is intersection relation, which means different boxes cross each other. For the first relation, only the largest box will be kept. For the second relation, the positions of the boxes will be averaged and rounded. The two situations and their handling were furthered in Figures 7 and 8.

The proposed improvement over Negris vehicle detection method [16] as described above displayed satisfactory performance, especially in the case when multiple vehicles exist in one image. Figure 9 gives two examples of the multiple vehicle detection results. As a stark contrast, both of the original VJ methods [11] and Negris algorithm [16] failed to accomplish the task, which can be observed clearly from Figures 10 and 11.

4. RoI Segmentation and Logo Detection

4.1. Vehicle License Plate Positioning and RoI Segmentation

Proceeded with the detected vehicle, a region of interest (RoI) will be pursued to focus on the small logo target. From the practical implementation point of view, our prior knowledge about the layout of vehicle logo should be taken into account to facilitate the fast logo detection. For almost all of the vehicles, the overall geometric arrangement of logo, headlights, radiator, and number plate is similar. The logo is located roughly on the axis line directly above the license plate. From this account, the number plate will be first localized as the reference. Then, a RoI, metaphorically termed as “vehicle face,” is defined and cropped in the lower part of the frontal vehicle images, as illustrated in Figure 12. Windshield and two rear view mirrors are excluded from the vehicle face. Further details are explained as follows.

The license plate localization mainly consists of two steps. The first step is an appropriate edge detection to extract the important structural information. The license plate region is distinguished from other regions of the vehicles image mainly in intensity change, particularly along the vertical direction. By edge detection, the local maxima of gradient magnitude can be found. Among the popular edge detection operators, Sobel operator [23] was chosen based on its trade-off between calculation cost and performance. Figure 13(a) shows the result of using Sobel operator to filter a preprocessed image. As can be seen from Figure 13(a), the boundary line information was enhanced in the result.

To focus on some salient properties of plate images such as contrast and symmetry while restraining some noise such as small holes, morphological processing [23] is applied in step 2. Among some basic morphological operations, the erosion and closing functions were chosen. Erosion shrinks the objects by eroding the boundaries. Closing is dilation that allows objects to expand, followed by erosion. These operations can be modified by proper choice of the structuring element which determines how many objects will be dilated or eroded. Preferably rectangle, it is used as the neighborhood for the structuring element of size which is obtained by a trial and error method. After the morphological processing, the position of license plate is isolated, as demonstrated in Figure 13(b).

To zoom in the presegmented image or vehicle face in the delineation of the Region of Interest (RoI) that contains logos, prior knowledge can be exploited with license plate as the reference. In most situations, vehicle logos are on the top of license plate, present with certain texture features in its background. Suppose that the license plate is located with estimated width and height , the rough location of logo region can be determined by defining a mask with a set width and height height . Further explanation of the logo RoI segmentation can be shown by Figure 14.

4.2. Two-Stage Classifier for Accurate Logo Detection

After the segmentation of logo RoI from vehicle’s low frontal part, accurate logo detection is accomplished by logo feature extraction and a two-stage classifier, which is mainly composed of Gentle Adaboost and SVM. The flowchart and procedure of the approach are illustrated in Figures 15 and 16, respectively.

As a prerequisite for building a reliable and accurate logo detection system, sufficient representative training data is prepared, which is composed of two types of images, negative ones and positive ones. Negative samples correspond to nonlogo images which were randomly taken from vehicle images excluding logo. Positive samples correspond to logo objects. A total of 761 positive images together with 1177 negative images were collected. All of the samples were resized to pixels. Some examples of the training images are shown in Figure 17.

As is the case for most of visual object detection procedure, the proposed logo detection approach classifies images based on some simple features. We use HOG as the description of logo features which characterize the distribution of edge directions [12]. The choice is motivated by proved efficiency of HOG in various visual object detection applications [6, 12, 16]. HOG is often implemented by dividing the image into small connected regions, and for each region compiling a histogram of edge orientations for the pixels within the region. The combination of these histograms then captures local shape properties. HOG bears some important advantages over other descriptor methods; for example, it is invariant to small deformations and robust in terms of outliers and noise. Figure 18 shows an example for calculation of the HOG feature.

With the HOG features as the image descriptor, the typical sliding-window object detection methodology is adopted to identify and localize the logo in an segmented RoI. Such a window-based detector applies the HOG descriptor to make decisions about the window-level logo presence, followed by appropriate postprocessing to merge nearby decisions. Among many of the possible off-the-shelf classifiers, the preference should be the Adaboost algorithms [11, 24] which are often the fastest and most accurate approaches for object detection.

Adaboost algorithm stems from boosting, which aims at learning a strong classifier based on a set of weak classifiers that are just above chance accuracy, by reweighting the training samples. Presently, there are several variants of the Adaboost algorithms, including Discrete Adaboost, Real Adaboost, and Gentle Adaboost [25, 26]. The output of discrete Adaboost is constrained within the range of . In contrast, the real Adaboost produces a probabilistic value. While Adaboost defines a particular method of reweighting data points, Gentle Adaboost modifies it to put less weight on outlier data points. In our detection system, Gentle Adaboost is chosen because it is more robust and stable compared with real Adaboost. Generally, Gentle Adaboost [26] is better on noisy data and robust to outliers. In our implementation, the weak classifiers exploit one feature from the feature pool in combination with a simple binary thresholding decision.

A Gentle Adaboost classifier consists of many layers. The first layer usually could reach preset targets with less weak classifiers included. However, with the increasing number of layers, the remaining training samples will become less and similar. The linear combination of more weak classifiers in rear layers is not only time-consuming but also prone to result in overfitting. It is the case that some of the logos cannot be correctly detected, which will be further expounded in the next section. To further improve on the performance, a two-stage classification scheme is proposed by combining SVM and Gentle Adaboost.

SVM was developed from the theory of Structural Risk Minimization [27]. It is to transform low-dimensional feature space into high-dimensional feature space to find out the maximum division margin between classes. SVM is widely used as a classifier of remote sensing images. In a nonlinear binary classification problem, the decision function is where is the input to be classified into , and is a set of positive coefficients. Support vectors are a subset of the training vector extracted during the optimization process. is a user-chosen kernel function. For the radial basis kernel, it is defined as .

The Gentle Adaboost and SVM were trained simultaneously with the HOG features. In the testing stage, Gentle Adaboost will be applied first, and result will be generated if there exists detected logo. Otherwise, the SVM will be invoked. By exploiting the complementary advantage of Adaboost and SVM, the detection performance can be drastically enhanced.

5. Experiments

MATLAB package version 7 with the Image Processing Toolbox is used to implement the software in the current work. The database is composed of 274 vehicle images of size pixels. All of the images were first converted to grayscale images.

As the first step, the improved vehicle detection algorithm with HOG features described in Section 3 is experimented, showing reliable and robust results as illustrated in Figure 19. Within a detected vehicle image, the RoI will be segmented based on the license plate position. The GML Adaboost toolbox [26] and SVM toolbox [28] will be exploited to implement the two-stage cascaded logo detector. The initial size of the detection window is pixels. After each sliding over the whole image, the detector window is scaled at times. When size of the detection window is pixels, the detection algorithm will terminate itself.

To validate the performance of the cascade detector, comparisons on the performance of the three classifiers, that is, Gentle Adaboost, SVM, and cascade classifier, were conducted, using 274 vehicle images. There exists different situations. The first situation, as shown by Figure 19, includes cases where Gentle Adaboost alone is unsuccessful in detecting the logo but accomplishable by SVM. This explains the advantage of exploiting SVM as the second stage in the proposed cascaded classifier system. However, the performance of a single SVM is unsatisfactory. As illustrated by Figure 20, a separate SVM itself may not generate the correct logo position while Gentle Adaboost is competent. Therefore, the cascading classifier obviously complements SVM and Gentle Adaboost by overcoming their individual limitations, resulting in a more accurate logo detection. With all of the 274 vehicle images taken into test, the two-stage cascade classifier offers detection rate of , which compares sharply with Gentle Adaboost alone () or SVM alone (). It is expected that the cascade classifier takes more processing time but the extra cost is marginal. On average, a Gentle Adaboost and a single SVM detector take 1.13 second and 1.21 second to detect a logo, respectively. As a comparison, the cascade two-stage classifier takes 1.42 second. This is acceptable for offline applications.

There are several important issues in the experiment of the proposed cascading logo detector. The first issue is the initial size of the sliding window over the segmented RoI. The appropriate choice of the size has influence over the detection rate. Without pursuing theoretical analysis which seems difficult if not impossible, a number of trials were taken to examine the differences. With the initial size varying from to , the corresponding detection rates were reported in Table 1, from which the moderate size is the best.

As discussed above, the logo detection proceeds with a scanning process using a sliding window size as the default size. To solve the image scaling problem that exists in general object detection, we propose to adapt the sliding window size. Specifically, upon the completion of a scanning process with the default window size, the sliding window will be resized with a ratio between and and then restart the scanning. By trial and error, it is found that the detection performance is not linearly dependent upon the resizing ratio. The experimental relationship between the detection rate and is given in Figure 21. Based on this set of experiments, a ratio of is applied.

Another factor that has effect on the detection rate is the number of pixels that a sliding window shifts. Being similar to the above exploration for the adaptation of sliding window size, different numbers of shifting pixels, or shifting steps, were compared, with the corresponding detection rates as demonstrated in Figure 22. It is obvious that a smaller forward displacement offers better detection rate with longer testing time as the cost. As a trade-off between testing time and detection rate, the number of shifting pixels is chosen as 8.

A collection of 274 testing vehicle images was prepared, with 11 different manufacturers’ logos such as Toyota, Volkswagen, and Chevrolet. Among the 274 images, 260 of them were with correct detection results. This counts for a correct detection rate. The distribution of the detection results for the eleven different types of logos was given in Table 2.

By further delving into the unsuccessful circumstances, several negative factors can be figured out. The first one is the lighting variations which may adversely affect a logo’s surrounding area. As being exemplified by Figure 23, lighting changes may make a logo more indistinguishable from its nearby pixels. Another difficult situation that often renders the proposed detection scheme is the special shapes from certain vehicle manufacturers. A typical example is the Audi logo, which occupies an elongated rectangular region. However, the sliding window is a square shape designed for most of the logos. Such a discrepancy will directly cause the Audi logo to be partially detected, as shown in Figure 23. How to overcome this limitation is still in the authors’ research agenda and progresses will be reported elsewhere.

6. Conclusion

With the increasing demand for security awareness and widespread use of surveillance cameras, there is an urgent need to develop vehicle identification and classification technologies to automatically identify the manufacturer of vehicles through recorded images. Automatic logo detection is, thus, significant as it enables effective identification of the brand of a vehicle. In this paper, we propose a new approach to logo localization and detection in outdoor surveillance images, which features coarse-to-fine approach drawn on the experience of attentional mechanism of human vision. At a coarse scale, an improved Adaboost algorithm with Haar-like and HOG features, performs vehicle detection. Then, a small RoI is segmented by integrating the prior knowledge about the location of the vehicle logos and their intrinsic local symmetry property. Finally, logo candidate regions are precisely classified using a two-stage cascaded classification strategy consisting of Gentle Adaboost and SVM, achieving precise logo localization and robust detection results. The efficiency was verified using real surveillance images.

Acknowledgment

The project is supported by Suzhou Municipal Science And Technology Foundation Grants SS201109 and SYG201140.