Abstract

Logo recognition is an important issue in document image, advertisement, and intelligent transportation. Although there are many approaches to study logos in these fields, logo recognition is an essential subprocess. Among the methods of logo recognition, the descriptor is very vital. The results of moments as powerful descriptors were not discussed before in terms of logo recognition. So it is unclear which moments are more appropriate to recognize which kind of logos. In this paper we find out the relations between logos with different transforms and moments, which moments are fit for logos with different transforms. The open datasets are employed from the University of Maryland. The comparisons based on moments are carried out from the aspects of logos with noise, and rotation, scaling, rotation and scaling.

1. Introduction

Logo recognition plays an important role in the pattern recognition. Logo has been widely used in intelligent transportation, advertisement, and document retrieval. So it attracts the attention of many researchers.

The research about logos can be classified into three categories from the perspective of applications: intelligent transportation, advertisement, and document retrieval. In the intelligent transportation, the logos can be used to identify the types of cars or to guide the cars to automatically drive. Studying the advertisements, producers can get the information which consumers are interested in and promote products to these customers. In the document retrieval field, the logos can be used to automatically classify the files so as to save much time and cost for people. Among these works, logo recognition as a kernel subprocess cannot be ignored. So, this paper mainly focuses on the logo recognition part, while logo location and detection are not discussed here. The logos, which are discussed here, are used to retrieve documents. Although many technologies have already been employed to recognize logos, moments [1], which are power descriptors, have not been used to compare their abilities in terms of recognizing logos.

The object of this paper is to compare the representation abilities of moments as descriptors which are used to extract features of logos. It can be found out that one moment is very appropriate to describe one kind of logos. These conclusions are also very useful for future researchers. The remainder of this paper is organized as follows. Related works are reviewed from intelligent transportation, advertisement, and document retrieval in Section 2. Some moments and their invariant moments are discussed in Section 3. Experimental setup is introduced in Section 4. The results of recognizing logos by moments are analyzed in Section 5. Finally, we conclude and outline plans for future research in Section 6.

Logo recognition is reviewed from intelligent transportation, advertisement, and document retrieval. First, Dai et al. [2] used the Tchebichef moment invariants and support vector machine to recognize the vehicle logos. However, the computation process has much computational load. Psyllos et al. [3] presented a new vehicle-logo recognition algorithm based on an enhanced scale-invariant feature transform- (SIFT-) based feature-matching scheme. This method can be employed in real-time framework. Llorca et al. [4] proposed a new vehicle-logo recognition approach using histograms of oriented gradients and support vector machines. All logos with low resolution were captured by traffic cameras. However, the sliding window technique would cost much more time. Yu et al. [5] proposed a new fast and reliable system based on bag-of-words for vehicle-logo recognition. The vehicle-logo images were represented as histograms of visual words and classified by support vector machine. This method processed the images with less time. Mao et al. [6] proposed a novel method to detect the vehicle logos in an image. This method first applied the horizontal and vertical direction filters to product two new images, and a saliency map was yielded from each image. Then a binary image was created from the saliency map. Finally, the vehicle logos were localized. This method had an advantage of real time.

Second, in the advertisement application, den Hollander and Hanjalic [7] employed template matching technology to detect and classify the logos in video stills. However, this method is very sensitive with logos foreground. Hesson and Androutsos [8] used a five-dimensional cooccurrence of colors and wavelet decomposition coefficients of pairs of pixels. This algorithm can exploit multiresolution analysis with wavelets. Phan and Androutsos [9] presented an algorithm which extended the color edge cooccurrence histogram (CECH) object detection scheme on compound color objects. However, this global descriptor was not suitable to deal with incomplete information or transformed versions of the original logos, and it was not proper to exactly describe the locality of logo traits. Qi et al. [10] proposed an effective solution for trademark image retrieval by combining shape description and feature matching. This method can be generalized to other applications. Sun and Chen [11] proposed a new recognition method on mobile phone cameras for logo images. By using the Zernike moment phase information, a new distinctive logo feature vector and an associated similarity measure were presented. Zhang et al. [12] introduced a novel prelocating algorithm for rapid logo detection in unconstrained color images. The spatial connected component descriptor (SCCD) represents the logos, while an effective-connected component describes the pixel distribution information. Roy and Garain [13] presented a probabilistic approach for logo detection and localization in natural scene images. One probability distribution described the features and another computed the shape geometry information defined by the key points. However, this algorithm could not handle logos which were mapped in two different affine planes. Chu and Lin [14] used the pair-specific concept to capture relevant features between a query logo and a test image, employed the mean-shift method to find candidate regions, and combined the visual word histograms and visual patterns to describe logo objects. New features can be designed in order to further improve the performance for some logos. Sahbi et al. [15] designed a novel variational framework which was able to match and recognize multiple instances of multiple reference logos in image archives. This paper used an energy function mixing to measure the quality of feature matching, captured feature cooccurrence/geometry, and controlled the smoothness of the matching solution. Liu and Zhang [16] proposed a multiple dictionary invariant sparse coding to automatically collect representative logo images from the internet without any human labeling or seed images. This method could be extended to more general images by adopting SIFT-like local features of HMAX-like features.

Finally, in the document image field, Doermann et al. [17] presented a multilevel staged approach to recognize logos. The global invariants were used to prune the database, while the local affine invariants were used to obtain a more refined match. However, this algorithm was only implemented in a small database. Doermann et al. [18] presented a novel application of algebraic and differential invariants to recognize logos. By using invariants, the shape descriptors could be used for matching the logos which were unique and independent of the point of view. The algebraic invariants could be used when the whole shapes of the logos were given, while the differential invariants could be used when the logos had only a part. Cesarini et al. [19] proposed a new approach for training autoassociator-based artificial neural networks (AANN), especially conceived for dealing with spot noise. The proposed algorithm, which was referred to as spot-backpropagation (S-BP), was significantly more robust with respect to spot noise than classical Euclidean norm-based backpropagation (BP). Chen et al. [20] proposed a method based on modified line segment Hausdorff distance. This method had the advantage in incorporating structural and spatial information to compute dissimilarity. So this method had a faster computation speed than the method using sets of points. Gori et al. [21] proposed a new approach to improve the performance of multilayer perceptron operating as autoassociators to classify graphical items in presence of spot noise on the images. The weights, which were replaced by Euclidean norm, depended on the gradients of the images in order to give less importance to uniform color regions. This method was time-consuming, and the authors only discussed the noisy spot. Wang and Chen [22] proposed a new method based on the boundary extension of feature rectangles. Wang [23] presented a simple and dynamic method to detect and recognize logos in document images. By applying feedback and selecting proper features, they were able to make the framework dynamic and interactive. The feedback mechanisms made the framework more accurate. Li et al. [24] introduced a fast, segmentation-free and layout-independent logo detection and recognition method. The recognition performance and running time were improved. Hassanzadeh and Pourghassem [25] proposed a novel approach based on spatial and structural features of logo images to solve these problems. The novel features based on horizontal and vertical histogram of the logo images and the KNN classifier were combined together to recognize logo images. Pham et al. [26] presented a new approach for detecting logos by exploiting contour based features. In the first stage, the outer contour strings (OCS) could be computed from each graphic and text part of the documents. In the second stage, two types of features were computed from each OCS. In the final step, correction helped with the wrong segmentation cases. Bagheri et al. [27] proposed a novel multiclass classification method which was applied to recognize logos; due to considering logo recognition as a multiclass classification problem, the proposed system mainly used the nearest neighbor classification algorithm and the powerful binary classifier.

3. Moments

Moments [28] as descriptors are widely used in pattern recognition and computer vision. In this paper, radial Tchebichef moment invariants, radial Tchebichef moments, Tchebichef, Krawtchouk, Krawtchouk moment invariants, Legendre [29], pseudo-Zernike [29], and Zernike [29] are all used since these orthogonal moments have promising characteristics. Affine moment invariants [30] are also used for recognizing logos. Different moments have different properties, so we want to find the appropriate moments to describe the logos with noise, scaling, rotation, and rotation and scaling. The following part introduces the moments and moment functions.

Zernike and pseudo-Zernike moments are defined in unit circle. The moment function of Zernike is defined in (1) and that of pseudo-Zernike is defined in (3), respectively: where is complex conjugates of Zernike function and is defined in where ; ; ; ; ; ; where ; ; denotes complex conjugate; is defined in (4) as

Legendre is similar to Zernike and pseudo-Zernike, which are all continue orthogonal moments. However, Legendre is defined in unit square and its moment function is defined in (5) as where , where .

Tchebichef [31] and Krawtchouk [32] moments are discrete orthogonal moments. They have minimal information redundancy and better reconstruction abilities. The main characteristic of them is that they are defined in the discrete domain which is different from continuous orthogonal moments. This characteristic makes them free from the dilemma of conversion of coordinates. The moment function of Tchebichef is defined in (7) as where ; is the width or height of the image; denotes an image intensity function: where . Consider

Krawtchouk moment can get local information by tuning the two parameters. This is different from the Tchebichef moment. Moment function of Krawtchouk is defined as where denotes the intensity function. Consider ; ; ; is the Pochhammer symbol, as defined in

Affine moment invariants [30] are features for pattern recognition computed from moments of objects on images that do not change their value in affine transformation. The theory of these invariants is connected to the theory of algebraic invariants. These invariants are automatically generated based on the graph theory. The moment function can be constructed in where the cross-product is the oriented double area of the triangle, whose vertices are , , and , and are nonnegative integers.

4. Experimental Setup

To conduct a comprehensive comparison, the open dataset (University of Maryland, Laboratory for Language and Media Processing (LAMP), LogoDataset, http://lampsrv02.umiacs.umd.edu/projdb/project.php?id=47) from the University of Maryland is employed. This dataset contains 106 logos; in this experiment 50 original logos are shown in Figure 1.

To evaluate the representation abilities of different moments, four groups of logos are used. The first group is logos with different noise which includes Gaussian white noise of mean zero and variance 0.1; that of mean zero and variance 0.2; Poisson noise; salt and pepper noise of densities 0.02, 0.05, 0.08, 0.1, 0.15, and 0.2; speckle noise with mean 0 and variance 0.05; and that with mean 0 and variance 0.1. The second group is logos with scaling, and the scaling factors are 0.25, 0.5, 0.75, 1.2, and 2.0. The third group is logos with rotation, and the rotation angles are 30, 60, 90, 120, and 150 degrees. The final group is logos with rotation and scaling, which includes logos with rotation 30 degrees and scaling factor being 0.5, that with rotation 60 degrees and scaling factor being 0.75, and that with rotation 90 degrees and scaling factor being 1.2.

The K-nearest neighbors are employed as a tool to recognize the logos. The total number of logos is denoted by TN and the number of logos correctly recognized is denoted by CR. The recognition rate is denoted by (CR/TN).

5. Experiments

In pattern recognition and computer vision fields, performance evaluation is more and more important. Mikolajczyk and Schmid [33] compared shape context, steerable filters, PCA-SIFT, differential invariant, spin images, complex filters, moment invariants, and cross correlation for different types of interest regions. Concerning the performances of different moments, Tsougenis et al. [34] had made comparisons in terms of watermarking methods. To our best knowledge, the performances about logo recognition based on moments have not been discussed before. In this section, the comprehensive comparisons are carried out. The experiments are implemented in turn according to the four groups of logos. These moments are discretely and continuously orthogonal and affine moment, which include radial Tchebichef moment invariants (RTMIs), radial Tchebichef moments (RTMs), Tchebichef moment (TM), Krawtchouk moment (KM), Krawtchouk moment invariants (KMI), Legendre moment (LM), pseudo-Zernike moment (PZM), Zernike moment (ZM), and affine moment invariants (AMI). Some of them are standard moments, and the others are moment invariants. First, the logos with different noise are discussed.

5.1. Logos with Different Noise

In this subsection, the ideal logo and the logos with different noise are shown in Figure 2.

From the results of Table 1, we can conclude that KM and KMI get the highest recognition rate, 0.9709. Both moments are very robust to the salt and pepper noise and poisson noise. TM also has the same properties with KM and KMI; however, when the salt and pepper noise has a ratio of 20%, its recognition rate is just 0.9. When the Gaussian white noise is mean zero and variance 0.1, TM is not sensitive at all. However, all the other moments are a little sensitive to this kind of noise. When the Gaussian white noise is mean zero and variance 0.2, RTMIs have the highest rate and RTMs have the second highest rate. The standard and invariant moments have some common characteristics when they are used to recognize some logos with noise, just like the KM and KMI. RTMIs and RTMs are also very robust to both speckle noise. TM, KM, and KMI also have some advantage in recognizing speckle noise. AMI have the lowest recognition rate. The main reason is that it is very sensitive to all noise, so that all recognized rates are very low. RTMs have the second lowest average recognition rate. When the salt and pepper noise has a ratio of 20%, the recognition rate of RTMs is only 0.04. When the salt and pepper noise has a great ratio, ZM has difficulty in recognizing the logos. In a word, if the logos have mixed these noises, the most appropriate moment is KMI or KM for completing the recognition task.

5.2. Logos with Scaling

In this subsection, the logos with scaling are discussed. The scale factors of logo are 0.25, 0.5, 0.75, 1.2, and 2.0. The ideal logo and the scaling logos are shown in Figure 3.

In Table 2, TM and PZM all have the highest rate, 1.0. No matter the scale factor is larger than 1, or smaller than 1, both moments are very robust. In the recognition context, both moments can be used to recognize these logos. When the factor is moderate, LM and AMI can replace the TM and PZM. In most cases, the recognition rates of other moments almost have the same law. That is to say, when the factor is too small, or too large, the rate is very bad. When the factors approach one from right or left, the recognition rate is a little higher. Of course, there are two special cases, ZM and RTMIs. The recognition rates of these moments have an ascendant trend as the factors increase.

5.3. Logos with Rotation

In Figure 4, the logos are rotated. The recognition rates are shown in Table 3.

In Table 3, the average recognition rate of RTMs is the highest. When the rotation angle is 90 degrees, the recognition rates of RTMs and AMI are 100%. The average recognition rate of KMI is the second highest. However, it can recognize a part of logos when the rotation angle is 90 degrees. RTMIs and RTMs have similar properties in most circumstances; however, when recognizing the logos with 90 degrees rotation they are definitely different. The recognition rates of other moments are very poor, especially ZM which gets the lowest rate, only 0.212. So, if the logos have rotation transformations, RTMs are the best choice to recognize these logos.

5.4. Logos with Rotation and Scaling

The sample logos of rotation and scaling are presented in Figure 5. The recognition rates of different moments are shown in Table 4.

In Table 4, the average recognition rates of all moments are very unsatisfactory. The highest rate of RTMs is only 0.42. So the logos with rotation and scaling are very difficult to be recognized. In the first two groups, RTMIs’ rates have a superiority. However, RTMIs has some problems in recognizing the logos with 90 degrees rotation; therefore, its rate is undesirable. LM also gets the lowest recognition rate on the third subgroup logos. PZM, ZM, and AMI have some advantages in recognizing the logos with 90 degrees rotation and 1.2 scaling factors. ZM’s rate is higher than AMI’s, while PZM’s rate is the lowest among these three moments. However, in the first two groups ZM cannot recognize any logo. In short, when one wants to recognize the logos with rotation and scaling, RTMs are the suboptimal choice. The appropriate descriptor should be found out in the future.

From Table 5, it is concluded that TM gets the highest rate. If one dataset contains these four groups of logos, TM can be used as a proper descriptor. KM and KMI have some advantage in recognizing the logos with different noise, and RTMIs and TM are the second best choice. TM and PZM are good at recognizing the logos with scaling, and LM and AMI can be used as a suboptimal descriptor. RTMs have a superiority in identifying the logos with rotation; RTMIs are the suboptimum tool. When recognizing the logos with rotation and scaling, among these moments RTMs are the first choice.

6. Conclusion

In this paper, we compare the performances of moments in terms of logo recognition. The original dataset is adopted from the University of Maryland, Laboratory for Language and Media Processing. The noise group, rotation group, scaling group, and rotation and scaling group are created by MATLAB R2011a. As we all know, one descriptor cannot be appropriate to recognize all objects. Nevertheless, one descriptor is good at recognizing one kind of objects. According to this idea, it is found that KM, KMI, TM, and RTMIs are fit to identify the logos with gaussian, salt and pepper, poisson, and speckle noise; TM, PZM, LM, and AMI are adapted to recognize the logos with scaling; RTMs are suitable to discriminate the logos with rotation, and they also can be used to distinguish the logos with rotation and scaling. If one dataset includes all these logos, TM can be used to complete recognition task. To our best knowledge, there is no paper about performance evaluation of logo recognition, even about comparing of moment-based logo recognition methods. So this paper can provide the constructive suggestions for the following researchers.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research was supported in part by Public Service Platform of Mobile Internet Application Security Industry under Grants Shenzhen Development and Reform Commission no. 2012720, Research on Key Technology in Developing Mobile Internet Intelligent Terminal Application Middleware under Grant no. JC201104210032A, and Research on Key Technology of Vision Based Intelligent Interaction under Grant no. JC201005260112A.