Abstract

Traffic sign detection and recognition systems include a variety of applications like autonomous driving, road sign inventory, and driver support systems. Machine learning algorithms provide useful tools for traffic sign identification tasks. However, classification algorithms depend on the preprocessing stage to obtain high accuracy rates. This paper proposes a road sign characterization method based on oriented gradient maps and the Karhunen-Loeve transform in order to improve classification performance. Dimensionality reduction may be important for portable applications on resource constrained devices like FPGAs; therefore, our approach focuses on achieving a good classification accuracy by using a reduced amount of attributes compared to some state-of-the-art methods. The proposed method was tested using German Traffic Sign Recognition Benchmark, reaching a dimensionality reduction of 99.3% and a classification accuracy of 95.9% with a Multi-Layer Perceptron.

1. Introduction

Road signs provide visual information to drivers by warning them from possible danger on the road, setting rules for pedestrian protection, informing people about their environment, and so forth.

Traffic sign recognition (TSR) is a challenging task due to illumination changes, scene complexity, and weather conditions. Moreover, high motion velocity may cause blurred images when taken from a moving car. In addition, road signs may be disoriented, damaged, or partially occluded. All these factors (shown in Figure 1) may significatively reduce the TSR systems performance.

In recent years, many approaches for road sign detection and recognition have been developed [1]. There is yet a variety of issues to be solved: high rate recognition accuracy, false positive avoidance, color invariance, partial occlusion, and real-time processing, among some common examples.

In this work, a method for road signs identification is proposed. This approach is based on oriented gradient maps and the Karhunen-Loeve transform as preprocessing steps. Oriented gradient maps provide a color and lighting invariant, strong source of information about traffic signs general appearance, while the Karhunen-Loeve transform makes dimensionality reduction possible without significant loss of information; furthermore, it allows a good traffic sign characterization using a minimal number of attributes, which is an important factor for applications on resource constrained devices like FPGAs. Artificial neural networks perform the classification stage, due to their noise robustness and generalization abilities.

The rest of the document is organized as follows. In the next section, relevant work in the field of traffic sign recognition is briefly mentioned. The techniques related to the proposed approach are described in Section 3. In Section 4, the results are presented and discussed. Finally, in Section 5, conclusions are stated.

2. Relevant Work

Traffic signs are pictograms that provide warnings, rules, and general information to the driver. They are designed with specific colors and geometric shapes in order to be easily detected and recognized by humans. Moreover, intelligent systems take advantage of these particular features, in order to discriminate road signs from the background information.

The field of TSR has been mainly divided into detection and recognition stages. Detection refers to an automatic localization process while recognition intends to identify the pictograms. Even though these stages can be used independently, they are usually treated as interrelated and complementary tasks.

In traffic sign localization, RGB color is often exploited to discriminate regions of interest (RoI) [57]. However, since real traffic sign images are seriously affected by illumination changes, among other factors, many approaches have been proposed to use different color spaces, such as HSI [8, 9], HSV [10], and CIECAM [11].

After using RoI discrimination, many authors use shape information to correctly detect the signs. Dimension of RoIs, width/height, and edge/area ratios are used in [7] to discriminate incorrect candidates and categorize shapes. The authors of [6, 9] used size rejection and aspect ratio as geometric filtering. Blob signature is proposed in [8]. Other approaches make use of morphological filters to reduce the search space [5].

Once the detection process is complete, the next step is the road sign categorization. Two important elements in this stage will be considered: the features that describe the image and the classification tool. The discriminatory attributes in an image are decisive, because they make the classifier task more reliable, and therefore the recognition rate will be improved. In this field, a variety of features have come to attention, K-d trees, Random Forests and Support Vector Machines (SVM) classification of histograms of oriented gradients (HOG) descriptors were analyzed in [5], pixels of interest as features for SVM were used in [9]. Foveal Systems for Traffic Signs (FOSTS) models imitate mechanisms of biological visual systems, and in [11], classification of traffic signs is performed by the means of these features. Radial Basis Function Neural Classifier was used in [12]. A deeper state of the art in TSR systems is described in [1, 2].

Some relevant work based on the use of gradient orientation as image descriptor has been developed. In [13], gesture recognition is done by exploiting Motion Gradient Orientation (MGO) images to form motion features and Principal Component Analysis (PCA) for dimensionality reduction, and Sparse Bayesian classifiers perform the identification task. Liao and Chung in [14] proposed a novel Local Binary Pattern (LBP) named Local Gradient Orientation Binary Pattern (LGOBP) which captures the neighborhood gradient orientation information for face recognition applications.

3. The Proposed Approach

3.1. Oriented Gradient Maps

Gradients have been widely used for shape recognition [5, 15, 16]. Since gradients are vectors, one of its representations is composition of magnitude and direction. In this study, gradient orientation is the main attribute; therefore, orientation takes only the active range .

The basic idea behind choosing oriented gradients for image representation consists of finding a local feature for appearance and shape which can correctly characterize the object, in spite of the lighting and weather conditions of the original image.

The computation of oriented gradient maps was done according to Dalal and Triggs [15] and Hinterstoisser et al. [16] methodology. Since RGB and CIE-Lab color spaces are reported to produce similar results [15], RGB color space is used for this research. Similarly, gradients calculation is achieved by the simplest convolution masks, as can be seen in where . Gradients norm and orientation can be calculated as shown in

For color images, gradients are calculated for each color channel, and then the one with the largest Manhattan norm on the pixel’s gradient vector remains, forming a single gradient orientation map as follows:

Figure 2 shows orientation maps for different images, it can be seen the features robustness to varying color and illumination. Gradient orientation maps are the system’s main source of information, but since the number of attributes is still large, a dimensionality reduction scheme will be performed.

When the gradient of an image is computed, two equally important components can be calculated: magnitude and orientation. Gradient magnitude is affected by illumination conditions, and therefore contrast normalization is usually needed. On the other hand, gradient orientation is illumination invariant.

In [15], Dalal and Triggs used gradient magnitude for two purposes: (1) for color images, gradient orientation is computed for each color channel and the gradient orientation with the largest norm is taken, and (2) for histogram calculation, a weighted vote is performed on the pixels orientation by using gradient magnitude as weighting factor. Our approach considers the gradient magnitude only to compute gradient orientation for color images. Gradient orientation is the main source of information and it is not further affected by the gradient magnitude. The Karhunen-Loeve transform is a tool that can be used to decide which features carry important information. As a consequence, our approach can be considered illumination invariant; furthermore, the number of features per image is lower than HOG too. Additionally, although both methods use gradient orientation, the contrast normalization, cell voting and histograms computing steps are not required; therefore, computation time is considerably lower than HOG. Our method can be improved by both the use of a hierarchical design and a classifier that considers frequency of observations per class.

3.2. Karhunen-Loeve Transform for Image Classification

The Karhunen-Loeve transform (also known as Hotelling transform, Eigenvector transform, or Principal Component Analysis) analyses a data set representing observations composed of several correlated variables. Its goal is to extract the relevant information and express it by a set of orthogonal new variables, the principal components. The new representation is supposed to be in a more tractable, lower-dimensional form, without losing too much information.

Turk and Pentland [17] proposed a popular method for face recognition based on PCA, the so-called Eigenfaces. The present study makes use of their methodology in order to identify road signs.

Let    be an -dimensional vector, corresponding to an image with   pixels. Since we have different images, the data train set can be represented as an input matrix , size . In order to center the data, the mean vector will be computed as shown in

So when the mean vector is subtracted from the rest of the data, the resulting matrix   will have zero mean in every component feature. Now, having the new centered data  , the covariance matrix can be defined as where the resulting matrix will have elements. The rest of the algorithm is only feasible when the original images are small, and is less than .

The next step consists of the calculation of the eigenvalues and the eigenvectors  of the covariance matrix. Subsequently, since the purpose of this study is data reduction, the most significant vectors, which correspond to the greatest eigenvalues, will be chosen. Originally, the eigenvectors were represented by an matrix, but the most significant vectors will remain, so will be . The projection of any   vector onto the new feature space is now given by where the new vector   is now -dimensional.

4. Results

All the experiments were performed on the German Traffic Sign Recognition Benchmark (GTSRB) [2]. The database contains more than 39,000 images for training and over 12,600 images for evaluation, with 43 different classes of traffic signs. Image sizes may vary between and pixels. The GTSRB provides images with varying illumination, weather conditions, shadows, partial occlusion, graffiti, blurring, perspective, and rotated images. The database also offers precomputed attributes per image: three different histogram of oriented gradient features (HOG 1, HOG 2, and HOG 3) and Haar-like features.

A Multi-Layer Perceptron (MLP) performed the categorization stage. MLP is a type of feedforward artificial neural network that can be used to develop classification tasks. It is robust to noise and presents satisfactory nonlinear discriminative properties.

The proposed methodology considers that all images have the same size. Since the GTSRB provides images with varying sizes, all of them were scaled to a fixed size using bilinear interpolation. Furthermore, this operation helps correcting some geometric transformations like perspective in images [10] (Figure 4). Although some affine transformations cannot be corrected without the use of perspective additional attributes, the classifier may achieve correct classification by considering images with these transformations in the training set.

In order to test the proposed technique, several analyses were performed. Firstly, the system was tested for different image sizes and different number of neurons in the hidden layer. Figure 3 shows that the best classification rates were reached for two different sizes, and pixels, and for 140 and 150 neurons in the MLP hidden layer, respectively. For data reduction purposes, the minimal size represents the best choice.

The aim of dimensionality reduction leads to the analysis of the minimum number of eigenvectors required to obtain the highest classification rate. The number of eigenvectors is decisive, because it provides the best representation of the data with the minimal amount of information and the highest discriminatory efficiency. It can be observed in Figure 5 that the best performance was obtained by both 120 and 140 eigenvectors. Data reduction is therefore in the range .

After the calculation of optimal size and optimal number of eigenvectors, the debug process for the number of hidden neurons in the MLP was performed, with the main purpose of increasing the classification accuracy. The results of this step are depicted in Figure 6.

Finally, the highest performance is obtained by 240 neurons, image size , and 140 eigenvectors, achieving a 95.9% of identification accuracy. A comparison test was performed for a system with the same image size, number of hidden neurons, and eigenvectors, but with raw grayscale images (no oriented gradient maps were computed), and the maximum identification accuracy was 93.39%, demonstrating the effectiveness of the proposed approach.

Once the neural network was trained, for evaluation purposes, the average image processing time is about 0.0054 s. This was tested using MATLAB v.2011 and an iMac with Intel Core i5 processor, 4 GB in RAM.

In 2012, a competition for TSR was launched using the GTSRB [2]. The authors report three baseline results: Linear Discriminant Analysis (LDA) on HOG features, -nearest neighbors (-NN) on HOG features, and human performance. In the competition, many teams participated, but only the top-ranking methods were reported. Table 1 shows a brief description and accuracies for those teams, the baseline results, and our approach. Convolutional neural networks (CNN) are used by two top-ranking methods. It is important to highlight that our approach uses a number of attributes at least 10 times lower than the compared methods, and the accuracy of the proposed method is comparable to that of those that use HOG attributes.

Table 1 shows four different methodologies that use HOG as a source of feature information. It can be seen that their accuracy is similar to that of the proposed method, but our method uses less amount of attributes and is working in real time (0.0054 s per image).

5. Conclusions

When attempting to recognize objects, data representation is decisive. Patterns can or cannot be well discriminated depending on the choice of the features that will characterize those objects.

In the present work, road sign images were identified. Because of all the difficulties previously mentioned, it was important to generate discriminative and robust data. Gradient orientations provide robust representation for extreme lighting condition images. Additionally, images carry a huge amount of information that can be hard to process, and as a consequence, dimensionality reduction plays an important role in this approach.

Experimental results demonstrate that the proposed method is robust and accurate compared to raw image process. When compared to some state-of-the-art methods, the number of attributes considered is at least ten times lower, which is a desirable condition for portable applications. Classification accuracy is similar to that of the methods that use HOG attributes. However, recognition performance can be further improved. In future work, classification task could be separated in two substages: (1) shape classification and (2) road sign recognition; therefore, identification accuracy may increase.

Acknowledgments

This work has been supported by the National Council of Science and Technology of Mexico (CONACYT) under the Grant no. 329483/229696 and the Universidad de Guanajuato through the Program of Institutional Enhancement PIFI-2012.