Abstract

Precise navigation map is crucial in many fields. This paper proposes a panorama based method to detect and recognize lane markings and traffic signs on the road surface. Firstly, to deal with the limited field of view and the occlusion problem, this paper designs a vision-based sensing system which consists of a surround view system and a panoramic system. Secondly, in order to detect and identify traffic signs on the road surface, sliding window based detection method is proposed. Template matching method and SVM (Support Vector Machine) are used to recognize the traffic signs. Thirdly, to avoid the occlusion problem, this paper utilities vision based ego-motion estimation to detect and remove other vehicles. As surround view images contain less dynamic information and gray scales, improved ICP (Iterative Closest Point) algorithm is introduced to ensure that the ego-motion parameters are consequently obtained. For panoramic images, optical flow algorithm is used. The results from the surround view system help to filter the optical flow and optimize the ego-motion parameters; other vehicles are detected by the optical flow feature. Experimental results show that it can handle different kinds of lane markings and traffic signs well.

1. Introduction

Precious map can provide essential information for many intelligent transportation systems (ITS), such as car navigation systems, autonomous driving systems, and advanced driver assistance systems (ADAS). Most of the existing maps are based on satellite photography and aerial photography, which are not precise enough to provide detailed information, such as road markings. Although Google Street View [1] enables users to have the details of a street, it is limited to an immense panorama and cannot provide information for a large-scale range, in other words, it cannot be used as a practical navigation map, whereas vision can deliver a great amount of information, which makes it a powerful means for recognizing the detailed road information. Therefore, vision-based detection and recognition of road markings are quite promising, and many efforts have been made for the generation of advanced navigation maps containing both lane markings and traffic signs systems.

Road markings have specific features since they are made according to the construction rules and regulations. Figure 1 shows examples of the typical lane markings in China. A lane marking can be expressed in the respect of its color, number of lines, and broken-or-solid style. For example, “yellow-2 lines-solid” represents a double yellow solid line.

There have been extensive researches on lane marking detection, also, a wide variety of modeling, detection, and tracking techniques have been proposed [26]. Wang et al. [7] proposed a road geometry extraction system. It extracted line segments and classified them into different types of lane markings by a synthetic analysis. Kim [8] introduced a robust real-time lane detection and tracking algorithm for local roads and highways in various challenging scenarios. Schindler and Lauren [9] presented a method for robust lane recognition by applying N-level-set-fitting to preprocessed image data from a single monochrome camera. Reference [10] introduced a method to high-accuracy road orthophoto mapping; its resolution is higher than Google’s satellite images, but no feature extraction is mentioned. Reference [11] introduced the ground texture based localization for intelligent vehicles, and it extracted edge information of pixel level, but no extraction of traffic feature is done in it.

Most of existing researches just focused on the detection and tracking of lane markings in the host lane while the extraction of information in a larger scale is not considered, and few have been done for the recognition of multilane on the roads. Furthermore, some important lane features are neglected, such as color, number of lines, and broken-or-solid style, which are important for a number of driving maneuvers like lane changing and over-taking. Another important fact is that the existence of vehicles turns out to be a big threat for lane markings recognition in large scale image since they will block the lanes. Detection of vehicles on the road is significative for lane recognition, but not many works have been done in this respect.

For traffic signs detection and recognition, a number of techniques have been studied [1218]. de la Escalera et al. [19] used a genetic algorithm to detect traffic signs and a neural network to classify them. It performed well in the situations where there are no shadows. Poor [20] introduced Eigen-based traffic signs recognition by invoking the principal component analysis (PCA) algorithm to choose the most effective components of traffic sign images for the classification of an unknown traffic sign. In [21], a fast road sign detection and recognition system was implemented in the autonomous unmanned vehicles for urban surveillance and rescue.

However, most of existing work focused on the detection of roadside traffic signs, which are less affected by the rolling and pitching of the host vehicle. Few works have been done for detecting the traffic signs on the road surface, which also provide essential information for a precise navigation map.

This paper presents a practical approach for the detection and recognition of various kinds of multilanes based on panoramic cameras, which, compared with traditional cameras, cover a much larger field of view and provide more information. The panoramic system in this paper consists of two subsystems, one is a surround view system focused on host lane detection and the other is a panoramic camera used for multilane detection. Section 2 introduces the host lane recognition based on a rotation histogram. It employs the Hough transform to obtain the main direction of the road lane and perform a histogram analysis to get the color, number of lines, and broken-or-solid information. Ego-motion estimation based on ICP is adopted to get the position and pose of the vehicle used as platform in surround view system. Section 3 introduces the multilane recognition. Curve model is employed to fit lane markings in a panoramic image. Ego-motion estimation based on optical flow method is adopted to detect other vehicles, and the result is optimized with the ego-motion parameters got in the surround view system. In Section 4, the traffic signs detection and recognition are described, which are based on the template matching and support vector machine (SVM). It uses Fourier Descriptor as the main feature and conducts the histogram statistics to identify axisymmetric traffic signs. In Section 5, the proposed system is tested with a large number of real traffic images and experimental results show that it can handle different kinds of lane markings and traffic signs accurately and robustly.

2. Host Lane Detection and Recognition

Host lane detection is so important that the information of other lane markings can be inferred from its position, width, and curvature. Thus, the surround view system, which consists of four fisheye cameras with 180-degree field of view around the host vehicle and a processing module that realizes viewpoints transformation and image stitching, is employed. To classify lane markings, three different types of information are analyzed. The first is the color of lines, for example, yellow or white; the second is the number of lines, for example, double or single, and the third is the feature that whether it is solid or dashed. Then our approach follows a three-step strategy that is divided into preprocessing, recognition, and filtering phase.

2.1. Image Preprocessing

The preprocessing algorithm functions in three phases. Initially a pixel-by-pixel search is performed to look for features that identify lane markings. Features are extracted from a raw image using a color filter and a width filter. The color filter detects pixels which correspond to the probable colors of lane markings. Pixels are converted from RGB color space to HSV color space, which rearranges the geometry of RGB in an attempt to be more intuitive and perceptually relevant. Since lane markings are either white or yellow, thresholds are set to the three components, hue, saturation, and value, respectively, to determine the color of pixels [22]. The width filter is a modified edge filter which responds to edges of the correct width. Only pixels between a rising edge and a falling edge within a predefined range are detected. In the images taken by the proposed system, the vehicle is always in the middle and occupies a fixed region, so the corresponding pixels can be neglected during the feature extraction.

Subsequently, these features are fused into a binary map by requiring feature pixels being either white or yellow in color and fitting the width filter. All feature points are set to 1, whereas the rest are set to 0.

Finally, since lane markings are linear features, Linear Hough Transform (LHT) is used to obtain straight lines by converting the problem in image space into a more easily solved current peak detection problem in parameter space. The presence of other linear features can create ambiguities during the extraction; in order to reduce these ambiguities, interframe filtering is implemented. The two peaks in the parameter space corresponding to the two host lane markings are selected based on the assumption that they are of the same orientation and width. Also, the peak in the Hough transform parameter space of the current frame should always be in the vicinity of the peak in the last frame.

Figure 2 shows an example of the procedure. Figures 2(a)2(c) are the raw image, the feature-fused binary map, and the detected lane markings.

2.2. Line Type Classification

The above processing extracts regions that contain lane markings. It is quite convenient to establish a histogram for feature expression. The horizontal axis of the histogram corresponds to pixel columns of the rectified image and the vertical axis corresponds to the number of edge pairs in each column. We call it the feature histogram. As shown in Figure 3, candidate lane markings produce square waves in the feature histogram.

Whether a lane marking is broken or solid is quite related with the following features. represents the feature of a single line, represents the feature of a double line, is the length of the region, and means the number of sections. Consider

Taking advantage of those features, we can judge whether it is a single or double line, also, whether it is solid or dashed. Obviously, should be close to 1 and close to 0 if it is a single solid line, while and should be larger than a suitable threshold if it is a solid-dashed line. It has to be mentioned that should be less than 10, and if not, the result may be interfered by noise. The detailed rules and thresholds are listed below:

2.3. Ego-Motion Estimation in Surround View System Based on ICP

The existence of vehicles in real traffic often causes false detection or recognition. This problem is much more serious in a larger field of view than in the view of surround view system. To reduce false alarms, ego-motion estimation is performed in this paper to detect vehicles. The visual field of a surround view image is small, and ego-motion estimation is not essential if only the information of host lanes is needed; however, the ego-motion estimation of surround view system is important for ego-motion estimation of panorama system which will be studied later in this paper.

The host vehicle is the main occlusion in the surround view image, and it can be detected and removed through ego-motion estimation. Traditional ego-motion of vehicles usually adopts optical flow method, which requires large amounts of corners in the image; however, the images got by the surround view system do not match the requirement; thus, ICP is used here.

The ego-motion estimation system diagram is shown in Figure 4, and inputs of this system are the image sequences collected by the surround view system. The ego-motion parameters, including displacement and variable rotation, can be calculated through the method of ICP [23]. Taking advantage of the image sequence and the ego-motion parameters, we can reduce the interference in lanes detection and recognition and restore the area of the vehicle bottom.

Taking the vehicle coordinate system as reference, assuming that is a point on the road, it is on point when the car is on point and is on point , and then the relationship between and can be expressed as follows:

and , respectively, represent rotation and translation variation of points between two adjacent frames, and those two parameters could also be used to represent the rotation and translation variation of cars position. ICP is adopted to calculate and , and the detailed process is as follows.(1)Get the initial rotation and translation parameters and of the image sequence ;(2)On the basis of the parameters and , calculated from the previous frame, we can get a new image transferred from image , and and could be got by adopting ICP to images and , and then we have , ;(3)Repeat the iteration process 2 until the whole image sequence has been finished;(4)Derive the vehicle position by the ego-motion parameters, and amend the parameters of previous step if the filter forecast value is gotten by Kalman filter and the actual values differ greatly.

For the reason that the host vehicle itself occupies a large area in a surround view image, some information on the road is blocked; however, an image with only lane markings and traffic signs on the road may be a better choice while generating a navigation map. Thus some work of image mosaic is done in this paper.

The vehicle’s position and pose got by the ego-motion estimation could be used to restore the road area blocked by the vehicle through the fusion algorithm. Poisson fusion algorithm is adopted in an iterative operation [24], in which we fill part of the area blocked by the vehicle in one image with the same area of the road unblocked in previous image. With this method, a vehicle in one image will be replaced by the area of road it blocked; an example is shown in Figure 5. Image mosaic provides a possible way to optimize the map generation, but detailed work will not be mentioned in this paper.

3. Multilane Detection and Recognition

In order to detect and recognize multilane, panorama based methods are proposed. Panorama in this paper has a visual angle of 360°, and nearly all the lane markings on the road could be seen in it, it is particularly suitable for multilane detection. Panorama image has blind area, and the area near the host vehicle will be blocked, but this problem is solved since we adopt surround view system as a supplement. Extending the information of host lane got by the surround view system is an important premise of multilane detection. Also, ego-motion estimation is used to remove the interference of other vehicles.

3.1. Multilane Detection and Recognition in Real Traffic

The visual shield is quite small in a surround view image so that we can describe a lane with straight line model; however, when it comes to panorama, straight line model may cause great deviation, and suitable curve models are necessary for fitting multi-lane.

The panorama images are transferred into top view images before used; thus, the lane markings are limited in an area of about 50 m; in this case, quadratic polynomial models could well meet our demand [25]. Schematic diagram is shown in Figure 6.

The multilane detection is similar to lane detection in surround view images in the respect of feature detection, such as using the color filter, width filter, and binary map, but it requires better feature selection since there is more interference in a panorama image. The host lane detection has provided the information of position and width, which could also be used in the multilane detection.

Linear Hough transform has been adopted to obtain straight lines in surround view images, searching new points along those fitted lines on both forward and backward direction, and if one edge point is found on the tangential direction of its previous points, this point will be clustered, and this method is also suitable when the line is dashed or unclear. The progress is shown in Figure 7.

In panorama system, similar method of clustering is used, and the model turns from straight line to curve. Least square method is used to seek parameters for quadratic curve fitting. The formula is as follows:

and represent those points clustered before; quadratic curve parameters , , could be calculated by the method of partial differential.

Under the assumption that standard lane markings in one image are parallel, other lane markings could be detected according to the width and curve equation of host lane. For a lane that matches the assumption, it is easy to get and cluster the points in it, and for those not strictly matching, their points could also be clustered by extending the searching area. Fitting curves for different groups of points could be gotten through the method of least square. Since a large view shield is provided by a panorama image, we can detect nearly all the lane markings on the current road surface, which is impossible for ordinary camera photos a result is shown in Figure 8.

3.2. Vehicle Detection Based on Ego-Motion Estimation

As mentioned previously, the existence of vehicles in real traffic may cause false detection or recognition, especially in the panorama images. Rich features, which is adverse for ICP, are covered in a Panorama, for such reason, optical flow method is performed here to get the changes of position and pose by calculating the optical flow information of corners in the images. However, normal optical flow method would be affected by the large amounts of moving objects in the visual field; thus, ICP result is used to help to optimize the ego-motion parameters in the panorama system. Both the host vehicle and other vehicles could be detected through the ego-motion results.

The surround view system and panorama system are two independent parts on the platform; in order to make their data complementary, unified calibration is done; the detailed progress is not mentioned in this paper. The frame rate is 20 fps for surround view system and 2 fps for panorama system, and the results of ego-motion estimation in surround view system cannot be used directly in the panorama system, and some steps are necessary here.

With the method of ICP, we got the ego-motion parameters and ; then, define a transformation matrix . Consider

The total changes of position and pose from the first frame to th frame could be gotten from the product of transformation matrixes, ; accordingly, . These transformation matrixes will be used later to optimize the ego-motion parameters the details are shown in Figure 9.

The concept of optical flow was firstly proposed by Gibson, in 1950, and it uses the correlation of pixels between adjacent frames to calculate changes on time domain; thus the objects’ information of movement could be gotten.

In this paper, the method of sparse optical flow algorithm KLT based on Pyramid model is adopted to calculate and track features of corners in an image to form optical flow field. Define as feature points in the previous frame and as corresponding points in current frame; the optimal rotation and translation matrices could be gotten by minimizing the error showing in the following equation:and are ego-motion parameters and is the weight.

In practical traffic environment, some moving objects, such as other vehicles or walkers, will form optical flow field too. It is necessary to filter the point set and point set to filter out the interference points and leave the right points only. The transformation matrixes are used here as a reference to determine the filter range. is the transformation relation between point sets; define as the maximum acceptable error, and then the filter rule could be described as follows:

A point will be moved out of the point set, which is expected to be used to calculate the ego-motion parameters, if it does not match the rule above. For the rest points, their weight could be calculated with the following formula:

The optimized ego-motion parameters could be gotten after finishing the filtering and calculation of weights. At the same time, the optical flow field is divided into reasonable values and abnormal values. Reasonable values are the points used to calculate the ego-motion parameters while abnormal values are the points moved out by the filter. In most cases, the abnormal values represent other moving objects in the surrounding environment. By clustering the abnormal values, other vehicles could be detected easily. An optical flow field of panorama and its vehicle detection results are shown in Figures 11 and 10.

4. Traffic Signs Detection and Recognition

Except from the traffic lane markings, there are many traffic signs on the road surface, which also provide useful information for navigation. Some common traffic signs are shown in Figure 12.

Traffic signs detection based on sliding window is performed in this paper. Also, a new classification method combining template matching and SVM is proposed to recognize those traffic signs.

4.1. Traffic Signs Detection Based on Sliding Window

Now that traffic lane markings have been extracted, the detection region is effectively limited; sliding window is created between lane markings and will scan the whole road.

Otsu method is adopted in every sliding window to do binary segmentation to the images [26, 27]. A binary image is easily affected by noises; thus filter and clustering are necessary. An 8-connected areas detection is used here to cluster the binary image; the regions got by clustering will be filtered with the following rules.

In most cases, noises only occupy a small area in one image, and it is easy to clear them out and leave the traffic signs only with an area constraint as follows:

Also, all the traffic signs lie in the middle of the sliding windows; in other words, only some specific regions need to be detected in an image, and the region constraint is as followed:

With the detection rules mentioned above, 600 images containing traffic signs in different situations are detected, and some results is shown in Figure 13.

4.2. Traffic Signs Recognition Based on Template Matching and SVM

Template matching is a method to find the corresponding parts in an image according to known templates some templates are shown in Figure 14. To judge the matching degree between template and the image, two parameters could be used here.

SSD (Sum of Squared Difference) describes the squared difference of two images, and the higher SSD is, the more similar those two images are.

NCC (Normalized Cross Correlation) is the other common parameter, ranging from −1 to 1; the image and template are exactly the same when and are totally different when .

Some experiments about template matching have been done in this paper; the result shows that SSD could provide satisfying dipartite degree while NCC could not. However, even the SSD failed in the detection of some images because the template matching is quite sensitive to scaling and rotation, even small changes would result in failure. In order to improve the robustness of traffic signs recognition, a recognition method based on SVM is proposed.

As mentioned above, for the existence of scaling and rotation, the robustness of traffic signs recognition cannot be guaranteed; what is more, some traffic signs suffer from occlusion and wear, making the recognition effect even worse. Fourier Descriptor is scaling and rotation invariant, and it could describe the features of objects precisely in different shooting angles, and the influence of occlusion and wear could also be weakened by it.

The contour of signs is regarded as a closed curve, and every pixel has a coordinate , and the coordinate is described by complex number ; the Fourier Descriptor could be calculated through discrete Fourier transform and normalization. Fourier Descriptor within a certain frequency range is chosen as classification feature for the recognition [28].

SVM are supervised learning models with associated learning algorithms that analyze data and recognize patterns, mainly used for classification and regression analysis. Essentially, SVM is a liner linear classifier, and points in -dimensional will be divided into different parts [29, 30]. Thus it could be used to classify the features of traffic signs, in other words, complete the recognition of them.

63 samples’ Fourier Descriptors are used to test the classification efficiency of SVM, and the result is shown in Table 1.

Obviously, we got accuracy rates higher than 90% for most signs using Fourier Descriptor as feature; however, it is difficult for SVM to distinguish “Turn left” and “Turn right,” as well as “Go straight and turn left” and “Go straight and turn right;” in other words, SVM should not be used in the recognition of axisymmetric images. For this case, a method combining template matching and SVM is proposed in this paper.

While classifying signs with SVM, axisymmetric signs and nonaxisymmetric signs are separated; for example, “Turn left” and “Turn right” belong to one class, “Go straight and turn left” and “Go straight and turn right” belong to another, while the rest of the signs belong to their own groups. The classes of axisymmetric signs will be recognized through the method of template matching. The accuracy is greatly improved since there are only two types of signs in each class and they are quite different in contour. The whole flowchart is shown in Figure 15.

5. Experimental Results

5.1. Experimental Platform

The proposed system was implemented in our experimental platform based on the CyberTiggo, which has four fisheye cameras mounted around the vehicle to collect surround view images and a ladybug3 camera on the top to collect panorama images. The angle encoder for measuring the steering wheel angle and the odometer encoder are also installed on the platform. The frequency of vehicle industrial computer is 2.40 GHz, with 4 G memory and Window XP operating system, and real-time data acquisition and processing can be completed within it. Details are shown in Figure 16.

5.2. Results of Host Lane Recognition

1200 images with 2316 lane markings are used for evaluation of the surround view system, which cover a wide variety of situations. In Figure 17, several typical lane markings are shown in aerial view together with the recognition results. The quality of the lane markings is not consistent, and worn out markers are also included.

The recognition results are divided into 2 situations: true Positives and false Negatives. Two parameters, Precision and Recall, are used to measure the statistical results in Table 2. Table 3 shows the recognition result. The definitions of Precision and Recall are as follows:

The statistical results of host lane recognition is 97.19%, which proves that the method proposed in this paper is practical, and the main reason for failed recognition is acute changes in the light and the existence of badly worn lane markings.

5.3. Results of Multilane Recognition

More than 1000 images are used for evaluation of the panorama system, and a wide variety of situations, such as tunnels, high-speeds, and normal roads, are involved in. Several results are shown in bird’s-view in Figure 18, including single dashed lines, single solid lines, and others, and both curved roads and straight roads could be seen in it.

Like the recognition of host lanes, the recognition results of multilane are divided into true Positives and false Negatives. Precision and Recall are also used here to measure the statistical results in Table 4. The results of recognition are shown in Table 5.

Seen the result in Table 4, the recognition accuracy is lower when the vehicle detection is not adopted because other vehicles would bring interference to the feature detection. Compared with the host lane recognition, both Precision and Recall are much lower in the panorama system; the possible reason is that panorama images are more complex than surround view images.

The statistical results of multilane recognition are 90.89%, and acute changes in the light and occlusion contribute mostly to the wrong recognition.

5.4. Results of Traffic Signs Recognition

About 150 images are used for the evaluation of traffic signs recognition, most of them contain more than 3 traffic signs, and the recognition results could be seen in Figure 19. Also, Precision and Recall are used to verify the effectiveness of detection method based on sliding window adopted in this paper, and the statistical results are listed in Table 6. The results of recognition are shown in Table 7.

The method of sliding window is proved to be efficent in the detection of traffic signs; however, only 95.4% of all the traffic signs could be recogized correctly, and stains on the road and worn signs may be to blame.

6. Conclusions

This paper presents a vision-based approach for the detection and recognition of various kinds of road markings. A platform equipped with surround view system and panorama system is used to collect road information in a wide range.

For host lane detection and recognition based on surround view system, Hough transform is used to get the main direction of road lane and a histogram analysis is performed to obtain the features, including the color, number of lines, and dashed-or-solid style. Image compensation in the HSV color space further improved the results performance. In order to recognize host lanes and restore the road area blocked by the host vehicle, ego-motion estimation based on ICP is adopted to get the position and pose of vehicles, and the road images are successfully restored through Poisson fusion algorithm after getting the ego-motion parameters.

For detection and recognition of multilane based on panorama system, similar method, as in the surround view system, is used. In order to reduce the interference caused by other vehicles in the detection and recognition, ego-motion estimation based on optical flow method is applied; the ego-motion parameters got in the surround view system is also used to optimize the result.

For traffic signs recognition, an approach combining template matching and SVM is proposed. The approach used the Fourier Descriptor as features, since SVM has difficulty in recognizing axisymmetric traffic signs, and this paper proposes a novel algorithm that extracts features based on contour segments, and then performs template matching to identify the axisymmetric signs.

Experimental results of detection and recognition of host lanes, multilane, and traffic signs substantiate the effectiveness and robustness of all the methods proposed in this paper.

The basic road information essential for a precise generation map could be gotten through the methods proposed in this paper. Future work will focus on the specific process of precise generation map generation with the recognized road markings.

Conflict of Interests

The authors declare that they do not have any conflict of interests with the content of the paper.

Acknowledgments

This work was supported by the General Program of National Natural Science Foundation of China (61174178/51178268), the Major Research Plan of National Natural Science Foundation of China (91120018/91220301), and National Magnetic Confinement Fusion Science Program (2012GB102002).