Abstract

Great changes have taken place in automation and machine vision technology in recent years. Meanwhile, the demands for driving safety, efficiency, and intelligence have also increased significantly. More and more attention has been paid to the research on advanced driver-assistance system (ADAS) as one of the most important functions in intelligent transportation. Compared with traditional transportation, ADAS is superior in ensuring passenger safety, optimizing path planning, and improving driving control, especially in an autopilot mode. However, level 3 and above of the autopilot are still unavailable due to the complexity of traffic situations, for example, detection of a temporary road created by traffic cones. In this paper, an analysis of traffic-cone detection is conducted to assist with path planning under special traffic conditions. A special machine vision system with two monochrome cameras and two color cameras was used to recognize the color and position of the traffic cones. The result indicates that this novel method could recognize the red, blue, and yellow traffic cones with 85%, 100%, and 100% success rate, respectively, while maintaining 90% accuracy in traffic-cone distance sensing. Additionally, a successful autopilot road experiment was conducted, proving that combining color and depth information for recognition of temporary road conditions is a promising development for intelligent transportation of the future.

1. Introduction

With rapid economic development, new opportunities have emerged for the automobile industry. In recent years, both car ownership and driver numbers have increased sharply in China. According to the data from the Ministry of Communications, before 2018, China already had over 300 million vehicles and 400 million drivers [1], and with a fast increase in the number of vehicles, some serious traffic issues have become noticeable. First, traffic safety continues to be very challenging. Globally, more than 1.25 million people die due to traffic accidents annually, with a total number having reached over 38 million since the start of the automobile industry [24]. The situation in China is not optimistic because over 100 thousand people get injured or die in traffic accidents every year, costing the economy more than 10 billion Renminbi (RMB). Second, traffic jams have become more and more serious. This has become a global problem in both developed and developing countries due to the traffic approaching or exceeding road capacity. According to the 2019 report from AutoNavi, rush hour traffic jams occurred in over 57% of the cities in China, while 4% of the cities suffered heavy ones [5]. Traffic jams increase travel time, gasoline consumption, and exhaust emission while at the same time decrease driving safety tremendously.

Advanced driver-assistance system (ADAS) (an important part of intelligent transportation) was developed to overcome the above problems [6]. With the development in the telecommunication services, sensing technologies, automation, and computer vision technologies, ADAS development has achieved positive results in traffic resource integration, real-time vehicle status, and driving environment monitoring [710]. Generally, ADAS consists of active safety and passive safety. Passive safety relies on certain devices, such as safety belts, airbags, and bumpers, to protect passengers and reduce damage [11]. However, passive safety cannot improve driving safety by itself because 93% of the traffic accidents are caused by the drivers’ lack of awareness of the danger [12]. Also, it has been reported that 90% of the dangerous accidents could have been avoided if the drivers were warned just 1.5 seconds earlier [13]. Consequently, active safety (developed to sense and predict dangerous situations) has been considered an important part of modern vehicles. By exchanging data with other devices on the Internet of things (IoT), active safety modules can assist drivers in making decisions based on the overall traffic status and replace the traffic lights for adaptive scheduling of vehicles at intersections [14]. Active safety modules can also estimate the risk of current driving behaviors by analyzing dynamic information from nearby vehicles via telecommunication service and cloud computing. If the risk is high and might cause a collision, the vehicle can warn the driver to correct the driving behavior, and in urgent cases, the active safety modules can take over the control of the vehicle to avoid a traffic accident [15]. The latest active safety modules have achieved the identification of traffic signs by applying deep machine learning technology. As a result, a vehicle could recognize a traffic warning or limitation and remind the driver not to violate the traffic rules [16].

In response to the need for intelligent transportation, ADAS research has focused on autopilots, with many countries (especially the US, Japan, and some European countries) investing a lot of money and effort into their development and making outstanding achievements [17]. Vehicular ad hoc network (VANET) technology, which provides channels for collecting real-time traffic information and scheduled vehicle crossings in the intersection zones, offers a new approach to releasing traffic pressure when traditional governance cannot solve the congestion issue effectively. It reduces the average vehicle waiting time and improves traveling efficiency and safety by gathering proper traffic-related data and optimizing scheduling algorithms [1820]. Many accidents caused by the driver’s inattention to the traffic signs can be avoided if the warnings are noticed in advance. Traffic-sign recognition function, which includes traffic-sign detection and traffic-sign classification, has been developed to solve this issue via machine vision technology. Since the camera-captured images include a lot of useless information, sliding window technology has been used to locate the traffic sign region in the image. Then, certain algorithms, such as histogram of oriented gradient (HOG), support vector machine (SVM), random forest, and convolutional neural network (CNN), are used for feature detection and classification [2123]. With the sliding window technology being rather time-consuming, some researchers have proposed other solutions for locating traffic regions (i.e., region of interest (ROI)), which decreased average image processing time to 67 ms [24]. One of the most important functions of ADAS is collision avoidance, where warning technology senses potential accident risks based on certain factors, such as vehicle speed, space between vehicles, and so on [22]. By installing proper sensors, like radar, ultrasonic sensor, or infrared sensor, multiple target vehicles and objects within 150 m can be measured with precision and assessed rapidly for a safe distance [21, 24]. One obvious challenge, however, is that space information may be missing in certain blind spots that sensors cannot detect [23]. To solve this problem, vehicle-to-vehicle (V2V) communication and Global Positioning System (GPS) have recently been introduced. Since then, collision avoidance warning has begun not only to be analyzed via passive measurements but also collected by active communication for its status data on the nearby vehicles [25].

Even though many different measures have been used in danger detection, one issue remains challenging. Colorful traffic cones that temporarily mark roads for road maintenance control or accident field protection are often hard to detect and process by the space sensors due to their small size. If neither the driver nor the ADAS notices the traffic cones on the road, serious human injuries and property damage may occur. Some fruitful research in detecting traffic cones has been conducted using cameras and LiDAR sensors, using such technologies as machine vision, image processing, and machine learning [2628]. However, some problems have become noticeable. First, high-quality sensors like LiDAR are expensive, and manufacturers are not willing to install them without a sharp cost decrease. Second, machine learning technology requires a lot of system resources, and on-board computers are not sufficient. Thus, the overall objective of this study was to develop a cost-effective machine vision system that can automatically detect road traffic cones based on the cone distribution to avoid any potential accidents. This method was able not only to recognize traffic cones on the roads but to sense their distance and assist the automatic vehicle control in navigating them smoothly. This required the development of algorithms for quick recognition of traffic cones by color and for sensing the corresponding distance data.

2. Materials and Methods

2.1. Experiment Car and Traffic Cones

An experimental car was designed with a 2600 mm length, a 1500 mm width, and a 1650 mm height, and its powertrain was composed of a 4 Ah battery and a DC motor with 80 KW, as shown in Figure 1.

The controlling system of the car contained an embedded computer (Intel i7 CPU, 8G RAM), vehicle controlling unit (VCU), battery management system (BMS), brake controller, DC motor controller, and a machine vision system, as shown in Figure 2. The embedded computer, which worked as the brain of the car, not only controlled the machine vision system to capture the road images but also sent appropriate commands to VCU after processing the road images and analyzing the car status. VCU performed as a bridge between the embedded computer and the hardware onboard. VCU collected real-time status data of the car, sending it to the embedded computer. At the same time, it controlled BMS, the DC motor controller, and the brake controller as they issued valid commands from the embedded computer. For safety reasons, the VCU rejected any invalid commands or any commands received in the presence of a component error. Each part of the controlling system communicated through the CAN bus with a 250 Kbps baud rate, except for the machine vision system, which exchanged data with the embedded computer through Ethernet.

The red, blue, and yellow traffic cones that are widely used on the roads in China were 200 mm × 200 mm × 300 mm (length, width, and height, respectively) with a reflective stripe attached in the middle, as shown in Figure 1. The red and blue traffic cones were used for indicating the left and right edges of a temporary road, while the yellow ones specified the start and end of a road in this experiment.

2.2. Machine Vision System

Figure 3 shows the Smart Eye B1 camera system (consisting of four cameras) chosen for this research. Two monochrome cameras, which composed a stereo vision system, were used for sensing real-time 3-dimensional environment data, whereas the color cameras were detecting color information. According to the specifications of the Smart B1 camera system, its error of space prediction is <6% within a detectable range of 0.5–60 m. Additionally, this camera system can automatically adjust white balance. The resolution for all cameras was set to 1280720, and the frequency of all cameras was set to 12 fps. Two independent Ethernets with a 100 megabit bandwidth controlled the data exchange for the monochrome and color cameras. The camera was placed 1500 mm above the ground to simulate the field of view in a sedan. The example images are shown in Figure 4(a).

2.3. Range Detection via Stereo Vision

In this experiment, two monochrome cameras were used to build a stereo vision. A point P (x, y, z) in a world coordinate system projected into the two cameras with the coordinates Pleft (xl, yl, zl) and Pright (xr, yr, zr). Since the height of the two cameras was the same, the values of yl and yr were the same and the 3-dimensional coordinate could be changed into a 2-dimensional coordinate for analysis, as shown in Figure 5. f was the camera’s focal length, while b was the baseline of the left and right cameras.

According to the triangle similarity law, the following relation exists:

From equation (1), the x, y, and z values can be calculated with the following equations:

A depth image D (x, y), which included the object’s distance information in each pixel, was generated by the z values as a 32 bit floating matrix that could be visualized via the handleDisparityPointByPoint () API from the camera system’s Standard Development Kit (SDK). A processed depth image is presented in Figure 4(b), with the warmer color indicating the longer distance. The original depth image format was converted from the 32 bit floating matrix to a color image because the float data and pixel values exceeded 255 and were unavailable for display on the current operating system.

2.4. Traffic Cone Detection

Traffic cone detection, which was developed using C++ language with an OpenCV library, consisted of four functions: color recognition, size and distance calculation, noise filtering, and the traffic cone marking.

2.4.1. Color Recognition

All traffic cones had the same shape, size, and reflective stripes, except for their color. Since the differences between the yellow, red, and blue colors were obvious, they were able to distinguish from the color images by processing these images during the day time. The color detection algorithm is shown in equation (3). The red, green, and blue values in each pixel of the color image H (x, y) were used for ratio calculations that would determine this pixel color feature. The thresholds from T1 to T7 were set based on the experimental results:

2.4.2. Size and Distance Calculation

When all traffic cone pixels in image H (x, y) were marked, traffic cone’s size and distance were calculated, as shown in equation (4). Size S was the number of pixels in one isolated traffic cone area in H (x, y), while D was the average gray value in the same area’s depth image D (x, y).

2.4.3. Noise Filtering and Target Marking

Since various objects showed up in the color images with colors similar to those of the traffic cones, it was necessary to eliminate those as noise. Because the traffic cone size was in reverse proportion to the distance in the images, filtering of the fake traffic cone pixels was conducted based on the size S and average distance data D, as shown in equation (5). A traffic cone was ignored unless S was less than the threshold at distance D, and it was confirmed if S was equal to or larger than the threshold at D. Finally, minimal external rectangles were calculated to mark all of the existing traffic cones in the area as the detected traffic cones:

3. Results and Discussion

The experiment was separated into a color marking test and a distance matching test. The color marking test was mainly focused on the traffic cone recognition, whereas the distance matching test validated the space measuring function. In addition, a road test was conducted to validate the algorithm’s stability and efficiency.

3.1. Traffic Cone Recognition Test

Twenty red traffic cones, fourteen blue cones, and sixteen yellow cones were manually placed in front of the experiment car. As shown in Figure 6, recognized traffic cones were marked by rectangles with the same colors as the bodies of the cones, whereas the unrecognized ones were marked with white rectangles. The blue and yellow traffic cones reached a 100% detection success rate, while the red ones were accurately detected 85% of the time. The three undetected red traffic cones were located close to the left and right edges of the image and placed on a section of the playground that was reddish in color. Also, one of them was 10 meters away from the camera, and two were over twenty meters away from the camera. The ground color might have influenced red color recognition.

3.2. Distance Matching Test

After the traffic cone marking process, the distance data matching test was conducted, and the experiment results are shown in Figure 7. Fourteen blue and sixteen yellow traffic cones were matched with the corresponding distance data from the depth image with a 100% accuracy rate. However, only 15 out of 20 red traffic cones had the corresponding distance data in the pixel area of the depth image. Besides the three undetected red traffic cones in recognition test, another two red ones on the left side, which were close to a blue pole, were mismatched in color and depth. The overlay might be the reason for this error. Consequently, 45 out of 50 traffic cones were successfully paired with their distance information, and the overall success rate was 90%. A prediction error existed for the paired traffic cones from 2 cm to 1.1 m between predicted distance and manual measured distance, and this error went up when the distance between the camera and the cone increased. This error was within 6%, and it was acceptable while the experiment car ran at a speed of 10 km/h.

3.3. Road Test

To simulate a temporary road, the traffic cones with red color were designated as the left road boundary and the blue ones were designated as the right road boundary. The yellow traffic cones were used to indicate the start and end of the temporary road. The distance between any two traffic cones of the same color was 5 m, and the width of the temporary road as marked by the red and blue cones was 3 m. The temporary road included a curve-line section and a straight-line section, and the road test images are shown in Figure 8.

The experiment demonstrated that a machine vision system could detect red, blue, and yellow traffic cones, and the experiment car in an autopilot mode could successfully navigate a temporary road at a speed of 10 km/h. Without the similar color influence, the success rate of recognition increased. At times, one or two traffic cones were missing from a frame of color and depth image, and this might be explained by the following. First, some cones that were near the left and right edges of the images could not be paired in color and depth, and the same happened in the initial static test. Since the distance between the car and the traffic cones near the edge of the image was quite long, the error would not impact driving safety. Besides, 12 frames of color and depth images were captured in one second, so the missing cones could be detected in the following frames while they moved away from the image boundary area. Second, traffic cones that were entering or leaving the images while the experiment car was moving might not have been detected if they showed up only partially. Once these traffic cones fully entered the images, this problem was solved automatically.

4. Conclusion

An image processing algorithm based on color and depth images was successfully applied to traffic cone detection. Each image frame was analyzed within 80 ms, which included one color and one depth image capture and processing. The traffic cones were very accurately recognized by color, with the success rates of color recognition being 85%, 100%, and 100% for red, blue, and yellow cones, respectively. Additionally, the distance was successfully sensed for 90% of the traffic cones by pairing color and depth images. Some of the cones were missing in some of the image frames when they were located around the image edge area, but they could be found in the following frames of the dynamic test. With 12 frames per second in the machine vision system, cones at the edges of the area naturally came in and out of the field of vision of the moving camera. This method was very effective on a temporary road marked by traffic cones of different colors. The advantages of using paired color and depth images for traffic cone detection can be summarized as follows. (1) This method is sensitive to small safety-related traffic cones. (2) It uses a highly efficient and stable algorithm for recognition processing. (3) It is a cost-effective solution for maintaining safe driving on temporary roads.

Data Availability

All data presented and analyzed in the study were obtained from laboratory tests at Beijing Information Science & Technology University in Beijing, China. All laboratory testing data are presented in the figures and tables in the article. We will be very pleased to share all our raw data. If needed, please contact us via e-mail: [email protected].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors wish to thank the National Defense Science and Technology Project (JCCPCX201705).The authors also appreciate the great support from Beijing Information Science & Technology University with Qin Xin Talents Cultivation Program (QXTCPA201903 and QXTCPB201901), Scientific Research Level Promotion Project (2020KYNH112), and School Research Fund (2025041).