Abstract

The Internet of Vehicles and information security are key components of a smart city. Real-time road perception is one of the most difficult tasks. Traditional detection methods require manual adjustment of parameters, which is difficult, and is susceptible to interference from object occlusion, light changes, and road wear. Designing a robust road perception algorithm is still challenging. On this basis, we combine artificial intelligence algorithms and the 5G-V2X framework to propose a real-time road perception method. First, an improved model based on Mask R-CNN is implemented to improve the accuracy of detecting lane line features. Then, the linear and polynomial fitting methods of feature points in different fields of view are combined. Finally, the optimal parameter equation of the lane line can be obtained. We tested our method in complex road scenes. Experimental results show that, combined with 5G-V2X, this method ultimately has a faster processing speed and can sense road conditions robustly under various complex actual conditions.

1. Introduction

The 5G network has the characteristics of high transmission speed and low transmission delay. It provides a more reliable communication environment for V2X. The 5G-based intelligent traffic management system has stronger management capabilities and robustness, which helps to improve traffic flow. With the help of 5G-V2X, real-time complex road perception becomes possible. Therefore, 5G-V2X is the key to a smart city [1, 2].

Lane detection is one of the most important tasks in understanding road scenes and is also the most complicated part. It can use extracted lane-marker information to locate the road and determine the relative position between the vehicle and the road. A lane-detection solution using visual algorithms is a relatively common solution. But there are a wide variety of lane markings. The markings may be blocked due to vehicle crowding, the line may be corroded and worn, and the weather and other factors can bring challenges to lane-detection [35].

Early cognitive algorithms for urban roads relied on manual design, which required a lot of work. These methods use Hough transform, random sample consistency, and other methods to segment the road area and detect lane lines [6]. The obvious disadvantage of this method is poor generalization. When the driving environment changes significantly, the accuracy may be significantly reduced [7].

Convolutional neural networks have achieved great success in computer vision. Methods based on deep learning can directly learn domain knowledge from large datasets, greatly improving the ability to understand complex urban road scenes [810]. On this basis, we propose a new method of marker detection based on a traditional detection algorithm and deep learning. First, we extract the overall road area by training the Mask R-CNN [1115]. The identified road area is used as the constraint area, the lane mark is detected in the area, the obtained discrete lane-line feature-point information is clustered by the least-squares method, and the lane lines are fitted in a different field of view using straight-line and curve-fitting models [16, 17].

The main contributions of this paper are the following two aspects: (1) The model combines the convolutional neural network, 5G-V2X, with the traditional algorithm. The model is used to improve the detection speed of extracting feature points. (2) In different fields of view, the straight-line and curve-fitting models are used to fit the straight line to make the fitting result more accurate, and the optimal parameter equation of the lane line can be obtained.

This article is organized as follows. Section 2 introduces related work. Section 3 describes the improved deep learning method used and the clustering and fitting algorithms for lane lines. Section 4 reports the results of the experiment. Section 5 provides our conclusions.

In recent years, with the availability of parallel computing, the training process of large-scale data has accelerated. The convolutional neural network (CNN) has become a research hotspot and is widely used in computer vision and pattern recognition [18, 19]. The convolutional neural network can automatically learn the hierarchical features of an image, avoiding the blindness of people’s artificial design and selection features, and showing its excellent performance in the task of object detection and instance segmentation.

2.1. Semantic Segmentation

A convolutional neural network-based SegNet learns high-order features in a scene to perform road scene segmentation. It classifies other test images by applying a training algorithm to a common image dataset to generate training labels. A new texture descriptor based on color-layer fusion obtains the maximum consistency of the road area [20]. Offline and online information are combined to detect the road area. A combined hierarchical framework for road-scene segmentation can reliably estimate the topological structure of a scene and effectively identify traffic scenes of multilane roads [21]. The state-of-the-art instance segmentation methods Mask scoring R-CNN and Cascade Mask R-CNN are both improved based on Mask R-CNN, which prove the effectiveness of Mask R-CNN [22].

2.2. Lane Line Detection

Lane line detection is the most important part of the entire road surface inspection. In recent years, deep learning has enabled great success in computer vision. In the lane-line detection problem, a deep neural network is used to learn the lane-line feature, which improves the accuracy of lane-line feature extraction and is suitable for complex road environments [23]. The University of Sydney used CNN and RNN to detect lane lines, with the CNN providing geometric information on lane-line structures for use by the RNN to detect lane lines. Kyungpook National University combined CNN and RANSAC algorithms to stably detect lane-line information even in complex road scenes [24]. The Baidu map project team proposed a dual-view convolutional neutral network (DVCNN) for lane-line detection. The Korean Robotics and Computer Vision Laboratory proposed a method to extract multiple regions of interest, merging regions that may belong to the same class, and using a principal component analysis network (PCANet) and neural networks to classify candidate regions [25]. The laboratory proposed a vanishing point guided network (VPGNet) [26] to solve the problem of lane-line and pavement-marking recognition and classification under complex weather conditions. The Ford Research and Innovation Center used the DeepLanes Network to extract lane-line features acquired by cameras on both sides of a vehicle.

2.3. Clustering and Fitting

The lane-line features extracted by deep learning cannot be directly used, and it is still necessary to cluster and fit the lane-line feature points. The main purpose of fitting is to depict the lane markings on a picture and to display the location in the road image [27]. In lane-line clustering and fitting algorithms, the commonly used road models are linear, linear parabola, least-squares (LS) curve fitting, cubic curve, and Bezier curve fitting [28, 29].

2.4. 5G-V2X

The goal of the 5G-V2X communication system is to achieve accurate and efficient road scene perception and accident-free and efficient collaborative autonomous driving [30]. Literature [3133] proposed a network protocol based on edge computing and a new vehicle network privacy protection protocol to enhance road safety, intelligent transportation, and smart city systems. Literature [34] introduced a real-time communication method based on 5G-V2X, which reduces the energy and time costs of the system and improves the management efficiency of vehicular networks. Literature [35] proposed an intelligent traffic Vehicle Detection Model based on 5G-V2X, which can dynamically coordinate computing and content caching and effectively allocate network resources.

3. Complex Road Lane Line Detection and Processing

We can divide the detection process of the lane line into three steps. The first is to use deep learning to extract the feature points. The second is to cluster the extracted lane-line feature points, and the third is to obtain cluster points.

3.1. Feature-Point Extraction Based on Improved Model

Building deep models is just a means of learning. Unlike traditional shallow learning, which relies on artificial features, deep learning has a deeper model depth, a greater emphasis on feature learning, and a greater amount of data for training. Therefore, it can better describe the internal correlation between the data.

As shown in Figure 1, Mask R-CNN consists of three parts. The first part is the backbone network, which is used for feature extraction. The second part is the head, which is used to obtain category scores and regression bounding boxes, and the third part is to generate a mask.

The RPNs in Mask RCNN and Faster R-CNN are the same, but after adding the mask layer, each ROI can be predicted in parallel for each category and “edge,” and in parallel. ROI align of Mask R-CNN is shown in Figure 2.

The loss function of ROI is redefined as

In the above formula, both work for the positive ROI. For the mask branch of each ROI, the output dimension is . means to encode two-class masks for images, each with categories. Therefore, one must apply a single-pixel sigmoid for two classifications and define as the average two-class cross-entropy loss. For an ROI of category , is defined at the mask.

The definition of allows the network to generate a mask for each category, and there is no competition between categories. A special classification branch is used to predict the category label, thus decoupling the category and mask predictions. The FCN uses a single-pixel softmax and multinomial cross-entropy loss. In this case, there is competition between the mask and the classes. Mask R-CNN uses per-pixel sigmoid and binary loss without this consequence. Experiments prove that this is the key point to improve instance segmentation.

We use ResNet101 as the backbone of Mask R-CNN for feature extraction. ResNet includes multiple computing blocks composed of convolution, bias, and Batch Norm (BN). After the training of the model, there are some redundant steps in the model which are only used for forwarding propagation, and the redundant parameters can be reduced by parameter combination.

Parameter combination is the 5 parameters of the bias layer and BN layer , , , , and ; combined into and , the 2 parameters can then calculate it by .

3.2. Driving Area Division

In this paper, the samples are labeled according to the Mask R-CNN labeling rules. The labeled sample images are sent to Mask R-CNN for training.

Mask R-CNN is ideal for detecting and segmenting lane marking, and the architecture of Mask R-CNN is shown in Figure 3. However, the lane-marking feature obtained through deep learning cannot be directly used. Although the lane-line feature is learned through depth, the extracted lane-line feature only has the coordinate information of the lane line. In the lane lines formed by the dotted lines, we still must identify which lane lines each of these dotted lane-line segments belong to, and this information is not available for discrete coordinate points. Also, the real driving scene is multilane, and we must classify each lane line. Therefore, we propose a clustering method for feature points of lane lines, which can eliminate the interference between multiple lane lines and obtain their information. This provides more accurate and comprehensive input for subsequent lane-line fitting. First, it divides the pavement area, assuming that during data acquisition, the intelligent camera will have a collection period of 10 milliseconds and a top speed of 200 kilometers per hour. By calculation, we know that there are about 10 meters in one cycle. In principle, the camera can get a clear image of the 100-meter mark in front of the vehicle, and lane lines that are too far away are difficult to recognize.

The original image we acquired through the camera is shown in Figure 4. We will use the upper edge of the overall road information identified by the neural network as area . Above this area is the sky image, which contains no lane-marking line information. Area is a near-field area that is the main part of our field of vision, where the lanes can be approximated as straight lines. Area is the midfield area, and there may be a curve with a small curvature. Area is the far-field of view where the lane image reflects the curvature of the lane. After dividing the overall road information in this way, when designing the algorithm, we need only pay attention to the near and middle fields of view of the image of the front of the vehicle in the divided road area.

3.3. Clustering Feature Points

Let represent the extracted feature-point coordinates. The feature segment represents the lane-line marker segment information, composed of a series of lane-line coordinates, where and represent the starting and ending coordinates of a lane line, and are the line parameters of the cluster feature coordinate, and represents the number of coordinates of the current lane-line segment.

The feature coordinates’ information points already included in the lane-line segment extracted by deep learning may represent the current lane-line characteristic line segment. Then, we use the least-squares line combined with the feature points to cluster the feature lines and judge the continuity of the horizontal position of the line segments according to the feature points. The clustering equations are

The horizontal constraint is using the following equation. where is the maximum deviation distance of the feature points. Under the perspective effect of the image, decreases with distance. The feature-point clustering effect under this constraint is shown in Figure 5.

3.4. Fitting Model Design

To get the optimality parameter equation of the lane line which can be obtained, we comb the linear and polynomial fitting methods of feature points in different fields of view.

3.4.1. Straight Line Fitting Model

To ensure that the regression line mainly includes normal data points of which the error is close to zero, the noise points with maximum error on both sides of the regression line are removed. Considering the advantages and disadvantages of HT and LS, we propose a line-detection method that combines the two algorithms. First, HT is used to determine the approximate region of the line, and then, the improved least-squares method is used to determine the line parameters based on the specific point of each region after clustering. The algorithm flow is shown in Figure 6. (1)Under the parameters, a probability-based Hough transform operation is performed on the lane-line feature to obtain a straight line(2)For each line obtained by HT detection, in all feature point sets , feature points whose distance line is not greater than are found, to form a set (3)The regression line parameters and of set are determined and the standard deviation by the least-squares method(4)Any feature point in set , all feature points satisfying form a subset , all feature points satisfying form a subset (5)In set and , find the point with the greatest error:where represents the distance from point to the regression line. (6)Points and are removed; then, set and are updated. Repeat step 3 until the error less than

3.4.2. Complex Road Curve Fitting Model

For intelligent vehicles with simple functions and low-speed requirements, the lane-line position information obtained by the linear model can basically meet the requirements of lane-line detection, but it cannot adapt to the real-time road environment of different shapes. The commonly used parabolic curve-fitting model cannot adapt at the junction of straight lines and curves. We propose the following third-order polynomial fitting model as follows.

Figure 7 shows an example of a lane model with two lanes and one exit. The model consists of segments, each with points. We use the third-order polynomials to fit each lane line with the points of the line segment. During the fitting process, each segment is an independent cubic spline model, where , , , and constitute the parameter information of the function. represents the lane boundary offsets, is used to drive the heading or yaw angle, is used to derive the curvature, and is used to derive the rate of change of the curvature. The left and right lane models are modeled separately, which also makes it possible to model nonparallel lane boundaries.

3.5. Complex Road Multilane Line Constraint
3.5.1. Straight Line Merging Algorithm

A straight line is obtained after the Hough transform, and most of it is valuable lane line information. Some noise still must be eliminated. In principle, a lane line can only be matched by a straight line in the near or middle field. Therefore, we want to merge the straight lines in the near and middle fields of view. This paper introduces the similarity measure for evaluation. As shown in Figure 5, we cluster lines of similar distance and direction into one class and then use least squares to fit the lane-line feature points on all the lines.

and are the two end points of the line , and the inclination angle is . and are the two end points of the line , the inclination angle is . The inclination angle of the straight line made up by points and is .

3.5.2. Different Area Merging Algorithm

After getting the candidate marker lines in the different fields, we also need to merge the near midfield lines. For segmented lanes, each zone model needs to be connected first, and then, the lane model is built. The connection method is divided into a straight-line model and a curve model. The linear connection compares the slope of the line; the curve compares the curvature of the curve at the same point and determines the connection according to the distance between the segments to be merged. The specific method is as follows.

As showed in Figure 8(a), in the straight-line connection mode, and are the two end points of the straight line and and are the two end points of the straight line . and are the coordinates of the vertical axis to the intersection of the two straight lines on the dividing line. and indicate the slopes of straight lines and , respectively. If the condition of equation (9) is satisfied, then and are connected to form a combined line segment. is the slope difference threshold for merging. is the intercept difference threshold of merging. It can be set according to the actual situation.

As shown in Figure 8(b), in the connection mode at the curve, if of curve S1 is above of S2, we extend the two end points S1 and S2 to B0 and C0 of the divider. If formula (9) is satisfied, we use point of S1 and points and of S2 to determine the new curved line segment after the merge. is the threshold at the time of the merge. However, when the connection is actually used, the selection of the threshold has a great impact; there are fluctuations in different sections of the test, and it is not easy to set manually.

Actually, from the lane-line model design, we have made an ideal assumption that the lane lines in the mid- and near-field regions are approximately linear. This greatly reduces the lack of connection at the corners.

4. Experiment and Analysis

4.1. Datasets and Validation

Our experiment was designed mainly to detect the real road information in real time, so the training and test datasets consisted of real road information. In this paper, we used the TuSimple lane dataset and TSD-MAX traffic scene dataset to verify the effectiveness of our method.

It labeled the lanes of the test dataset and checked the accuracy of the IPM picture. We defined the criteria for the lane line as follows: where the input discrete point is regarded as the positive checkpoint in the true value area, TP is the total length of the positive lane-mark; FN is the total length of the missed check, and it equals the total marked true value minus TP; FP is the total length of a false check, and its value is equal to all extracted lane-mark points minus TP. For Mask R-CNN detection accuracy intersection over union (IoU), the following formula (13) is also applicable.

4.2. Complex Road Lane Line Detection Results

In the driving process of an intelligent vehicle, due to the complexity and diversity of the driving environment, false and missed detections often occur. Our proposed deep learning-based lane-marker extraction algorithm can well shield the influence of these uncertain factors on the lane-marker extraction results, so as to adapt to the complex and varied real road environment. Figure 9 shows the results of training using our improved model. As is shown in Figure 9, our model clearly identifies all the information.

Table 1 shows the comparison of our improved method and some semantic segmentation methods on the TuSimple lane dataset. As shown in Table 1, our method is more accurate than most of the state-of-the-art methods on the TuSimple lane test dataset. SCNN has certain requirements on the location distribution of lane lines, and it is suitable for the TuSimple lane dataset which have a fixed overall linear number and less scene changes. At the same time, Table 1 shows the processing speed comparison of the model. Our method takes only 150 ms to process a picture, which is better than other methods.

4.3. Complex Road Lane Line Fitting Results

After extracting the feature points, we also need to fit the lane-line feature points. To verify the robustness of the lane-line fitting algorithm, we performed current-lane detection and multilane-line detection on multiple datasets. As shown in Table 2, the average accuracy was 97.61%. Our method is more accurate than other fitting algorithms. In the meantime, combining linear and polynomial fitting methods of feature points in different fields of view has higher precision than other algorithms. Therefore, the robustness of the proposed algorithm was verified by the detection accuracy.

Table 3 shows the detection accuracy in different environments. is the total number of frames, is the missed positive rate, is the false positive rate, and is the true positive rate. The calculation formulas of and are (14) and (15), respectively.

Figures 1014 show the results of fitting the lane lines in different scenes. The first column is the result of the line fitting, the second is the result of the curve fitting, and the third is the final fitting of the line and the curve result. Other complicated situations are shown in Figure 15. As can be seen from the figure, our fitting algorithm fits the real lane line better, and it is also very good at the corner.

4.4. Competition Results

Figure 16 shows the response time comparison between our method and the traditional method. The results show that our method has a faster response speed. This shows that 5G-V2X has greatly improved the perception of complex roads and has met the real-time requirements.

5. Conclusions

In this article, we propose a real-time road perception method based on deep learning and 5G-V2X. Compared with the traditional method, this method has higher road perception ability and faster response time. Using linear fitting and polynomial fitting methods of feature points in different fields of view, the lane markings can be extracted robustly under various complex practical conditions, and the optimal parameter equations of the lane lines can be obtained. The experimental results show that the method adopted in this paper can better adapt to various types of road scenes. The algorithm has good detection effect in different scenarios, fast processing speed, good fitting effect, wide application scenarios, and strong robustness. In future work, we will focus on the optimization of the fitting algorithm to further improve the real-time performance in a dense traffic scenario of the proposed method.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Beijing Municipal Commission of Education Project (Nos. KM202111417001 and KM201911417001), the 2021 Vocational Education Project of China Association of Vocational and Technical Education (Grant No. SZ21C037), the National Natural Science Foundation of China (Grant Nos. 62102033, 61906017, 62006020, 61871039, and 62171042), the Collaborative Innovation Center for Visual Intelligence (Grant No. CYXC2011), and the Academic Research Projects of Beijing Union University (Nos. ZB10202003, ZK40202101, ZK120202104, and BPHR2020DZ02).