Abstract

Intelligent transportation applications based on urban Internet of Things can improve the efficiency of government services and promote urban modernization. As smart cameras are more and more widely used in cities, artificial intelligence technology is an important force to achieve license plate recognition. An efficient license plate recognition algorithm not only improves the efficiency of traffic management but also saves management costs. This paper proposes a network based on the shufflenetv2 dilated convolution (SDC) model, which includes two parts: license plate location and license plate recognition. SDC model adopts shufflenetv2 as the backbone network, which combines dilated convolution and global context blocks. Therefore, the receptive field and feature expression ability of the model are enhanced. For license plate location, CIOU loss considers not only the coverage area of the bounding box but also the center distance and aspect ratio. For license plate recognition, CTC loss trains the network based on the sequence and solves the sample alignment problem, which improves the accuracy of license plate recognition. The experiments show that the precision of the SDC model in license plate location is 98.7%, which is 5.2%, 5.5%, and 4.1% higher than the precision of Faster-RCNN, YOLOv3, and SSD, respectively. The precision of the SDC model in license plate recognition is 98.2%, which is 5.3%, 3.7%, and 2.9% higher than the precision of LPRNet, AlexNet, and RPNet, respectively.

1. Introduction

Intelligent transportation is an important foundation and guarantee for national economic and social development [1]. With the innovation of Internet of Things and big data [2, 3], it is very important to further promote the digitization and intelligence of transportation industry, which is conducive to promoting the better and faster development of transportation industry. Due to the advancement of social modernization and the improvement in people’s quality of life, the number of intelligent cameras and vehicles is growing rapidly, which brings great challenges to traffic management cost [4, 5]. Therefore, accurately obtaining traffic data and constructing traffic data processing model is the premise of building intelligent transportation, and this problem can be solved by big data technology [6].

Based on the Internet of Things, intelligent transportation can improve not only traffic quality but also the efficiency of traffic management [7], using advanced video monitoring equipment and intelligent identification methods to increase the temporal, spatial, and scope management, which can continuously improve the fineness of transportation systems. As the basic work of intelligent transportation, the license plate recognition technology based on deep learning has laid a good foundation for the next analysis and decision-making of the Internet of Things.

In the actual scene, there are many complex objects in the captured images, such as people and vehicles. Before we recognize the license plate, we need to locate the vehicle in the image. Traditional license plate location methods include methods based on edge detection [8], color features [9], and mathematical morphology [10]. However, the above methods are greatly affected by the external environment and image quality. Du et al. [11] proposed the SSD model with VGG19 as the basic network. SSD predicts the position offset between each bounding box and the ground-truth box. However, SSD method has a situation in which the position offset is too large and beyond the range of the image. Redmon et al. [12] proposed the Yolov3 model, which was built with darknet-53 as the backbone. YOLOv3 model controls the range from 0 to 1 by adding sigmoid activation to the predicted position offset, which solves the problem of an excessively large position offset. These two methods are types of one-stage methods [13], which have the advantage of fewer calculations and can save time, but they lack accuracy. Ren et al. [14] proposed the Faster-RCNN model, which is a type of two-stage method. It has great advantages in terms of accuracy, but it requires considerable computation and time [15].

At present, license plate recognition algorithms include template matching [16] and feature analysis matching methods [17]. The template matching method adjusts the license plate characters according to the size and matches the template characters in all sample libraries. The feature analysis matching method extracts the features and discriminates between the results according to the number and shape of the character contour. However, they are obviously affected by illumination, noise, and character occlusion. Zherzdev et al. [18] proposed the LPRNet model, which does not need to segment characters and effectively solves the gradient problem. However, the recognition accuracy of complex situations is not high, and it easily decreases during training. Xiu et al. [19] proposed an end-to-end license plate recognition algorithm based on AlexNet model. The AlexNet model uses overlapping pooling, local normalization, and dropout methods to improve the accuracy of the model. But the convolution and pooling of AlexNet model in the training process will cause the loss of features. Xu et al. [20] proposed the RPNet model, whose feature map is shared and the loss function is jointly optimized. However, the lack of the spatial information and receptive field of the model leads to incomplete feature extraction.

To solve the accuracy problem of the above algorithm, this paper proposes a network based on the shufflenetv2 [21] dilated convolution (SDC) model: (1)SDC model adopts shufflenetv2 as the backbone network, which combines dilated convolution and global context blocks. Therefore, the receptive field and feature expression ability of the model are enhanced(2)For license plate location, CIOU loss considers not only the coverage area of the bounding box but also the center distance and aspect ratio. For the license plate recognition, CTC loss trains the network based on the sequence and solves the sample alignment problem, which improves the accuracy of license plate recognition

2. Materials and Methods

For the problems regarding about the accuracy of license plate recognition [22], this paper proposes a network based on the shufflenetv2 dilated convolution (SDC) model, which includes two parts: license plate location and license plate recognition.

2.1. Dilated Convolution

Because vehicle recognition systems in practical applications face different scenarios (e.g., road traffic and high-speed charge stations) [23], different proportions of license plate sizes result in the whole picture. In the face of these complex situations, the use of a standard convolution receptive field cannot solve our problem.

Dilated convolution adds dilation based on an ordinary convolution [24], which increases the size of receptive fields in the calculation process. The dilation rate is used to control the interval between points of the convolution kernel. In Figure 1, the left figure shows a standard convolution kernel, which corresponds to the , with the receptive field represented as the blue region. The left figure represents the dilated convolution, which corresponds to the . The receptive field is enlarged by , which can make the coverage of the convolution larger to add the receptive field.

The calculation of the dilated convolution receptive field is shown below: where indicates the receptive field and indicates the dilation rate.

2.2. License Plate Location

The conventional license plate location algorithm, including the one-stage method and the two-stage method, cannot achieve a good balance of speed and accuracy in the face of complex situations. Therefore, this paper uses the anchor-free location method based on four angles, and the calculation method is shown below: where is the amplitude, . is the predicted coordinate of the angle, and is the ground-truth coordinate. and represent the variance.

This paper uses the lightweight network shufflenetv2 to build the model, which reduces the training time. The lightweight ShuffleNetv2 network contains multiple Shuffle blocks. The channel split operation in the Shuffle block divides the number of feature channels into two branches, which reduces the number of parameters in each branch and improves the operation speed. Combining global context blocks and dilated convolution enhances the spatial information of the model and increases the receptive field of the model.

In Figure 2, the gray block represents the stem block with dilated convolution [25]. The stem block utilizes three convolutions instead of a large-scale convolution to reduce the loss of information and is combined with dilated convolution to increase the receptive field size. The loop module includes three shuffle blocks and a global context-dilated convolution (GC-DC) block [26]. The green block represents the shuffle block with , which is used to compress the width and height of the feature layer. The Shuffle block with not only reduces the amount of calculation, but also retains more feature information to improve the effect of feature extraction. The yellow block represents the shuffle block with , which is used to deepen the number of layers of the network. The orange block is the GC-DC block, which represents the global context block with dilated convolution. The blue line represents the jump connection, and the red line indicates the loop operation. The loop module is followed by a down-sampling operation and can obtain three feature maps with different scales. “Add” is the feature fusion operation. The blue block is the residual block, and the white block is the convolution. After applying the residual block and convolution, the located license plate image is obtained.

The loss function used in license plate location is CIOU loss [27]. Compared with IOU loss [28], GIOU loss [29], and DIOU loss [27], it considers the overlapping area, center point distance, and aspect ratio. The model can accelerate convergence and improve the regression accuracy when performing bounding box selection. where and are the two bounding boxes; is the ratio of the intersection and union of these two boxes; and denote the center points of the predicted and ground-truth boxes, respectively; represents the Euclidean distance between the bounding box and the center point of the ground truth; represents the diagonal distance of the minimum circumscribed matrix of the predicted box and the ground-truth box; and are the width and height of the bounding box, respectively; and and are the width and height produced by the ground-truth value, respectively.

2.3. License Plate Recognition

The license plate recognition method requires character segmentation training, which increases the training time. Therefore, this paper uses the lightweight ShuffleNetv2 network as the backbone network, combined with dilated convolution and CTC Loss [30] to establish a model and adopts end-to-end training without character segmentation. In Figure 3, the white block is convolution, and the blue block is pooling. The yellow block represents the shuffle block with . The part in the red box is the loop module. The black block represents the downsampling operation, and “Add” is the feature fusion operation. The orange block is the GC-DC block, which represents the global context block with dilated convolution.

The loss function used in the license plate recognition network is CTC loss. The traditional BP neural network is trained by the frame, and CTC loss is trained by the sequence. The training based on the frame needs to align samples. However, the location and proportion of license plates on each frame are different, which makes it difficult to align samples in practice. CTC loss is a loss function without the need to align samples, and only the corresponding sequence must be obtained for the same character.

In Formula (4), is the input data, and is the output data. is the sequence label, and is the sequence label with blank character. In Formula (5), represents the set of many-to-one mappings. In Formula (6), is the sum of probabilities that the input is and the output is , is the forward recursive probability sum, and is the reverse recursive probability sum.

3. Results and Discussion

The dataset used in our experiment is the CCPD2019 dataset (https://github.com/detectRecog/CCPD). The CCPD2019 dataset is an open source license plate dataset of the University of Science and Technology of China, which is an authoritative dataset in the field of license plate recognition. In Figure 4, the photos of license plates in the CCPD2019 dataset involve a variety of complex environments. The CPU used in our experiment is an Intel(R) Core(TM) i9-9820X CPU @ 3.30 GHz, and the GPU is an NVDIA GeForce RTX 2080 Ti GPU.

Precision and recall are used as the evaluation criteria in our experiment [31], as shown in Formulas (9) and (10), where TP represents the number of license plates correctly determined by the model, FP represents the number of non-license plates incorrectly determined as license plates by the model, and FN represents the number of license plates incorrectly determined as non-license plates by the model.

3.1. License Plate Location Experiment

This experiment adopts the anchor-free location method based on four angles. In the training stage, it is necessary to locate the four-angle positions of license plates in each image and obtain the corresponding coordinates. We send the corresponding coordinate and the pictures into the model for training. The effect of license plate location is expressed by the Gaussian score. When the Gaussian score of training is higher than the set threshold, the model will start training the recognition part. The threshold of this experiment is set to 0.85. The license plate location results are shown in Figure 5.

The located image will not be directly input into the recognition network. There will be an interception operation to eliminate other parts and only retain the license plate. Figure 6 shows the results comparing the locations and the reserved license plates.

Our approach compares with three classical license plate location algorithms (fast RCNN, YOLOv3, and SSD). The comparison results on the CCPD2019 dataset are shown in Table 1.

As seen from the results in Table 1 and Figure 7, for the precision evaluation criterion, our approach is 5.2% more precise than Faster-RCNN, 5.5% more precise than YOLOv3, and 4.1% more precise than SSD. For the recall evaluation criterion, the recall of our approach is 8.0% higher than Faster-RCNN, 5.1% higher than YOLOv3, and 7.1% higher than SSD.

3.2. License Plate Recognition Experiment

The image obtained by the Section 3.1 is input to the license plate recognition network. The results of license plate recognition are shown in Figure 8.

Our approach is compared with three representative license plate recognition algorithms (LPRNet, AlexNet, and RPNet). The comparison results of the four license plate recognition algorithms are shown in Table 2.

As seen from the results in Table 2 and Figure 9, for the precision evaluation criterion, our approach is 5.3% more precise than LPRNet, 3.7% more precise than AlexNet, and 2.9% more precise than RPNet. For the recall evaluation criterion, the recall of our approach is 7.1% higher than LPRNet, 4.0% higher than AlexNet, and 1.8% higher than RPNet.

4. Conclusions

The combination of the big data and Internet of Things can achieve good results by training the model with data obtained from urban intelligent cameras [32]. Big data helps to establish a comprehensive traffic information system. By integrating the “data warehouse” in different regions and fields, the integrated utilization mode of public transport information is constructed. The Internet of Things and big data technology are research focus and would have a profound impact on the intelligent transportation.

Therefore, this paper proposes an SDC model, which combines dilated convolution and global context blocks. Therefore, the receptive field and feature expression ability of the model are enhanced. For license plate location, CIOU loss considers not only the coverage area of the bounding box but also the center distance and aspect ratio. For license plate recognition, CTC loss trains the network based on the sequence and solves the sample alignment problem, which improves the accuracy of license plate recognition.

As a part of intelligent transportation, license plate recognition technology based on deep learning has laid a good foundation for the future analysis and decision-making of the Internet of Things [33]. However, the accuracy for complex situations is not high. For example, for snowy and rainy days, license plates re prone to partial occlusion, which will lead to license plate recognition errors. For dim or nighttime environments, due to the lack of light, there is a lack of color information in the picture, making it difficult to locate and recognize license plates. Therefore, adjusting the parameters of the deep learning model in complex situations will become a difficult problem in the Internet of Things.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by horizontal scientific research project of Campus Network Design Scheme (HX2021251), in part by the Hubei Natural Science Foundation under grant 2021CFB156 and the JSPS KAKENHI under grant JP21K17737.