Abstract
To reduce the risk of traffic congestion to residents and the urban transportation system, this paper extracted frequently congested areas and major trunk roads based on the GPS (global positioning system) data of cabs and TPI (traffic performance index) data and identified traffic patterns and main trunk roads in the traffic grid, so as to analyze the evolution of traffic congestion and make effective suggestions. The results can not only enable travelers to effectively avoid peak periods and congested sections but also support the managers to optimize urban planning and implement efficient traffic management methods. The research process of this study is as follows: firstly, the research object area was divided into different grids based on one-week taxi GPS data and the distribution characteristics of taxi operations in Qingdao. Secondly, the two-dimensional grid traffic attribute information is constructed using the following two indicators: number of vehicles and the average speed of passenger trajectory. Then, the congestion discriminant model based on the three-dimensional traffic attribute information was established according to the variation rules of the number of position point in the grid. Finally, the TPI data was applied to compare and evaluate the identification results of the above two models to identify frequently congested grids and main trunk roads. The case analysis showed that the result of grid’s congestion status identification considering three-dimensional traffic attribute information (25.198%) was better than that of grid congestion state considering two-dimensional traffic attribute information (23.997%).
1. Introduction
In recent years, the urbanization process has continued to advance from urban centers to urban suburbs, and the functions of the cities have been gradually enhanced. On the one hand, the increasing development of the city has promoted the growth of social economy, improved people’s living standards, and increased residents’ needs in economy, culture, education, technology, and other aspects. On the other hand, it has promoted the development of urban transportation and improved the way people travel. However, in the process of urban traffic development, the mismatch between road traffic facilities and people’s travel needs has led to traffic congestion, frequent traffic accidents, environmental pollution, and other problems, among which congestion has increasingly become the focus of people’s lives.
In the early stage, traffic flow parameters of traffic operation status identification were mainly obtained through questionnaires, induction coils, and other methods, which not only took a lot of time and effort but also had low efficiency and weak universality (generality). Subsequently, with the rapid development of traffic big data, more data sources are used for traffic status identification research, such as floating car data, hand-held terminal data, multisource internet data, etc., and traffic status identification methods are also increasing.
Chen et al. [1] introduced an improved neural network structure and adaptive gradient learning algorithm in the process of discriminating the expressway traffic operation status, which realized the classification and identification of recurrent congestion and occasional congestion. Based on taxi GPS data, Fotouhi [2] proposed a traffic condition identification method based on the K-means clustering algorithm and obtained the best traffic condition identification effect by constantly adjusting parameters, which could correctly identify traffic conditions in most cases. Lu et al. [3] adopted the clustering algorithm based on density to realize the identification of seriously congested sections by defining the spatiotemporal association rules. Liu [4] made a comprehensive evaluation by referring to multiple indexes, such as sample size, traffic flow speed, and speed difference, to identify the water accumulation traffic status. Yu et al. [5] comprehensively evaluated the traffic operation status according to traffic flow, average speed, and occupancy and made experimental analysis with simulation technology. Zhang et al. [6] identified recurrent congestion roads from a new perspective, namely, experimenting by considering the congestion intensity, congestion time, congestion position, congestion frequency, and other factors. Huang et al. [7] established the discriminant model of traffic operation status based on the fuzzy C-means clustering covering traffic flow, vehicle speed, road occupancy, and other various traffic attribute information. Yang et al. [8] built a traffic congestion status estimation model combined with multiple information, such as vehicle density, speed, traffic inflow, and previous traffic status information, which had a good identification effect. Wang et al. [9] developed an algorithm for automatically detecting the traffic status of road intersections using the GPS trajectory of taxis and set different traffic rules for different intersections to achieve high detection quality. Zhang and Guo [10] built a quantitative model of road network traffic status and calculated the real-time traffic status and the distance between the most severe congestion status on the basis of historical time delay index data to quickly identify road network congestion status. Li et al. [11] proposed a road traffic congestion recognition method based on the support vector machine (SVM) with traffic volume and traffic density as parameters, which had good recognition effect. Feng et al. [12] put forward a critical road recognition method by combining the GPS trajectory data with the oriented weighted complex network. Ding et al. [13] presented a congestion recognition method using a convolutional neural network in deep learning, which could effectively detect traffic congestion images. Han et al. [14] introduced Fourier transform into the oriented graphic convolutional method and constructed a subgraph to identify traffic congestion through spanning trees.
According to the above literature, it can be concluded that in terms of data, different cities have their unique geographical characteristics and traffic systems [15], and the calculation method and significance of traffic congestion evaluation indexes are defined with their own characteristics. Hence, the congestion evaluation indexes are not exactly the same. In the selection of research objects, given the differences among cities, the characteristics of urban traffic congestion are often not universal, and the measures to alleviate congestion are not general. In terms of traffic operation characteristics, scholars have mostly focused on applying clustering algorithms to the classification of time to explore the differences between the traffic operation status in different time periods or to conducting classification research on regional traffic congestion patterns while rarely categorizing and analyzing on traffic operation characteristics of roads. However, in a complex road network, traffic conditions vary between different classes of road sections and even between the same class of road sections. The spatial and temporal distribution characteristics of traffic operation states differ because of the surrounding economic development level, public service level, and other factors [16]. Therefore, it is very necessary to study the classification of road traffic operation status.
To solve the above problems, by combining the practical situation of the study area and meshing the study area, a congestion discriminant model is proposed in this paper to reduce the impact of map matching algorithm on results, and by extracting grid’s internal attributes based on the taxi GPS data, the traffic attributes of different dimensions are separately introduced into congestion discriminant model to identify recurrent congestion areas and main trunk roads.
The organizational structure of this paper is divided into four sections. Section 2 is the research data and methods in which the data and relevant theoretical knowledge of congestion discrimination model are introduced. Section 3 shows the case study. The section first explores the regional grid division to identify congestion-prone grids using the number of vehicles and average speed and then explores and introduces the variation law of the number of anchor points in the grid to establish a congestion discrimination model based on 3D traffic attribute information, after which the identification results of the above two models are compared and evaluated using TPI data to identify frequently congested grids and major roads. The fourth part is the conclusion, which summarizes the main research work and possible shortcomings and provides an outlook for future work. The flowchart for this article is shown in Figure 1.

2. Data and Methods
2.1. Data
2.1.1. Traffic Operation Index Data
The traffic operation index is usually combined with the real-time traffic status, road capacity, and other factors to evaluate the overall traffic operation status of the road network. The traffic operation index calculation model is usually established using different traffic flow parameters according to the road operation characteristics of different cities, where the index range is [0, 10], and the larger the value, the higher the congestion degree.
Python software is used to obtain the TPI data of main trunk roads from Qingdao public security traffic information service network, and a total of 313,693 pieces of data is collected from 6: 00 to 23: 00 on April 16, 2018, to May 27, 2018, with a collection frequency of 5 minutes. After deleting two weeks of data with statutory holidays, a total of 28 days of data are included. Taking Taidong No.1 Road as an example, the data format includes arterial road name, traffic operation index, float, average speed, congestion level etc., as shown in Table 1.
2.1.2. Taxi GPS Data
In recent years, with the development of position technology and wireless communication technology, taxis have basically realized the full coverage of GPS devices. The GPS data of taxi record the location and related operation information in the driving process every 10–30 s, including the license plate number, longitude, latitude, instantaneous speed, time, passenger status, etc. In this paper, a total of 7.67 G (123,317,809 records) taxi operation data of Qingdao city from September 9, 2019, to September 15, 2019, for a consecutive week were adopted from Qingdao Municipal Transport Bureau, and the data format is shown in Table 2.
In Table 2, CLBH is the taxi license plate number, JDZB and WDZB are the longitude and latitude coordinates of the taxi trajectory point, respectively, GPSSD and GPSSJ are the speed and time stamp when the taxi records are written, respectively, and ZKZT is the passenger status, which represents whether there are passengers in the taxi. Besides, ZKZT is composed of 0 and 1, where 1 indicates carrying passengers and 0 indicates no passengers, which means that the taxi is driving with no passengers or looking for passengers.
2.2. Congestion Discrimination Model
In the study of urban traffic congestion based on taxi GPS data, scholars mostly use map matching algorithms to improve the matching between location points and roads. However, this algorithm has the problems of long time consumption and high complexity. To solve this problem, scholars Shi et al. and Yang [17, 18] proposed a grid-based congestion discrimination model to reduce the impact of map matching errors on subsequent studies.
The occurrence of congestion in a certain road section is usually manifested by an increase in the number of vehicles and a decrease in the speed of vehicles within a certain range. Therefore, the average speed and the number of vehicles of taxi passenger trajectory are introduced and represented by V and N, respectively. The study area is divided into grids, and the traffic operation mode of a grid within time t is defined as , where V is the average speed of passenger trajectory in this time period, and N is the total number of vehicles in this time period. After data normalization, the Euclidean distance is used to measure the difference of and of the same grid in different time periods.
In multiple consecutive time periods , where is the traffic operation mode of the grid within the time period , it is significantly different from the average traffic status of the sample in the whole day, i.e., it meets the following condition:where is the judgment threshold, setting at 0.5 or 0.4, and is the average value of all grid traffic operation modes in all time periods to , i.e.,
To reflect the decrease of the average speed in traffic congestion, the range of the speed parameter value should be specified, i.e., the average speed of all trajectories in the grid within time should be set lower than the means of all grids from time to time .
If the two conditions of equations (2) and (4) are satisfied at the same time, it will be determined that traffic congestion occurs in this grid within the time period .
In July 2018, Yang Haiqiang improved the model again [19]. Taking into account the unit size inconsistency between the average speed and the number of vehicles, he used the Marcian distance to calculate the differences between the traffic operation modes of each grid.where is the covariance matrix of full sample, , ,… , and the calculation method is shown inwhere and are the average speed and means of vehicles in all grids from time period to , respectively.
is the traffic operation mode of the grid with the time period , which is significantly different from the average traffic status of the sample in the whole day, i.e., it meets the following condition:
The right formula is the means of distance between the average value of the traffic operation mode of each period and the all-day operation mode, and is the discriminant index of congestion degree. Among them,. The larger the value, the higher the congestion degree.
In addition, the traffic operation mode in the grid may have many abnormal conditions, such as the increase of the number of vehicles and the increase of the speed, the decrease of the number of vehicles and the increase of the speed, etc. To eliminate the abnormal status during traffic congestion, screening conditions should be set based on the number and speed of vehicles, i.e., not only should equation (4) be satisfied but also , the number of vehicles passing through the grid within the time period , should be greater than the means of all grid vehicles from to .
If the three conditions of equations (4), (7), and (8) are satisfied at the same time, it will be determined that traffic congestion occurs in this grid within the time period .
As the number of position points also shows a periodic change trend with the change of time and can reflect the change rule of the grid traffic status to a certain extent, the traffic attribute and the number of position points in the grid are introduced. Considering the average speed, the number of vehicles and position points in the grid, the traffic status is defined as a three-dimensional vector, namely, , where represents the number of position points. Then, equation (9) is used to measure the difference between and of the same grid in different time periods. Compared with equation (1), the distance between grids increases after improving the model.
In equations (4) and (8), we set the range of vehicle speed and number when congestion occurs. Considering the driving time of vehicles under congestion, a screening condition is added and quantified by equation (10). It means that when congestion occurs in , if the travel time of vehicles passing through the grid is long, the number of GPS position points collected should be large. Therefore, , the number of position points of all trajectories in this time period, should be greater than the means of all grid position points from to .
If equations (4), (7), (8), and (10) are satisfied at the same time, it will be determined that traffic congestion occurs in this grid within this time period.
3. Case Description
3.1. Overview of the Study Area
Qingdao is located in the southeast of Shandong Peninsula. As the economic center of Shandong Province, the first batch of coastal open cities in China, and a popular coastal tourist destination, Qingdao has successfully held a number of major events in recent years, such as the SCO Summit, Boao Forum for Asia, Xiangshan Tourism Summit, and the 29th International Beer Festival, with a large population flow and rapidly growing economy. In 2018, Qingdao’s GDP (gross domestic product) exceeded 1.2 trillion yuan, up 7.4% year-on-year, and ranked second among prefecture-level cities in Shandong Province in terms of GDP per capita. In terms of traffic, the urban public transportation has achieved rapid development. Buses and taxis have become the main ways for residents to travel, and rail traffic also has gradually developed with the operation of Qingdao Metro Line 2, Line 3 and other lines, which provides convenience for citizens to travel. By the end of 2018, Qingdao has a large population with 2.826 million motor vehicles and 18.249 billion passenger trips/km of passenger turnover. However, traffic congestion has become an urgent problem as Qingdao ranked 25th in the list of cities according to the road network travel delay index during peak hours in the “Traffic Analysis Report of Major Cities in China in the First Quarter of 2019.”
As the main framework of the urban road network, the main trunk road carries a large number of people and traffic flow, and hence, they are chosen as the research objects. If the main trunk road has two directions of upstream and downstream, then each direction is regarded as a main trunk road. Therefore, a total of 63 main trunk roads are collected in the study area to explore the spatial and temporal characteristics of the traffic operation of trunk roads in the area to solve the urban traffic congestion problem.
3.2. Data Preprocessing
3.2.1. Processing of Traffic Operation Index Data
Because of the delay of collection time or the influence of bad weather, there are some problems, such as missing and duplication of traffic operation index data. Therefore to ensure the integrity and accuracy of the data, the TPI data should be screened and corrected:(1)If the major trunk roads with upstream and downstream directions are considered to be two trunk roads, the total number of trunk roads in the original data is 63. Fifteen of these trunk roads are relatively short in length or lack more data. Hence, they are deleted, and the remaining 48 trunk roads will be used for the study.(2)If the main trunk road has two sets of data at a time point, then the group of data with large fluctuation will be deleted according to the TPI data of the two moments before and after the time point.(3)If the data at a certain time point is missing, then the average TPI of the two moments before and after the time point will be used for correction.
The study collected 313,693 pieces of raw traffic operation index data, and after processing, 275,520 pieces of data remained.
3.2.2. Processing of Taxi GPS Data
Because of the large and complicated data obtained by taxi GPS devices and the unavoidable factors, such as vehicle instrument failure, errors in the data collection process, etc., unreasonable data with latitude and longitude beyond the study range, repeated data, and incomplete information often appear. To improve the accuracy of the data, it is necessary to preprocess the original GPS data, which mainly includes the removal of unreasonable data and the extraction of effective data:(1)Delete incomplete, repeated, and incorrect data There are five types of data information collected from taxi GPS data in this paper, which are vehicle number, longitude, latitude, time, and passenger status. If the data fails to contain these five types of information at the same time, it will be deleted. If more than one data exists in a taxi at a certain time, only one piece of data with the same information is retained and the other repeated data will be deleted. According to the city’s maximum speed limit, vehicles with instantaneous speed greater than 100 km/h are unreasonable records and will be deleted.(2)Remove unreasonable data The time range of the target data is specified as 6: 00 to 23: 00 from September 9 to September 15, 2019. If the data exceed the range, then they will be removed. Combining with the scope of the target area, the longitude and latitude information in GPS data is used to screen data. In this paper, the study area includes Shinan District, Shibei District, Licang District, and part of Laoshan District in Qingdao, i.e., , , where is the longitude and latitude value, respectively. The data satisfying the above constraints are reserved, and the data beyond this range will be denoted as deviation data and removed.(3)Extract effective passenger trajectory data
The passenger trajectory data of a taxi refers to the activity event of completing loading and unloading, i.e., when the passenger status of a taxi changes from 0 to 1, the data at the moment is marked as the loading data, and after that, the passenger status continues to be 1. When the passenger status changes from 1 to 0 at the first moment, the data at the time is recorded as the data of unloading. The GPS points between the loading point and the unloading point are connected as the passenger trajectory of the taxi (Figure 2).

3.2.3. Map Matching
Because of the influence of taxi GPS devices, satellites, and other factors, there will inevitably be some deviation between the collected GPS data and its actual position. When the data is imported into the road network map, it will be found to be off-road, thus affecting the subsequent analysis. Therefore, the map matching of anchor point data is required before analysis to make the data accurately mapped to the map.
When using the direct projection algorithm for map matching, the distance problem is considered first, and the road sections within a certain range near GPS points are selected as candidate road sections. When there are multiple candidate road sections, the road section closest to the vehicle position should be selected as the final matching road section by considering driving direction, projection distance, and other factors, and then the original GPS point is projected onto the matching road section. The projection point is the matching point of the original GPS point.
The basic principle of direct projection method is shown in Figure 3 The point is a moment of GPS point. Firstly, two candidate road sections and in the area adjacent to the current vehicle are found according to the search radius. Make the projection of point on road and . and are projection points. Then, the candidate road measurement is calculated based on the angle between the vehicle’s movement direction and the candidate road direction, as shown in

In equation (11), is the metric value of the candidate road section corresponding to the anchor point . and , respectively, represent the weight value of distance and vehicle direction. is the projected distance from point to the candidate road. is the included angle between the driving direction of the vehicle and the direction of the candidate road section. . Among all the roads to be matched, the larger the road metric value, the closer the candidate road segment to the actual driving road of the vehicle, and the projection points on the road are the positions of the vehicle after matching.
The map matching algorithm written in Python is used to modify the original taxi GPS points, which can accurately locate some scattered registration points around the road section into the road network and ensure the effectiveness of the subsequent work.
3.3. Area Meshing and Grid Traffic Information Extraction
3.3.1. Area Meshing
In the past, scholars mainly conducted regional research based on practical basis (administrative region, road, river, etc.). The division method based on administrative region and river is simple, however, a single grid covers a wide area, which is suitable for macro research, and although the road-based division method can excavate traffic conditions in detail, it is difficult to divide roads because of main trunk roads, subroads, and branches. Moreover, some scholars divide them by fixed square shapes, such as the grid data of or , where the area and number of grids can be set, which is regarded as a simple and easy process. Therefore, to describe the spatial structure differences more accurately, this paper uses a raster-based method and selects a grid cell of to divide the study area into multiple square grids.
According to the spatial distribution of main trunk roads, the areas with large distribution of main trunk roads were selected and divided into square grids, where the side length of each grid was 500 m, and the drawing direction was from west to east and from south to north. Figure 4 shows a grid area divided into , whose main trunk roads mainly included Nanjing Road, Shandong Road, Harbin Road, Ningxia Road, Wenzhou Road, and other sections.

Each grid has different geographical information attributes and basic service facilities. Hence, the traffic information collected in different time periods is quite different. For a single grid, the latitude and longitude range of the grid is equivalent to limiting the travel range of taxi GPS. When traffic congestion occurs in the city, the clustering phenomenon usually occurs, i.e., the number of taxi vehicles in a specific range increases, the number of GPS position points collected increases, and the average speed of vehicles decreases.
3.3.2. Grid Traffic Information Extraction
The GPS data of taxis in Qingdao city from 6: 00 to 23: 00 on September 9 to September 15, 2019 (Monday to Sunday) were selected, and the time interval was set as 10 minutes. The average speed and the number of vehicles in all the grids were extracted during each period, and the number of GPS position points in the grid was also counted. The greater the number of position points, the longer the taxi in the grid, and thus, the more the position points collected. The variation trend of each attribute information of all grids within a week was as shown in Figure 5.

(a)

(b)
As can be seen from Figure 5, the variation trend of each attribute information in the grid is obviously cyclical. In the period of September 9 to September 12 (Mondays to Thursdays), the daily operating status of taxis is larger, the morning and evening rush phenomenon is obvious, the change range of all passenger trajectories of the grid average speed is 9.98 km/h to 29.2 km/h, and the number of vehicles and location points are more. In the period of September 13 to September 15 (Friday to Sunday), the curve fluctuation slows down, and the overall taxi operating status is better than that on weekdays. All passenger trajectories of grid average speed vary in the range of 15.49 km/h to 30.38 km/h, and the number of vehicles and position points decreased relatively, which are related to the daily travel characteristics of residents. The period from September 13 to September 15 is the mid-Autumn festival holiday. During this time, the residents’ travel time is relatively dispersal. However, the rush travel hour of most of the residents in working days for regular activities is fixed, the number of vehicles on the roads will see a sharp increase, and the speed is reduced.
In addition, within the study scope, the number of vehicles and the number of position points show similar trends, i.e., with the increase of the number of vehicles and the number of position points, the average speed of all grids gradually decreases, and vice versa. Taking the grid (1, 5) on September 9 as an example, the number of vehicles, the number of position points, and the average speed values in each time interval are plotted, as shown in Figure 6.

(a)

(b)
From Figure 6, it can be seen that with the increase of the number of vehicles and position points in grids, the average speed of most passenger trajectories gradually decreases. However, there are some abnormal points, i.e., the number of vehicles is less in the grid, and the average speed is low. Thus, it cannot reflect real congestion. If we want to discern the type of grid congestion, it is necessary to delete these abnormal points. While in a time period, if the number of vehicles and position points are large and the average speed of vehicles is low, the congestion may occur at this time point.
Therefore, the basic scheme of judging traffic congestion is to compare various traffic attributes (number of passenger trajectory vehicles, location points, and average vehicle speed) in the grid with the average state attributes, respectively. If the differences are large, i.e., the number of vehicles and taxi passenger trajectory, the position points are higher than the average value, and the average speed is lower than the means, then it can be determined that traffic congestion occurs in this grid, and the accuracy of congestion discrimination results can be evaluated using traffic performance index data.
3.3.3. Grid Congestion Identification Based on Two-Dimensional Traffic Attribute Information
In a grid, the time interval is 10 minutes, and the average speed of all vehicles passing through the grid, the number of vehicles, and the number of GPS position points of taxis in each time period are counted. Taking grid (1, 5) as an example, its attribute data on September 9, 2019 is shown in Table 3.
Firstly, the traffic state is defined as a two-dimensional vector by considering only the average speed and the number of vehicles in the grid, namely, . The variation trend of the average speed and the number of vehicles in grid (1, 5) in a week is shown in Figure 7.

In Figure 7, the traffic status in grid (1, 5) showed morning and evening rush every day from September 9 to September 12, among which the evening peak from September 9 to 12 is more obvious, and congestion is more likely to occur during these morning and evening rush periods.
Considering the disunity of the dimensionality of the two attributes in the grid, the maximum and minimum normalization methods are used to scale the data between 0 and 1. During the period from 6: 00 to 23: 00 on September 9, 2019, the average speed and the means of the number of vehicles passing through the grid are 15.93 and 37, respectively. After normalizing the data, the average values of the two attributes are 0.31 and 0.51, respectively.
Based on the congestion discrimination method in 2.2, after data normalization, Euclidean distance is adopted to measure the differences in different time period of the same grid, and within the time period ,, the traffic grid operation mode is compared with the average traffic operation mode throughout the sample in the whole day, namely, equation (7), where the degree of the congestion index value of is set as 1, 2, and , respectively. The larger the value of , the more serious the congestion. When and if equation (7) does not hold, then it indicates that the current period of grid traffic condition is good. When and if equation (7) holds and simultaneously satisfies equations (4) and (8), i.e., the number of vehicle passengers in the grid increases and the average speed of the trajectory decreases, then the grid is determined to be congested. When and equations (4), (7), and (8) are simultaneously satisfied, it is determined that the traffic operation mode in the grid is abnormal and the congestion is serious.
The calculation results of traffic status discrimination in grid (1, 5) from 8: 00 to 10: 00 on September 9 are shown in Table 4.
According to the calculation, if the time interval is set as 10 minutes, there are 43 time intervals that satisfy the condition of equation (7) () on the same day. With the increase of the constraints and equations (4) and (8), there are 22 time intervals that satisfy the constraint condition simultaneously, namely, 7: 40–8: 20, 8: 40–9: 50, 17: 10–18: 30, 18: 40–19: 00. When , there are four time intervals that satisfy equations (4), (7), and (8) simultaneously, namely, 8: 00–8: 10 and 8: 50–9: 20, indicating that the traffic operation status of this grid is relatively serious in these two time periods.
The traffic congestion discriminant model based on two-dimensional traffic attribute information is used to extract the traffic operation status in each grid, and the statistical results of the traffic status in all grids within a week are shown in Table 5, among which 0 indicates no traffic congestion and 1 indicates congestion.
The trend of the total number of congestion time stamps in all grids over time from Monday to Sunday is shown in Figure 8.

It can be seen from Figure 8 that traffic congestion occurred at 4413 time stamps in all grids within seven days. Among them, the number of congestion times varies less from Monday to Thursday, while the number of congestion times from Friday to Sunday is more than that of weekdays, with Friday having the most congestion time. It is because Friday is the first day of the Mid-Autumn Festival, when most of the residents go home for holidays or parties, resulting in high travel mobility.
3.3.4. Grid Congestion Identification Based on Three-Dimensional Traffic Attribute Information
Taking grid (3, 2) as an example, the variation trend of the average speed, the number of vehicles and the number of position points in the grid within a week is shown in Figure 9.

(a)

(b)
As can be seen from Figure 9, the variation trend of traffic operation status in grid (3, 2) tends to be flat from Friday to Sunday, while on weekdays, the traffic status varies greatly, with multiple small peaks after the morning peak and relatively scattered congestion times.
For example, the calculation results of traffic status discriminant in grid (3, 2) from 16:00 to 18: 00 on September 11 are shown in Table 6. The average speed of vehicles, the number of vehicles, and the number of position points in the grid are 16.10, 41, and 149, respectively, and 0.37, 0.57, and 0.49, respectively, after data normalization.
According to the calculation, there are totally 16 time stamps on that day, and they simultaneously satisfy the constraint conditions of equations (4), (7), (8), and (10) when , namely, 7: 50–8: 00, 8: 10–8: 50, 9: 50–10: 00, 10: 30–11: 10, 11: 40–11: 50, 16: 10–16: 20, 16: 40–17: 00, 17: 10–17: 40, 18: 50–19: 00, and there are no time stamps that satisfy the constraint condition simultaneously when , which indicates that the grid will not have serious congestion on that day.
The traffic congestion discriminant model based on the three-dimensional traffic attribute information is used to extract the traffic operation status in each grid, and the statistical results of the traffic status in all grids within a week are shown in Figure 10.

It can be seen from Figure 10 that traffic congestion occurred at 4810 time stamps in all grids within seven days. Among them, the change of congestion time points is small from Monday to Thursday. There are significantly more congestion timestamps than weekdays from Friday to Sunday, with Friday having the most congestion timestamps, followed by Sunday, which may be because of the fact that people choose to travel and go back to work again during the holidays, resulting in more vehicles on the road, which are prone to traffic congestion.
3.3.5. Comparison and Analysis of Models
Taking the road conditions in grid (1, 5) as an example, as shown in Figure 11, the grid (1, 5) mainly includes the Wenzhou Road (the main trunk road), where most of the GPS position points of taxis are concentrated. Therefore, the traffic status of the whole grid can be estimated according to the traffic status of the Wenzhou Road (the main trunk road).

According to the traffic performance index data of the Wenzhou Road at 6: 00 to 23: 00 from September 9, 2019, to September 15, 2019, it shows that there are a lot of missing TPI data. If we take 5 minutes as the time interval, the total amount of data should be 1428, however, in fact, only 58.68% of the data volume can be obtained, which is impossible to judge the traffic status identification results based on taxi GPS data. Therefore, it is necessary to evaluate the results in combination with the traffic operation status rule of the Wenzhou Road.
The main road situation in each grid is shown in Table 7. The identification results of the two models are evaluated according to the traffic operation status of the roads with a larger traffic flow in each grid, and the two models are compared and analyzed as shown in Figure 12 and Table 8.

As can be seen in Figure 12, the matching rate between the identification results of the improved model and TPI is higher than original results, except for the identification results of one day (Saturday). In Table 8, the number of identified congestion timestamps and the matching rate with TPI have been improved, although the traffic state index is added to the original model to increase the gap between grids and the constraints are added. In other words, the result of grid congestion status identification considering three-dimensional traffic attribute information (25.198%) outperformed that considering two-dimensional traffic attribute information (23.997%).
In addition, for original models, there are totally 4065 same congestion time points with the identification results of improved models, however, the matching rate is only less than 30% compared with that of TPI, because TPI data is an index of the comprehensive traffic status index. In the process of calculation, not only the speed of real-time acquisition is considered but also road traffic capacity with different grades is referred to reflect the traffic operation status of the road from the macro level. However, the taxi GPS data is real-time data. Hence, TPI data is not sensitive to its evaluation and cannot fully reflect the real-time traffic status. Likewise, the lack of TPI data also leads to the failure to match many congestion time points, which is a problem that hopefully can be solved in further research.
3.3.6. Evolution of Traffic Congestion
The traffic congestion status of each grid is obtained by the grid congestion status identification model based on the three-dimensional traffic attribute information, and the total frequency of congestion of each grid in seven days is counted. The results are shown in Figure 13.

As can be seen in Figure 13, when the threshold value of the number of congestion time stamps is set to 125, there are 5 grids with repeated congestion, namely, (1, 4), (1, 5), (5, 5), (2, 6), and (3, 6). The threshold value of the number of congestion time stamps is set to 113, and there are 13 grids with repeated congestion, namely, (1, 1), (4, 1), (1, 4), (2, 4), (5, 4), (1, 5), (2, 5), (5, 5), (2, 6), (3, 6), (6, 6), (4, 7), and (5, 7). Then, the main trunk roads in the recurrent congestion grid are extracted, including the Wenzhou Road, Harbin Road, Taidong No.1 Road, Liaoyang West Road, Shandong Road, Chongqing Road, Renmin Road, etc.
Taking the data on September 9, 2019 as an example, the congestion frequency distribution graph of all grids in different time periods is shown in Figure 14, in which it can be seen that the number of congestion times is the highest during the periods of 8: 00–10: 00 and 17: 30–19: 30.

By analyzing the evolution process of road traffic congestion, the characteristics and rules of road traffic operation status can be well-mastered. Therefore, to describe the process of occurrence, spread, and dissipation of congestion, the periods of 7: 00–11: 00 and 16: 30–20: 30 are selected to observe the evolution process of recurrent congestion at 30-minute intervals, as shown in Figures 15 and 16.


As can be seen from Figure 15, in general, the congestion range of morning rush is large, and the number of congestion grids is also large. From 7: 00 p.m. to 7: 30 p.m., only Hangan Viaduct and Renmin Road are congested, while the other grids are in good condition. As time goes by, the congestion gradually spreads from the northwest to the surrounding area and eventually forms a region, which indicates that most of the northwest area is residential area with more congested grids between 8: 30 p.m. and 10: 30 p.m. After 10 p.m., the congestion tends to dissipate, and the congestion near the Chongqing road (grid (5, 7), (6, 7), (5, 6), etc.), Weihai road (grid (1, 2), Yanji road (3, 3), (4, 3), and other grids have disappeared. It can be determined that most of these grids are working places and the transition from residence to work place is completed in the morning.
Compared with the traffic status during the morning rush hour, the congestion range during the evening rush hour is obviously different from that during the morning rush hour, and the congestion range and number of congestion grids is lower, as shown in Figure 16. During the period of 16: 30–17: 00, congestion occurs at Weihai Road, near Chongqing Road, and its eastern area, while the traffic status of other grids is good. The congestion gradually spreads to the surrounding area after 17: 30, and there are a lot of congestion grids between 17: 30 and 19: 30. After 19: 30, the congestion begins to dissipate, only with congestion around Taidong No.1 Road (grid (1, 1)), Yan ‘an No. 3 Road (grid (2, 1)), Chongqing Road (grid (5, 7), (6, 7)), Harbin Road (grid (7, 6)), and Liaoyang West Road (grid (7, 4)). The number of congestion grids is less after 20: 00.
4. Conclusion
Traffic congestion on urban main trunk roads is a widespread problem, especially in first-tier and second-tier cities. To reduce the risks caused by congestion, this paper takes Qingdao city as an example and uses the congestion discrimination model based on taxi GPS data and TPI data to extract the areas and main trunk roads with recurrent congestion. The results of this study can provide powerful data support for travelers to effectively avoid rush hour and congested routes, traffic managers to implement refined management and control, and the governments to optimize urban planning and design. Besides, the research shows that adding location points and traffic attribute information can better identify the congestion status in the grid. According to the analysis, within the research scope, the main trunk roads in the recurrent congestion grid mainly include the Wenzhou Road, Harbin Road, Taidong No.1 Road, Liaoyang West Road, Shandong Road, Chongqing Road, Renmin Road, etc. Definitely, this paper has limitations, for example, the traffic index is a comprehensive index, and it reflects the road traffic status from a macro level. However, it cannot fully reflect the operation status of urban road in real time, while the taxi GPS data is a real-time data. Hence, the traffic operation index is generally not sensitive to judge the results of the identification. These will be considered in further research.
The future research work is mainly divided into the following two parts: one is the negative impact of TPI data loss on the results and the other is the analysis of the factors influencing traffic operation status in combination with POI data.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation of Shandong Province, China (Grant no. ZR2021MF113).