Abstract
Predicting traffic operational condition is crucial to urban transportation planning and management. A large variety of algorithms were proposed to improve the prediction accuracy. However, these studies were mainly based on complete data and did not discuss the vulnerability of massive data missing. And applications of these algorithms were in highcost under the constraints of high quality of traffic data collecting in realtime on the largescale road networks. This paper aims to deduce the traffic operational conditions of the road network with a small number of critical segments based on taxi GPS data in Xi’an city of China. To identify these critical segments, we assume that the states of floating cars within different road segments are correlative and mutually representative and design a heuristic algorithm utilizing the attention mechanism embedding in the graph neural network (GNN). The results show that the designed model achieves a high accuracy compared to the conventional method using only two critical segments which account for 2.7% in the road networks. The proposed method is costefficient which generates the critical segments scheme that reduces the cost of traffic information collection greatly and is more sensible without the demand for extremely high prediction accuracy. Our research has a guiding significance on cost saving of various information acquisition techniques such as route planning of floating car or sensors layout.
1. Introduction
Traffic operational condition, measuring with traffic flow, travel time, and vehicle speed, is an important indicator to reflect the level of service of urban roads network. Travelers design efficient travel plans, including departure time, travel mode, and route, while traffic managers develop strategies to respond to various traffic situations in advance by predicting traffic operational conditions of the road network [1–3]. Thus, traffic state prediction is always a research hotspot. With various available highquality datasets of traffic information, the vast, elaborate machine learning algorithms [4–8] were applied to deal with this problem and the prediction accuracy was being pushed to a fairly high level.
Although the principles of algorithms are different, their implementations are in the same procedures or scenes, shown in Figures 1(a) and 1(b). In Figure 1(a), traffic state predictions of each segment are independent [9]. The historical information has an advantage in prediction; for example, the correlation analysis is used to analyze the relevance between the historical traffic flow and the traffic flow within the current interval [10]. In Figure 1(b), traffic state predictions of segments in road network are cooperative [11–13]. The entropybased grey relation analysis is implemented to choose lane segments that are strongly correlated with the lane segments to be predicted [14]. The convolutional neural network was also used to extract the spatial features [15]. Both of them are based on the hypothesis that the traffic state at different locations of road network has correlation which is related to the spatial distribution [16]. In general, the predictive effect in the second scene is superior to the first. Then, we consider the third scene: predicting the vehicle speed of segments all over the network using available historical information of parts of segments (Figure 1(c)).
(a)
(b)
(c)
We discuss a practical application of the third scene here. We take an experimental road network consisting of seventyfive segments (directional) in Xi’an city of China (see Figure 2(a)). The mean and standard deviation of travel speed of each segment are shown in Figure 2(b). We suppose there are installed sensors in each segment to record the dynamics of the vehicle flow. Theoretically, the algorithms suited for the above first or second scene could accomplish the prediction task well. However, once sensors malfunction causes data missing or errors, those algorithms would not work. They require maintaining all the sensors frequently to guarantee the quality of realtime data. Indeed, the maintenance resource is usually limited because it is hard to ensure that plenty of sensors are faultfree simultaneously and chronically. A sensible alternative scheme is to maintain a small fraction of segments' sensors and establish an prediction algorithm based on the incomplete data only recorded on them, Even in the worst case, only this fraction of sensors is on operation; the prediction accuracy for the whole road network still meets the demand. Then, a challenge different from the pure issues of speed prediction raises: how to identify the critical segments that determine understanding the traffic state of the whole road network?
(a)
(b)
In order to solve this question, we introduced the graph neural network (GNN), an extended deep learning model to deal with graph data [17]. It is wellperformed in finding the complex relation information among elements [18–20]. Combined with attention mechanism [21], we construct an GNNbased machine learning model that takes the historical traffic information of critical segments as input and predicts the link travel speed for each segment in the next time interval. The attention mechanism quantifies the contribution of each segment’s traffic information to travel speed prediction of each link in the road network when the model achieves the downstream objective of minimizing the prediction error. We take advantage of the quantitative contribution to design a heuristic algorithm, which removes the segment with minimal selfattention coefficient as the most trivial one iteratively. The remaining segments are finally identified as the critical segments. The results show that the model performance using the traffic data of only two critical segments is beyond the conventional method using historical average in the experimental road network. The application of the proposed method can reduce the amount of traffic information that needs to be collected significantly at the expense of a slight loss in prediction accuracy. Here, we introduce some related research [22–24]. An approach was proposed to exploit the spatialtemporal causality among travel speeds of road segments by a timelagged correlation coefficient function and utilize the local stationarity of correlation coefficient to estimate the travel speeds of road segments to handle the problem of missing travel speed values of vehicles on some road segments, due to the coarseness of vehicular crowdsensing data [22]. However, the objective of our research is to reduce the data demand initiatively by identifying critical segments while these previous researches aimed to reduce the negative impacts of data missing passively.
In summary, our main contributions are as follows:(1)According to the application restriction, traffic information is expected to collect on segments as less as possible in order to reduce acquisition cost; we put forward a new research issue: how to identify the critical segments that contribute to guaranteeing traffic state prediction accuracy for all segments in road network in the most effective way.(2)We propose a heuristic algorithm to select segments of which the missing traffic information is hard to be remedied from others as the critical segments by attention mechanism.(3)We make an experimental study that prediction with the data of 2.7% of segments can meet the accuracy demands. The critical segments schemes are highly costefficient and provide a cost saving thought for various information acquisition techniques.
1.1. Problem Formation
We describe the question concisely as follows:where is true vehicle speed of segment in training sample , is a vector of the prediction speed, is a decision variable of jth segment and if it is selected, and represents the cost limit. Objective function (1) minimizes the mean square error between true speed and prediction speed on all the segments:where is the history information of vehicle dynamic and represents the complicated prediction relation from history information to the future vehicle speed. Equation (5) indicates the history information collected only from the selected critical segments.
Then, the question could divide into two subproblems. (a) Decision variables assignment: the contribution of each single segment to vehicle speed prediction for road network is heterogeneous and influenced by road network topology and traffic assignment. Critical segments identification is a combinatorial optimization problem. We have to design a heuristics algorithm for this NPhard problem. (b) Prediction relation establishment by machine learning model: it is a typical nonlinear regression problem.
2. Method
2.1. GNNBased Machine Learning Model for Vehicle Speed Prediction
Among various GNN variations, the graph attention network [25] causes our attention. We exploit the selfattention mechanism to explore the contribution of each segment to predict vehicle speed for the whole network. Combined with our problem, a single graph attentional layer is described as follows:where is the hidden feature vector of segment in th layer. Initially, is the history information recorded on segment while is the ultimate predicted speed of segment . is the set of neighbor segments of segment . is a learning weight matrix sharing by each . equals dimensionality of feature vector in the next layer and the current layer, respectively. is the activated function and is the attention coefficient indicating the contribution of history information on segment to predict vehicle speed on segment . is calculated by attention mechanism as follows:where is independent feedforward network sharing by any pair of and and quantifies the importance of to . is another learnable weight matrix, which transforms the hidden feature vectors into higherlevel features before feeding to . And is the concatenation operation.
Here, we consider the neighbor segments . In graph theory, neighbor means the node linked to the current node directly. On the road network, segments’ layout is constrained with geographic location and no explicit links, so the spatially close segments look like neighbors, such as segments 20/22/69 in Figure 2. However, can we assert that segment 56 has no correlation with segment 62 even when they are on opposite sides of the network? The answer is certainly not. The drastically increased vehicle flow on segment 56 may give rise to congestion on segment 62 in the next time interval. The road network is a complex system, which not only is an underlay topology structure but also carries the traffic dynamic. Thus, we take the road network as a fullconnected network in which any pair of segments has a link and can be put into the machine learning model. The correlation strength on links is quantified by the attention mechanism intelligently.
2.2. AttentionBased Greedy Algorithm for Critical Segments Identification
The previous section solves the second subproblem in Section “Problem Formation,” which is the establishment of the prediction relation by machine learning model. For critical segments identification, we design a heuristics algorithm based on the accessory in attention coefficient . After the model completes training, for each pair of segments is calculated on the test set. An indicator needs to be designed to heuristically decide which segment’s data is abandoned or retained in each step.
Many studies about centrality of nodes in complex systems field [26, 27] indicated that the effect of node set was not simple sort combination of each node’s effect. The effects for vehicle speed prediction among history information of segments were redundant and replaceable. Then, what is irreplaceable? A segment , with higher selfattention , means the speed prediction on it is mainly dependent on history information itself. The vehicle dynamic on this segment is relatively independent of the road network more than the segments with lower selfattention. If the data of this segment is missing, it is hard to extract useful features from other segments’ data for prediction.
According to this clue, we give out a greedy algorithm to generate critical segments’ scheme iteratively. In each step, segment with the lowest selfattention (see the red symbols in Table 1) is removed in a greedy way:where is the remaining segments with history information. And the machine learning model is retrained to renew per iteration. The iteration will stop until the number of the remaining segments drops to cost limit , as illustrated in Algorithm 1.

Equation (6) is specialized for the remaining segments, as GNN blocks in Figure 3. By removing segments not having history information, the predicted value of the vehicle speed is utilized in the hidden features of the remaining segments generated in GNN blocks, calculated as follows (linear blocks in Figure 3):where is a learning weight (regression coefficient). The difference between and is that is a constant while is changed per sample in different time intervals.
3. Experiment and Result
3.1. Data and Machine Learning Model Configuration
The data we use is vehicle trajectories within the periods of ridehailing orders in the second ring road area of Xi’an city. The data is from DiDi platform and spans from 10/01/2016 to 11/29/2016. The GPS points in the dataset cover the whole road network in Figure 2 and are processed by routing to ensure that the data can correspond to the actual road information. The collecting interval of GPS points is 2–4s. The main fields in the dataset contain the driver ID, order ID, timestamp, longitude, and latitude. After data preprocessing, the average vehicle speed is obtained on each segment per 5 minutes between 6:00 AM and 10:00 PM. We take the vehicle speed and flow volume in the previous two hours as historical traffic information to predict the vehicle speed in the next 5 minutes. The training set is the data of the former 48 days (10/01/2016–11/17/2016) and the test set for evaluation is the data of the latter 12 days (11/18/2016–11/29/2016).
The detailed structure of the proposed model is shown in Figure 4. A single GNN block consists of two graph attentional layers using Leaky Rule activation and one Conv1D layer using linear activation. The number of neurons is 32,16,1 in each layer, respectively. The attention coefficients calculated in the first graph attentional layer are adopted in the greedy algorithm. A feedforward neural network consisting of three Conv1D layers using Leaky Rule activation and one SoftMax layer is incorporated into a single graph attentional layer to compute the attention coefficient. The number of neurons is 16,16,1 in each Conv1D layer, respectively. The hidden features outputting from the second graph attentional layer are taken as the inputs of the Linear Block. The Linear Block consists of one linear regression layer and one Conv1D layer using the linear activation. The number of regression coefficients in the linear regression layer equals the number of the remaining segments multiplying the number of the removed segments. The neural network is trained by the method of Adam optimizer with the batch size of 256 and the learning rate of 10e−3.
3.2. Result of Critical Segments Identification for Vehicle Speed Prediction
The results are shown in Figure 5(a). The green circles represent the average prediction accuracy of the road network by giving a certain number of critical segments selected by the greedy algorithm I. By contrast, we design the greedy algorithm II which removes the data of the segment with minimal contribution per iteration:where the contribution of the segment is the sum of its contribution to other segments (see the green symbols in Table 1):
(a)
(b)
Greedy algorithm II retains those suffering from the most attention from other segments as critical segments and is a more intuitive solution. The results are shown by yellow circles in Figure 5(a). Besides, we subjectively set a lower limit (red horizontal line in Figure 5(a) as a reference by the conventional method using historical average, which takes the average vehicle speed in the same time interval of history days (weekdays and weekends are distinguished) as the prediction values for each segment.
We observe that (a) the accuracy of our GNNbased machine learning model with complete data of all 75 segments increases by nearly 12% compared to the conventional methods (the leftmost green circle in Figure 5(a); (b) the greedy algorithm I generates a scheme (called scheme I) containing only two critical segments in which the prediction accuracy by incomplete data is still beyond the low limit (the second green point on the right in Figure 5(a); (c) the scheme (called scheme II) to meet this demand needs six critical segments generated by the greedy algorithm II (the sixth yellow circle on the right in Figure 5(a); and (d) with the number of the selected critical segments reducing, the prediction accuracy descends. In each iteration, greedy algorithm I is superior to greedy algorithm II.
The prediction errors of the proposed model on all the segments are shown in Figure 5(b). The value on xaxis is the number of segments in road network. The blue bars are generated using incomplete data only containing history information on critical segments of scheme I. The orange bars are generated using complete data of the whole road network.
3.3. Interpretability of Critical Segments from Traffic Perspective
In order to give an insight into the characteristic of the critical segments, we visualize scheme I and scheme II on road network, as shown in Figure 6. We find both scheme I and scheme II prefer selecting the segments on the margin of road network. These particular segments perceive the external vehicle flow entering the network and the internal vehicle flow leaving network sensitively. Supervising the flow information on them is convenient to estimate the total volume of vehicle flow as well as the level of congestion in road network.
(a)
(b)
Segments 24 and 29 in eastwest direction roads are the unique express way in this road network. Since the import and export of express way are controlled by flyover crossing, the dynamic of vehicle speed on it has a strong continuity at time sequence and is disturbed less by flow afflux from other segments. This result accords well with the logic of greedy algorithm I that selects critical segments with maximal selfattention. The rightturn lane on segment 29 with heavy traffic is a main link between the express way and road network. Scheme I also indicates that the current vehicle dynamic on these two critical segments is the worthiest to be paid attention to if we want to foresee the traffic situation of the whole road network in advance.
3.4. Interpretability of Critical Segments from Machine Learning Perspective
In order to analyze why scheme I is an efficient design (because scheme I needs less number of critical segments than scheme II to meet the prediction demand), we examine the representation of each segment learned by machine learning model, using a technique developed for visualization of highdimensional features called tDistributed Stochastic Neighbor Embedding (tSNE) (see Figure 7). Specifically, twodimension embedding is generated from the hidden features outputting from the first layer of GNN blocks by running tSNE algorithm, which tends to map the representation of perceptually similar states to nearby points [28]. In other words, the nodes located closely in subplots of Figure 7 mean that the hidden features extracted from the history information recorded on these segments are highly similar. Two nodes representing two critical segments selected in scheme I are far apart in both morning peak hours (Figure 7(a)) and evening peak hours (Figure 7(c)). Their features are uncorrelated so that they have adequate ability to express cooperatively other nodes’ features. Contrastively, the part of the six nodes selected in scheme II is distributed intensively especially in the evening peak, as shown in Figure 7(d), where seventyfive nodes are clustered into six categories by kmeans and five critical segments selected in scheme II fall into the same category (green) indicating the configuration of scheme II is highly redundant. That is the reason why the number of the selected segments in scheme II is triple that of scheme I but the performance of scheme II is not better than scheme I. We conclude that it must be an excellent scheme where the hidden features embedding of critical segments belong to different categories in different time intervals.
(a)
(b)
(c)
(d)
3.5. Relation between Prediction Accuracy Improvement and the Number of Critical Segments
Apart from meeting the needs of prediction accuracy, we also consider the efficiency meaning the degree to which equivalent history information is converted into prediction accuracy. A representative case is demonstrated in Figure 8. We use greedy algorithm I to generate three schemes containing one/two/three critical segments, respectively. Obviously, the second scheme containing two critical segments is the same as the aforementioned scheme I. Then, the first and second ones are taken as a control group and the second and third ones are taken as another control group. With the number of critical segments rising from one to two, prediction accuracy on eighteen segments markedly improves (see Figure 8(a), highlighted by color orange) while the improvement of prediction accuracy on only one segment is beyond 2% (Figure 8(b)) when the number continues to increase to three. The digit beside segments in the figure is the reduction of prediction error (MAPE). Comparing Figure 8(a) with Figure 8(b), we find the growth of prediction accuracy is significantly different though the increment of the numbers of critical segments is the same. In other words, the average benefit for vehicle speed prediction bringing by unit amount of history information, regarded as one segment, is different in various schemes.
(a)
(b)
3.6. Relation between the Prediction Accuracy Improvement and the Number of Critical Segments
Collecting vehicle flow information of segments needs cost, no matter by means of floating cars and sensors. In the practical application we discuss in the introduction, the cost may be maintenance cost or production cost of sensors. While the prediction accuracy improves along with the number of critical segments, the scheme cost also increases. Assuming information collection for one segment is a unit cost, the costefficiency measuring the balance of performance and cost for schemes is quantified as follows:where and are prediction accuracy of the current scheme and the benchmark scheme containing only one critical segment, respectively, and is the number of selected critical segments in the current scheme. The costefficiency of the schemes generated by greedy algorithm I present the downward trend as a whole with the increase of . As shown in Figure 9, the curve descends rapidly before reaches 8 and then gradually flattens. Comparing Figure 5(a) with Figure 9, we consider that the schemes containing a smaller number of critical segments are more advisable if no requirement of extremely high prediction accuracy exists. The costefficiency could be a reference index to aid in decisionmaking besides maximizing prediction accuracy as much as possible.
4. Discussions and Conclusions
The aim of our research is to identify a small number of the critical segments to reduce the collection amount of the traffic information significantly with the permission of slight loss in prediction accuracy, rather than blindly pursuing the extremely high prediction accuracy. We draw the following conclusions:(1)In experimental road network, the average prediction accuracy of travel speed for all the segments by the prediction model using the historical traffic information collected from only 2.7% of critical segments is superior to the conventional method using historical average. The proposed greedy method could identify the critical segments efficiently to understand the traffic state of the whole road network.(2)Using the visualization technique of highdimensional features tSNE, we know that a scheme of critical segments is optimal if the distribution of the twodimension embeddings generated from the information features of the critical segments is dispersive, indicating that the traffic information of the critical segments is not redundant.(3)The costefficiency of information acquisition meaning the efficiency of the equivalent traffic information for the improvement of prediction accuracy is continuously declining with richer and richer information. The traffic information acquisition should consider the acquisition cost and prediction accuracy requirements of traffic state comprehensively.(4)The results provide a thought for cost saving of information acquisition techniques. For example, since the traffic flow information of only a small number of critical segments needs to be recorded, the trip distance of the floating cars and the installation or maintenance number of sensors can be cut down dramatically.
Our research can be improved from two directions in the future. Firstly, the elaborate prediction models are designed further to establish a more precise relation between the history traffic information and the predicted vehicle speed, such as the existence of research of hybrids model where GNN and recurrent neural network combine [29] and dynamically modeling spatial dependencies of traffic flows [30]. Secondly, the critical segments identification methods are further designed to find a more optimal combination of critical segments. Both of them attempt to make the curve in Figure 5(a) decline slowly.
Data Availability
The data supporting the results of our study can be found at https://outreach.didichuxing.com/research/opendata/.
Conflicts of Interest
The authors declare that they have no conflicts of interest.