Abstract

Accurate traffic state prediction plays an important role in traffic guidance, travel planning, etc. Due to the existence of complex spatio-temporal relationships, there are some challenges in forecasting. Firstly, in terms of spatial correlation, some models only consider the road network structure information, and ignore the relative location relationships between nodes. Secondly, some models ignore the different impacts of nodes in the global road network on traffic. To solve these problems, we propose a new traffic state-forecasting model, namely, spatio-temporal attention-gated recurrent neural network (ST-AGRNN). In the proposed model, structure-based and location-based localized spatial features are obtained simultaneously by Graph Convolutional Networks (GCNs) and DeepWalk. The localized temporal features are obtained by gated recurrent unit (GRU). The attention-based approach is used to obtain global spatio-temporal features. Experimental validation is performed with two real-world public datasets, and the results show that the ST-AGRNN model outperforms the state-of-the-art methods.

1. Introduction

Traffic congestion is a common problem faced by almost all major cities. Because of traffic congestion, a lot of manpower and material resources are wasted every year. Accurate and real-time traffic state prediction is the basis to solve the problem of traffic congestion. On the one hand, people can plan their trips in advance through traffic-state information. On the other hand, traffic managers also conduct effective traffic guidance and management through traffic state prediction information. At the same time, traffic prediction is a typical spatio-temporal problem, and the inherent nonlinearity and complexity of traffic affect the accuracy of prediction. Therefore, integrated consideration of temporal and spatial characteristics is necessary for traffic state prediction.

Taking the spatio-temporal correlation in Figure 1 as an example, there are localized spatio-temporal correlations and global spatio-temporal correlations. Each node will have influence on the traffic of its neighbors because it is physically connected with its neighbors and belongs to the relationship between upstream and downstream, which is spatial dependence. At the same time, each node will also affect itself at the next time step, which is temporal dependence. These are localized spatio-temporal correlations. In addition, a busy intersection has influence on the traffic of the entire region, which is the global spatio-temporal correlation in the road network. Obtaining this correlation is crucial to spatio-temporal data prediction.

In previous studies, various deep learning approaches were used to model spatio-temporal correlations, including stacked autoencoders (SAEs) [1], recurrent neural networks (RNNs) [2], generative adversarial networks (GANs) [3], transformer [4, 5], convolutional neural networks (CNNs) [6], and Spatio-Temporal Graph Convolutional Networks (STGCN) [7]. The SAEs acquire spatial and temporal correlations through unsupervised learning. The RNNs extract temporal features through the gate mechanism. The GANs extract spatio-temporal features through generators and discriminators and the transformer model spatial and temporal dependencies through encoder–decoder architecture. The CNNs and GCNs obtain spatial features through convolution operation. However, these methods only capture localized spatio-temporal correlations.

Recently, attention mechanisms have received increasing attention. Because they are effective in identifying the relevance of inputs in prediction, components with high relevance are given greater attention. They are successfully applied in many fields, such as natural language processing (NLP) [8], computer vision (CV) [9, 10], and speech recognition [11]. Attention-based traffic forecasting has also developed rapidly in recent years. For example, attention temporal graph convolutional network (A3T-GCN) [12] uses attention mechanism to obtain global temporal and spatial correlations. However, it ignores location-based localized spatial information.

To obtain complex localized and global spatio-temporal correlations, we propose a novel deep learning architecture—spatio-temporal attention-gated recurrent neural network (ST-AGRNN)—for traffic state prediction. To fully exploit the localized spatio-temporal correlations, ST-AGRNN learns structure-aware graph embedding information through a GCN, and obtains position-aware information through DeepWalk. To tackle temporal dependencies, a gated recurrent unit (GRU) is used. Finally, in order to fully exploit the global spatio-temporal correlations, the attention mechanism is used to obtain spatio-temporal correlations about the networks.

The main contributions of this work are as follows: (i)we propose a new localized spatial feature extraction method by combining DeepWalk with a GCN, where DeepWalk obtains position-aware information and the GCN obtains structure-aware graph embedding information(ii)Traffic state is a time series data. The current traffic state will affect the traffic state at the next time step. GRU is used to obtain localized temporal correlation between traffic data(iii)Attention mechanisms are introduced to obtain global spatio-temporal correlations about networks. Different nodes have different impacts on the traffic state, and the attention mechanism can obtain the weight of nodes from the historical traffic state, representing the global spatio-temporal correlations of network(iv)Our experiments applying ST-AGRNN to traffic state prediction show that ST-AGRNN outperforms 12 state-of-the-art methods in terms of both accuracy and robustness on two benchmark datasets

2. Literature Review

2.1. Traffic State Forecasting

Time series data modeling and prediction are widely used in many fields [13, 14]. Traffic state data is a typical time series data. There are two main categories in traffic forecasting: statistical methods and machine learning methods. Statistical methods include autoregressive integrated moving average (ARIMA), the Kalman filter (KF), Markov chains, exponential smoothing (ES), and Bayesian networks. In the 1970s, Ahmed and Cook [15] used ARIMA to predict short-term traffic flow. Hamed et al. [16] later applied a simple ARIMA model to predict traffic volumes in urban arterials. Subsequently, various variants of ARIMA have emerged [1719]. Kalman filtering excels in regression problems. Guo et al. [20] applied an adaptive Kalman filtering model to predict short-term traffic flow. Hinsbergen et al. [21] used a localized extended Kalman filter (L-EKF) to estimate traffic states. In addition, traffic prediction methods based on Markov chains, exponential smoothing (ES), and Bayesian networks also perform well. For example, Qi et al. [22] proposed a hidden Markov model (HMM) to achieve short-term freeway traffic prediction during peak periods. Chan et al. [23] employed the hybrid exponential smoothing method and the Levenberg–Marquardt (LM) algorithm for short-term traffic flow forecasting. Wang et al. [24] used an improved Bayesian combination method (BCM) for short-term traffic flow prediction.

Statistical methods have some disadvantages, such as the inability to deal with nonlinear relationships between data. Machine learning methods, on the other hand, are more flexible. Machine learning methods are mainly divided into classical machine learning and deep learning.

Commonly used classical machine learning approaches include k-nearest neighbors (KNN), support-vector machine (SVM), random forest (RF), and decision tree (DT) methods. Cai et al. [25] proposed an improved KNN model to achieve short-term traffic multistep forecasting. Xu et al. [26] used kernel k-nearest neighbors (kernel-KNN) to predict road traffic states in time series. Cong et al. [27] presented a traffic flow prediction model based on the least squares support-vector machine, and automatically determined the least squares support-vector machine model with two parameters at the appropriate value by FOA. Xu et al. [28] used genetic programming (GP) and random forest (RF) techniques to achieve real-time crash prediction on freeways. Crosby et al. [29] proposed a spatially intensive decision tree for the prediction of traffic flow across the entire UK road network. Although classical machine learning methods are effective in identifying nonlinear relationships in traffic states, they still have many drawbacks, e.g., KNN models have low prediction accuracy for rare categories and require high computational complexity when there are many features. It is difficult to choose a suitable kernel function by applying the SVM model. The random forests do not perform very well on high-dimensional sparse data. In addition, decision trees are prone to overfitting.

In order to solve the above problems, deep learning has been developed rapidly in recent years. The key to traffic prediction is to learn the temporal dependence and spatial dependence, where the methods to learn the temporal dependence are mainly recurrent neural networks (RNNs) and their variants long short-term memory (LSTM) and gated recurrent units (GRUs). Nejadettehad et al. [30] used three kinds of recurrent neural networks to predict short-term traffic flow. Van et al. [31] used recurrent neural networks to predict freeway travel time. Tian et al. [32] took advantage of LSTM to dynamically determine the optimal time lags to predict short-term traffic flow. Fu et al. [33] used LSTM and GRU methods to predict short-term traffic flow. These models consider the temporal dependence but ignore the spatial dependence in the road network. Therefore, they cannot accurately predict changes in the traffic state. Obtaining the temporal and spatial dependence is a prerequisite for accurate traffic prediction. There are also many models for the learning of spatial features. For example, Lv et al. [34] proposed a stacked autoencoder model to inherently learn the spatial and temporal correlations for traffic flow prediction. Yuan et al. [35] proposed a novel variable-wise weighted stacked autoencoder (VW-SAE) for hierarchical, layer-by-layer output-related feature representation. Ma et al. [36] proposed a convolutional neural network (CNN)-based model to learn traffic as images and predict large-scale, network-wide traffic speed. Wu et al. [37] proposed a model called CLTFP, which combines CNN and LSTM, to forecast future traffic flow. Jo et al. [38] adopted a convolutional neural network (CNN) to deal with map images representing traffic states and the model adopts images for both the input and the output of a CNN model to predict traffic speeds.

Although the above methods can handle spatial dependencies in traffic, CNNs are more suitable for Euclidean spatial structures such as pictures, and grids. Meanwhile, traffic road networks are complex networks, and the neighboring nodes are not fixed. Thus, the spatial features of the road network cannot be fully obtained by CNNs. In recent years, graph-based convolution operations have developed rapidly [39], and have become suitable for learning the structural features of graph types. He et al. [40] used LDA and GCN to tackle road link speed prediction. Li et al. [41] proposed a DCRNN model for obtaining spatio-temporal dependence in traffic flow forecasting; the model uses diffusion convolution to learn spatial dependence and a GRU to learn temporal dependence. Wu et al. [42] learned an adaptive dependency matrix via node embedding to obtain spatial dependency and temporal dependency through stacked dilated 1D convolution. Huang et al. [43] proposed a new graph attention network, cosAtt, to obtain spatial features through cosAtt and GCN and temporal features through a GLU. Roy et al. [44] consider important daily patterns and present-day patterns from traffic data in addition to spatio-temporal characteristics to improve the accuracy of predictions. However, these methods only consider the spatial features based on structure-aware graph embedding information, without considering the location information, so they cannot effectively obtain the spatial features.

2.2. Attention Mechanism

The attention mechanism has been a hot topic of neural network research in recent years, and it has been remarkable in neural machine translation, image captioning, time series prediction etc. The attention mechanism originates from the study of human vision, which determines which part of the input needs to be attended to and allocates processing resources to the important parts. Bahdanau et al. [45] proposed the use of an attention mechanism in the decoder to decide which part of the input sentence should be attended to. Xu et al. [46] introduced the application of soft and hard attention mechanisms to image captioning. Li et al. [47] proposed convolutional self-attention further improves Transformer’ performance to achieve time series forecasting. Daiya et al. [48] proposed a multimodal deep learning architecture for stock movement prediction. Zhou et al. [49] used ProbSparse self-attention mechanism and distilling operation to handle quadratic time complexity and memory usage. In the area of traffic state prediction, prediction methods based on attention mechanisms are also developing rapidly. Park et al. [50] proposed the use of temporal attention, spatial attention and spatial sentinel vectors to obtain temporal and spatial dependencies. Wang et al. [51] proposed a novel spatial temporal graph neural network model for traffic flow prediction, and a learnable positional attention mechanism is applied in the model to aggregate information from adjacent roads. Guo et al. [52] proposed a novel attention-based spatio-temporal graph convolutional network (ASTGCN) to model recent, daily, and weekly dependencies.

Inspired by the above study, considering traffic location information and spatio-temporal characteristics, we learned both location- and structure-based information to obtain localized spatial features, learned localized temporal features through a GRU and, finally, considered the global spatio-temporal features of traffic networks through the attention mechanism.

3. Methodology

3.1. Data Processing

Given a speed sequence of data with a length of , the time interval is 5 minutes. To predict the future 15 minutes of data, for example, the input sample construction process of the model is shown in Figure 2. The input data of sample 1 is , and the label data is . The input data of sample 2 is , and the label data is . And so on, to obtain the entire input sample matrix. If predicting the next 30 minutes of data, the method is similar, i.e., the input data of sample 1 is unchanged, the label data is , and the sample matrix is obtained recursively. The longer the prediction time, the more the label is increase.

3.2. Traffic State Prediction Based on ST-AGRNN

The structure of the ST-AGRNN model is shown in Figure 3. In order to fully capture the localized spatial dependencies, we propose a new spatial feature extraction method by combining DeepWalk with a GCN, where DeepWalk obtains position-aware information and the GCN obtains structure-aware graph embedding information. The localized temporal dependencies are captured using the gated recurrent unit network, and the road network global spatio-temporal dependencies are captured using the attention mechanism. The specific details of each part of the model are presented in the next subsections.

3.2.1. Localized Spatial Dependency

Consider the urban road network as an undirected graph , where is the set of vertices in the graph and is the set of edges. Denote the adjacency matrix of the graph by . denotes the degree matrix of the graph, where denotes the number of adjacencies of each vertex. Moreover, the Laplace matrix of the graph is expressed as (where is an orthogonal matrix composed of eigenvectors), and the Fourier transform and inverse transform of the graph can be expressed as and , respectively. A two-layer graph convolutional neural network can be represented as follows: where denotes the feature of the node, while denotes the adjacency matrix of the graph. Calculated in the preprocessing step , where denotes the adjacency matrix with self-connections, , is the weight of the input layer to the hidden layer, while is the weight of the hidden layer to the output layer.

The GCN aggregates information about neighboring nodes via convolution, which is a structure-based graph embedding algorithm. The obtained embedding representation cannot retain the position relationship between nodes, which is a very important relationship between nodes in the traffic network. Deepwalk’s objective function forces nodes that are close in the shortest path to be close in the embedding space representation [53]. In order to fully exploit the spatial features of the road network, we introduce the DeepWalk algorithm to learn the position embedding representation between nodes.

The graph embedding algorithm based on the random walk is also close in the embedding space for nodes that are close in the shortest path. This allows the resulting embedding space to also preserve the relative positional relationships. These relations are an important complement to the structure-based embedding space, and are necessary for spatial features in traffic.

The random walk with as the vertex is represented as , where denotes the th node in the path with as the root. For all of the nodes in the graph, each node has another similar path. We then obtain a sequence matrix . The corresponding graph embedding representation containing the location information is then obtained by the update procedure—the skip-gram algorithm. The embedding representation is denoted as , and then the final result is obtained by the fully connected layer. where denotes the graph embedding representation, while and are the learnable weights and biases, respectively.

3.2.2. Localized Temporal Feature

Temporal dependence is another major problem in traffic prediction. Recurrent neural network (RNN) models are very effective for time-series data processing, but they suffer from gradient disappearance and gradient explosion. GRUs and LSTM are variants of RNN that can effectively overcome these problems.

GRU is used to handle temporal dependence. is the output of GCN at time , is the traffic state at the present moment, and is the reset gate that determines whether the previous moment information is retained or not—if it is 1, then the message is carried to the next moment; if it is 0, then the message is ignored. is the hidden state at the previous moment. is the update gate, which is a value between 0 and 1 that determines how much information is remembered from the previous moment—if it is 1, then more information is remembered; if it is 0, then more is forgotten. is the current memory content, and is the output of the current moment.

3.2.3. Global Spatio-Temporal Correlations

Critical intersections in cities often have a large impact on regional traffic, and congestion at critical intersections is likely to evolve into congestion in the associated areas. In order to strengthen the modeling ability of traffic networks, this paper obtains global spatio-temporal correlations through the attention mechanism. All of the hidden states of the GRU network are used as the input of the attention network, and then the weights of each hidden state of the GRU are calculated to obtain the traffic information changes in the road network at each moment. The attention network is calculated as follows: where is the attention coefficient, is the GRU hidden state, and are the trainable weight parameters, and are the trainable bias values, is the normalized attention coefficient, and is the attention weight.

3.3. Prediction Component

We predict future changes in traffic state based on historical traffic states. In the prediction component, we concatenate the attention mechanism and the location-based graph embedding output as follows:

The concatenation result is used as the input of the fully connected layer, and the final traffic state is obtained by the sigmoid activation function. It is expressed as , where is the predicted time step, in the following form: where and are the learnable weights and biases, respectively.

Input: The training epoch ; the historical traffic state ; the traffic graph ; the window size of historical traffic state ; the predicted length of traffic state ;
Output: Learned ST-AGRNN model
1: Initialization parameter ;
2: Data processing;
3: For do
4: Select real historical data ;
5: Select real future data ;
6: Input real historical data and the traffic graph into GCN and GRU to get ;
7: Input into attention to get ;
8: Use DeepWalk on and get the embedding result ;
9: Concatenate and ,;
10: Optimize by minimizing the loss function;
11: End for

The training overview of the model is shown in Algorithm 1. We used Adam to optimize the model. We used TensorFlow to implement the proposed model.

4. Experiments

4.1. Experimental Settings

The software and hardware environments for the experiments were configured as follows: PYTHON 3.6.2, NUMPY 1.16.0, TENSORFLOW 1.14.0, and Memory: 64 GB.

For this paper, we used speed and traffic flow to represent traffic states, where 80% of the data were used as the training set and 20% as the test set. In the experiments, the speed was predicted for 15, 30, and 60 minutes, and the flow prediction was predicted from 5 to 60 minutes with 12 time windows.

We use the root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage errors (MAPE) to evaluate the models.

4.2. Dataset Description

In the experiment, we used two real-world traffic datasets: PeMSD4, and PeMSD8 [43].

PeMSD4 was collected from the Caltrans Performance Measurement System (PeMS) and the traffic data in the San Francisco Bay Area, with 307 sensors on 29 roads. The dataset spanned from January to February 2018.

PeMSD8 refers to the traffic data in San Bernardino from July to August 2016, with 170 detectors on 8 roads.

4.3. Baselines

In this paper, the traffic state includes traffic speed and flow. For the traffic speed, we used the proposed model to predict 15, 30, and 60 minutes. The compared baseline models contain both traditional HA and ARIMA, along with neural network models such as STGCN [7], DCRNN [41], ASTGCN [52], GWN [42], LSGCN [43], and USTGCN [44].

In traffic flow forecasting, all models have a prediction window from 1 to 12, i.e., a prediction time from 5 minutes to 60 minutes, in 5-minute intervals. The baseline models compared included both traditional and neural network models, for a total of 11.

The details of the baseline model are as follows: (1)HA: the average traffic information of the previous period is used as the forecast value(2)ARIMA: autoregressive integrated moving average(3)STGCN: spatio-temporal graph convolutional network, which consists of several spatio-temporal convolutional blocks(4)DCRNN: diffusion convolutional recurrent neural network, which obtains spatial dependencies through bidirectional random walks and temporal dependencies through an encoder–decoder structure with scheduled sampling(5)ASTGCN(r): three independent components with the same structure are used to obtain the recent, daily, and weekly dependencies in the traffic data. The spatio-temporal attention mechanism and spatio-temporal convolution are used to obtain the spatio-temporal dependencies within the components. For the sake of experimental fairness, only the recent components are used(6)GWN: a new adaptive dependency matrix is learned by node embedding to capture the hidden spatial dependencies in the data and obtain temporal dependence via a stacked dilated 1D convolutional component(7)LSGCN: the model uses spatial gated block and gated linear units (GLU) convolution to capture spatio-temporal features(8)USTGCN: the model obtains complex spatio-temporal correlations through the proposed unified spatio-temporal convolution strategy(9)STSGCN [54]: spatio-temporal synchronous graph convolutional network, which uses a spatio-temporal synchronous graph convolutional module to capture the complex localized spatio-temporal correlations and deploys multiple modules to capture the heterogeneities in localized spatio-temporal network series(10)STFGNN [55]: spatio-temporal fusion graph neural network, which uses spatio-temporal fusion graph neural modules and a gated CNN module to capture the spatio-temporal correlations(11)Z-GCNETs [56]: Z-GCNETs introduce new GCNs with a time-aware zigzag topological layer(12)STG-NCDE [57]: spatio-temporal graph neural controlled differential equation, which extends the concept and designs two NCDEs to capture the spatio-temporal correlations

4.4. Experimental Results

The traffic state prediction results for all baseline models and our model are shown in Tables 1 and 2. In Table 1, we can see that our proposed model performs better overall on the datasets PeMSD4 and PeMSD8 compared to the other baseline models for 15-, 30-, and 60-minute traffic speed predictions. Taking the 15-minute speed forecast as an example, on the PeMSD4 dataset, our model is better than HA, ARIMA, DCRNN, STGCN, ASTGCN, GWN, LSGCN, and USTGCN with 53.14, 52.58, 11.85, 19.04, 43.86, 8.46, 17.93, and 15% lower MAE, with 52.41, 58.74,19.72, 21.59, 40.40, 11.94, 19.45, and 12.26% lower RMSE, and with 60.97, 59.21, 19.02, 25.68, 47.83, 18.72, 25.17, and 22.77% lower MAPE, respectively. On the PeMSD8 dataset, our model is better than HA, ARIMA, DCRNN, STGCN, ASTGCN, LSGCN, and USTGCN with 48.73, 46.57, 13.24, 14.7, 31.87, 12.5, and 10.96% lower MAE, with 49.63, 57.49, 20.07, 20.99, 34.9, 15.51, and 3.72% lower RMSE, and with 53.8, 64.38, 21.55, 22.22, 42.4, 18.75, and 12.07% lower MAPE, respectively. From the results, it is clear that ST-AGRNN performs well in both short- and long-term predictions. In particular, on the PeMSD4 dataset, the ST-AGRNN model is optimal on all three-evaluation metrics. Except for the RMSE metric, which is the second best on the PeMSD8 dataset, the other metrics are also optimal for long- and short-term prediction.

HA and ARIMA are the worst performers because they do not capture spatio-temporal correlations effectively. Since STGCN has cumulative errors, it does not perform as well as DCRNN. DCRNN can effectively obtain complex spatial correlations through diffusion convolution. ASTGCN considers the periodicity of prediction, so it is better than STGCN for long-term prediction.

The spatial gate block of LSGCN integrates the proposed cosAtt and GCN, and in combination with a GLU can effectively extract complex spatio-temporal correlations. Meanwhile, the USTGCN model considers the important historical and present-day patterns in traffic data, in addition to the unified spatio-temporal convolution strategy. Therefore, its prediction performance is the second best.

Table 2 shows the results of traffic flow forecasting performed from 5 minutes all the way to 60 minutes, with a prediction window from 1 to 12, and all of the results are averaged. Compared with all of the baseline models, our proposed model performs the best in traffic flow prediction. From table 2, on the PeMSD4 dataset, our model is better than HA, ARIMA, STGCN, DCRNN, ASTGCN(r), GWN, LSGCN, STSGCN, STFGNN, Z-GCNETs, and STG-NCDE with 50.11, 43.75, 10.34, 10.60, 17.26, 23.78, 11.89, 10.47, 7.37, 2.71, and 1.24% lower MAE, with 49.35, 38.51, 14, 10.27, 14.81, 24.34, 11.39, 10.83, 7.71, 5.08, and 3.49% lower RMSE, and with 54.05, 47.02, 7.37, 9.59, 22.64, 25.91, 2.8, 7.84, 23.61, -0.23, and -0.39% lower MAPE, respectively. On the PeMSD8 dataset, our model is better with 57.11, 51.91, 14.57, 11.11, 18.08, 18.21, 15.67, 12.72, 11.74, 5.07, and 3.23% lower MAE, with 60.92, 47.76, 14.54, 12.17, 17.49, 22.96, 13.49, 13.61, 11.8, 7.8, and 6.69% lower RMSE, and with 66.96, 59.48, 18.42, 15.65, 20.87, 24.19, 17.76, 15.96, 13.11, 7.99, and 7.15% lower MAPE, respectively.

The STSGCN model considers both localized spatio-temporal correlations and the heterogeneities in spatial-temporal data. Therefore, its performance is better than STGCN, DCRNN, ASTGCN(R), GWN, and LSGCN. The SFTGNN obtains hidden spatio-temporal correlations by fusing spatial and temporal graph operations and integrating the gate convolution module at the same time. Z-GCNETs proposed new GCNs with a time-aware Zigzag topological layer to obtain spatio-temporal correlation. The STG-NCDE model uses two neural controlled differential equations (NCDEs) to obtain the temporal and spatial correlations. Since The STSGCN model only extracted localized spatio-temporal correlations, its performance was inferior to that of SFTGNN, Z-GCNets, and STG-NCDE. The ST-AGRNN model obtains both localized and global spatio-temporal correlation and combines location-based graph embedding representation to obtain localized spatial correlation. So, the overall performance on both datasets is better than all baseline models.

4.5. Case Study

We selected two nodes with heavy traffic from the two datasets to show the ground-truth and predicted curves: nodes 111 and 261 in PeMDS4 and nodes 9 and 112 in PeMSD8, as shown in Figures 4 and 5, respectively. From the figures, it can be seen that the model fits this trend well in places with huge traffic flows between 7 : 00 and 9 : 00 a.m. and between 3 : 00 and 6 : 40 p.m. Figure 6 shows the change in the nodes’ 15-minute speed. From the figure, the traffic speed also drops sharply at the peak time of corresponding traffic flow.

4.6. Error for each Length of Forecasting

Figure 7 shows the trend of the prediction error of the model in terms of prediction speed on two datasets. From the figure, it can be seen that although the error increases for all of the models as the prediction length increases, the error of our model is smaller than baselines and the increasing trend of our model is the flattest. This proves that our model is more stable than the baseline models.

4.7. Ablation Experiments

In the traffic network, the road sections at different locations play different roles in traffic. Road sections in central areas have a greater impact on the surrounding traffic, while remote road sections play a small role in influencing traffic. These are the global spatio-temporal correlations. To verify the importance of global spatio-temporal correlations, we conduct ablation experiments on speed prediction.

From the comparison of the traffic speed prediction results in Table 3, it can be seen that the prediction error of the ST-AGRNN model with the attention mechanism is smaller overall than the error of ST-DWGRU [58] without the attention mechanism. As an example of the 60-minute prediction results, the MAE of ST-AGRNN on the PeMSD4 dataset is 7.3% smaller than that of ST-DWGRU, the RMSE is 9.4% smaller, and the MAPE is 8.2% smaller. The MAE of ST-AGRNN on the PeMSD8 dataset is 2.5% smaller than that of ST-DWGRU, the RMSE is 4.5% smaller, and the MAPE is 2.5% smaller. From the results, it is clear that the ST-AGRNN model is more effective in obtaining complex spatio-temporal information.

5. Conclusions

A new traffic state prediction model is proposed, in which localized spatial correlation is obtained by a GCN and DeepWalk, localized temporal correlation is obtained by a GRU, and the global spatio-temporal correlations is obtained by the attention mechanism. Finally, the proposed model ST-AGRNN was tested with two publicly available datasets, namely, PeMSD4 and PeMSD8. In terms of traffic speed prediction, MAE improved by 15-53.14% and 10.96-48.73%, RMSE improved by 12.26-52.41% and 3.72-49.63%, and MAPE improved by 22.77-60.97% and 12.07-53.8% on the PEMSD4 and PEMSD8 datasets, respectively, compared to the baseline models. Meanwhile, the ST-AGRNN model also showed different degrees of improvement in traffic flow prediction compared with the baseline models. From the results, it is clear that ST-AGRNN outperforms all of the baseline models, and is more stable.

Data Availability

Previously reported traffic data that were used to support the study are available. These prior studies (and datasets) are cited at relevant places within the text as references [43].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was financially supported by the National Natural Science Foundation of China (61977001).