#### Abstract

Traffic prediction is the key for Intelligent Transport Systems (ITS) to achieve traffic control and traffic guidance, and the key challenge is that traffic flow has complex spatial-temporal dependence and nonlinear dynamics. Aiming at the lack of the ability to model complex and dynamic spatial-temporal dependencies in current research, this paper proposes a traffic flow prediction model Attention based Graph Convolution Network (GCN) and Transformer (AGCN-T) to model spatial-temporal network dynamics of traffic flow, which can extract dynamic spatial dependence and long-distance temporal dependence to improve the accuracy of multistep traffic prediction. AGCN-T consists of three modules. In the spatial dependency extraction module, according to the similarity of historical traffic flow sequences of different loop detectors, an adjacency matrix for the road network is constructed based on a sequence similarity calculation method, Predictive Power Score (PPS), to express latent spatial dependency; and then GCN is used on the adjacency matrix to capture the global spatial correlation and Transformer is used to capture dynamic spatial dependency from the most recently flow sequences. And then, the dynamic spatial dependency is merged with the global spatial correlation to obtain the overall spatial dependency pattern. In the temporal dependency extraction module, the temporal dependency pattern of each traffic flow sequence is learned by the temporal Transformer. The prediction module integrates both patterns to form spatial-temporal dependency patterns and performs multistep traffic flow prediction. Four sets of experiments are performed on three actual traffic datasets to show that AGCN-T can effectively capture the dynamic spatial-temporal dependency of the traffic network, and its prediction performance and efficiency are better than existing baselines. AGCN-T can effectively capture the dynamics in traffic flow. In addition to traffic flow prediction, it can also be applied to other spatial-temporal prediction tasks, such as passenger demand prediction and crowd flow prediction.

#### 1. Introduction

Traffic flow prediction is the prerequisite for traffic control and traffic guidance, and it is also the key to the research and implementation of the Intelligent Transportation System (ITS). Traditionally, real-time traffic data are collected by in-ground loop detectors or traffic surveillance camera systems, which are fixed measure stations that obtain information in terms of flows, occupancy, speed, or videos. Prediction of one or more information of these within one or several hours ahead is typically called short-term flow prediction [1]. In this work, we focus on traffic speed prediction, which predicts the future traffic speeds for each loop detector using historical speed.

The traffic flow on a loop detector is spatially affected by the dynamic traffic of other loop detectors in the traffic network and has multiple timing characteristics such as proximity, periodicity, and trend. These complex dynamic spatial-temporal correlations make traffic flow prediction become a very challenging task. Early studies regarded traffic flow as an independent time series and used time series-based methods to predict it. The most representative one is ARIMA (Autoregressive Integrated Moving Average) model and its variants [2–4]. This type of method requires data to be stable and continuous and cannot be well adapted to dynamic and complex traffic flow prediction. Subsequent research gradually considered spatial relationships and external background data, such as locations, weather, and events, and used traditional machine learning methods for modeling [5–8]. However, these models rely heavily on expert experience in related fields and require manual feature engineering. In recent years, as deep learning has been widely used, a large number of studies have adopted RNN (Recurrent Neural Network) and/or CNN (Convolutional Neural Network) to automatically extract spatial-temporal correlation features from traffic flow, which greatly improves the prediction accuracy [9–12]. However, these works still have some limitations, including the following: ① RNN-type methods may cause serious training problems due to the vanishing gradient and explosive gradient problems; ② CNN-type methods are based on a regular network structure, but gridding the traffic network will result in the inability to effectively express non-Euclidean spatial characteristics of the traffic network. To solve the problem, the GNN (Graph Neural Networks) method, which constructs the traffic network as an irregular graph structure, has received a lot of research in recent years. In GNNs, GCN (Graph Convolutional Network) has gained the most attention because of its natural fusion of structure (spatial characteristics in traffic flow) and attributes (temporal characteristics in traffic flow) [13–17]. At the same time, research based on Graph Attention Network (GAT) is gradually increasing because the attention mechanism it uses can effectively learn the dynamics in traffic flow [18].

However, GNNs still fall short of modeling traffic flow due to the following challenges. Firstly, in the expression of spatial dynamics, existing GNNs rely heavily on a predefined network structure and lack consideration or expression of time-varying spatial correlation. However, on the one hand, existing topology construction methods for traffic networks usually consider the local connection characteristics of roads, constructing the topology based on connectivity or distance, and ignoring the global impact among roads, so the constructed traffic network may ignore the correlation among distant roads which share similar temporal patterns. On the other hand, the spatial dependence between roads is not strictly stable, and the spatial pattern of traffic flow will also change significantly over time. For example, on weekdays and weekends, or morning peak and evening peak, two specific roads usually have a different relationship. There are complex spatial dependencies between different road nodes, and the traffic flow between different areas at different times has complex temporal dependencies, so different roads have different patterns of traffic flow. Secondly, when expressing temporal dynamics, existing GNNs usually combine RNN-type methods to capture short-term trends and long-term periodicity. However, RNN-type methods have problems of time-consuming iterative propagation and gradient explosion or disappearance and cannot simulate the temporal change pattern well. Finally, the current prediction model is generally oriented to single-step prediction, which makes its forecast accuracy decrease significantly as the forecast time range increases. However, in practical applications, multistep prediction is more valuable, and how to improve the accuracy of multistep prediction becomes an urgent problem to be studied.

To meet the above challenges, we study the problem of multistep traffic prediction by proposing a new model AGCN-T (Attention-based Graph Convolution Network and Transformer) by fusing GCN and Transformer. AGCN-T extracts the global dependencies between nodes based on the similarity between time series firstly and builds the basic network topology by extracting the global dependencies among nodes in the traffic network based on the similarity among historical traffic sequences on different loops detectors. On this basis, GCN is used to capture global spatial correlation. At the same time, a spatial Transformer network with the attention mechanism learns the local influence of different loop detectors and provides the ability to dynamically perceive spatial dependent changes. And then, the temporal Transformer network is used to capture the dynamic temporal dependencies on each loop detector. Finally, the learned spatial-temporal features are fused and used for multistep traffic flow prediction. Notice that the Transformer used can obtain multistep prediction at one run instead of multiple predictions based on existing prediction results. Notice that in addition to traffic flow prediction, AGCN-T can also be applied to other spatial-temporal prediction tasks, such as passenger demand prediction and crowd flow prediction. The main contributions of the paper include the following:(i)We adopt a network building method PPS (Predictive Power Score), to learn hidden spatial dependencies among historical traffic sequences. PPS has intuitive physical meaning. Notice the method can automatically discover invisible network structures from the data without the guidance of prior knowledge. It is the first time that PPS has been used in the construction of an adjacency matrix for a traffic network.(ii)AGCN-T is proposed to model the dynamic spatial-temporal dependence for traffic flow prediction. To extract spatial pattern, the spatial dependency is learned by GCN and the spatial Transformer to extract the global and local spatial relationships, respectively, to indicate the periodicity and trend separately. The temporal dependency is learned through the temporal Transformer. Especially, Transformer is used instead of RNN technologies to solve the multistep prediction problem.(iii)Experiments are conducted on real traffic data sets, and the results show that the effect of AGCN-T is better than the existing mainstream prediction methods.

The paper is organized as follows: Section 2 gives a summary of state-of-the-art research in predicting traffic flow. We then formalized the traffic prediction problem and proposed the AGCN-T model in Section 3. Section 4 discusses the experiment design and the performances of the tested models. Conclusions are then drawn in Section 5.

#### 2. Literature Review

The field of traffic flow prediction has existed for almost five decades and covers a wide array of methodologies which can be divided into two categories, traditional statistical methods and machine learning methods.

##### 2.1. Traditional Statistical Methods

The research field of traffic prediction has evolved greatly ever since its inception in the late 1970s, while Ahmed et al. used ARIMA for highway traffic flow prediction for the first time [2]. Subsequently, ARIMA and its variants [3, 4], VAR [19], Kalman filter [20], and other algorithms used and regarded traffic flow as time series and performed statistical analysis on historical traffic flow to realize the prediction of future ones. The advantage of these models is their simplicity, but their prediction performance is poor, and the ability to mine the potential correlations among traffic flows and various influencing factors is insufficient because most of them are based on linear assumptions.

##### 2.2. Machine Learning Methods

###### 2.2.1. Classical Machine Learning Methods

Due to the deficiencies of traditional statistical models, researchers flocked to machine learning models, which can learn the nonlinear relationship between traffic flow and influencing factors and greatly improve the performance of traffic flow prediction. Typical methods include Support Vector Regression (SVR) [5], Bayesian model [6], kNN [7], Random Forest Regression (RFR) [8], etc. However, machine learning models rely on artificial feature engineering rather than learning directly from raw data. Artificially defined features are often difficult to capture the overly complex spatial-temporal correlations of traffic flow, resulting in information loss. Besides, shallow and simple structures of machine learning models also limit their prediction power.

###### 2.2.2. Deep Learning Methods

As theoretical and technological advances emerged in the middle of the 2010s, researchers started to apply DNN models for traffic prediction.

These models can automatically extract and capture the characteristics from data and have strong nonlinear data mining capabilities, showing superior performance. According to how they model spatial-temporal correlations, there are classified into temporal dependency models and spatial dependency models.

*(1) Temporal Dependency Models*. First of all, RNN and its variants, such as LSTM [9] or GRU [10], are neural network models that process sequential data well and are commonly used to model temporal dependency. But they only regard the traffic flow as a time series. However, these methods still treat traffic flow as time series. It is easy to lose the content learned at the previous time step, resulting in poor performance when the input data is a long series. Different from RNN, CNN is a fully convolutional network, which captures the time trend of nodes through the time convolutional layer, while the convolutional network does not rely on previous calculations, and the elements in the sequence can be parallelized. The receptive field of CNN is usually small, which is not conducive to capturing and storing long-distance dependent information [11]. Since traffic flow prediction has inherently complex spatial-temporal dependence, some research use hybrid CNN-RNN models for archiving better predictive effect than single models [12]. For example, CNN is used to extract interday and intraday traffic flow patterns, and LSTM is used to learn the evolution of intraday traffic flow in [21]. Refence [22] uses the density peak clustering algorithm and the genetic algorithm to optimize the input so that to improve the accuracy. In recent years, the self-attention mechanism adopts in traffic prediction to effectively capture long-term dependence because it can be easily adapted to data sequences of different lengths. Some recent research takes Transformer into account and achieves encouraging results [23].

*(2) Spatial Dependency Models*. To characterize the spatial correlation, CNN is a natural choice because it can capture the spatial local features well [11]. However, CNN is designed for the Euclidean spatial structure and needs to be based on the regular grid structure, which violates the natural non-European nature of the traffic network, so CNN cannot fully express the spatial association of traffic flow. GNNs present a new opportunity for traffic prediction due to their ability to capture spatial correlations in non-Euclidean network structures. In GNNs, GCN is used for traffic prediction in most cases. For example, T-GCN [13] is the first research to introduce GCN into traffic flow prediction; DCRNN [14] uses diffusion GCN to describe the information diffusion process in spatial networks and uses RNNs to model temporal correlation; ST-GCN [15] adopted Chebnet to capture the spatial correlation of traffic flow. These models usually combine RNN and GCN to model temporal and spatial correlation, respectively. In addition to GCN, attention-based traffic flow prediction methods gradually emerge. For example, ASTGCN [16] simultaneously employs graph convolutions and attention mechanisms to model the traffic flow; Graph WaveNet [17] proposed a new adaptive dependency matrix to capture the hidden spatial dependencies; and STGAT [18] adopted a dual path network with gating mechanisms and residual architecture, which contains gated temporal convolution and graph attention layer. However, the local similarity of road space is not considered.

These studies show that GCN is a powerful tool to deal with complex spatial-temporal dependencies, and the attention mechanism can be used to extract the dynamic changes of these dependencies. Although existing hybrid models of GNN and RNN have improved the prediction performance, there are still limitations, and before applying GNN and attention to traffic prediction tasks, there are three basic problems to be studied. Firstly, how to accurately represent the complex structure of the traffic network? Secondly, the predefined network structure is usually local and static, how to capture spatial dynamics on this basis? Thirdly, how to learn the temporal pattern of traffic flow better? Next, we will show our solutions to these problems.

#### 3. Methodology

First, we need to formalize the traffic flow prediction problem, and then introduce the AGCN-T model, and elaborate on the details of AGCN-T in the following. A list of abbreviations of the definitions and notations used in the paper is firstly given in Table 1.

##### 3.1. Problem Formalization

Traffic flow prediction means to perform prediction on future traffic information, as given historical information collected by a group of loop detectors. In this paper, traffic speed adopted by loop detectors is taken as a prediction object. In this section, we first give some definitions and then formalize the traffic prediction problem.

*Definition 1. *(Graph for Traffic Network ): The traffic network can be abstracted as a graph, which is represented as , where is the set of vertices in graph with , represents nodes in the traffic network, such as loop detectors; represents the set of edges between loop detectors. denotes the weighted adjacency matrix that is derived from the graph . Element in matrix is used to describe the relationship strength between vertex and vertex . Usually, the larger values mean that the two vertexes have higher correlations.

*Definition 2. *(Feature Matrix ): is the recorded historical observations, where represents the total number of historical time steps.where is represented as the feature vector and is the speed detected by the loop detector at time .

The formal definition of the traffic speed prediction problem is to find a mapping function , such that we can infer the snapshots of graph with in the future snapshots according to historical observations:The objective is to find parameters of the model which can minimize the error between the predicted speed and the observed ones:where and are observed and predicated traffic at time separately. is the loss function and is the optimal set of parameters for the function .

##### 3.2. Framework of AGCN-T

Figure 1 gives an overview of AGCN-T, which contains three modules: spatial dependency extraction module, temporal dependency extraction module, and prediction module. Firstly, the spatial dependency extraction module constructs an adjacency matrix for a traffic network based on historical speed sequences of nodes in a traffic network and uses the adjacency matrix as input to mine the global spatial dependency pattern by GCN; then, the spatial Transformer is used to obtain hourly dynamic spatial dependency; secondly, the temporal dependency extraction module uses the temporal Transformer to learn temporal dependency pattern of historical traffic sequences. In the end, the prediction module integrates learned spatial and temporal dependencies and performs multistep traffic flow prediction.

AGCN-T solves three difficult problems: ① adjacency matrix construction for traffic networks without connection features; ② extraction of spatial dynamic dependency; and ③ extraction of temporal dependency. In the next few sections, each part of AGCN-T is elaborated in more detail.

##### 3.3. Adjacency Matrix Construction

As we have mentioned, GCN is used in AGCN-T to extract spatial correlations in the traffic network. GCN needs a predefined adjacency matrix to perform graph convolution operations, so adjacency matrix construction is the key to the success of GCN.

Current mainstream research is usually based on the physical connectivity between nodes [16] or [17] distance between latitude and longitude [24, 25] to build an adjacency matrix. The literature [26] obtains the hidden interdependence between historical traffic sequences by the Pearson correlation coefficient. However, the Pearson correlation coefficient is more suitable for expressing linear relationships, but the traffic sequences are often nonlinearly related; second, the correlation coefficient matrix is symmetric, and due to the upstream and downstream relationship between the detectors in the road network, their mutual influence is not symmetrical. To this end, AGCN-T uses the PPS method to learn the correlation between nodes. This is the first time that PPS has been used in the construction of an adjacency matrix for a traffic network.

The PPS is a normalized index (ranging from 0 to 1) that tells us how much the variable could be used to predict the variable . The higher the PPS index, the more the variable is decisive in predicting the variable . Assuming that the traffic sequences detected by the two detectors , in the traffic network are and respectively, the ability of to predict represents the correlation between and . In PPS, is treated as the target variable and as the only feature, and then a regression decision tree is calculated. In detail, the method needs to divide the feature space . Each division examines all values of all features in one by one and selects the best one as the segmentation point according to the square error minimization criterion, and then a regression decision tree is built to obtain the MAE (Mean Absolute Error) value the median of is recorded as , then the element value of the adjacency matrix is obtained by the ability of to predict :

PPS models the nonlinear correlation among traffic sequences and obtains the adjacency matrix , as shown in Figure 2.

The construction process mentioned above does not consider the influence of loop detectors themselves, so we update the adjacency matrix by adding a self-loop.where is the identity matrix with size .

It is worth noting that in the above topology construction method, the direction dependence has been implicitly integrated into the adjacency matrix, so this paper adopts a simple and easy-to-calculate undirected graph instead of the complex directed graph.

##### 3.4. Spatial Dependence Extraction

Traffic flows often show a multiscale correlation in spatial dimensions, including global dependence and local dependence. The traffic status in different loop detectors tends to correlate with each other, and the strength of spatial correlations varies at different locations and highly depends on the underlying traffic network structures. This is global dependence. At the same time, the traffic conditions are temporalvarying. For example, the strength of the dependence between adjacent road sections is different in the morning peak and evening peak, which means that the spatial relationship is dynamic, and this is local dependence. Establishing an extraction model for multiscale spatial dependence among nodes in the road network is the key to accurately capturing spatial dependence.

AGCN-T uses GCN to mine the global spatial dependence pattern, uses a spatial Transformer to obtain the local dynamic spatial dependence that evolves, and then fuses the learned global spatial features and local dynamic features. Combined with adjacency matrix construction, the spatial dependence extraction module can be regarded as a general message passing GNN for dynamic graph construction and feature learning.

###### 3.4.1. Global Spatial Dependence Extraction

Taking the global correlation graph constructed by PPS as input, GCN is used to capture the inherent spatial dependence among nodes in the traffic network. The representation matrix obtained by GCN is as follows:where the size of is , and represents the dimension of the embedding vector. GCN is composed of two layers of convolutions:where is the symmetric normalized adjacency matrix, in which is the degree matrix, is the weight matrix of layer .

###### 3.4.2. Local Spatial Dependence Extraction

After applying GCN to capture the whole spatial correlations of traffic, we are seeking a way to capture the local spatial dependence. Here we use a spatial Transformer to capture the spatial dependence behind hourly timing information and express the dynamic impact of local changes on the traffic network structure.

The input and output of the spatial Transformer are two sequences of vectors. Notice that the input is the flow matrix of the last 12 intervals, namely, , with 5 minutes at intervals. In the spatial Transformer, each vector in the input sequence is linearly transformed into three vectors called query , key , and value [27]. Each output vector is computed as a weighted sum of all the values, where the weights are the outputs of a softmax layer, and the inputs of the softmax layer are scaled dot products of the corresponding query with all keys. The spatial Transformer is calculated as follows:where , , and are the weight matrices corresponding to , , and respectively, is the attention matrix, and is the dimension of the vector . After further learning through a three-layer feedforward neural network, the representation of local spatial dependence is obtained:where , are the parameters of different layers in the feedforward neural network.

###### 3.4.3. Fusion

The global spatial dependence learned by GCN represented by and the local spatial dependence learned by using the attention mechanism represented by are combined to adjust the spatial correlation of nodes in the graph, and thereby effectively simulate dynamic spatial dependence. Notice that the multimodal fusion mechanism of the Transformer can convert different features into a unified sequence, solve the problem of inconsistent multimodal input, and ensure the consistency of spatial-temporal feature fusion.

##### 3.5. Temporal Dependence Extraction

Traffic status at the same loop detector also exhibits strong correlations over time. To capture this temporal dependence, a temporal Transformer is used to model the sequence relationship of the flow sequence. On the one hand, this is to solve the problem that the RNN-type methods cannot extract long-term periodicity well. Transformer supports the multihead self-attention mechanism instead of abandoning the recurrence mechanism in RNN, and this can avoid the “forgetting” problem and solve the challenges faced by RNN. On the other hand, the attention mechanism and position coding strategy in Transformer can dynamically capture the context-related characteristics of sequence data to realize the traffic prediction of multiple time steps better.

The calculation process of the temporal Transformer and the spatial Transformer is the same, and the difference is that the input is a series of past traffic data, that is, the historical time series feature matrix , instead of the traffic matrix of the last 12 moments, and the output is a temporal dependence representation .

##### 3.6. Prediction

Spatial dependence and temporal dependence affect traffic flow together. Therefore, we first merge spatial-temporal dependence learned to form spatial-temporal feature and then perform multistep prediction through two convolutional layers:where is a 1 ∗ 1 convolutional operation. The goal of model training is to minimize the error between the actual traffic speed and the predicted one , so the loss function is mean absolute loss:

#### 4. Experimental Analyses

To verify the effectiveness of AGCN-T, four sets of experiments are designed to try to answer the following questions:(1)Question 1: How is the overall traffic prediction performance of AGCN-T as compared to various baselines?(2)Question 2: How do the designed different submodules contribute to the model performance?(3)Question 3: Is the PPS-based adjacency matrix construction method effective compared with the classic adjacency matrix construction method for traffic networks?(4)Question 4: From the perspective of running time, how is the model efficiency of AGCN-T?

##### 4.1. Experiment Preparation

###### 4.1.1. Experimental Environment and Dataset

The experimental development environment is shown in Table 2.

Three real network-scale traffic speed datasets are utilized in the experiments.

PeMSD7 [15] collects traffic information from 228 monitoring stations in the California state highway system during the weekdays from May through June of 2012. Traffic speeds are aggregated every five minutes and normalized with Z-Score as inputs.

Seattle [28] is collected from inductive loop detectors deployed on four connected freeways (I-5, I-405, I-90, and SR-520) in the Greater Seattle area and contains traffic data from 323 sensor stations over the entirety of 2015 at 5-minute intervals.

Los-loop [13] collects 207 loop detectors and their traffic speed from March 1 to March 7, 2012, on the highway of Los Angeles County. Traffic speeds are also aggregated every five minutes.

We use 60% of the data as the training set, 30% as the validation set, and 10% as the test set in strict chronological order.

###### 4.1.2. Evaluation Standard

To summarize various evaluation indicators in the literature, the most commonly used are the following: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE), and their calculation formulas are shown below.where and are the actual observed value and predicted one of node respectively.

###### 4.1.3. Baselines

We compare AGCN-T with three types of baseline methods, namely, classical statistical models such as ARIMA, DNN models that do not consider spatial information such as FC-LSTM, and GNN models based on spatial-temporal fusion such as T-GCN and ST-GCN, DCRNN.(1)ARIMA [2] predicts future data through a combination of statistical methods including autoregression, moving average, and difference calculations on historical data.(2)The encoder and decoder in FC-LSTM [9] both include multiple layers of LSTM and then use the fully connected layer for prediction.(3)T-GCN [13] uses GCN to learn the spatial characteristics of the traffic network, uses GRU to learn the temporal characteristics, and then fuses them for prediction.(4)The core of ST-GCN [15] is two ST-Conv Blocks. Each ST-Conv Block has two time-gated convolutions and a spatial GCN in between to extract temporal and spatial features, respectively.(5)DCRNN [14] captures the spatial dependence by bidirectional diffusion random walks and captures the temporal dependence by an encoder-decoder architecture with presampling.

###### 4.1.4. Experimental Parameter Settings

We train our model using an Adam optimizer with a learning rate of 0.0001. The dimension of AGCN-T is 512, and the number of heads in the attention of the Transformer is 8. Dropout with is applied to the outputs of the graph convolution layer.

##### 4.2. Evaluation Results and Analysis

###### 4.2.1. Experiment 1: Prediction Performance of AGCN-T

Aiming to answer Question 1, Experiment 1 compares AGCN-T with baselines and predicts the speed of each node on the three data sets of PeMSD7, Seattle, and Los-loop in the future 15 minutes, 30 minutes, 45 minutes, and 60 minutes respectively, the performance of each method is indicated by MAE, MAPE, and RMSE. The experimental results are shown in Tables 3–5.

It can be seen intuitively from the figures that AGCN-T has the best performance overall, while ARIMA has the worst prediction effect. This is because ARIMA is a traditional statistical method and has limited ability to model complex traffic data with nonlinear characteristics. FC-LSTM, like ARIMA, does not consider the spatial dependence among nodes in the traffic network, so the prediction ability is lower than the other four methods which consider the spatial dependence. By performing bidirectional diffusion graph convolution on the explicitly designed directed matrix to consider the influence of directionality, DCRNN outperforms ST-GCN and T-GCN. However, the influence of directionality is complicated and hard to measure. Instead, AGCN-T uses PPS to capture the dynamical directed spatial dependence in a data-driven manner to achieve similar results with a simple undirected adjacent matrix, and adopts Transformer to enhance the dynamical spatial dependence, demonstrating the effectiveness of modeling. Furthermore, compared with ST-GCN adopting three pairs of spatial and temporal units, AGCN-T consistently outperforms it with the combination of GCN and Transformer.

###### 4.2.2. Experiment 2: Sub-Module Comparison

For Question 2, to evaluate the effects of modules including temporal dependence extraction, spatial-dependence extraction, and spatial-temporal fusion in AGCN-T, disassemble AGCN-T to form four methods: AGCN-T-T, AGCN-T-A, AGCN-T-G, and AGCN-T, while AGCN-T-T only implements the temporal dependence extraction module in AGCN-T, AGCN-T-A implements the temporal extraction module and the global spatial dependence extraction submodule, and AGCN-T-G implements the temporal extraction module and the local spatial dependence extraction submodule. AGCN-T contains all modules. Tables 6–8 show the prediction performance of each method.

Observing Tables 6–8, it can be seen that with the increase in prediction time, the overall performance of AGCN-T is always better than the other three variants, which shows that the comprehensive use of all modules can achieve the best prediction effect. At the same time, the prediction performance of AGCN-T-G and AGCN-T-A is better than AGCN-T-T, indicating the importance of considering spatial dependence extraction; the overall prediction effect of AGCN-T-G and AGCN-T-A is similar, while AGCN-T-A is better on PeMSD7 and AGCN-T-G is slightly better on Los-loop, indicating that for traffic flow prediction, both local and global spatial dependencies are valuable.

###### 4.2.3. Experiment 3: Effect of PPS

In the construction of the adjacency matrix, AGCN-T uses PPS instead of the distance between loop detectors (a.k.a. Adj) or Pearson correlation between traffic sequences (a.k.a. Cov). To answer Question 3 and to study the influence of different spatial structure representation methods on the prediction effect, Experiment 3 verifies the effect of the PPS by comparing the prediction effect through changing the input of GCN in AGCN-T to the adjacency matrix constructed by Adj, Cov and PPS, respectively. The prediction results are shown in Table 9.

It can be seen from Table 9 that after the three data sets of PeMSD7, Seattle, and Los-loop are verified that the adjacency matrix constructed by PPS enables AGCN-T to achieve the best performance. The adjacency matrix constructed based on the distance of the latitude-longitude pair is only suitable for situations where there is a fixed spatial-temporal relationship between loop detectors and cannot reflect the dynamic influences of traffic flows. On the contrary, Pearson correlation and PPS are suitable for expressing the dynamic influence among loop detectors because they can reflect joint change degree of speed among detectors. However, Pearson correlation tends to extract symmetrical and linear influences, which do not conform to the asymmetric and nonlinear nature of traffic flow. PPS reflects the ability of one node to predict another node, which is not affected by symmetry requirements and has no linear correlation restrictions, so the constructed adjacency matrix is more in line with the characteristics of traffic flow.

###### 4.2.4. Experiment 4: Comparison of Efficiency

Aiming to Question 4, we compare the computation cost of AGCN-T with its baselines. Figure 3 gives the running time of each method on PeMSD7.

As can be seen from Figure 3, due to the use of the statistical method, the efficiency of ARIMA is significantly higher than that of deep learning-based methods and GNN-based methods. Compared with ST-GCN and DCRNN, AGCN-T has doubled its operating efficiency. This is because AGCN-T generates multistep predictions in one run while the other two have to produce the results conditioned on previous predictions. FC-LSTM and T-GCN adopt RNN technology which requires iterative training and learning, so their computational efficiency is lower than that of Transformer, which calculates multistep predictions in one run.

In summary, after the comparison of the above four sets of experiments, AGCN-T proposed in the paper has a certain improvement in the multistep traffic prediction, and also has obvious advantages in prediction efficiency.

#### 5. Conclusions

In this paper, a novel traffic prediction model AGCN-T is proposed. A matrix construction method PPS is used to identify the asymmetric and nonlinear relationship in data sets. AGCN-T can dynamically model spatial dependency by GCN and spatial Transformer and temporal dependency by temporal Transformer. Four sets of experiments on three real datasets demonstrate the superior performance of AGCN-T in multistep prediction. Based on AGCN-T, an experimental platform for traffic flow prediction is realized, and it can be used in practice after expansion and optimization.

However, AGCN-T only considers the traffic data itself, ignoring the impact of other related information such as traffic events such as accidents [29] and so on. Multisource data will be the direction of data sources for traffic flow prediction in the future. How to effectively use these data for more accurate modeling will be continually studied in the follow-up work for traffic prediction on multisource data.

#### Data Availability

The data used are openly available at https://github.com/I-am-YuLang/AGCN-T.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work was supported by the Shaanxi Provincial Natural Science Foundation Project (No. 2020JM-533).