Abstract

Short-term Origin-Destination (OD) flow prediction plays a major part in the realization of Smart Metro. It can help traffic managers implement dynamic control strategies to improve operation safety. Also, it can assist passengers in making reasonable travel plans to improve the passenger experience. However, there are problems that the dimension of OD short-term traffic prediction is much higher than the base number of metro stations and the OD matrix is sparse. To resolve the above two problems, a threshold-based method is proposed to extract key OD pairs first. OD passenger flow contains the attribute information of the Origin-Destination station and exhibits similar time evolution characteristics, so the spatial and temporal correlation needs to be considered in the prediction. Pearson correlation matrix is used to build a virtual graph and model the virtual connection between OD pairs. A spatiotemporal virtual graph convolutional network (ST-VGCN), which combines the advantages of a graph neural network and gated recurrent neural network, is proposed to identify spatial associations and temporal patterns simultaneously. The proposed method is evaluated on 39 days of real-world data from Shenzhen Metro, which outperforms other benchmarks. The research in this work can contribute to the development of short-term OD flow forecasts and help to provide ideas for the research on real-time operation and management of rail transit. Furthermore, it can help to establish passenger flow prediction and early warning mechanisms to quickly evacuate a large number of passengers in case of emergency.

1. Introduction

In the wake of developments in intelligent cities, the analysis and mining of big traffic data have attracted widespread attention in the field of intelligent transportation. As an important means of transportation to alleviate road traffic congestion in big cities, urban rail transit has gradually entered the era of network operation. At the same time, a large amount of passenger flow data is generated, which makes the research on passenger flow prediction more and more important [1]. The most basic thing is modern technology has greatly guaranteed the safety of train operation [2], ensuring the growing demand for safe travel of residents. Shenzhen Metro, for example, the average travel scale of which in a single day is up to 5 million person-times, carries more than one-third of the traffic flow of the entire city and occupies an important position in the urban public transportation system. However, the ever-increasing travel demand has brought greater and greater pressure to the operation of urban rail transit. Especially during the morning and evening rush hours, the demand for commuting is very large, so the operating efficiency of the metro directly affects the overall commuting efficiency of the city. In addition to the research on the operation safety of train control systems [3], more and more researchers have paid attention to the passenger flow data generated by metro operations and conducted multilevel analysis [4] on it to ensure the operation safety. Because of the enormous scale of the current metro network system, passenger travel shows great complexity. Most of the existing research pays more attention to the prediction of inbound and outbound passenger flow, daily ridership, hourly ridership, and sectional passenger flow. OD passenger flow reflects the direction of flow in and out of the station, which is a visual display of passenger travel needs and contains precious information.

Because of the good connectivity between stations in the metro network, the travel demand of passengers is usually expressed by an abstract OD matrix. The elements in the matrix represent the OD passenger flow. As important carriers of the metro passenger flow, they represent the travel times between station O and station D within a specific time interval. OD matrix can well describe the source and destination of the ridership between any two stations, which is the foundation for travel behavior characteristic analysis. The existing information platform establishes a channel of communication between the metro system and passengers to provide passengers with metro network status prediction information and help passengers plan their trips reasonably. At the same time, OD passenger flow prediction combined with real-time detection of passengers in densely crowded areas of metro stations [5] can provide better services for ensuring operational safety. However, the prediction results may differ from the actual situation, especially during peak periods. This inaccuracy disrupts travel plans and creates a sense of mistrust for the information while affecting metro managers to make judgments. Therefore, how to accurately predict the OD passenger flow of the whole network is an important issue for the refined and comprehensive prediction of the rail transit network. By predicting the fine-grained passenger flow and its dynamic changes in the network state, metro managers can timely adjust the operation plan and carry out reasonable passenger flow organization and control [6], which is of great practical significance to ensure traffic safety. Among them, passenger flow organization and control includes, but is not limited to, adjusting schedules, adding services, disseminating information to passengers in response to the condition of demand surge, and incentivizing passengers to delay or change travel time.

The research of metro OD passenger flow based on AFC data mainly includes three aspects, OD matrix, the main passenger flow direction of the origin station, and key OD pairs. Since the OD matrix is sparse and highly skewed, combined with the analysis of the main direction of the origin station, it can be known that the key OD pairs reflect the travel routes of large demand, which largely determines the metro operation status of the network. Meanwhile, they are the main object of metro vehicle allocation which means higher requirements for ensuring efficient and orderly trains [7]. At the same time, the OD pair implicitly includes the information of station O and station D and is regular in time variation like time series. For instance, most of the flow in the morning rush is commuters whose commuting routes are relatively fixed. OD pairs with O as residential areas and D as business areas show a similar regularity. That is, OD pairs show some dependence on the consecutive time intervals and the similar spatial properties of the Origin-Destination station. Therefore, how to determine the key OD pairs and capture the spatiotemporal dependence between them at the level of metro network becomes a key problem.

In this work, we propose a method to extract and predict flow of key OD pairs in metro network. According to the historical data of the AFC system, we extract the OD matrix and then obtain the key OD pairs through filtering. We introduce a spatiotemporal virtual graph convolutional network (ST-VGCN) to model the spatiotemporal dependencies between key OD pairs. The virtual topology is used to establish a similarity map. The graph convolution combined with GRU is used to extract spatiotemporal features and predict the key OD passenger flow. We carry out an experiment on the real-world dataset from Shenzhen Metro to prove the advantage of ST-VGCN. The main contributions of this study are as follows:(1)We obtain OD matrix by processing AFC historical data and extract the key OD pairs that have a large demand for metro travel by setting reasonable thresholds(2)We propose a spatiotemporal virtual graph convolutional network (ST-VGCN) model, in which we establish a similarity graph based on the Pearson coefficient as a virtual topology, to capture the spatial and temporal dependencies between key OD pairs(3)We have verified our method through a case study of Shenzhen Metro, and the validity of the model was confirmed by comparative tests

The rest of this paper is organized as follows: In Section 2, we review related work on OD traffic forecast. In Section 3, we describe our main work on the key OD flow prediction. In Section 4, we evaluate our method based on real data and present our results and analysis. In Section 5, we summarize our paper and discuss several possible directions for future work.

Most of the researches concentrate on the prediction of road traffic flow and traffic speed [8]. More and more researchers focus on urban rail transit forecasting, among which the research on OD prediction is relatively less. Considering the similarity between OD sequence data and inbound/outbound data of urban rail transit stations, we will introduce relevant studies on traffic prediction in related work and further summarize the research on OD passenger flow prediction.

Early researchers used methods based on statistical learning, such as Historical Average [1] (HA), Autoregressive Integrated Moving Average [9] (ARIMA), Vector Autoregressive model [10] (VAR), etc. Later, traditional machine learning and neural network methods were applied to traffic flow prediction. Wu et al. [11] studied the travel time prediction problem using support vector regression(SVR). Zhu et al. [8] came up with a linear conditional Gaussian Bayesian network model for short-term traffic flow forecast, which took into account spatiotemporal characteristics and velocity information. Yang and Hou [12] used a hybrid model based on wavelet analysis and least squares support vector machine to complete short-term rail transit passenger flow prediction. Jiao et al. [13] proposed an improved Kalman filter model based on Bayesian combination and nonparametric regression, in which real-time passenger traffic is deviated from historical data to mitigate the volatility of original data.

Because of the good performance of deep learning models in other fields, many scholars try to use them in traffic prediction. Zhang et al. [14] presented deep-ST, which is the first model that uses convolutional neural network to mine spatial dependence between grids. On this basis, ST-ReSNet [15] adopted the framework of residual convolution network and considered the time series of three different trends to mine the spatiotemporal relationship; then, external factors such as weekdays and weather were integrated to improve the accuracy of traffic flow prediction. Liu et al. [16] used Long Short-Term Memory (LSTM) and full connection layer(FC) to predict passenger flow inbound and outbound of metro stations. Ma et al. [17] converted the data of metro passengers (i.e., Destination station) in Beijing into images by using Convolutional Neural Networks(CNN) and a bi-directional LSTM layer. Yang et al. [18] proposed a novel attention mechanism-based end-to-end neural network to predict the inbound and outbound passenger flow, which improved the prediction effectiveness.

All the above models can effectively capture complex time features, and the proposed Graph Convolutional Neural Network (GCN) solves the problem of Network spatial correlation effectively. Traditional neural networks can only effectively extract some features from standard grid data and cannot deal with the complex and nonlinear traffic data well. In this case, we need to consider using graph convolutional neural network. Chen et al. [19] constructed topology, similarity (based on dynamic time-warping distance), and correlation graph to represent the dependency between passengers at different transfer stations, and then used a variant of graph neural network to conduct demand forecasting. Defferrard et al. [20] applied a three-dimensional convolution operation to seamlessly capture irregular spatiotemporal dependence on metro network. Guo et al. [21] proposed the ASTGCN model, which added an attention mechanism to take into account the dynamic influence of different time periods and different places on adjacent time periods and adjacent places. Yan et al. [22] proposed a spatiotemporal graph convolution model (STGCN) combining graph convolution and CNN based on spatial domain. Zhao et al. [23] put forward a special passenger flow forecasting prediction model based on temporal graph convolutional network (T-GCN), which combines a graph convolution network (GCN) and gated recursive network (GRU). In this model, GCN learns complex nonlinear structures to capture spatial topology, while the gated loop unit understands dynamic changes of traffic and captures time-dependent data.

There are many studies on the estimation and prediction of short-term traffic origin and destination, especially in the area of taxi and car-hailing travel [24]. The research on passenger flow OD prediction starts from residents travel, road traffic, and urban public transportation network, which is called travel distribution prediction or traffic distribution prediction.

One of the important differences between OD prediction of traffic and public transport is the high dimension of data. A network with N sites consists of N  N OD pairs, and it is not extensible to exploit a model with a traditional method for each OD pair. Matrix/tensor decomposition is an effective method to solve the high-dimensional problem of OD matrix prediction, and many researches [25] have explored this based on it. Deep learning is also one of the mainstream OD prediction methods. Because the data is high dimensional, Toqué et al. [26] applied LSTM networks to selected high-traffic OD pairs with heavy traffic. Wang et al. [27] improved OD flow prediction network of GCN + LSTM by multi-task learning. Shen et al. [28] mixed CNN with gravity model to predict the OD matrix of the metro system. The effect of deep learning model is usually affected by noise in sparse Metro OD matrix. To reduce the influence of noise, Zhang et al. [29] developed an index called OD attraction degree (ODAD) to cover up nonimportant OD pairs, indicating that shielding OD pairs close to zero can improve the prediction of LSTM. Meanwhile, a Channel-wise Attentive Split-CNN (CAS-CNN) model [30] is developed for metro OD matrix prediction. Gong et al. [31] proposed a real-time delayed data collection problem and discussed how to address it. Peyman et al. [32] considered the issue of delayed data availability, which is a challenge in the prediction of dynamic Origin-Destination (OD) demand.

From the traditional method to the current artificial intelligence method, it is all to better capture the regularity of the prediction content. Most of the predictions for inflow and outflow are at the station level, while the prediction of OD flow often involves stations in the entire network. The traditional method mainly makes predictions based on statistical laws, but it cannot show the influence of various factors, and the prediction accuracy is low. The GRU method is often used for simple time series prediction, which is very effective, but it lacks the capture of spatial information in the problem of rail transit OD passenger flow prediction. Using the knowledge of graph theory and convolution, GCN can well capture the physical space information brought by the metro network and the virtual space information brought by the OD passenger flow. Masking unimportant OD pairs was proposed in previous methods, which has a strong inspiration for us. Therefore, in this work, we study how to extract important OD pairs for research. At the same time, according to the analysis of the properties of key OD passenger flow, GRU combined with GCN is selected to capture spatiotemporal information to achieve good results.

3. Methodology

3.1. Key OD Pairs Prediction Problem

The Metro AFC system records the original travel data of passengers, including card number, entry station number and time, transaction type (inbound or outbound), and other data. In this way, we know the entry and exit stations and the corresponding time. We summarize them at a fixed time interval and calculate the traffic demand from O to D in the statistical period based on O, regardless of whether the journey is completed or not. We finally form an OD matrix, where N represents the total number of metro stations in the dataset. We extract the key OD pair set as

In our work, a virtual graph between key OD pairs denoted as is established, which is built to represent the connection relationship between OD pairs. and denote the set of nodes and the set of edges, respectively. In our work, is equivalent to the set of key OD pairs used to encode the characteristics of nodes, which refers to the traffic time series of each OD pair. The adjacency matrix that only contains elements of 0 and 1 represents the virtual connection relationship between OD pairs. The corresponding element is 1 if there is an association relationship between nodes, otherwise, it is 0. The data collected by the key OD pair within consecutive moments is expressed as , where represents the data collected by each key OD pair at the time . The feature matrix represents the attribute characteristics of the node, represents the number of key OD pairs, and represents the historical time sequence length. The purpose of our paper is to predict the passenger flow of the future time intervals according to the historical passenger flow data of key OD pairs in previous time intervals, which can be expressed as the following learning function:where represents the values of all OD pairs at time interval and is a mapping function.

3.2. Key OD Pairs Extraction

The OD matrix describes the number of travels between each OD pair in the system during the time interval. The amount of stations in the metro system is , and the size of the generated OD pairs is , which means a huge number. The heat map formed by the OD matrix at the 10-minute granularity is shown in Figure 1(a). As shown by the color bar on the right side of Figure 1(a), the brighter the color indicates the greater the amount of OD between the two stations. It can be seen that the graph has a large black part, that is, most of the elements in the OD matrix are 0, which means that there is no passenger travel demand between most of the stations. Therefore, not all OD pairs are the ones we need to pay attention to. The passenger flow of some noncritical OD pairs is scarce, and the travel demand is very random. The contribution of this kind of OD passenger flow is very small, implying that relatively few key OD pairs account for the vast majority of the overall OD passenger flows. So we use the historical passenger flow data set of each OD pair to filter OD pairs, where is the length of the selected historical data, represents the OD value of the period. In order to achieve the extraction of key OD pairs, we set three thresholds, including the proportion of nonzero value, the randomness judgment value and the proportion of value that is greater than the randomness judgment value. The key OD pairs meet the following conditions:where and means a subset of key OD pairs set in which elements value is zero and and a subset in which elements value is greater than randomness judgment value .

We selected 14 OD pairs and showed them in Figure 1(b). The horizontal axis represents time. From this picture, we can see that the critical and noncritical OD pairs exhibit large differences. And different OD pairs show different time-varying laws. The color bar of OD14 is all black, indicating that there is no OD demand between the two stations. OD11–OD13 have some faint colors, indicating that the passenger travel demand between such noncritical OD pair is very random. The remaining OD pairs show different color distributions, representing different passenger flow characteristics.

3.3. Virtual Graph Construction

Graph neural network performs graph convolution based on the relationships between nodes. We conduct research and prediction based on key OD pairs, so it loses the dependency information that comes with the physical topology of the real stations. However, the OD pairs combine the spatial properties of the origin-destination station by themselves. For example, if station O is a residential area and station D is an office area, this kind of OD pair has similar properties. The OD pairs show correlation, that is, OD pairs may have similar traffic distribution characteristics because of similar functionality. So virtual connection edges can be established to generate an adjacency matrix.

To measure the degree of correlation between two variables, researchers usually use the Pearson correlation coefficient. In the quantification of the correlation between time series, it can be used to measure how two continuous signals change together with time, and the correlation coefficient shows their relationship. The Pearson correlation coefficient can be calculated by the following formula:where and represent two variables and and represent the mean of the variables.

According to the above calculation formula, we obtain the correlation matrix R that describes the relation between key OD pairs. represents the Pearson correlation coefficient between observed historical passenger flow data series of the and OD pair. Classical GCNs encode adjacency between nodes to represent arbitrarily structured graphs. A binary-encoded adjacency matrix A is usually used to represent the connectivity between nodes. if nodes i and j are directly connected in the graph, otherwise . According to the meaning of the correlation coefficient, if the absolute value of the relative coefficient is larger, the correlation between variables is stronger. We set a threshold c to determine whether to establish a virtual connection between OD pairs. The formula is as follows:

3.4. Spatiotemporal Virtual-Graph Convolution Network

After completing the virtual graph construction, we use the combined model of GCN and GRU to model the spatiotemporal dependencies of key OD pairs. The prediction framework of this paper is shown in Figure 2. The main structure of the model is divided into three parts, which are the input layer, feature extraction layer, and output layer. The input layer receives raw historical OD matrix data. After Key OD pairs extraction, feature matrix that represents multiple time steps historical flow of all key OD pairs is generated. And through virtual graph construction, adjacency matrix is generated. The feature extraction layer is composed of a graph convolution module and a GRU module, and the inputs of it are and . Firstly, the graph convolution module receives the data from the input layer and learns both node features and structural information end-to-end through graph convolution operation to obtain rich node information and aggregate spatial features. Then the result with spatial features of each node is sent to the GRU module to capture the time series features. The final output layer gets the prediction result.

There are two main categories of GCN methods: spectral-based and spatial-based. In our work, we use a graph convolutional network based on the spectral method. Given a feature matrix X representing node features and an adjacency matrix A representing structural features, the graph convolution operation computes the information of nodes using the information of related nodes. The core calculation formula is as follows:

Here, represents the sum of the adjacency matrix and the identity matrix ; represents the degree matrix of ; represents the is normalized; represents the weight matrix; and denotes the activation function.

GRU is one of the most widely used recurrent neural networks for processing series data. It can be regarded as a combination of reset gate and update gate. It is used to model the sequence information that has undergone graph convolution operations to capture its temporal features and complete the prediction task. After the original sequence passing through the graph convolution layer, new sequence data containing spatial information is obtained as . We input the new sequence data into the GRU network. The feature extraction layer improves the basic GRU structure in combination with graph convolution operation. The result is shown in Figure 3.

This process can be described by the following equations:

In the formula, is the hidden state at time ; is the flow information of all key OD pairs at time t; is the reset gate in the GRU model, which is about how the new input information is integrated with the previous memory; the update gate indicates the amount of previous memory saved to the current time step; is the memory content stored at time ; is the output state at time . GRU takes the hidden state at time and the current key OD pair flow information that has undergone graph convolution operations as input, and obtains the flow state at time . The key OD flow is predicted in order to make the forecast result as close as possible to the actual traffic demand. Therefore, we need to select the loss function to estimate the degree of inconsistency between the predicted value and the real value of the model. Our ultimate goal is to minimize loses during training process. The loss function chosen in this paper is shown as follows:where and are the true OD flow and predicted values and represents the length of observation window.

4. Experiments

4.1. Data Description

Because of few public benchmarks for metro passenger forecasts, we construct the dataset MetroSZ2020 with 39 consecutive days of metro smart card data which records card number, origin number, destination number, and the entry and exit time of each metro trip from Shenzhen, China. Because some data in the Shenzhen metro network cannot be obtained, after data cleaning, the data of 205 stations are available. As shown in Table 1, MetroSZ2020 covers 205 metro stations in Shenzhen from August 23rd to September 30th, 2020. We select 6:00–24:00 as the metro operating period. We count the OD matrix every 10 minutes, which means 108 matrixes per day and contains a total of 42025 OD pairs per matrix.

Three thresholds for , , and are set as 80%, 10 and 10% respectively. We test different threshold combinations in experiments and obtain different numbers of key OD pairs. However, considering the hardware resource problem and the model training situation, the above threshold combination is finally selected. By analyzing the OD matrix heat map similar to Figure 1(a), it can be seen that it is meaningful to set the nonzero value ratio to at least 80%. The randomness judgment value needs to be changed according to the change of time granularity. The time granularity selected in this paper is 10 min. For the OD matrix of 205 stations, setting to 10 can already indicate that the OD passenger flow has a certain regularity. For a station with regular passenger flow, it takes a sustained period of time to express the regularity, so setting to 10% is a reasonable choice. The key OD pairs that meet less than 80% of the historical time series data is zero and more than 10% of the data value is greater than 10 are extracted. A total of 490 key OD pairs of data are used as input and we scale the data to (0, 1], divide by the maximum value into the data. When proceeding with virtual graph construction, we set the threshold as 80%. The data is divided into training data and validating data according to the ratio, which is 0.8 in our research.

4.2. Model Configurations

Our experiment is completed in the Pytorch environment on a workstation equipped with an Intel(R) Core(TM) i7-6800k processor whose cache is 15 M and working frequency is up to 3.40 GHz, 16 GB memory space, and NVIDIA GeForce GTX 1080 Ti graphics card. We train the model using the Adam optimizer. To obtain the best experimental results, we manually adjust the determined parameters including the number of hidden units and the number of training epoch which may greatly affect the prediction precision. As shown in Figure 4, the horizontal axis indicates the different parameter choices and the vertical axis indicates the variation of the different metrics. The red dots indicate the performance of each metric under the selected parameters. First, we test the training epoch in the set [500, 1,000, 1,500, 2,000, 3,000, 3,500, 4,000, 4,500, 5,000] and analyze the variation of the model performance. Figure 4(a) shows the results of metrics for different training epochs, and Figure 4(b) shows the variation of metrics for different hidden units. With the increase of training epoch value, the variation of evaluation metrics stabilizes, and there is a turning point at 3,000. So we fix the training epoch at 3,000 and select the number of hidden cells from the set [32, 64, 100, 128, 256]. As shown in Figure 4(b), the model becomes stable when hidden units reach 128. Therefore, the training epoch is affirmed as 3000 and the number of hidden cells as 128. Besides, the time step is affirmed as 12, the batch size as 64, and the learning rate as 0.001.

4.3. Evaluation Metrics

In this work, we choose root mean square error (RMSE), mean absolute error (MAE), and linear regression coefficient of determination (R2) as evaluation metrics. In real experiments, each evaluation metric represents a different meaning. RMSE and MAE represent the error between the predicted value and the actual value. R2 reflects the fitting effect of the model, which measures the ability of prediction results to represent actual data by calculating correlation coefficients. As we all know, the smaller the error, the closer the predicted value is to the real value, and the higher the fitting degree, the better the prediction effect of the model. The specific calculation formulae are as follows:where represents the real OD information, represents the predicted OD information, represents the mean of the predicted OD information, and is the number of nodes.

4.4. Results and Discussion

In this section, we select the following five baseline methods for comparison with the model in this paper, including three traditional time series models as well as two generalized deep learning models, namely (1) Historical Averaging model (HA), (2) Autoregressive Integrated Moving Average model (ARIMA), (3) Support Vector Regression model (SVR), (4) Graph Convolutional Network model (GCN), and (5) Gated Recurrent Unit model (GRU). To verify the effectiveness of the model in our work for the key OD flow prediction problem, we compare the evaluation metrics with some comparative experiments, as shown in Table 2.

We explore the performance of ST-VGCN for key OD pair flow prediction at 10-minute time granularity. We can find that ST-VGCN performs much better than HA model on all performance metrics, and its RMSE is reduced by approximately 26.15% and 0.6% compared to ARIMA and SVR models, respectively. GCN and GRU are both deep learning methods, but GCN only focuses on spatial relations while GRU only focuses on temporal relations. ST-VGCN model takes both into account and its RMSE is reduced by about 38.62% and 30.74% compared to GCN and GRU models, respectively. Other indicators also perform significant improvements. Through the above analysis and comparison, we verify the improvement of the proposed ST-VGCN model in three indicators. And we believe that the model in our work is effective. The comparison results validate the superiority of the proposed ST-VGCN model.For the 10-minute granularity of key OD pairs flow, the prediction result is visualized. As shown in Figure 5, three different passenger travel patterns can be seen. Figures 4(b) and 5(c) have explicit spike moments, and the spike moment in 5(c) is later than that in 5(a). After analysis, it is concluded that the nature of the OD pair of stations represented in Figure 5(a) is that station O is a residential area and station D is an office area. Figure 5(c) represents that station O is an office area and station D is a residential area, which is in line with the residents' commuting pattern. Figure 5(b) maintains a relatively flat trend, where station D is an airport station, so it maintains a certain level of passenger travel demand throughout the day.

As can be seen, our model can well model different passenger flow demands. Accurate predictions can provide effective reference information for passenger travel. At the same time, the modeling and analysis of the key OD pairs flow can assist the emergency response [33] when an unexpected event occurs, and reduce the continuous impact caused by the occurrence of emergencies.

5. Conclusions

Our work mainly defines and studies the key OD pair flow prediction problem and proposes a complete set of procedures for accomplishing the prediction. The key steps are to obtain key OD pairs, model the correlation between them, and use the ST-VGCN model to complete the prediction. The ST-VGCN model achieved the best prediction results on a real-world dataset when compared with five preexisting models. Through experiments and analysis, we have obtained the following conclusions:(1)OD flow has the problem that the data is too sparse and its dimension is too high, yet most OD pairs have a small contribution to the overall passenger flow. So it is necessary to extract key OD pairs for research by setting a threshold filter.(2)The OD pairs exhibit certain correlations with each other due to the same nature of the origin-destination station and time variation pattern. It makes sense to consider the spatiotemporal characteristics of OD passenger flow when predicting.(3)The proposed ST-VGCN model can combine temporal and spatial information to improve the ability of prediction. By establishing a virtual graph, GCN is used to capture spatial properties. GRU is used to capture time series information.

Overall, research on key OD pair flow prediction can provide important insights for metro operation and management. In the future, the impact of multiple sources of data needs to be considered which can further improve the prediction ability of the model. It is also an open question worthy to be explored how to determine the threshold value for filtering key OD pairs more reasonably. Also, the problem of not being able to obtain real-time OD information due to trip duration is not explored in this study, and we will consider the problem of how to obtain real-time information in the next work.

Data Availability

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data are not available.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded by the Beijing Municipal Natural Science Foundation (Grant No. L201015); the National Key R&D Program of China (Grant No. 2020YFC0833104); and the Green, Intelligent and Safe Mining of Coal Resources (Grant No. 52121003).