#### Abstract

Bus passenger flow prediction is a critical component of advanced transportation information system for public traffic management, control, and dispatch. With the development of artificial intelligence, many previous studies attempted to apply machine learning models to extract comprehensive correlations from transit networks to improve passenger flow prediction accuracy, given that the variety and volume of traffic data have been easily obtained. The passenger flow on a station is highly affected by various factors such as the previous time step, peak hours or nonpeak hours, and extracting the key features from the data is essential for a passenger flow prediction model. Although the neural networks, -nearest neighbor, and some deep learning models have been adopted to mine the temporal correlations of the passenger flow data, the lack of interpretability of the influenced variables is still a big problem. Classical tree-based models can mine the correlations between variables and rank the importance of each variable. In this study, we presented a method to extract passenger flow of different routes on the station and implemented a XGBoost model to find the contributions of variables to the prediction of passenger flow. Comparing to benchmark models, the proposed model can reach state-of-the-art prediction accuracy and computational efficiency on the real-world dataset. Moreover, the XGBoost model can interpret the predicted results. It can be seen that period is the most important variable for the passenger flow prediction, and so the management of buses during peak hours should be improved.

#### 1. Introduction

Passenger flow prediction is important for advanced transportation information system (ATIS) and the planning for multimodal traffic management. A comprehensive classification of historical changing patterns of passenger flow for public bus stations is not only capable of finding the hot spots of public transportation of multimodal traffic network but also able to improve the accuracy of future passenger flow prediction. As a component of ATIS, accurate passenger flow prediction is the prerequisite for dynamic vehicle scheduling and public transport planning. The increasing need for short-term passenger flow prediction embedded in ATIS has led to a large amount of research on passenger flow prediction in the past ten decades. Before the breakthrough of artificial intelligence, passenger flow prediction models have been gradually changing from traditional statistical models to machine learning models. With the exponential growing of computational capability and the volume of traffic data, a large number of deep learning models, such as conventional neural network, recurrent neural network, and their extensions, have been adopted for short-term passenger flow prediction during recent years.

However, there are still some challenges in the prediction model. The first question is how to extract accurate passenger flow from the massive bus card data and vehicle operation data. The accurate data is the basic of the prediction model. Moreover, how much each variable contributes to prediction is also a question. There might be several routes at one station, and the buses of the same route could arrive many times during one predicted interval. The competition and complementation are critical factors contributing to the number of passenger flow. Considering the above two variables into current prediction models is necessary.

To address the above issues, we propose a new model for bus passenger flow prediction and to rank the influence factors. In this paper, we firstly presented the method to extract the passenger flow from the bus card data and the vehicle operation data. Then, XGBoost can be implemented to mine the temporal correlations in the time series data and find contributions of each variable. In order to learn the correlations between different routes of the same stations and fulfill passenger flow prediction, we take the number of buses arriving on the same routes and different routes into consideration in the prediction model. As tested on a real-world dataset collected from Guangzhou, the introduced preprocess method is effective and XGBoost achieves better predicted accuracy with strong interpretability comparing to existing benchmark models. The contributions of this paper are summarized as follows: (1)An integral process to extract passenger flow from the raw data is presented. Moreover, a real-world dataset is used to test the proposed model(2)The interaction between different routes could affect the passenger flow at a specific station. In this study, we consider the number of buses coming during the prediction interval to improve predicted accuracy and measure the influence(3)Besides improving the accuracy, this study mines the factors from massive bus card data and vehicle operation data affecting passenger flow. The results can contribute to bus station and route planning

The rest of this paper is organized as follows. Section 2 discusses the existing literature. Section 3 defines XGBoost-related concepts and describes the methodology in detail. The data preprocessing procedures and evaluation criteria are presented in Section 4. The experimental results are shown in Section 5. Finally, we conclude the paper in Section 6.

#### 2. Literature Review

During recent years, a large number of passenger flow prediction models for public transportation have been built. Generally, the prediction model can be divided into two categories: statistic-based models and machine learning-based models. The popular statistic-based models include autoregressive, moving average, and autoregressive moving average model. For example, it is found that Autoregressive Integrated Moving Average Model (ARIMA) has a high accuracy in predicting rail transit passenger flow [1]. Milenković et al. found a strong seasonal autocorrelation in time series and proposed a Seasonal Autoregressive Integrated Moving Average (SARIMA) method to predict railway passenger flow [2]. Tang et al. proposed a short-term passenger flow prediction framework and evaluated its performance with ARIMA, multiple linear regression, and support vector regression [3]. Zhang et al. built a two-step real-time prediction model, which first made rough prediction of bus passenger flow based on historical data, and then calibrated the rough prediction based on the extended Kalman filter (EKF) [4]. Gong et al. put forward a three-stage framework to predict the short-term passenger flow of bus stations. First, arrival passenger count (APC) is predicted based on SARIMA; then, departure passenger count (DPC) is predicted based on event algorithm, and finally, waiting passenger count (WPC) is predicted based on Kalman filter; the APC and DPC are used to update evolution functions in the third step [5]. To improve the reliability of subway passenger flow prediction, Li et al. proposed a hybrid model combining linear ARIMA model and nonlinear symbolic regression [6]; Ding et al. built their models based on joint ARIMA and generalized autoregressive conditional heteroscedasticity (GARCH) [7]. Wang et al. applied a hybrid model combining empirical pattern decomposition (EMD) and ARIMA to predict short-term traffic speeds on highways [8]. Some researchers combined ARIMA with other methods, such as bagging technique [9] and genetic programming [10], for improving prediction accuracy. However, with the increase of the data, the statistic-based models have some limitations. Firstly, the statistic-based models have some strong assumptions which might not be relevant to traffic data, so it is difficult to mine the useful features. Moreover, statistic-based models are not good at handling categorical variables in the massive data.

The development of machine learning models gives us some new opportunities to improve passenger flow prediction. For example, Liu et al. designed a new deep learning architecture called modular convolutional neural network based on the experiment of decision tree-based models, to solve the large-scale bus passenger flow prediction problem [11]. Liu and Chen proposed an unsupervised training model based on the combination of stacked autoencoder (SAE) and deep neural network (DNN) to forecast hourly bus passenger flow [12]. Many researchers have studied the subway passenger flow forecast. Li et al. introduced a new dynamic radial basis function neural (RBF) network with dynamic input to predict the outbound passenger flow in subway stations [13]. Zhang et al. combined with Residual Network (ResNet), Graphic Convolutional Network (GCN), and Long and Short-Term Memory (LSTM) put forward a deep learning architecture [14]. Zhao et al. proposed a new three-stage framework based on a hierarchical clustering algorithm (AHC) and tree-based models to select the appropriate feature variables [15]. They proposed a hybrid spatial and temporal deep learning neural network (HSTDL-NET) [16]. Liu et al. developed an end-to-end deep learning architecture based on recurrent neural network (RNN) and LSTM [17]. Hao *et al*. proposed an end-to-end deep learning framework based on LSTM network to predict the number of passengers getting off at each station in the last few short-term periods [18]. Li et al. proposed a multistation passenger flow prediction method for subway stations, using a dynamic weight combination of Support Vector Machines (SVM) and RBF to improve the stability of the prediction model [19]. Zhang et al. proposed a channel-wise attentive split–convolutional neural network (CAS-CNN) to predict short-term OD flow. This is the first time that the split convolutional neural network is applied to short-term OD prediction in urban rail transit [20]. However, these models have been criticized for their poor interpretability and the need for a great deal of computing resource.

The ensemble tree method was developed to solve the above deficiencies and has been widely applied during recent years. For example, Xu et al. [21] and Liu et al. [22] built a Gradient Boosting Decision Tree (GBDT) model to predict the bus passenger flow, so as to make the prediction more accurate. Zhang *et al*. applied the LightGBM model to predict the subway passenger flow, taking into account the influence of the transfer passenger flow on the prediction [23]. Liu et al. established a passenger flow forecasting model by using Random Forest [24]. In order to further improve the accuracy and efficiency of passenger flow forecasting, researchers combine other methods with ensemble tree model. A new model combining singular spectrum analysis with AdaBoost-weighted extreme learning machine is proposed [25]. Lin and Tian introduced a hybrid model combining Random Forest and LSTM to predict subway passenger flow [26]. Dong *et al*. proposed a traffic flow prediction model combining wavelet decomposition and reconstruction with the Extreme Gradient Boosting (XGBoost) algorithm [27]. Du et al. used the combined model of XGBoost and LSTM in the short-term traffic prediction of the base station [28]. Yun et al. built a local optimal fusion model based on LSTM, LightGBM, and dynamic regression device [29]. Wang *et al.* took Multivariable Linear Regression (MLR), -nearest neighbor (KNN), XGBoost, and Gated Recurrent Unit (GRU) as four seed models to establish a regression integration model to accurately predict short-term passenger flows of urban public transport [30]. The aim of this paper is to develop a prediction using the massive bus card and bus operation data and find the importance of variables for the prediction accuracy, and therefore, XGBoost is implemented.

#### 3. Methodology

In 2015, Chen et. al proposed an improved integrated learning algorithm called Extreme Gradient Boosting tree (XGBoost) [31]. Compared with other gradient boosting algorithms, XGBoost has a significant improvement in accuracy and speed for regression and classification problems, because the model can integrate multiple regression trees to make decisions under the framework of boosting. In this section, the theory of XGBoost will be introduced.

##### 3.1. Regression Tree

Regression tree is a binary tree to solve regression problems. Assuming the built regression tree has leaf nodes, so the regression tree divides the input space into units as. The formulation of regression tree model can be expressed as where is the sample; is the prediction value; is the identify function which will return 1 if is in the subset ; and is the output value on the divided unit. To divide the input space best, the square error can be used to represent the prediction error of the training data which can be expressed as where is the th predicted value; to minimize the overall error of regression tree, it is only necessary to set the predicted value in each unit area to the mean value of the output of the sample set contained in the area: where is the function to obtain the average value of . The details of the optimal procedure are described as below. Firstly, in the input space of the training data set, the th feature and the corresponding value is selected for each division. After dividing into two subregions, the samples with feature values less than are divided into the left subtree, and the samples with feature values greater than are divided into the right subtree:

Then, all input variables are traversed to find the optimal segmentation feature and the optimal segmentation point , constituting one pair , by minimizing the loss function of the two subregions:

Finally, the output value of each subregion can be determined by

The above process for the division of the input space is repeated until the division cannot be continued.

##### 3.2. The Integration Process of the XGBoost Model

Boosting is an additive model and one of the methods for ensemble learning which complete learning tasks by constructing and combining multiple learners. There is a strong dependency between these individual learners, which must be generated serially, and each weak learner must be upgraded to a strong learner to reduce the bias of the model. XGBoost is one of the tree-based ensemble learning which grows a tree by continuously adding regression tree and splitting features. Each time, a tree is added, and the model learns a new function to fit the residual of the previous tree prediction; that is, the th regression tree is obtained by training based on the model of the previous iterations. After the training is completed and trees are obtained, the predicted value of each sample is obtained by falling into the corresponding leaf node of each tree according to the characteristics of the sample. The model obtained after the th iteration is where is the predicted value of the th sample in the th round of the model, is the score of the th sample in the model retained for the previous round, and is the score of the th sample newly added to the regression tree. The detail of the model for passenger prediction is introduced in the following part.

Assuming that the XGBoost model integrates trees, the predicted value of each sample is where is the set of all regression trees and is one of the regression trees in . The integration process of the XGBoost model is illustrated in Figure 1.

In the passenger flow prediction problem, the goal is to find a suitable model and continuously optimize the parameters to minimize the difference between prediction and observation. Therefore, this paper defines the objective function of the XGBoost model and minimizes the objective function to find the best parameters.

The objective function of the XGBoost model in this paper is expressed as

The objective function has two parts. The first part is the loss function, which measures the fitting effect of the model on the training set. The smaller the value of the loss function, the better the fitting effect. The latter part is the regular term, which measures the complexity of the model. Optimizing the regular term can avoid the weak generalization ability. The regular term is expressed as where is the number of leaf nodes of the tree, is the score of the th leaf node, represents the complexity of the number of leaf nodes of the tree, represents the regular term of , and is the coefficient that controls the number of leaf nodes.

The objective function of the XGBoost model obtained after the th iteration is expressed as where is the regularization penalty term of the previous iterations.

After performing the second-order Taylor expansion on the error term of the objective function, the algorithm can be updated as where is the first derivative of the error function and is the second partial derivative of the error function. The expressions are shown as follows:

Since the error term and regular term of the previous iterations are constant terms, they have no effect on the optimization of the objective function of the th iteration, so they can be omitted. The simplified objective function of the th iteration is

In regression trees, the score of the sample will eventually fall into one leaf node, so it can be a collection of all samples of the same node. The mapping function of regression tree can be transformed into where represents the sum of all sample scores of the th leaf node.

So far, the objective function can be expressed as follows:

Some definitions of the above formula are

Through the above formula, the problem of minimizing the objective function is transformed into a problem of finding the extreme value of a quadratic equation of one variable about the fraction of leaf nodes. Therefore, the best score of the leaf nodes is

Then, the objective function corresponding to this best score is

Equation (19) is the score of the tree structure in which is a function for scoring the tree structure. The lower the score, the better the tree structure.

When the XGBoost model traverses all the feature points and splits the tree nodes, it uses the above objective function as the evaluation function. If the total value of the objective function of the left and right subtrees after the split is increased compared with the original and the increase value is greater than a certain threshold, the split point that can obtain the maximum value of the objective function would be found continuously. If the point does not exist, no split is performed. This maximum value of the objective function is the gain. In iterations, when splitting a leaf node, the gain before and after splitting is defined as where the first part in brackets is the score of the left subtree after the split, the second part in brackets is the score of the right subtree after the split, and the third part in brackets is the score before the split. represents the cost value of the complexity by adding a leaf node. The larger the value of , the lower the value of the objective function after splitting, and the better the tree structure.

#### 4. Data and Evaluation Criteria

##### 4.1. The Format of Raw Data

In this study, the bus card data and vehicle operation data of 30 bus stations in Guangzhou from April 24 to May 20, 2018, were selected to evaluate the performance of the model. The formats of bus card data and vehicle operation data are listed in Tables 1 and 2, respectively. From the raw data, the passenger flow cannot be obtained directly. Therefore, we need to match the bus card data with the vehicle operation data. In this paper, bus card data and vehicle operation data are matched using vehicle ID number. The format of the matched data is shown in Table 3.

##### 4.2. The Preprocess of Data

From the raw dataset, we cannot get the number of passengers at a specific station during a prediction interval (30 min in this study), because the bus card data cannot reflect the actual arrival time of passengers. To obtain the real passenger flow data as much as possible, the arrival of passengers was assumed to follow a uniform distribution pattern according to previous studies. On the basis of this assumption, the number of passengers boarding the vehicle at a station is evenly distributed from the departure of the previous bus to the arrival of the current bus. The specific process of passenger flow calculation is as follows:

*Input*: the specific route ID, the site ID, and the matched data in Table 3

*Output*: the short-term passenger flow of the site

*Process*: according to the data of the selected station in the selected bus route, find out the data corresponding to different bus work shifts, get the interval time of different bus work shifts and the number of passengers, distribute the number of passengers evenly to the interval time, and then, get short-term passenger flow data after accumulation. For the convenience of analysis, the prediction interval of short-term passenger flow is set to 30 minutes. The process of passenger flow calculation is shown in Figure 2. According to the above calculation process, a total of 756 short-term passenger flow data for each route at a single station from 7 am to 9 pm every 30 minutes are calculated.

##### 4.3. Evaluation Criteria

Mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE) have been commonly used as evaluation criteria in previous studies that can measure the quality of prediction models. The calculated time (TIME) is an evaluation index to measure the computational efficiency of prediction models.

The formulas of MAPE, RMSE, MAE, and TIME are as follows: where is the number of samples, is the true value, is the predicted value, and and represent the time when the model calculation starts and ends, respectively. The range of MAPE is [0, +∞). When the value of MAPE is 0%, it means there is no error. Root mean square error (RMSE) reflects the magnitude of the deviation between the predicted value and the true value. The unit of TIME is second.

#### 5. Results

##### 5.1. Optimization of the Prediction Model

###### 5.1.1. Input Variable Selection

*Period*. The distributions of the short-term passenger flow of buses are shown in Figure 3. It can be seen that the passenger flow of most routes is higher in the peak hours than nonpeak hours. The passenger flow of bus changes dynamically for one day; however, it shows a similar change between workday and weekend. The change of passenger flow in the weekend is different from that of workdays. It has also been proved by previous studies. Therefore, weekend or workday is also taken into consideration in the prediction model.

*Passenger flow of previous period*. The closest thing to the current state of passenger flow is the passenger flow of the adjacent previous period, and the short-term passenger flow of bus is related to the number of passenger flow of the adjacent previous period.

*The number of buses arriving on the predicted routes during the prediction interval*. According to the calculation of passenger flow, it can be seen that there are fluctuations in the short-term passenger flow of conventional buses. In a prediction interval, the number of buses arriving of the same routes is higher than in the past and also is more than different considered routes, so the probability of passengers taking this route is higher than in the past; that is, the passenger flow on this route increases; otherwise, it decreases.

*The number of buses arriving on the other routes during the prediction interval*. In this article, some routes that have successively repeated sites on a route are called different routes into consideration. In real life, passengers usually consider the different routes that arrive at the station first when taking a bus. The more buses on the different routes into consideration arrive, the more passenger flow will be divided, and the smaller the passenger flow of the select route.

In the end, period, workday or weekend, passenger flow of previous period, the number of buses arriving on the predicted routes during the prediction interval, and the number of buses arriving on the other routes during the prediction interval are selected as the feature input of the conventional short-term passenger flow prediction model based on the XGBoost model.

###### 5.1.2. Model Parameter Settings

The parameters of the XGBoost model include general parameters, lifting parameters, and learning task parameters. The general parameters control the overall function. The learning task parameters guide the model to perform optimization. The promotion parameters control the regression tree at each step. The general parameters and learning task parameters of the XGBoost model include *booster*, *objective*, and *eval_metric* as shown in Table 4. The goal of predicting passenger flow is a regression problem, and so the base learner of the XGBoost model used in this article is a regression tree. RMSE is used to evaluate the accuracy of the model.

The lifting parameters are selected using a grid search method. Take the process of adjusting the parameter *n_estimators* as an example. The candidate values of *n_estimators* are 200, 300, 400, 500, and 600. Firstly, we change the value of *n_estimators*, while the other parameters remain unchanged, and we use cross-validation and RMSE to measure the performance of models. Then, the best parameter value is selected. According to RMSE, the optimal *n_estimators* is 300. According to the above steps, the following parameters of a forecast model of this route are selected in turn: *min_child_weight*, *max_depth*, *gamma*, *subsample*, *subsample_bytree*, *reg_alpha*, *reg_lambda*, *learning_rate*. The optimal values are shown in Table 5.

After adjusting all the parameters, the construction of the short-term passenger flow prediction model for conventional public transportation based on the XGBoost model has been completed.

##### 5.2. Comparison with Traditional Model

Currently, there are many prediction models for short-term passenger flow of conventional public transportation. Among them, the models that can perform multivariable short-term passenger flow prediction mainly include KNN, BP neural network, and LSTM. To show the better prediction effect of the proposed model, KNN, the BP neural network, and LSTM are established as benchmark models. Meanwhile, an XGBoost predicted model without the number of buses arriving is also implemented.

###### 5.2.1. KNN Regression Model

Use *sklearn.neighbors.KNeighborsRegressor* in Python to build a short-term passenger flow prediction model for conventional public transportation based on KNN regression. In the KNN regression model, the key parameters are *n_neighbor* and *weight*. Take No. 125 bus at the station of Zhongshan 8th Road Station Substation 1 as an example, and finally, get the value of *n_neighbor* as 5 and *weight* as *uniform* through training.

###### 5.2.2. BP Neural Network Model

The *keras* deep learning framework is used to build a short-term passenger flow prediction model for conventional public transportation based on BP neural network. Take the 125 bus at the substation of Zhongshan 8th Road Station as an example. After repeated experiments, the number of input layer nodes of the BP neural network model is selected as 70, the number of hidden layer nodes is 30, and the number of output layer nodes is 1; that is, the model structure is 70-30-1. Among them, the activation function of the hidden layer and the output layer is selected as the *relu* function. In addition, the number of iterations of the BP neural network is selected as 200, and the learning rate is 0.01.

###### 5.2.3. LSTM Model

The *keras* deep learning framework is used to build a short-term passenger flow prediction model for conventional public transportation based on LSTM. Take the 125 bus at the substation of Zhongshan 8th Road Station as an example. After repeated trials, the number of input layer nodes of the selected LSTM model is 20, the number of hidden layer nodes is 15, the number of output layer nodes is 1, the step size is 4, the activation function of the hidden layer and the output layer is the *relu* function, and the optimizer is *adam*. In addition, the number of iterations of LSTM is 50 times.

###### 5.2.4. XGBoost Model without the Number of Buses Arriving

Taking the 125 bus at the substation of Zhongshan 8th Road Station as an example, the best values of its parameters are shown in Table 6.

The MAPE, RMSE, and MAE of the above prediction models for each bus route are shown in Figure 4. It can be clearly seen that, regardless of indicators MAE, MAPE, or RMSE, the proposed model has the lowest value among most stations. For a quantitative analysis of the prediction results, Table 6 shows the comparison of the average values of the evaluation indicators of five prediction models.

**(a)**

**(b)**

**(c)**

From Table 7, the proposed model has higher accuracy than other models, and the number of buses arriving as a feature input can improve the prediction effect of the XGBoost model. Among them, the accuracy of the proposed model is better than that of the KNN regression model and the BP neural network model, and it is significantly better than the XGBoost model without the number of buses arriving.

The calculation time of the XGBoost model is slightly shorter than that of the KNN regression model, and it is much lower than the calculation time of the BP neural network model and the LSTM model. It is concluded that the XGBoost model is faster than the KNN regression model, BP neural network model, and LSTM model.

##### 5.3. Robustness of the Model

To evaluate the robustness, we compare the performance of the proposed model.

###### 5.3.1. Analysis by Time Period

Firstly, the accuracy of the proposed model is compared under different time periods: peak hours and nonpeak hours. The results are shown in Table 8. It can be seen that the MAPE of the proposed model during peak hours is significantly better than that during low peak periods. This may be caused by its relatively stable passenger flow distribution during peak hours which has less volatility and less influence by various factors. But at the same time, the absolute error (RMSE and MAE) in the low-peak period is relatively low. This is because the passenger flow in the low-peak period is much smaller than that in the peak period. It can be seen that the proposed model has a good prediction effect during peak hours.

###### 5.3.2. Analysis by the Type of Passenger Flow Distribution

Secondly, the accuracy of the proposed model is compared under different distributions. The distributions of passenger flow are divided as unimodal type, bimodal type, and other type. The comparison results are shown in Table 9.

It can be seen that in the proposed model for the three types of passenger flow distribution, the prediction accuracy (MAPE) of the other types is better than that of the single-peak and double-peak types. This may be due to the fact that other types include multipeak passenger flow distribution types, which have more peak hours, and the passenger flow distribution during peak hours with less volatility is relatively stable and has been less affected by various factors. In general, other types of prediction accuracy are higher. The absolute error (RMSE and MAE) of other types is lower, which may be due to the fact that the average short-term passenger flow of other types is higher than that of the single-peak type and the double-peak type.

##### 5.4. Influence Degree of Variables

For different stations, different XGBoost prediction models are built. We have analyzed the weight of each variable contributing to the passenger flow prediction. It can be found that the contribution of variables of different routes is similar, and so we use the average of the routes to take further analysis. The result is shown in Figure 5. It can be seen that period contributes the most among all variables and following is the date type indicating the temporal correlation is strong in passenger flow data. Interestingly, the contribution of buses from other routes is greater than that of buses from the same route, indicating the competition between different routes is fierce. Also, the importance of other routes for the prediction is different. To mine the relationship between different routes should be worthy of study and concerns in the future work.

#### 6. Conclusion

In this paper, we propose an ensemble tree method XGBoost to predict passenger flow of bus routes. Since the passenger flow of a station is highly influenced by the competition and complementation of other routes and buses of the same route, we take the number of routes and the number of buses during the predicted interval into the model to improve the accuracy. Comparing to the model, which does not consider the above two factors, the MAPE, RMSE, and MAE can be improved by 30.21%, 14.71%, and 15.58%, respectively. Moreover, the proposed model was compared to some benchmark models using the same data from Guangzhou; the results show it can achieve superior prediction performance. Surprisingly, XGBoost, consuming less computing resource than deep learning model LSTM, can achieve higher accuracy.

Also, the proposed model has stronger reliability and interpretability than other benchmark models. We have evaluated the model under different passenger flow types and periods, and the proposed model can yield stable results. Moreover, the weights of variables contributing to passenger flow prediction are analyzed. It can be concluded that temporal information is critical to the prediction model. The competition of the different routes considered in the prediction model could improve the accuracy.

In the future, we will take more variables, such as spatial variables and daily variables, into consideration in the model to further improve the accuracy. Also, we will work on extracting the passenger flow during shorter interval to build short-term prediction model.

#### Data Availability

The data used to support the findings of this study have not been made available because of privacy.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This paper is supported by Shenzhen Science and Technology Innovation Committee (Grant No. JCYJ20170818142947240), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (Grant No. 2019KQNCX126), and Science and Technology Planning Project of Guangdong Province under (Grant No. 2018B020207005).