Operations Research for Transportation and Sustainable DevelopmentView this Special Issue
ST-LSTM: A Deep Learning Approach Combined Spatio-Temporal Features for Short-Term Forecast in Rail Transit
The short-term forecast of rail transit is one of the most essential issues in urban intelligent transportation system (ITS). Accurate forecast result can provide support for the forewarning of flow outburst and enables passengers to make an appropriate travel plan. Therefore, it is significant to develop a more accurate forecast model. Long short-term memory (LSTM) network has been proved to be effective on data with temporal features. However, it cannot process the correlation between time and space in rail transit. As a result, a novel forecast model combining spatio-temporal features based on LSTM network (ST-LSTM) is proposed. Different from other forecast methods, ST-LSTM network uses a new method to extract spatio-temporal features from the data and combines them together as the input. Compared with other conventional models, ST-LSTM network can achieve a better performance in experiments.
With the development of urban scale, the short-term traffic forecast has become a core issue of ITS. Accurate short-term traffic forecast can provide technical support for the surveillance and the forewarning of passenger flow. Therefore, over the past few decades, many data analysis models have been proposed to promote the forecast accuracy. Among these models, LSTM network is widely recognized as the most suitable model to deal with traffic forecast. LSTM unit has three gates, namely, input gate, forget gate, and output gate, which can adjust the state of unit dynamically, so LSTM network is able to capture the features on longer time span. Therefore, LSTM network can provide a higher accuracy in traffic forecast because traffic data is usually collected according to time series.
In recent years, researchers pay more attention to the spatial features of traffic flow. It is widely acknowledged that traffic forecast is a problem with spatio-temporal complexity, i.e., the problem of spatial transportation in temporal dimension. In , Zheng Zhao et al. establish a network by connecting several LSTM units, which aimed to imitate the structure of urban traffic. However, it failed to imitate the structure of large urban scale. Xiaobo Chen et al. proposed a new method to process spatial features by using sparse hybrid genetic algorithm . Liu Qingchao et al. proposed a model based on manifold similarity to capture the spatial regularity from freeway data . These two approaches are sensitive to the spatial features, but compared with LSTM network, they cannot process temporal features well.
In this paper, the object of study is the short-term forecast of rail transit. In the research, we find that differing from other transportation, rail transit has stations with fixed position, vehicles with uniform speed, and regular schedule. Because of these characteristics, the spatial correlation between stations can be transformed into the time cost. Based on this analysis, this paper proposes a new method to capture spatio-temporal features from rail transit data and input the features into a new model named spatio-temporal long short-term network (ST-LSTM), which is based on LSTM network. Compared with most existing methods, the proposed model has a better performance on accuracy and meets the real-time requirement.
2. Related Work
There are many methods that have been proposed to improve traffic forecast, including historical average and smoothing [4, 5], dynamic linear methods [6, 7], traffic theory-based methods [8, 9], and machine learning methods [10, 11]. These forecast approaches can be divided into two categories, namely, parametric approaches and nonparametric approaches. Autoregressive integrated moving average (ARIMA) model is widely recognized as a classic method in parametric approaches. As early as the 1970s, Levin and Tsao found that ARIMA model was the most statistically significant in traffic forecast . Parametric approaches have favorable properties and capture regular variations very well. However, traffic data usually shows irregular variations. To solve this problem, researchers also paid attention to non-parametric approaches, such as nonparametric regression models , support vector machine (SVM) [14, 15], and recurrent neural network [16, 17]. Afterwards, recurrent neural network  was proposed to process temporal features, such as evolutionary neural network (ENN) , dynamic neural network (DNN) , and nonlinear autoregressive models with exogenous inputs (NARX) . Among them, RNN is widely recognized as a suitable method to capture the temporal features of passenger flow. However, previous studies proved that RNNs failed to capture the long-term features because of vanishing gradient and exploding gradient. To solve these problems, long short-term memory neural network (LSTM NN)  was applied in the traffic forecast. In recent years, some approaches have been proposed to deal with the spatio-temporal complexity of traffic data, which are mentioned in Section 1.
Different from these methods, this paper proposes a new method to capture spatio-temporal features and a new network based on LSTM to forecast the exit passenger flow of rail transit. The remainder of this paper is as follows. Section 3 introduces the architecture of ST-LSTM network. Experiments based on the data of Chongqing rail transit are shown in Section 4. Section 5 is composed of the analysis of experiment result, and future work is at the end of this paper.
Short-term forecast for rail transit is a problem with spatio-temporal complexity. Suppose the exit passenger flow of station is needed to be predicted. The temporal features are the correlation between historical data and current data, i.e., the previous exit passenger flow of station . These features can be extracted directly because the rail transit data is collected according to the temporal dimension. The spatial features are the transportation of passenger flow on geographic position; i.e., the summation of estimated passenger flows from the other stations. For every two stations, the spatial features include the volume and the cost of transportation between them. The volume of transportation is reflected by the passenger flow between two stations. In the proposed model, spatial correlation matrix (SCM) is integrated to calculate the volume of transportation. The cost of transportation is reflected by several factors, such as time cost, economic cost, and distance. Among them, time cost is the most suitable factor to reflect the spatial correlation, which is mentioned in Section 1. Therefore, time cost matrix (TCM) is introduced to calculate the cost of transportation. The proposed model is based on the technologies, including passenger information system (PIS), features extraction method, and ST-LSTM network. The detail of the technologies will be explained in this section.
3.1. Passenger Information System
Sufficient data is the basis of accurate forecast, and PIS can provide us with comprehensive data. PIS is a huge and complex network. Various rail transit data can be collected in real time through the gate system, ticketing system, and vehicle scheduling system. With the development of data acquisition technology, PIS is able to provide sufficient support for short-term forecast. Based on the card records, the entrance and exit passenger flow of stations are calculated with a frequency of 10 min, which can be denoted bywhere is a card record. , , , , , and are the attributes of , which represent card identification, origin time, origin station, destination time, destination station, and date, respectively. is the database of card records. is the entrance passenger flow of station in time , and is the exit passenger flow of station in time .
3.2. Feature Extraction Method
The extraction of spatio-temporal features is one of the core problems of the proposed model. The proposed model extracts temporal features and spatial features, respectively, and then put, them together into the ST-LSTM network. The temporal features can be extracted directly, because rail transit data is recorded according to the temporal dimension. To extract the spatial features, TCM matrix and SCM matrix are integrated into the method.
3.2.1. Time Cost Matrix
Time cost is the most suitable factor to reflect the spatial correlation between stations, so the time cost between all stations constitutes the TCM matrix. Due to the changes of schedule and passenger flow, TCM matrix is dynamic with time going on. Suppose there are stations in the rail transit system; then the size of TCM matrix is , which can be denoted bywhere is the TCM matrix in time . is the average of time cost between and in historical time series, where is origin station and is destination station. is the time cost between two stations in record . is a card record in database , which has been defined in Eq.(1). is the historical time series of time . is a week and is the number of weeks.
In the analysis, it is found that passenger flow varies by people’s routine cycle, i.e., from Monday to Sunday. For example, the passenger flow in Thursday is similar to the one in last Thursday not yesterday. Therefore, to promote the extraction, in time represents the average time cost in historical time series of time . In Eq. (6), is the historical time series, which consists of time and the same period in several weeks ago of it. This method is also used in the calculation of spatial correlation matrix.
3.2.2. Spatial Correlation Matrix
To forecast the exit passenger flow at station in time , passengers from station in time have to be considered. The entrance passenger flow of station in time () is available. However, time has not happened, so the proportion of passengers in , which set off to station , is unavailable. To solve this contradiction, spatial factor is introduced in this paper. Spatial factor is the historical average probability of passengers between station and in entrance passenger flow. When forecasting the exit passenger flow at station in time , the spatial influence from station can be calculated by multiplying the spatial factor and entrance passenger flow . The spatial factors between all stations constitute the spatial correlation matrix. There is an SCM matrix in each time, because the factors vary according to the time. Suppose there are stations in the rail transit system; then the size of SCM matrix is , which is denoted bywhere is the SCM matrix of time . is the spatial factor, where is origin station and is destination station. is the time cost from to . is the number of passengers from to , whose origin time is . is the entrance passenger flow of station in time . , , and have been defined in Eq. (6).
3.2.3. Extraction of Spatio-Temporal Features
The structure of extraction method is shown in Figure 1. To forecast the exit passenger flow at station in time , the temporal features is the exit passenger flow at station in time . The spatial features are gathered by calculating the number of passengers, who will arrive at station in time and depart from other stations. The function of extraction can be set aswhere is the temporal features of station in time . is the exit passenger flow of station in time . is the spatial features of station in time . is the set of stations and is a station in it. is the spatial factor, which has been defined in Eq. (9). is the entrance passenger flow of station in time .
3.3. Structure of ST-LSTM Network
Based on LSTM network, a fully connected layer is added to combine temporal features and spatial features in ST-LSTM network. The model will acquire the best mode of combination through the training.
The structure of ST-LSTM network is shown in Figure 2. The input of the model is spatio-temporal features and , and the output is the forecast result . There are four layers in this model, namely, fully connected layer, input layer, hidden layer, and output layer. The fully connected layer combines the features at first and conveys the result to the input layer. The input of hidden layer is calculated through the input layer. The hidden layer has three gates, namely, input gate , forget gate , and output gate . Moreover, the state of the hidden layer is indicated by . The inputs of every gate are and the previous state . The blue points in Figure 2 are confluences, which stand for multiplications, and dashed lines are the transmitting of the previous state. Based on the information flow, the structure of ST-LSTM network can be summarized aswhere , , , and are the output of different layers. , , , and are the intermediate variables of the hidden layer. is the state of the hidden layer. , , , , , , , , , , , , and are weight matrices. , , and are bias vectors and is sigmoid function.
The cost function is activated after the forecast through the training. The proposed model is improved by reducing the output of cost function, which can be set aswhere is the forecast of station in time and is the actual output.
3.4. Training Algorithm
The training algorithm contains two aspects. One is the extraction of spatio-temporal features, and the other is the training of ST-LSTM network. The key point of training is minimizing the output of cost function by adjusting the weight matrices and bias vectors. The training procedure can be stated as follows.
Step 1. Obtaining the Inputs and Labels. Capture the temporal features and the spatial features in each time , which are the input of model. Collect the exit passenger flow in each time as the labels.
Step 2. Initialization of the ST-LSTM Network. Initialize the weight matrices and bias vectors, including , , , , , , , , , , , ,,, , and .
Step 3. Fine-tuning the Whole Network. Fine-tune the whole network by adjusting the weight matrices and bias vectors in order to minimize the output of cost function. The process will be stopped until the output meets the qualification or the time of training reaches the limit.
Based on the data of Chongqing rail transit, four models are contained in the experiment, namely Seasonal ARIMA (SARIMA) , Support Vector Regression Model combined with Particle Swarm Optimization (PSO-SVR) , LSTM network , and the proposed ST-LSTM network. The target of forecast is exit passenger flow with a frequency of 10 min. The four models will be trained and tested on 100 stations. The details of each model are as follows.
SARIMA: The seasonal period ‘S’ is 100, due to the operating time being from 6.20 am to 23.00 pm (100×10 min). After the processing method in , ARIMA (2,1,0) × (0,1,1)100 is finally used.
PSO-SVR: The time period is 100 (100×10 min per day), and the limit of parameter combination of SVR is from , 0. to 0, 0.. The final parameter combination will be selected by PSO in the training.
LSTM network: The number of units is 10 and the time step is 100 (100×10 min per day).
ST-LSTM network: The number of units is 10 and the time step is 100 (100×10 min per day).
4.1. Data Description
The data of card records are provided by Chongqing City Transportation Development & Investment Group Co., Ltd. Compared with other targets, such as Origin-Destination (OD) volume, exit passenger flow is more accurate and has less missing data. Therefore, we calculate the exit passenger flow from 01 March 2017 to 31 March 2017 based on the dataset. There are more than 46 million card records. After processing, 600 thousand data are calculated.
We use several criteria to compare the performance of four models. Maximum error (ME) and mean absolute error (MAE) are used to measure the accuracy of models. Root mean square error (RMSE) is sensitive to the stability of models. Mean relative error (MRE) is the most suitable to compare the performance of four models. The definitions of criteria arewhere is the forecast data, while is the measured data.
4.3. Training and Testing
5-fold cross validation is used to evaluate the models. In 5-fold cross validation, the data is divided into 5 subsets. Each subset is a testing set, and the rest of data is the training set. The experiments are repeated 5 times for each station. After the experiments, the performance of four models was collected. The experiments are conducted under a desktop computer with Intel i7 3.20 GHZ CPU and 16 GB memory.
4.4. Experiment Result
The experiment results of different algorithms are shown in Table 1 and the operation time is averaged on all stations. Compared with SARIMA, PSO-SVR, and LSTM network, the proposed ST-LSTM network achieved a better performance. From the view of ME and MAE, ST-LSTM network is more accurate than the other models. Moreover, from the view of RMSE, ST-LSTM network has a better stability. Therefore, the proposed ST-LSTM network is more suitable for the short-term forecast of rail transit.
5. Analysis of Result
When the models have been tested on 100 stations of Chongqing rail transit, we find that ST-LSTM network achieves a higher accuracy than the other models. However, the performance of ST-LSTM network fluctuates on different stations, which are shown in Table 2. Therefore, we analyze the experimental results based on the field investigation. The stations in Table 2 are sorted in descending order by passenger volume. Due to the lack of space, Table 2 just exhibits the performance on stations of top-10 and bottom-10 passenger volume.
5.1. Base Volume
In our research, we find that base volume is one of the influence factors of the forecast. The performance of two stations is chosen to shown in Figure 3, which are station No.321 and station No. 334 of Chongqing rail transit. Both of them are located in the residential district of Chongqing. However, station No. 334 only attains 5% of the base volume of station No.321 monthly. In the test, the MRE on station No.321 is 13.52%, while the MRE on station No.334 is 26.58%. We use these two different performances as samples to show the influence of base volume. The research suggests that the stations with higher base volume usually have more prominent regional features. As a result, passenger flows of these stations have stronger regularity and are more insensitive to the emergent factors. Therefore, short-term forecast on stations with low base volume is one of the difficulties in rail transit forecast.
Except for the base volume, we discover that randomness is another influence factor of the forecast. As shown in Figure 4, the base volume of station No. 323 and station No.123 are both around 900 thousand monthly. However, the performance of forecast on two stations is quite different. In the test, the MRE on station No.323 is 16.73%, while the MRE on station No.123 is 38.37%. This phenomenon occurs on a few special stations, such as station No.123, which is located in the university town of Chongqing. Compared with the commuters, the undergraduates have more choice on the travel time. So the passenger flow of stations, which next to universities, has stronger randomness than others. Similarly, the passenger flow of stations, which next to railway stations or airports, is related to the flight schedule. Therefore, the randomness from the environment cannot be neglected on several stations. Short-term forecast on these stations is one of the difficulties in rail transit forecast.
6. Conclusion and Future Work
Short-term forecast for rail transit is an essential issue in ITS. We propose the ST-LSTM network, which combines the temporal features and spatial features. To extract spatial features, TCM matrix and SCM matrix are integrated into the method. Compared with other models, the proposed model is more suitable for rail transit forecast.
This study researches on prediction of exit passenger flow, but a model which also includes entrance passenger flow is more significant for the management. In addition, except the rail transit, ITS also contains bus system and taxi system. The correlation between different public transportation is worth consideration. In the future, we will try to forecast other targets of rail transit and then consider the relation among different transportation. Finally, a comprehensive system for rail transit will be built to output a more accurate result of short-term forecast.
The data used in the experiments are freely available at https://drive.google.com/open?id=1RuH080U_9PHdh9B9VoOjNurODWnHAYaf and more data of Chongqing rail transit are available by contacting the corresponding author.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
This work is supported by the Central Universities under Grant 106112017CDJXY090001 and Chongqing Major Science and Technology Projects of Artificial Intelligence under Grant CSTC2017RGZN-ZDYF0150.
Z. Zhao, W. Chen, X. Wu, P. C. Y. Chen, and J. Liu, “LSTM network: A deep learning approach for Short-term traffic forecast,” IET Intelligent Transport Systems, vol. 11, no. 2, pp. 68–75, 2017.View at: Publisher Site | Google Scholar
X. Chen, Z. Wei, X. Liu, Y. Cai, Z. Li, and F. Zhao, “Spatiotemporal variable and parameter selection using sparse hybrid genetic algorithm for traffic flow forecasting,” International Journal of Distributed Sensor Networks, vol. 13, no. 6, pp. 1–14, 2017.View at: Google Scholar
Q. Liu, Y. Cai, H. Jiang, X. Chen, and J. Lu, “Traffic state spatial-temporal characteristic analysis and short-term forecasting based on manifold similarity,” IEEE Access, vol. 6, pp. 9690–9702, 2017.View at: Google Scholar
K. Farokhi Sadabadi, M. Hamedi, and A. Haghani, “Evaluating moving average techniques in short-term travel time prediction using an AVI data set,” in Transportation Research Board 89th Annual Meeting, vol. 2010.View at: Google Scholar
B. L. Smith and M. J. Demetsky, “Traffic flow forecasting: comparison of modeling approaches,” Journal of Transportation Engineering, vol. 123, no. 4, pp. 261–266, 1997.View at: Publisher Site | Google Scholar
W. Min and L. Wynter, “Real-time road traffic prediction with spatio-temporal correlations,” Transportation Research Part C: Emerging Technologies, vol. 19, no. 4, pp. 606–616, 2011.View at: Publisher Site | Google Scholar
X. Fei, C. C. Lu, and K. Liu, “A bayesian dynamic linear model approach for real-time short-term freeway travel time prediction,” Transportation Research Part C: Emerging Technologies, vol. 19, no. 6, pp. 1306–1318, 2011.View at: Publisher Site | Google Scholar
K. Nagel and M. Schreckenberg, “A cellular automaton model for freeway traffic,” Journal de Physique I, vol. 2, no. 12, pp. 2221–2229, 1992.View at: Publisher Site | Google Scholar
L. Li, X. Chen, Z. Li, and L. Zhang, “Freeway travel-time estimation based on temporal-spatial queueing model,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3, pp. 1536–1541, 2013.View at: Publisher Site | Google Scholar
X. Zhang and J. A. Rice, “Short-term travel time prediction,” Transportation Research Part C: Emerging Technologies, vol. 11, no. 3-4, pp. 187–210, 2003.View at: Publisher Site | Google Scholar
Y. Wei and M. Chen, “Forecasting the short-term metro passenger flow with empirical mode decomposition and neural networks,” Transportation Research Part C: Emerging Technologies, vol. 21, no. 1, pp. 148–162, 2012.View at: Publisher Site | Google Scholar
M. Levin and Y. D. Tsao, “On forecasting freeway occupancies and volumes (abridgment),” Transportation Research Record, vol. 773, pp. 47–49, 1980.View at: Google Scholar
A. Rosenblad, “J. J. Faraway: Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models,” Computational Statistics, vol. 24, no. 2, pp. 369-370, 2009.View at: Google Scholar
Y. Zhang and Y. Liu, “Traffic forecasting using least squares support vector machines,” Transportmetrica, vol. 5, no. 3, pp. 193–213, 2009.View at: Publisher Site | Google Scholar
C.-H. Wu, J.-M. Ho, and D. T. Lee, “Travel-time prediction with support vector regression,” IEEE Transactions on Intelligent Transportation Systems, vol. 125, no. 6, pp. 515–523, 2004.View at: Google Scholar
I. Okutani and Y. J. Stephanedes, “Dynamic prediction of traffic volume through Kalman filtering theory,” Transportation Research Part B: Methodological, vol. 18, no. 1, pp. 1–11, 1984.View at: Google Scholar
H. Liu, H. van Zuylen, H. van Lint, and M. Salomons, “Predicting urban arterial travel time with state-space neural networks and Kalman filters,” Transportation Research Record, no. 1968, pp. 99–108, 2006.View at: Google Scholar
P. Lingras, S. Sharma, and M. Zhong, “Prediction of recreational travel using genetically designed regression and time-delay neural network models,” Transportation Research Record, vol. 13, no. 1, pp. 435–446, 2002.View at: Google Scholar
E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, “Optimized and meta-optimized neural networks for short-term traffic flow prediction: a genetic approach,” Transportation Research Part C: Emerging Technologies, vol. 13, no. 3, pp. 211–234, 2005.View at: Publisher Site | Google Scholar
L. Shen, Freeway Travel Time Estimation and Prediction using Dynamic Neural Networks (Ph. D. Dissertation), Florida International University, 2008.
X. Zeng and Y. Zhang, “Development of recurrent neural network considering temporal-spatial input dynamics for freeway travel time modeling,” Computer-Aided Civil and Infrastructure Engineering, vol. 28, no. 5, pp. 359–371, 2013.View at: Publisher Site | Google Scholar
X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, “Long short-term memory neural network for traffic speed prediction using remote microwave sensor data,” Transportation Research Part C: Emerging Technologies, vol. 54, pp. 187–197, 2015.View at: Publisher Site | Google Scholar
S. V. Kumar and L. Vanajakshi, “Short-term traffic flow prediction using seasonal ARIMA model with limited input data,” European Transport Research Review, vol. 7, no. 3, 2015.View at: Google Scholar
W. Hu, L. Yan, K. Liu, and H. Wang, “A short-term traffic flow forecasting method based on the hybrid PSO-SVR,” Neural Processing Letters, vol. 43, no. 1, pp. 155–172, 2016.View at: Publisher Site | Google Scholar
C. Xu, J. Ji, and P. Liu, “The station-free sharing bike demand forecasting with a deep learning approach and large-scale datasets,” Transportation Research Part C: Emerging Technologies, vol. 95, pp. 47–60, 2018.View at: Publisher Site | Google Scholar