A Deep Learning Model with Conv-LSTM Networks for Subway Passenger Congestion Delay Prediction
When urban rail transit is faced with a large number of commuter passengers during peak periods, passengers are often waiting for the next train because the subway is running at full load, which causes delays to the overall travel time of passengers. The calculation and prediction of the congestion delay in subway stations can guide the operation department and passengers to make better planning and selection. In this paper, we use a new method based on deep learning technology to evaluate the congestion delay of subway stations. Firstly, we use automatic fare collection (AFC) system data to evaluate the congestion delays of stations. Then, we use a convolutional long short-term memory (Conv-LSTM) network to extract spatial and temporal characteristics to solve the short-term prediction problem of the subway congestion delay in the network structure. The spatiotemporal variables include inbound passenger flow, outbound passenger flow, number of passengers delayed, and average delay time. As a spatiotemporal sequence, the input and prediction targets are both spatiotemporal three-dimensional tensors in the end-to-end training model. The effectiveness of the method is verified by a case study of the Chongqing Rail Transit. Experimental results show that Conv-LSTM is better than the benchmark models in capturing spatial and temporal correlation.
With the rapid development of the national economy and the continuous improvement of the urbanization level, the number of passenger trips and construction projects of urban rail transit is also increasing rapidly. By the end of 2019, 40 cities in mainland China had opened urban rail transit, with annual passenger quantity up to 23 billion 710 million times. Moreover, there are still 65 cities whose urban rail transit plans have been approved, and China’s urban rail transit is in a period of great development and construction.
Although the subway has the characteristics of large passenger capacity, the imbalance of traffic supply and demand often occurs in the peak periods . Due to the limitation of the carrying capacity of subway carriages and platforms, some passengers need to wait for the next train or even more. This phenomenon of passengers staying at the platform will prolong the waiting time of passengers and delay the overall travel time of passengers. In this case, it is very important for passengers and operation departments to accurately grasp the internal operation status of the subway network system.
Subway congestion usually refers to the crowd in carriages. When passengers cannot get on the crowded subway car, it will reduce the comfort and increase the travel time . The part of the increase in travel time due to congestion is called congestion delay, which can be used as an indicator to reflect the current state of passenger flow at the station in real time. The realization of this idea benefits from the large-scale application of automatic fare collection (AFC) system data. The AFC data record the card number, time, and location of each passenger trip. Therefore, it is possible to count the travel time and delays of passengers between ODs in the subway system in real time through big data mining technology . This paper studies the travel time delay of subway passengers and judges the congestion state of the station by analyzing the average travel time delay of passengers waiting at the station. To allow subway operators and passengers to effectively grasp the future operating status of stations, we use the most advanced deep learning methods to predict station congestion.
The main contributions of this paper are as follows: (1) based on the calculation of passenger travel time using AFC data, we use the idea of control variables to eliminate interference factors and use the difference between the real travel time in the peak period and normal travel time in the off-peak period to evaluate passenger congestion delay. (2) The congestion delay of subway passenger flow in the whole network is represented by the image and time series. Among them, the image contains the spatial propagation of congestion delay between adjacent stations, and the time series contains the time dependence of subway station congestion delay. (3) We extend the traditional fully connected long short-term memory (FC-LSTM) network idea to the convolutional long short-term memory (Conv-LSTM) network, which has a convolution structure in both input-to-state and state-to-state transitions and can effectively capture spatiotemporal correlations of congestion delay. (4) The congestion delay of the whole Chongqing Metro network is calculated and predicted, and the effectiveness of the method is verified by the operation data. This is different from the traditional passenger flow forecast research, which is often limited to station or route-level forecasting.
The rest of this paper is organized as follows. Section 2 reviews past works and existing methods in the fields of congestion delay calculation and forecasting. Section 3 introduces how to use AFC data for congestion delay calculation. To cooperate with the prediction of congestion delay, the Conv-LSTM structure used in this paper is described in Section 4. Section 5 analyzes the distribution of congestion delays in the Chongqing Rail Transit network. Section 6 briefly summarizes the work of this paper and puts forward the outlook.
2. Related Work
In the research field of the subway congestion delay problem, there is no complete and effective calculation and prediction method. However, in recent years, big data processing technology and artificial intelligence have developed rapidly, which provides us with new ideas and methods to study the subway congestion delay problem.
In the research field of subway passenger congestion delay, the existing literature mainly focuses on the evaluation and optimization of passenger travel time or waiting time. As early as 2009, Vansteenwegen  proposed the linear programming method to optimize the Belgian railway’s train timetable and found that the general waiting cost could be reduced by 40%. With the extensive use of the AFC system and the continuous development of big data technology in recent years, researchers began to use AFC data to study the waiting time and congestion problem of passengers and achieved a lot of results. Yong-Sheng and En-Jian  used a new estimation model based on the Bayesian inference formula to evaluate the travel time distribution of subway passengers and prove that the walking, waiting, transfer, and in-vehicle travel times of subway passengers belong to a truncated normal distribution by using AFC data. Ingvardson et al.  proposed a mixed distribution composed of uniform distribution and beta distribution to estimate the waiting time of passengers. Then, smart card data are used to verify that this method can improve the estimation of waiting time in the public transport model. Some scholars use mathematical programming to optimize train headway and use AFC data to verify the effectiveness of their model. Liu et al.  optimized the departure interval of the subway transfer station by combining simulated annealing and parallel computing and verified that the model can effectively reduce the waiting time of passengers by using AFC data. Yin et al.  proposed an integrated approach for the train scheduling problem on a bidirection urban subway line to minimize the operational costs and passenger waiting time. The effectiveness of the method is verified by the operation data of the Beijing Subway. Luo et al.  proposed a hybrid method, which combines the static traffic assignment model with the agent-based dynamic traffic simulation model to estimate the frequent congestion in the subway system.
In recent years, machine learning has made great progress in various practical applications . At present, some achievements have been made in the prediction of passenger flow and traffic congestion by using the deep learning method. Yang et al.  proposed an improved long-term feature model based on long-term short-term memory (ELF-LSTM) neural network. It makes full use of the advantages of the long short-term memory (LSTM) neural network model in processing time series and overcomes the limitation that it cannot fully learn long-term time dependence due to time lag. Huaizhong et al.  used the method of deep learning to predict the passenger flow of a single subway station by considering the weather, holidays, ground transportation, and other factors. Wang et al.  proposed a deep learning method with an error-feedback recurrent convolutional neural network (eRCNN) structure for continuous traffic speed prediction. They took Beijing ring road as an example to demonstrate the feasibility of the model in identifying congestion sources. Chen et al.  proposed a hybrid algorithm that combines the addition mode of seasonal-trend decomposition based on loess and the LSTM neural network (STL-LSTM) to mitigate the influences of irregular fluctuation and improve the performance of short-term subway ridership prediction. Ai et al.  used Conv-LSTM to solve the problem of airport delay prediction in the network structure and verify the effectiveness of the model. Zheng et al.  developed an attention-based Conv-LSTM module to extract the spatial and short-term temporal features, able to efficiently capture the complex nonlinearity of traffic flow. Sudatta et al.  defined the vehicle congestion fraction in a block and used the LSTM neural network structure to predict the congestion of the street network. Moreover, some scholars use the parallel structure model to predict congestion. Ma et al.  proposed a parallel structure composed of a convolutional neural network (CNN) and a bidirectional long-term memory network (BLSTM) to predict subway passenger flow. However, the calculation and analysis process of the parallel architecture is complex. Lin et al.  took the event detection system as the research object and proposed an event detection framework based on the generative adversarial networks (GANs) to solve the problem of insufficient event samples. Li et al.  expanded the sample size and balanced the datasets by using the generative adversarial network (GAN) and then extracted the temporal and spatial correlation of traffic flow and detected incidents by using the temporal and spatially stacked autoencoder (TSSAE). In short-term passenger demand forecasting, Ke et al.  proposed the fusion convolutional long short-term memory network (FCL-Net) to address spatial dependencies, temporal dependencies, and exogenous dependencies within one end-to-end learning architecture.
In conclusion, by reviewing the existing results, we found that the inbound passenger flow at subway stations is generally influenced by the travel habits of passengers and the weather, so better prediction results can be obtained by using LSTM and its improved model . However, we found that the congestion of the subway station is not only related to the passenger flow in and out of the station but also closely related to the congestion of the adjacent stations. Specifically, after a station is congested with passengers on the platform due to a full carriage, if the next station does not have a large number of passengers disembarking to make room for the remaining carriages, the phenomenon of passengers being stranded on the platform will still occur at the next station. This forces us to extract spatial features effectively while considering time series data. Therefore, we make a fundamental adjustment to the traditional LSTM approach and adopt Conv-LSTM to extract the spatial and temporal characteristics of the passenger flow congestion to realize the prediction of station passenger flow congestion and achieve a better prediction effect.
3. Congestion Delay Calculation
Passenger flow congestion means that the movement of passengers is limited by other passengers and the state of the environment, increasing travel costs (travel time, physical consumption). The passenger congestion in the station shows that the limited space (station space, train residual capacity) and equipment capacity cannot meet the needs of passengers, thus gradually forming congestion. Compared with other periods, the passenger volume in the peak period is significantly higher, and a large number of passengers gather in a short time in the local space, which easily leads to passenger congestion. If we cannot achieve early warning and effective management, it will bring security risks to the subway operation. However, there are many reasons for passenger delays. For example, passenger flow congestion, train delay, signal failure, and other objective factors will cause an increase in passenger travel time. When the passenger travel time exceeds a certain threshold, it means that the passenger travel is different from the usual, and it is likely that there is a delay. Due to the congestion of passenger flow during the peak period, passengers are hindered by objective factors such as other passengers or control measures, resulting in extra time loss in the process of travel. Rather, the delay is expressed as the difference between the real travel time and the normal travel time.
Among them, the increase of travel time caused by passenger congestion is called congestion delay, which is the main research object of this paper. Congestion delay is mainly composed of walking delay and waiting delay. (1) The main reasons for walking delay include slow travel caused by passenger flow congestion, queuing caused by equipment capacity limitation, and increased travel distance caused by passenger flow organization adjustment in the station. (2) The main reason for the waiting delay is that passengers cannot get on the train in time due to the high full load rate. Therefore, this paper adopts the idea of control variables, selects specific dates to eliminate the interference of other factors (train delay and signal failure), and focuses on the impact of passenger flow congestion on travel time delay.
For the passengers who need to transfer in the process of travel, we can only know the location of the passengers in and out of the station through AFC data and cannot determine where the passengers’ transfer. Moreover, when passengers’ travel time increases due to passenger congestion, we cannot judge whether the increased travel time occurs at the origin station or the transfer station. Therefore, when we evaluate the degree of station congestion, we take nontransfer passengers as the research object. If this part of passengers has a congestion delay, it can also be judged that nontransfer passengers entering the station during the same period will also face the same congestion situation.
3.1. Off-Peak Period for Normal Travel Time
We assume that, during the off-peak period, passengers will not stay on board due to passenger congestion in the carriages or platforms. In this case, the waiting time of passengers is an approximately uniform distribution , and the maximum waiting time of passengers is a departure interval in the off-peak period.
Since the passengers’ walking speed is approximately normal , we assume that the inbound walking time and outbound walking time of passengers obey the normal distribution .
Some researchers consider that, for the same station, the path of entering and leaving the platform is the same, so they set the walking time to enter and leave the station to the same value. However, through the investigation, we found that some stations have different routes for passengers to enter and leave the platform, passengers have different directions on the stairs when entering and leaving the platform, and the capacity of the stairs is also different. Therefore, we calculate and analyze the walking time to enter and leave the platform, respectively. For station p, the walking time of passengers entering and leaving the platform can be set as and .
Taking the no transfer route (p∼q) as the research object, the overall travel time can be given bywhere denotes the overall travel time of passengers from station p to station q, denotes the waiting time of passengers at station p, and denotes the total on-board time of passengers from station p to station q. When the train runs on time according to the timetable, is the fixed value.
According to the independence of each travel time element, the mean and variance of travel time can be given bywhere and denote the mean and variance of travel time of the route (p∼q), and denote the mean walking time of passengers in and out of the station, and denote the variance of the walking time of passengers in and out of the station, and denotes the departure interval of the train at station p during the off-peak period. and can be obtained from urban rail train operation data; and can be calculated by AFC data. The time range can be divided into k periods . For passengers entering the station at , the overall travel time can be given by
3.2. Peak Period for Congestion Delay
Congestion delay refers to the additional part of travel time caused by passenger flow congestion in the stations and carriages. It mainly includes the extra walking time caused by the passenger flow congestion in the walking link and the extra waiting time caused by the passenger flow congestion in the waiting link.
For the same station at the same time point, due to different up and down directions, the waiting situation of passengers is also different. As shown in Figure 1, this paper calculates the platform congestion delay time in the up and down directions of each station separately. For line l with n stations, the number of platforms is 2n.
Taking the no transfer route (p∼q) as the research object, the overall travel time can be given bywhere denotes the congestion delay time of passengers in the up or down direction of station p.
According to the independence of each travel time element, the mean and variance of travel time can be given by
Using the walking time and obtained in the previous paper, the congestion delay time of passengers at station p can be obtained by equation (6).
The time range of the station congestion study is divided into k periods . For passengers entering the station at , the total delay time can be given by
If passengers arrive at the platform evenly, the average waiting time is equal to half of the departure interval. In other words, even if there is no congestion at the platform, half of the passengers’ waiting time is longer than half of the departure interval.
Therefore, for a specific passenger, even if , it is not sure that the passenger has a congestion delay. To avoid the calculation error of the number of passengers with a congestion delay caused by this part of passengers, the full departure interval is used as the maximum waiting time of passengers without a congestion delay. A passenger whose congestion delay exceeds the full departure interval is deemed to have been delayed. The number of passengers delayed can be given bywhere a denotes the order number of passengers, the peak period set of rail transit is , and denotes the collection of all passengers traveling along the route (p∼q) during the period .
For station p, the proportion of passengers with congestion delay in the period can be given bywhere denotes the delay rate of passengers arriving at station p during the period and denotes the number of all passengers traveling along the route (p∼q) in the period .
For station p, the average congestion delay time of passengers entering the station in the period can be given by
4. Deep Learning Forecasting
Passenger congestion delay has complex characteristics in spatial and temporal dimensions. The passenger congestion delay of a station at a certain time can be explained from two aspects. From the perspective of the temporal dimension, the passenger congestion delay of the next period can be regarded as the continuation of the passenger congestion delay of the previous period. From the perspective of the spatial dimension, the passenger congestion delay of a station is affected by the congestion delay of adjacent stations, and the congestion delay of adjacent stations has a certain spatial correlation. Therefore, we apply Conv-LSTM to deal with the spatial dependence, temporal dependencies, and the network topology properties of the subway passengers’ congestion delay. In this section, we will briefly review the traditional FC-LSTM structure and then explain the deep learning architecture and advantages of Conv-LSTM.
LSTM is a special form of the RNN structure, which is mainly used to solve the problems of gradient vanishing and gradient explosion in the process of long sequence training. In most RNNs, the hidden layer function H is the basic application of the sigmoid function. However, the LSTM architecture uses specially constructed memory cells to store information, which is better at discovering and utilizing long-term dependence in the data. In short, LSTM can perform better in long sequences than ordinary RNN.
The main innovation of LSTM is that its storage unit is the accumulator of state information. Whenever there is a new input, if the input gate is activated, the input information will accumulate into the cell. Besides, if the gate is opened, the past cellular state may be “forgotten” in the process. Whether the final unit output will be propagated to the final state is further controlled by the output gate . Using memory cells and gates to control information can ensure that the gradient will be captured in the cell and avoid disappearing too fast. FC-LSTM adds “peephole connections” to the traditional LSTM structure, allowing the gate layer to see the state of cells. The inner structure of an FC-LSTM layer is shown in Figure 2. FC-LSTM can be regarded as a multiversion LSTM in which the input, output, and status are 1D vectors. In this paper, we follow the FC-LSTM formula in , which is expressed as follows:
, , and denote the input gate, forget gate, and output gate. represents a one-dimensional vector or scalar, and can be given a different dimension. The weighted parameter matrices are , which conduct a linear transformation between the vectors. are the intercept parameters. The operator “” is the Hadamard product; and tanh are the two nonlinear activation functions given by
Because the internal gate of FC-LSTM is calculated by a similar feedforward neural network, this structure can deal with the time-series data well, but for spatial data, it will bring redundancy. The reason is that spatial data have strong local characteristics, but FC-LSTM cannot describe these local characteristics.
To obtain a better spatiotemporal relationship of the model, we extend the traditional FC-LSTM idea to Conv-LSTM. The method is to replace input-to-state and state-to-state of FC-LSTM with convolution instead of feedforward calculation . By stacking multiple Conv-LSTM layers to form a coding prediction structure, we can establish an end-to-end training model for short-term subway congestion delay prediction. Conv-LSTM can overcome the shortcomings of the traditional LSTM network in space dependence. Compared with traditional LSTM, Conv-LSTM transforms all inputs, outputs, hidden states, and various gates from a two-dimensional vector to a three-dimensional tensor. The comparison between Conv-LSTM and FC-LSTM is shown in Figure 3.
As we defined, the grid of the subway congestion delay system in a spatial region is composed of P rows and Q columns. Each cell with a subway station in the grid has Z measurement scales varying with time. Therefore, the information at any time can be represented by tensor , where R is the observed feature domain. Conv-LSTM determines the future state of a cell in the grid by its local neighbors’ input and its past state. The key expressions are as follows:
All the inputs , cell outputs , hidden states , and gates , , of the Conv-LSTM are 3D tensors whose last two dimensions are rows and columns. The operator denotes convolution, and is the Hadamard product, so the weight matrix will be transformed into a convolution filter for calculation.
In this part, we can take Conv-LSTM as a model to deal with the eigenvectors in 2D meshes. We can predict the characteristics of the central grid according to the characteristics of the surrounding points in the grid. Therefore, we can make a short-term prediction of the subway congestion delay system under the spatiotemporal variables.
The training steps of Conv-LSTM are as follows (Algorithm 1).
5. An Experimental Case
This paper takes the Chongqing subway network as an example to verify the model. In the Chongqing subway system, passengers need to input smart card information on the automatic fare collection system of each subway station. The AFC system records the entrance and exit information of each passenger (e.g., transaction time and station ID). An example of card data is shown in Table 1. This study selects 40 working days’ operation data of Chongqing Metro in September and October 2018. Firstly, the subway passenger delay rate and congestion delay index are calculated by the method in Section 3. Then, the RMSE, MAE, and R2 values of the prediction results are calculated to evaluate the ability and effectiveness of the Conv-LSTM model.
We need to divide the subway network diagram into many small units and ensure that each small unit contains at most one subway station. Therefore, we take the row and column values of subway network cells as 64. Besides, the dataset will be divided into two parts: the first part is the training data (35 days), and the second part is the test data (5 days). We will test the Conv-LSTM model with different layers to determine the best structure. The future delay rate is predicted by using historical observation data such as the number of passengers entering or leaving the station, the delay rate, and the average delay time.
In this paper, root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) are used to verify the prediction accuracy of the model:where denotes the ith actual value and denotes the ith predicted value. denotes the mean of all , and n is the size of the test set. Table 2 shows the comparison of the prediction results between the proposed model and benchmark models. The results show that the Conv-LSTM network is superior to the benchmark models in three indexes of prediction performance. By comparing the benchmark models, we can find that the machine learning method has better prediction performance than the traditional time series model. CNN and Conv-LSTM networks have obvious advantages in spatial relevance capture and verify the importance of considering spatial correlation to the prediction of subway congestion delay. Among them, Conv-LSTM achieves the best predictive performance measured by RMSE (0.0331), which is 6.5% lower than the CNN (0.0354). Conv-LSTM performs better in the combination of spatial features and time series features, and the convolution layer can realize the transition from state to state, so it can capture the spatial correlation better in the coding network.
Figure 4 shows the actual situation of the delay rate of the Chongqing subway network, in which the red bar chart represents the upward direction, the blue bar chart represents the downward direction, and the height of the bar chart represents the size of the congestion delay rate. Figure 5 shows the forecast of the congestion delay rate of the Chongqing subway network. The darker the color, the higher the delay rate of the station. These two visualization graphs effectively reflect the good prediction effect of Conv-LSTM. According to the intuitive comparison between the actual delay and the predicted delay, we find that the model can effectively capture the spatiotemporal characteristics of each node and make an effective prediction.
Besides, we found some rules in the training and prediction of the model. First of all, the stations with the highest congestion delay are mainly concentrated in the subway stations at the intersection of line 1 and line 3 and the stations in the surrounding areas. This is mainly because line 3, as the longest straddle-type monorail transit line in the world, has its limited capacity. Moreover, line 3 passes through several important areas and transfer stations in Chongqing, attracting a large number of passengers. During the peak period, it is even necessary to wait for 5 trains to get on the train. Secondly, the peak of congestion in the morning peak occurs between 7:30 and 8:30, and that in the evening peak occurs between 17:30 and 18:30, which is consistent with the commuter rule of passengers. Thirdly, we also find that congestion mainly occurs in areas within the inner ring, while the possibility of congestion outside the inner ring is relatively small. This is related to the layout of the subway network created by the special terrain of Chongqing. There are relatively few routes to the central city, which make it easy for passengers to gather in the urban area, which will lead to congestion.
The method of combining passenger congestion delay distribution with visualization is helpful for the subway operation department to detect and forecast station congestion and provide a more reasonable basis for subsequent work plan arrangement and even subway network planning. At the same time, it can also provide a reference for passenger travel route planning.
Based on the analysis of the reasons for the delay of subway travel time, this paper uses the idea of control variables to propose the calculation method of passenger congestion delay at the subway network level. Considering that the passenger flow congestion between stations is communicable, the congestion of stations is not only related to the historical congestion of the station but also related to the congestion of adjacent stations. Therefore, combined with the temporal and spatial characteristics of passenger congestion, we use the improved deep learning method Conv-LSTM based on CNN and FC-LSTM to make a short-term prediction of subway station congestion delay. Conv-LSTM not only retains the advantages of FC-LSTM but also is suitable for spatiotemporal data because of its unique convolution structure. We use a variety of benchmark models to evaluate the performance of the proposed model. The test results show that Conv-LSTM is satisfactory in solving the passenger congestion delay prediction problem of the subway station.
In this paper, an end-to-end deep learning structure based on spatiotemporal variables is used to realize the short-term prediction of the passenger congestion delay distribution, which can real-time grasp the congestion situation in the subway network. On the one hand, it can help the operation management department to develop better management and planning schemes. On the other hand, it can help passengers grasp the congestion situation of subway stations and make better travel plans and choices. However, this paper also has corresponding shortcomings, such as transfer passengers will face twice or more waiting time, and we cannot accurately determine the specific time and place of congestion delay. In future work, we will discuss how to judge and calculate the congestion delay of transfer passengers and add it to the prediction model.
Access to data is restricted. The survey data source has certain confidentiality.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The authors thank Chongqing Rail Transit (Group) Co., Ltd., for providing the necessary data. This research was supported by the National Key Research and Development Program of China (2017YFB1200702).
X. Ruihua, L. Yanan, Z. Wei, and L. Sijie, “Empirical analysis of traveling backwards and passenger flows reassignment on a subway network with automatic fare collection (AFC) data and train diagram,” Transportation Research Record Journal of the Transportation Research Board, vol. 2672, Article ID 036119811878139, 2018.View at: Publisher Site | Google Scholar
J. B. Ingvardson, O. A. Nielsen, S. Raveau, and B. F. Nielsen, “Passenger arrival and waiting time distributions dependent on train service frequency and station characteristics: a smart card data analysis,” Transportation Research Part C: Emerging Technologies, vol. 90, pp. 292–306, 2018.View at: Publisher Site | Google Scholar
J. Yin, L. Yang, T. Tang, Z. Gao, and B. Ran, “Dynamic passenger demand oriented metro train scheduling with energy-efficiency and waiting time minimization: mixed-integer linear programming approaches,” Transportation Research Part B: Methodological, vol. 97, pp. 182–213, 2017.View at: Publisher Site | Google Scholar
S. Xingjian, Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, “Convolutional LSTM network: a machine learning approach for precipitation nowcasting,” in Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 802–810, MIT Press, Cambridge, MA, USA, June 2015.View at: Google Scholar