Abstract

Accurate traffic flow prediction is increasingly essential for successful traffic modeling, operation, and management. Traditional data driven traffic flow prediction approaches have largely assumed restrictive (shallow) model architectures and do not leverage the large amount of environmental data available. Inspired by deep learning methods with more complex model architectures and effective data mining capabilities, this paper introduces the deep belief network (DBN) and long short-term memory (LSTM) to predict urban traffic flow considering the impact of rainfall. The rainfall-integrated DBN and LSTM can learn the features of traffic flow under various rainfall scenarios. Experimental results indicate that, with the consideration of additional rainfall factor, the deep learning predictors have better accuracy than existing predictors and also yield improvements over the original deep learning models without rainfall input. Furthermore, the LSTM can outperform the DBN to capture the time series characteristics of traffic flow data.

1. Introduction

Traffic flow prediction is an important component of traffic modeling, operation, and management. Accurate real-time traffic flow prediction can (1) provide information and guidance for road users to optimize their travel decisions and to reduce costs and (2) help authorities with advanced traffic management strategies to mitigate congestion. With the availability of high resolution traffic data from intelligent transportation systems (ITS), traffic flow prediction has been increasingly addressed using data driven approaches [1]. In this regard, traffic flow prediction is a time series problem to estimate the flow count at a future time based on the data collected over previous periods from one or more observation locations. The prediction models in the literature can be broadly divided into parametric and nonparametric models.

For parametric analysis, linear time series models have been widely applied. An early study in 1979 investigated the application of the autoregressive integrated moving-average (ARIMA) model for short-term traffic data forecasts [2]. The results suggest that the ARIMA model could be used for one-time interval prediction. Levin and Tsao proposed that ARIMA (0, 1, 1) is the most statistically significant for both local and express lanes prediction [3]. Since then many scientists have focused on the ARIMA model and its extensions. A hybrid method of combined Kohonen maps and ARIMA model was developed to forecast traffic flow at horizons of half an hour and one hour, and its performance was found to be superior to ARIMA [4]. In addition, many other modified ARIMA models have been examined, such as subset ARIMA [5], ARIMA with explanatory variables [6], and seasonal ARIMA [7]. Another widely used method to predict traffic volume is the filtering approach, such as the Kalman Filter model. Okutani and Stephanedes introduced Kalman filtering theory into this field and results indicated improved performances [8]. Based on Kalman filter theory, there have also been some modifications [9] and hybrid models [10].

Due to the stochastic and nonlinear characteristics of traffic flow, it is difficult to overcome the limitations of parametric models. Nonparametric machine learning methods have become increasingly popular. Artificial neural networks (NN) have been commonly utilized for this problem, which can be regarded as the general pattern of machine learning application in traffic engineering [11]. Smith and Demetsky developed a NN model which was compared with traditional prediction methods and their results suggest that the NN outperforms other models during peak conditions [12]. Dougherty et al. studied back-propagation neural network (BPNN) for the prediction of traffic flow, speed, and occupancy and the results show some promise [13]. Since then, NN approaches have commonly been used for traffic flow forecasting [1416]. In addition, many hybrid NN models have been proposed to improve performance [1720]. Other nonparametric models have also been studied, such as -nearest neighbor (-NN) models [21] and support vector regression [22].

With the fast development of ITS, it is possible to get access to huge amount of traffic data as well as multisource environmental data. However, both the typical parametric and nonparametric models tend to make assumptions to ignore additional influencing factors, due to the shallow architecture and inability to deal with big data as well as inefficient training strategies. The deep learning method, a type of machine learning, has been well developed recently to address this problem. Deep learning has shown its advantages in pattern recognition [23, 24] and classification [25, 26]. For traffic flow prediction, Huang et al. proposed a deep belief network (DBN) architecture for multitask learning and experiments results show that the DBN could achieve about 5% improvements over the state of the art [27]. Another study used a stacked autoencoder model (SAE) to implement prediction considering explicitly the spatial and temporal correlations [28]. These papers are the pioneers in applying deep learning in traffic flow prediction.

Furthermore, to capture the time series characteristics in the training and prediction steps, another branch of the deep learning models is introduced as the recurrent neural network (RNN), which is designed to cope with time series data prediction problem [29, 30]. In terms of traffic data, a kind of time series data, the RNN can use memory cells to save the temporal information from previous time intervals [31, 32]. One of the most famous RNN is the long short-term memory (LSTM) model, which can automatically adjust some hyperparameters and can capture the long temporal features of the input data [33]. The LSTM has been introduced into the traffic information prediction field and experiments have shown some promising results [34, 35]. The DBN and LSTM will be introduced in detail to represent the basic deep learning method and the advanced recurrent NN, respectively.

However, in most studies, the predictions have not considered weather factors as a multisource input. Whether the deep learning can improve prediction accuracy based on more realistic conditions with additional data sources, like rainfall intensity, is an important issue that warrants investigation.

With regard to the impact of rainfall, there is general consensus that it significantly affects traffic flow characteristics and leads to congestion and accidents. Without a comprehensive understanding of the weather influence on traffic flow, traffic management authorities cannot consider relevant factors in related operational policies to improve traffic efficiency and safety [36]. For rainfall-integrated traffic flow prediction using machine learning methods, Dunne and Ghosh combined stationary wavelet transform and BPNN to develop a predictor that could choose between a dry model and a wet model depending on whether rainfall is expected in the prediction hour [37]. The results show that rainfall-integrated predictor could improve the prediction accuracy during rainfall events. Deep learning tools provide a promising avenue to incorporate the impacts of rainfall in traffic flow prediction.

In this paper, a systematic procedure to predict traffic flow considering rainfall impact using the deep learning method is presented. The main contribution of this work is to investigate whether the deep learning models can outperform typical methods with multisource data. This study also aims to compare the difference between advanced recurrent NN and basic deep learning NN for the prediction of time series traffic flow. In addition, the temporal and spatial characteristics can also be included with multiply sets of traffic detectors. In this regard, the features of traffic flow under various rainfall conditions can be learned after training using local arterial traffic and rainfall intensity data from Beijing, China. The methodology and results can facilitate improving the traffic modeling, operation, and management performance. The findings of this paper can be significant by investigating the prediction accuracy with additional nontraffic data and can be regarded as reference for further consideration of more environmental factors.

Section 2 reviews the literature relating to the DBN, LSTM, and rainfall impact analysis. Section 3 develops the rainfall-integrated DBN (R-DBN) and the rainfall-integrated LSTM (R-LSTM) models with multisource data input considering the additional rainfall intensity. In Section 4, experiments are presented and the results are discussed. Finally, Section 5 provides a summary, concluding remarks, and future research directions.

2. Research Background

2.1. Deep Belief Network

The DBN is a type of deep neural network with many hidden layers and large amounts of hidden units in each layer. The typical DBN is equivalent to a stack of Restricted Boltzmann Machine (RBM) models with an output layer on the top. The DBN uses a fast, greedy unsupervised learning algorithm to train RMBs and a supervised fine-tuning method to adjust the network by labeled data [38].

Each RBM has a visible layer and a hidden layer , connected by undirected weights. For the stack of RBMs in the DBN, the hidden layer of one RBM is regarded as the visible layer of the next RBM. The parameter set of a RMB is , where is the weight between and . and are the bias of the layers. A RBM defines its energy asand its joint probability distribution of and asand the marginal probability distribution of as

To obtain the optimal for a single data vector , the gradient of log-likelihood estimation can be calculated based on equation, given aswhere denotes the expectations under the distribution specified by the subscript that follows.

Because there are no connections between the units in the same layer, is easy to obtain by calculating the conditional probability distributions, given asThe activation function is sigmoid function.

For the contrastive divergence (CD) learning method could be utilized by reconstruction to minimize the difference of two Kullback-Leibler divergences (KL). CD learning is proved to be efficiently practical and also can reduce computational cost compared with typical Gibbs sampling method [3840].

The weights in DBN layers are trained by unlabeled data using the aforementioned fast and greedy unsupervised method. For prediction purpose, a supervised layer should be added above the DBN to adjust the learned features by labeled data using an up-down fine-tuning algorithm. In this study, the fully connected layer is selected to perform as the top layer, also using the sigmoid activation function.

2.2. Long Short-Term Memory

The RNN models can take the time series characteristics into consideration, using a recurrent mechanism to save the temporal and spatial states of previous intervals. However, some early models, such as time-delay neural network (TDNN) [31], fail to describe the long-term correlation of data. The training and application steps of RNN to predict traffic information usually contain data with long time lags. In this regard, the LSTM model is developed to overcome this shortcoming.

The success of LSTM relies on the ability to learn the time series characteristics and to determine several hyperparameters automatically from data [3335]. The LSTM model is composed of one input layer, one hidden layer, and one output layer. The hidden layer is regarded as a memory block with memory cells to memorize the long and short temporal features. Three gates are developed, including the input gate, output gate, and forget gate, to control the input, output, and continual time series characteristics, respectively. The state of the memory cell is recorded and recurrently self-connected, as shown in Figure 1. The input data, output data, and memory information of hidden layer at interval t are denoted as , , and , respectively. The , , , and represent the output of input gate, output gate, forget gate, and the memory cell state, respectively.

The LSTM model can be conducted by the equations shown below.where means the sigmoid activation function, and are the centered logistic sigmoid functions, W means the weigh matrices, and b means the bias vectors.

Unlike DBN training, labeled data is utilized to train the neural network of LSTM in a BP manner using the gradient descent optimization method. Due to the mathematical framework of recurrent mechanism, the time series information can be learned by the memory cell in the LSTM without specifying the time lags from previous intervals [34].

2.3. Rainfall Impact

The reason we chose to consider rainfall intensity as the additional data source is that there is a general consensus that it significantly affects traffic flow characteristics. Conclusions from an early study implied that rainfall weather has negative impacts on the traffic flow system [41]. Since then, many papers have focused on studying the rainfall impact on principal road traffic characteristics such as capacity and operating speed. It is noted that both road capacity and vehicle speed will influence the traffic flow count. Thus, it is reasonable to consider this weather factor when implementing prediction using a deep learning method.

Previous studies examining the influence of weather have been limited to the lack of data. However, in recent years with the access to high resolution data from traffic agencies and weather stations, an increasing number of investigations have focused on the quantitative rainfall impacts on the traffic system. The speed-flow-occupancy relationship under adverse weather conditions was examined by using dummy variables within a multiple regression model and the results of capacity loss are marginal in light rain but about 15% in heavy rain are indicated [42]. A study estimated the rainfall impact on freeway road capacity and operating speed under different rainfall categories and found that the capacity reductions in heavy rain are greater than that recommended by the HCM2000 [43, 44]. More specifically, an investigation found that rainfall conditions of 0–0.01 in/h, 0.01–0.25 in/h, and >0/25 in/h cause 2%, 7%, and 14% capacity reductions and 2%, 4%, and 6% speed reductions, respectively [45]. For urban arterial flow count reductions, using the data from the Tokyo Metropolitan Expressway, a study found that rainfall decreases flow ranging from 4–7% in light rain to a maximum of 14% during heavy rain [46]. For Belgium, a paper focused on the effect of weather conditions on daily traffic flow. The heterogeneity of the weather effects between different traffic count locations and the homogeneity of the weather effects on upstream and downstream traffic at specific locations were observed [47]. More recent works have focused on the development and calibration of generalized rainfall impact models, which have yielded good results for traffic management and simulation systems [48, 49].

3. Methodology

For accurate traffic flow prediction, we propose two deep learning methods to learn the effective features of traffic flow and rainfall data. From a previous study based on local data in Beijing, it was concluded that rainfall weather significantly affects the traffic flow characteristics, including road capacity and operating speed [50]. The reduced road capacity and slower vehicle speed yield the differences of flow count between different weather conditions. In other words, despite other factors remaining the same except for rainfall condition, the traffic flow counts will differ. In most previous studies, this factor was not included in the input vector, resulting in inappropriate features learned by predictors and inaccurate prediction outcomes.

In this study, together with the traffic flow from previous intervals, the rainfall intensity data for corresponding previous intervals is added to both the training and the testing data set. In the training stage, the feature of rainfall impact on traffic flow can be learned to reflect the reductions of flow under various rainfall conditions. Consequently, in the test stage, the model will have better prediction compared to those that neglect this weather feature [37].

Based on the input layer, a stack of RBMs are developed to extract the features of data using a fast, greedy unsupervised learning method. The hidden layer of RBM is regarded as the visible layer of the next RBM. Each RBM is trained separately using unlabeled data by the CD learning method to reconstruct the input. A fully connected output layer is developed on top of the RBMs to fine-tune the entire architecture using labeled data by an up-down algorithm. Unlike typical pattern recognition or classification task, the traffic flow data in this study are from one arterial, collected by four sequent detector sets which have spatial-temporal relationships. Thus, it is feasible to put the four related prediction tasks together to share the trained features and weights to avoid unreasonable bias. This rainfall-integrated DBN is denoted as R-DBN and its structure is shown in Figure 2.

Similar to the structure of R-DBN, the input data of R-LSTM also contains the rainfall data for corresponding previous intervals with the flow data. The model has one hidden layer as the memory cell to recurrently save the time series characteristics of the traffic data. The network can be trained by labeled data in a BP manner using the gradient descent optimization method. Similarly, the structure of R-SLTM is shown in Figure 3.

The deep learning methodology adopted in this paper is as follows.

Step 1. Use traffic data and rainfall data to prepare training and test data set for 10-minute and 30-minute prediction.

Step 2. Use architecture optimization data set to decide optimal R-DBN architecture parameters, including input dimension, layer size, hidden units size per layer, and epochs.

Step 3. Use training data set to train the R-DBN and R-LSTM based on optimal architectures.

Step 4. Use test data set to examine prediction performance. Compare the results with rainfall-integrated BPNN (R-BPNN) and rainfall-integrated ARIMA (R-ARIMA). The regular DBN, LSTM, BPNN, and ARIMA without rainfall consideration are also considered as benchmarks.

4. Experiments

4.1. Data Description

An arterial segment from Deshengmen to Madian connecting the 2nd and 3rd Ring Road in Beijing, China, is selected as the test objective. The graph of the test area is shown in Figure 4, with three lanes for each direction. This 2.1-kilometer long arterial has the commonality to represent the corridors in Beijing, which ensures the model to be transferable to other areas. Four sets of detectors are distributed in both directions. Traffic flow count data are recorded in 2-minute intervals and archived by the Beijing Traffic Management Bureau (BTMB). For weather data in the corresponding urban area, hourly rainfall intensity is recorded by the National Meteorological Center (NMC) of the China Meteorological Administration (CMA). The flow count data collected by detectors from three lanes are aggregated as one data point. The study period is from June to August, 2013, and June to August, 2014. The data from July 8 to July 14 of 2013 are selected as the test set. The 2013 June data are used for model architecture optimization. And the rest of the 2013 and 2014 data are regarded as the training set. Most of the available data is utilized to train the models to ensure accuracy. The test week is selected as it contains four rainfall days, 8th, 9th, 10th, and 14th. It is noted that this study only uses one direction of the arterial as an experiment demonstration. The aggregated traffic flow count data for the four detector sets of the test week are shown in Figure 5. The horizontal axis is time in 10-minute unit and the vertical axis is traffic flow count. The patterns of daily traffic flow count can be observed.

To evaluate the performance of the predictor, three performance measurements are selected, which are the mean absolute error (MAE), the mean absolute percentage error (MAPE), and the root mean square error (RMSE), shown below.where and are the observed and predicted flow data for interval and is the test sample size. The units of MAE and RMSE are both vehicle per hour (veh/h).

4.2. Prediction Architecture

The architecture of the DBN predictor depends on several principal parameters, including the number of previous time intervals utilized for prediction, the hidden layer size, the units in each hidden layer, and the number of epochs for training a RBM. To decide the model architecture, the previous time intervals for prediction are set from 1 to 15 (2 minutes to 30 minutes), and the hidden layer size is from 1 to 10. The units in each RBM layer are chosen from 100 to 1000 with gap value 100, and the epochs are set from 10 to 50 with gap value of 10. In this paper, the DBN predictor is utilized to predict the next 10-minute and 30-minute traffic flow. For each of the three prediction horizons, grid search has been implemented to decide the most effective architecture for predictor based on the training set. The parameter choices of R-DBN and DBN are listed in Table 1.

From Table 1, it is surprisingly observed that for all the cases the optimal hidden layer size should not exceed 3. In other words, a DBN with three RBM is sufficient for Beijing traffic flow analysis. For previous data set intervals, it is noted that more intervals are needed to predict traffic flow with longer horizon. For example, more than 20 previous intervals are required to implement a 30-minute prediction, while only 12 or 14 intervals are needed for a 10-minute horizon. Regarding the units in each hidden layer, the number should be neither too small nor too large. Furthermore, the epoch time ranges from 10 to 40. Using a grid search, the optimal DBN architecture is determined for arterial traffic flow prediction in Beijing. The results could provide references for future research and application.

In terms of the R-LSTM and LSTM, only one hidden layer is utilized in this paper, and the optimal previous time intervals can be decided automatically. Thus, it is unnecessary to consider the determination of the hyperparameters.

4.3. Result Discussion

The test data set is from July 8 to July 14, which as noted earlier includes four rainy days. For comparison, several other predictors are selected, including R-BPNN and R-ARIMA. The regular DBN, BPNN, and ARIMA without rainfall consideration are also implemented as benchmarks. For BPNN and ARIMA, grid search is also implemented due to the structure uncertainty with the additional rainfall input. The hidden layer number for BPNN is 1, and the units are 400 and 400 with rainfall consideration and 400 and 300 without rainfall consideration, for 10-minute and 30-minute horizons, respectively. For ARIMA, the best structure is ARIMA (1, 1, 2) with rainfall input and ARIMA (1, 2, 3) without rainfall input.

The MAE, MAPE, and RMSE between observed and predicted data are presented in Table 2. For all the predictors, performance results show that it works better for a 10-minute horizon than 30-minute horizon, for all the measurements. This finding is expected and consistent with other studies wherein larger errors are observed for longer prediction horizon. In comparison with the predictors without rainfall input, the R-DBN and R-LSTM perform better for both time horizons, indicating that the accuracy of deep learning could be improved with multisource data inputs. This emphasizes the significance of using the rainfall-integrated model for traffic flow prediction. The rainfall feature can be obtained by deep learning and then utilized for forecasting, indicating that the R-DBN and R-LSTM have the ability to handle multisource data and learn the underlying patterns of traffic flow under various rainfall conditions. Furthermore, the LSTM family performs better than DBN, with/without considering rainfall factor, which proofs the advantage of LSTM to capture the time series characteristics of traffic data.

For the comparison with other existing methods, we choose BPNN and ARIMA to represent a shallow machine learning model and a typical parametric model, respectively. From Table 2, it is noted that these two rainfall-integrated shallow models are even less effective than the LSTM and DBN without rainfall input. Furthermore, the BPNN has better accuracy than ARIMA whether with or without rainfall consideration. Another interesting thing is that the R-BPNN works better than regular BPNN, while the R-ARIMA is worse than the regular one. This phenomenon indicates that the ability to conduct multisource data is limited for the time series model like ARIMA. This work implies that with additional data such as weather factor, deep learning could also outperform shallow machine learning and typical predictors.

The performance of R-LSTM is shown in Figure 6. The horizontal axis is time in 10-minute unit and the vertical axis is traffic flow count. It can be observed that the predicted data can describe the real situation with/without rainfall influence.

5. Conclusion

In this paper, we try to investigate whether the deep learning model can outperform traditional methods based on traffic flow data considering the rainfall factor. This study also aims to compare the difference between advanced recurrent NN and basic deep learning NN for the prediction of time series traffic data. For this purpose, we propose two deep learning methods, the R-DBN and the R-LSTM models. The DBN model consists of a stack of RBMs, which utilizes unsupervised and supervised training. The LSTM model uses memory block with memory cells to memorize the long and short temporal features, which yields better performance for the prediction of time series data. The data used is from an arterial in Beijing, with rainfall intensity data for the corresponding period. Test experiments show that the R-DBN and R-LSTM can effectively discover the features of traffic flow under various rainfall conditions, and the new models can outperform R-BPNN and R-ARIMA and also yield improvements over the original deep learning models without rainfall consideration. In addition, the LSTM family can outperform the DBN family to describe the time series characteristics of traffic data. In summary, the results highlight the importance of incorporating weather impacts and the significant improvements attainable in prediction accuracy. This work suggests that deep learning is promising with multisource data inputs.

For future work, more deep learning methods should be introduced. Also, the consideration of more additional factors would be interesting.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.