Abstract

The traffic flow prediction is becoming increasingly crucial in Intelligent Transportation Systems. Accurate prediction result is the precondition of traffic guidance, management, and control. To improve the prediction accuracy, a spatiotemporal traffic flow prediction method is proposed combined with k-nearest neighbor (KNN) and long short-term memory network (LSTM), which is called KNN-LSTM model in this paper. KNN is used to select mostly related neighboring stations with the test station and capture spatial features of traffic flow. LSTM is utilized to mine temporal variability of traffic flow, and a two-layer LSTM network is applied to predict traffic flow respectively in selected stations. The final prediction results are obtained by result-level fusion with rank-exponent weighting method. The prediction performance is evaluated with real-time traffic flow data provided by the Transportation Research Data Lab (TDRL) at the University of Minnesota Duluth (UMD) Data Center. Experimental results indicate that the proposed model can achieve a better performance compared with well-known prediction models including autoregressive integrated moving average (ARIMA), support vector regression (SVR), wavelet neural network (WNN), deep belief networks combined with support vector regression (DBN-SVR), and LSTM models, and the proposed model can achieve on average 12.59% accuracy improvement.

1. Introduction

The accurate prediction of future traffic conditions (e.g., traffic flow, travel speed and travel time) is crucial requirement for Intelligent Transportation Systems (ITS), which can help administrators take adequate preventive measures against congestion and travelers take better-informed decisions. Among different applications in ITS, traffic flow prediction has attracted significant attention over the past few decades. It is still a challenging topic for transportation researchers.

Due to the stochastic characteristics of traffic flow, accurate traffic prediction is not a straightforward task. In order to deal with this issue, many techniques are deployed for modeling the evolution of the traffic circulation. These existing prediction schemes are classified roughly into three categories: parametric methods, nonparametric methods, and hybrid methods. The parametric methods include Autoregressive Integrated Moving Average method (ARIMA) [1], Seasonal Autoregressive Integrated Moving Average method (SARIMA) [2, 3], and Kalman filter [4, 5]. The parametric methods are widely used in traffic flow prediction, but these methods are sensitive to the traffic data for different situations. The nonparametric methods include artificial neural networks (ANNS) [69], k-nearest neighbor (KNN) [1014], support vector regression (SVR) [15, 16], and Bayesian model [17, 18]. Compared to the parametric methods, nonparametric methods are more effective in prediction performance. Even so, nonparametric methods require large amount of historical data and training process. The hybrid methods are mainly combining the parametric approach with nonparametric approach [1929]. Although the prediction accuracy of nonparametric methods and hybrid methods is superior to parametric methods, all these methods mainly considered the data closed to the prediction station, which could not fully reveal the spatiotemporal characteristics of traffic flow data. Vlahogianni et al. [30] summarized existing traffic flow prediction algorithms from 2004 to 2013. Suhas et al. [31] followed a systematic study to aggregate previous works on traffic prediction, highlight marked changes in trends, and provide research direction for future work. Lana et al. [32] summarized the latest technical achievements in traffic prediction field, along with an insightful update of the main technical challenges that remain unsolved. The readers interested in details of models that applied in traffic prediction field could refer to review reference paper.

With the widespread traditional traffic sensors and new emerging traffic sensor technologies, tremendous traffic sensors have been deployed on the existing road network, and a large volume of historical traffic data at very high spatial and temporal resolutions has become available. It is a challenge to deal with these big traffic data with conventional parametric methods. But for nonparametric methods, most are shallow in architecture, which cannot penetrate the deep correlation and implicit traffic information. Recently, deep learning, an emerging machine learning method, has drawn a lot of attention from both academic and industrial filed. Traffic flow prediction based on deep learning methods has become a new trend.

Huang et al. [33] proposed a deep architecture for traffic flow prediction with deep belief networks (DBN) and multitask learning. Lv et al. [34] used a stacked autoencoder (SAE) model to learn generic traffic flow features. Duan et al. [35] evaluated the performance of the SAE model for traffic flow prediction at daytime and nighttime. Soua et al. [36] proposed a DBN based approach to predict traffic flow with historical traffic flow, weather data, and event-based data. An extension of dempster-shafer evidence theory was used to fuse traffic prediction beliefs coming from streams of data and event-based data models. Koesdwiady et al. [37] predicted the traffic flow and weather data separately using DBN. The result of each prediction was merged using dada fusing techniques. Yang et al. [38] proposed a stacked autoencoder Levenberg-Marquardt model to improve prediction accuracy. The Taguchi method was developed to optimize the model structure. Zhou et al. [39] introduced an adaptive boosting scheme for the stacked autoencoder network. Polson and Sokolov [40] developed a deep learning model to predict traffic flows. An architecture was proposed combined with a linear model that was fitted using regularization and a sequence of tanh layers. Zhang and Huang [41] employed the genetic algorithm to find the optimal hyperparameters of DBN models. In recent years, recurrent neural network (RNN) was more practical in comparison with other deep learning structures for processing sequential data. Ma et al. [42] utilized a deep Restricted Boltzmann Machine and RNN architecture to model and predict traffic congestion. However, the traditional RNNs face problems of vanishing gradients and exploding gradients. To solve this problem, a long short-term memory network (LSTM) was proposed. Because LSTM can automatically calculate the optimal time lags and capture the features of time series with longer time span, a better performance can be achieved with LSTM model in traffic flow prediction. LSTM was developed to capture the long-term temporal dependency for traffic sequences by Ma et al. [43]. Shao and Soong [44] utilized LSTM to learn more abstract representations in the nonlinear traffic flow data. In recent years, LSTM was very successful in traffic flow prediction, but the spatiotemporal characteristics of traffic flow were hardly considered. Zhao et al. [45] proposed an origin destination correlation matrix to represent the correlations of different links within the road network, and a cascade connected LSTM was used to predict traffic flow. However, the architecture of proposed LSTM model was overly complicated, making comprehension difficult. The prediction results were not very stable and reliable in different observation points.

In this paper, inspired by the successful application of LSTM in traffic flow prediction, the high spatiotemporal correlation characteristics of traffic flow data are considered in order to improve prediction performance. A hybrid traffic flow prediction methodology is proposed based on KNN and LSTM. KNN is used to choose mostly related neighboring stations with the test station. A multilayer LSTM is applied to predict traffic flow in all selected stations. The final prediction results are obtained by weighting the prediction values in all selected stations. The weights are assigned by adjusting the weight dispersion measure with rank-exponent method. The experiment results show that proposed method has better performance on accuracy compared with most existing traffic prediction methods.

The main contributions of this paper are summarized as follows.

(1) A hybrid traffic flow prediction methodology is proposed combined KNN with LSTM, which utilizes the spatiotemporal characteristics of traffic flow data. Experimental results demonstrate that proposed approach can achieve on average 12.59% accuracy improvement compared to ARIMA, SVR, WNN, DBN-SVR, and LSTM models.

(2) The prediction results are obtained by weighting the prediction values in all selected stations by adjusting the weight dispersion measure with rank-exponent method. Different from the traditional weighting method, the proposed method highlights the importance of the highly relevant stations to the prediction result.

(3) From classical understanding, closer stations from the prediction station have more correlation than those further stations. In fact, some further stations have also correlation with the prediction station. However, it is consistent with the general fact that the traffic flow in the upstream and downstream has great influence on the prediction result in the traffic flow prediction.

The rest of this paper is organized as follows. Section 2 gives details on a hybrid traffic prediction method based on KNN and LSTM. In Section 3, the dataset used is introduced for the numerical experiments. The results and performance evaluation are also presented. Finally, the conclusions and the future research are given in Section 4.

2. Methodology

2.1. LSTM Network

RNN is a neural network that is specialized for processing time sequences. Different from conventional networks, RNN allows a “memory” of previous inputs to persist in the network internal state, which can then be used to influence the network output. Traditional RNN exhibits a superior capability of modeling nonlinear time sequence problems, such as speech recognition, language modeling, and image captioning. However, traditional RNN is not able to train the time sequence with long time lags. To overcome the disadvantages of traditional RNN, LSTM is proposed. LSTM is a special kind of RNN, designed to learn long term dependencies. The LSTM architecture consists of a set of memory blocks. Each block contains one or more self-connected memory cells and three gates, namely, input gate, forget gate, and output gate. The typical structure of LSTM memory block with one cell is in Figure 1. Input gate takes a new input from outside and process newly coming data. Forget gate decides when to forget the previous state and thus selects the optimal time lag for the input sequence. Output gate takes all results calculated and generates output for LSTM cell.

Let us denote the input time series as , and is input sequence length. is the number of inputs, is the number of cells in the hidden layer, and is the number of memory cells. The subscripts , , and refer to the input gate, forget gate, and output gate, respectively. is the weight of the connection from unit to unit . is the network input to some unit at time , and is the value after activation function in the same unit. is the state of cell at time . is the activation function of the gates, and and are, respectively, the cell input and output activation functions. The LSTM model can be conducted by the following equations.

Input Gates

Forget Gates

Cells

Output Gates

Cell Outputs

By the function of the different gates, LSTM network has the capability of processing arbitrary time lags for time sequence with long dependency.

2.2. KNN Algorithm

KNN algorithm is a nonparametric method used for classification and regression. The KNN method makes use of a database to search for data that are similar to the current data. These found data are called the nearest neighbors of the current data. In this paper, KNN is used to select mostly related neighboring stations with the test station. Suppose there are M stations in the road network. is the historical traffic flow data in test station, and is the sample data length. is the historical traffic flow data in the mth station, which is different from the test station. The Euclidean distance [see (10)] is used to measure the correlation between the test station with others.

According to the calculated distance, a total of K-nearest neighbors are found, and K stations are selected as mostly related stations with the test station.

2.3. Proposed Method

Different form the conventional LSTM network, KNN algorithm is used to select spatiotemporal correlation stations with the test station at first. A two-layer LSTM network is applied to predict traffic flow, respectively, in selected stations. The final prediction results in test station are obtained by weighting with rank-exponent method. At time , the traffic flow data in the test station is denoted as . The traffic flow data for stations near the test station is denoted as

is the station selected by KNN. The prediction traffic flow in the selected stations and test station can be calculated as

where is the weight matrix between the hidden layer and output layer and is bias term. The final prediction results in test station are obtained by weighting according to (12).

where is the weight coefficient. The Rank-Exponent method of weights is used in this paper. Rank-Exponent method can provide some degree of flexibility by adjusting the weight dispersion measure as shown in (13). The value of is set to 2 as indicated by the authors [46].

where is the rank of the selected station, is the total number of selected stations, and is weight dispersion measure.

The flowchart of the proposed method is shown in Figure 2, and the detailed calculation process is shown as follows.

Step 1. Calculate the Euclidean distance between adjacent stations with the test station according to (10).

Step 2. Select mostly related stations with the test station.

Step 3. Predict traffic flow with LSTM network, respectively, in selected stations according to (13).

Step 4. Weigh prediction value in selected stations according to (14).

Step 5. Calculate the RMSE for the predicted traffic flow.

Step 6. Repeat Steps 25 with the different ().

Step 7. Find the smallest RMSE in all the different .

Step 8. Obtain the predicted traffic flow in the test station when RMSE is the smallest.

3. Experiments

3.1. Data Description

The data used to evaluate the performance of the proposed model was collected in mainline detectors provided by the Transportation Research Data Lab (TDRL) at the University of Minnesota Duluth (UMD) Data Center from March 1st, 2015, to April 30th, 2015. The sampling period of the testing dataset was 5 min. In our experiment, we selected the road network in Figure 3 as the experiment area. The area mainly contains four expressways numbered I394, I494, US169, and TH100.There are 36 stations in the experiment area. The station locations and ID that are used are shown in Figure 3. Stations S339 and S448 are located near a transportation hub in road networks in the experiments. Therefore, they were selected as the test stations for the traffic flow prediction. Due to the similarity of traffic flow on the same workday in different weeks, we used the data in the one workday as train and test data in order to ensure the prediction stability. In our experiment, we chose the traffic flow data on Tuesday. Of course, we can choose any one workday from Monday to Friday. There was a total of 9-day traffic flow data on Tuesday in our test dataset. The dataset was divided into two datasets. The data in first 8 days was used as training sample, while the remaining data was employed as the testing sample for measuring prediction performance. The most commonly used prediction interval is 5 min, and we also select the prediction time interval as 5 min, and it is verified to be reasonable by the real experimental results.

Traffic flows for 5 consecutive Tuesdays are shown in Figure 4 in the station S339, and typical traffic flows are shown in Figure 5 in the station S339 and four neighboring stations. From Figure 4, we can see that there is a little difference in the rush hours; however, the profiles of the traffic flows are basically consistent. From Figure 5, it can be seen that there are some differences in different stations, but the data distribution is similar to the station S339. Because traffic flow data has high spatiotemporal correlation characteristics, it is effective to improve traffic prediction accuracy with the spatiotemporal correlations.

3.2. Performance Indexes

In order to evaluate the prediction performance, Root Mean Square Error (RMSE), which was the most frequently used metrics of prediction performance in previous work, and predicting accuracy (ACC) were chosen to evaluate the difference between the actual values with predicted values.where is the length of prediction data and and are the measured value and predicted for ith validation sample, respectively.

4. Results and Discussions

4.1. Results Analysis

In our experiment, stations S339 and S448 are chosen as the test stations, which are located in the two directions of the road network. The timesteps are an important hyperparameter, which are the input size to the model and determines number of LSTM blocks in each level. Through experiment, when timesteps are set as 6, the prediction performance can achieve the optimal value. To validate the efficiency of the proposed method, the performance is compared with some representative approaches, including ARIMA model, SVR, wavelet neural network (WNN), DBN, and LSTM. In SARIMA model, AR and MA order are set as 5 and 4, and normal and seasonal differencing order are set as 1 and 2. In SVR model, kernel function is set as Radial Basis Function (RBF), the penalty parameter of the error term as 300, and the iteration number as 1000. In WNN model, the number of hidden nodes is set as 6, the learning rate as 0.001, and the iteration number as 500. For DBN model, 3-layer architecture is used, and the number of nodes in each layer is set to 128 for simplicity.

The predicted results of different models and real traffic flow are shown within one day in Figures 6 and 7. It is observed that the predicted traffic flow has similar traffic patterns with the real traffic flow and the prediction value of the proposed KNN-LSTM model is almost coincided with the measured data, especially in morning and evening peak hours. The RMSE and ACC for different models are shown for stations S339 and S448 in Table 1. It can be seen that the proposed method has the minimum RMSE. The average ACC for the proposed method is 95.75%, which improve by 28.92%, 8.31%, 14.44%, 6.95%, and 4.32% compared with other models. The traditional ARIMA model has the worst prediction performance, which assumes the traffic flow data is a stationary process but this is not always true in reality. The SVR and WNN method receive better RMSE and ACC than the ARIMA model, while they show weakness when compared with the deep learning methods. The DBN model has also no obvious advantage over SVR.

4.2. Discussions

In this paper, KNN is used to select mostly related stations with the test station. The different values have different prediction performance. We search for all possible values for , the corresponding is the optimal value when the RMSE is minimum. The optimal is set as 10 for the station S339 in our experiment, and the ID numbers of selected stations are S339, S340, S341, S321, S337, S342, S338, S344, S336, and S293. The optimal is set as 6 for station S448, and the ID numbers of selected stations are S448, S447, S446, S450, S737, and S452. As shown in Figure 3, it can be seen that almost all of the selected stations are located in upstream and downstream in the test stations. From classical understanding, closer stations from the prediction station have more correlation than those further stations. In fact, some further stations have also correlation with the prediction station. For the test station S339, the closer station S343 is not selected, and closer station S451 is not selected for the test station S448. However, it is consistent with the general fact that the traffic flow in the upstream and downstream has great influence on the prediction result in the traffic flow prediction. When , the temporal correlation is only considered, the average ACC is 91.43% which is decreased by 4.32% compared with the proposed method. It indicates that spatiotemporal features have important roles in the traffic prediction. These results verify the superiority and feasibility of the KNN-LSTM, which employ KNN to capture the spatial features and mine temporal regularity with the LSTM networks.

5. Conclusions

In this paper, we proposed a spatiotemporal traffic flow prediction method combined with KNN and LSTM. KNN is used to select mostly related neighboring stations that indicated the spatiotemporal correlation with the test station. A LSTM network was applied to predict traffic flow, respectively, in selected stations. LSTM is able to exploit the long-term dependency in the traffic flow data and discover the latent feature representations hidden in the traffic flow, which yields better prediction performance. The final prediction results in test station are obtained by weighting with rank-exponent method. We evaluated the performance of our model with real traffic data provided by TDRL and compared with ARIMA, SVR, WNN, DBN, and LSTM model. The results show that proposed model is superior to other methods. Since the traffic flow data is affected by weather, incident, and other factors, the impact of these factors on traffic flow data will be further studied so as to improve the prediction accuracy.

Data Availability

The data used in this paper are collected in mainline detectors provided by the Transportation Research Data Lab (TDRL) at the University of Minnesota Duluth (UMD) Data Center. (http://www.d.umn.edu/~tkwon/TMCdata/TMCarchive.html) If any researcher requests for these data, they can download from the website.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was partly supported by the National Key R&D Program of China (2018YFC0808706) and the National Natural Science Foundation of China (Grant no. 5157081053). The authors are also grateful to the UMD Data Center (TDRL) for providing the data.