Learning Methods for Urban Computing and IntelligenceView this Special Issue
Multistep-Ahead Prediction of Urban Traffic Flow Using GaTS Model
The mathematical models for traffic flow have been widely investigated for a lot of application, like planning transportation and easing traffic pressure by using statistics and machine learning methods. However, there remains a lot of challenging problems for various reasons. In this research, we mainly focused on three issues: (a) the data of traffic flow are nonnegative, and hereby, finding a proper probability distribution is essential; (b) the complex stochastic property of the traffic flow leads to the nonstationary variance, i.e., heteroscedasticity; and (c) the multistep-ahead prediction of the traffic flow is often of poor performance. To this end, we developed a Gamma distribution-based time series (GaTS) model. First, we transformed the original traffic flow observations into nonnegative real-valued data by using the Box-Cox transformation. Then, by specifying the generalized linear model with the Gamma distribution, the mean and variance of the distribution are regressed by the past data and homochronous terms, respectively. A Bayesian information criterion is used to select the proper Box-Cox transformation coefficients and the optimal model structures. Finally, the proposed model is applied to the urban traffic flow data achieved from Dalian city in China. The results show that the proposed GaTS has an excellent prediction performance and can represent the nonstationary stochastic property well.
As the main driving force of development, traffic has significant effects on the flow of production factors and the daily life of the urban system. The intelligent transportation system (ITS) can effectively provide innovative services relating to different modes of transport and traffic management [1, 2], such as transportation planning , traffic pressure easing , and traffic accident evaluation . It enables transport networks to be more informed, more coordinated, and more efficient for various users. ITS requires a reliable prediction of traffic information in real time. Thus, how to accurately and timely predict traffic is a challenging task, which has gained more and more attention.
The traffic flow is full of complex dynamics and is stochastic , which make analysis and prediction mainly depend on timely or historical traffic data. As we illustrate in the latter, the urban traffic flow data is a nonstationary stochastic process with heterogeneous variance. Thus, the time series models are preferred for the prediction of traffic flow. On the other hand, the control operations, like variable speed limits (VSLs), are always embedded into ITS . This fact suggests that the prediction models in ITS should be of concise structures natural to conduct the control and operation design. Thus, our studies focus on developing the data-driven time series model, which can predict the nonstationary distribution of the traffic flow and is of concise structures for the stochastic control design for ITS.
To construct the statistical model for the nonnegative traffic flow data, we first investigate which probability distribution is feasible for describing the uncertainty of the traffic flow. By detecting the change point of the traffic flow in 24 hours, we divide the traffic flow into four groups, whose distributions are separable from each other. According to the characters of the four groups, we proposed Gamma distribution-based time series (GaTS) models motivated by the generalized linear model .
We take the original observations and their Box-Cox transformation [9, 10] as the response variable, which can be considered as random variables generated by a Gamma distribution-based stochastic process. Moreover, we extracted the homochronous term from the historical observations used as explanatory variables.
We use the Bayesian information criterion (BIC) to select the optimal model structure. By these means, the proposed model can predict not only the mathematical expectation but also the nonstationary variance from the past observations. Furthermore, using the homochronous term makes our model of outstanding accuracy in multistep-ahead prediction. Finally, the real data collected from Dalian city in China are used to validate the performance of the proposed GaTS. The computational results from the real-world data indicate that the homochronous term is helpful to enhance multistep-ahead prediction performance. Meanwhile, GaTS is of the linear structure. Thus, GaTS is efficient and convenient for further control design in ITS.
The rest of this paper is organized as follows: In Section 2, we review the studies on short-term traffic flow prediction. In Section 3, we present the GaTS methodology for generating traffic flow data as building blocks for prediction. In Section 4, we discuss the experimental results. At last, concluding remarks are described in Section 5.
2. Literature Review
Over the past few decades, a lot of mathematical models have been developed by using statistics and machine learning methods for traffic flow prediction. The regressive type model, including the autoregressive models and the support vector regression (SVR), has been used as the parametric methods. The nonlinear model, like the artificial neural network (ANN) model, has also been applied to the prediction of the traffic flow. Besides the parametric methods, the nonparametric models, including the -nearest neighbour (KNN) model, were also constructed.
In the family of the regression-type parametric models, the autoregressive integrated moving average (ARIMA) models were widely used for predicting the traffic flow [11–17]. Besides, the extensions of ARIMA have been studied for the prediction of traffic flow. The space-time autoregressive integrated moving-average model was proposed to fulfil the internal relationships of the links . Stathopoulos and Karlaftis  designed a model for predicting the traffic congestion on the basis of a multivariate time-series state-space model. Meanwhile, SVR is also used for traffic prediction [20–23]. These regressive models mainly focused on the prediction of the tendency (mathematical expectation) of the traffic flow and ignored the statistics for the dispersion (variance).
The ANN model is one of the most commonly used nonlinear models in artificial intelligence methods. Most of the researchers proposed to apply ANN for the traffic prediction problem using a different architecture of the ANN models or to treat the ANN model as a baseline for comparing a wide variety of classification methods [7, 24–28]. It has been observed recently that many researchers in the field have proposed to integrate ANN with different preprocessing methods like fuzzy methods to improve its performances [29–32]. Wang et al.  proposed to design a prediction model for traffic flow by integrating a fuzzy ANN using the Taguchi method. This work employed the Taguchi method to fix up a count for sensors along the roadside. They proved the benefits of the information collected through the detectors. On similar lines, Quek et al.  proposed to utilize a fuzzy-based ANN to the problem of estimation of short-term traffic flow. The reported results indicated that the performance of the proposed model was promising in comparison to the backpropagation-based trained feed-forward (FF) ANN. The ANN and deep learning networks have achieved excellent performance on the traffic flow prediction problem [35–37]. However, they are relatively difficult to develop further traffic control designs for their complex structures.
Effectively modelling traffic flow variance can produce more accurate confidence intervals for short-term traffic flow forecasts and thus improves prediction reliability. Because the generalized autoregressive conditional heteroscedasticity (GARCH) model can be used to describe the time-varying volatility structure of the time series data, it was used by Kamarianakis et al.  for prediction of the conditional variance of speed with the mean equation of the ARIMA model. Similarly, GARCH was used for dependent variance prediction to 15 min volume based on a seasonal ARIMA model [39, 40]. Furthermore, Tsekeris and Stathopoulos used a fractionally integrated asymmetric power GARCH model with the mean equation of an autoregressive fractionally integrated moving average model for traffic volatility prediction and found that the combined model outperformed the ARIMA-GARCH model . Because of the stochastic characteristics in traffic flow series, another volatility model, the stochastic volatility model, was proposed by Tsekeris and Stathopoulos  for urban traffic variability prediction. The evaluation results showed that the stochastic volatility model could produce a more accurate forecast speed variance than GARCH.
KNN is an essential method in the family of nonparametric methods. KNN has the ability to predict the sampled data based on a number without formulating a model [43, 44]. KNN can most benefit the situation with little prior knowledge. Keeping its simplicity and better performance into consideration, the popularity of the applied algorithm is increasing in the field of traffic prediction. Yu et al.  proposed a KNN model for regression of estimation of multiple-time-step prediction. The parameters are measured for each minute by a loop detector. Hou et al.  presented a model for determining the flow of short-term traffic based on KNN. The major limitation of the algorithm is that it requires tremendous computational resources for a massive amount of historical data. The algorithm also suffers a limitation of the sensitivity of the outliers of archival data.
Besides the works mentioned above, the Kalman filtering method [47, 48], advanced techniques for kernel regression [49, 50], and mixtures of multivariate Gaussian processes  were also used to the prediction of the traffic flow.
3.1. Data Description and Transformation
This research focused on urban traffic flow data. The data under study were collected at Donglian road, which is an important business centre of Dalian city of China. The microwave radar vehicle detectors were used to record the traffic flow every 15 minutes from Jan. 5, 2016, at 00:15 to Jan. 15, 2016, at 00:00, including a total of 960 samples (Figure 1). As shown in Figure 1, the data are of apparent periodicity, which suggests that the information in the corresponding period of the past days can be helpful for the prediction. This fact motivates us to improve the multistep-ahead prediction by using the homochronous term.
Figure 2 summarizes the boxplots for the corresponding sampling points of 10 days. The ranges of the boxes at each corresponding sampling point show that the variance of traffic flow is small at night and is large in the daytime. Thus, the traffic data is of time-varying variance, i.e., heteroscedasticity. Furthermore, we divide the data into four segments by using the change point detection method for the periodic time series . Figure 3 illustrates the histograms for the four segments. Figures 3(a)–3(c) suggest that the distribution for the positive random variable, like the log-normal distribution and the Gamma distribution, can be used. However, Figure 3(d) is the histogram of the data collected from midnight to early morning and is of a single right tail, which cannot be approximated by the density function of the log-normal distribution. Thus, we use the Gamma distribution to build the time series model.
Furthermore, we use the Box-Cox transformation to find a proper positive real-valued time series data as the following:
We use BIC to select the proper .
3.2. Model Structure
Because the transformed is nonnegative real-valued, we assume that obeys a Gamma distribution. Let denote the probability density function, with and being, respectively, the location and scale parameters. Consequently, the conditional probability density of can be formulated as the following: with being the Gamma function.
The time-varying implies that the stochastic process generating is nonstationary. To predict such nonstationary on the basis of historical data, and are regressed as follows: where is the explanatory vector given by with and being the maximum time lags of each variable for . means that (3) is used for -step-head prediction. is called the homochronous term, which is the mean of the observations at time in the past five days up to the day containing . Then, for the data set , the likelihood can be formulated as the following: where is the initial joint distribution and is the set of unknown parameters. Note that the initial joint distribution is not the function about . The parameter set can be estimated by solving the following maximum likelihood estimation problem:
3.3. Evaluation Criteria for Prediction Performance
To comprehensively test the prediction performance of the models, several evaluation criteria are calculated. We use the mean absolute error (MAE)  and root mean square error (RMSE)  to show the scale of the prediction error:
Because the above two criteria cannot be used to evaluate the models crossing the data sets, the coefficient of determination calculated from the observation and estimated value is used as the following:
Here, is the sample mean of . From (7), we can know that does not consider the time-varying variance. Thus, it is not proper for evaluating GaTS models with a time-varying variation. To solve this problem, we use the adjusted coefficient of determination  as the following: with being weighted mean. Note that if the scale parameter is estimated as constant.
The appropriate model structure determined by the time lags is crucial for prediction. Note that both and are monotonic increasing with the complexity of the model. Therefore, they cannot be used for model structure selection. Instead, is used to evaluate GaTS, where means the number of the estimated parameters. We computed BIC to evaluate different models. The optimal in the Box-Cox transformation illustrated in Figure 4 and the model structures with the minimum BIC are selected. For fairness, all the evaluations for various changes are conducted by using the original data. That is, we hold inverse transformation of the Box-Cox transformation (1) for the prediction of the transformed data, and all the evaluation criteria are computed by using the original data.
4.1. Model Structure Setting Up
Table 1 shows the one- to four-step-ahead ( minutes to minutes) prediction results obtained by three kinds of models. ( defined by the autoregressive variable and homochronous term , where is of time-invariant scale variable , and are of time-varying scale parameter , and the homochronous term is applied to . For to , the proper ’s for the Box-Cox transformation are selected by BIC. Figure 4 shows the BIC values for the candidates. is the normal distribution-based model. is GaTS by using the original data without the Cox-Box transformation. is the log-normal distribution-based time series model.) The 2nd to 5th columns are the maximum lags of each variable selected by BIC. Here, × means that the variables are unnecessary to be estimated; 0 means that the variables are not chosen by BIC. is calculated from the training data, and is derived from the testing data.
4.2. Expectation Prediction
The difference between values and values is minimal, which suggests that all the models have been well estimated without overfitting or underfitting. Bold numbers are the best values for each evaluation criterion. They indicate that ’s with time-varying scale parameters and homochronous terms are the optimal models for one- to four-step-ahead predictions. Meanwhile, values show that ’s are of the best prediction performance. In ’s, the selected regression structures for are more straightforward than those for . This is similar to the results in work . The homochronous terms represent the periodicity of rather than that of , which are the mean of the traffic flow at the same time of 5 successive days before the day containing prediction time . Therefore, the homochronous terms have not been selected to regress to by BIC, as we expected.
We also construct normal distribution, GaTS without the Cox-Box transformation and log-normal distribution-based models, denoted by ’s, , and in Table 1. By comparing ’s for , we find that ’s are optimal models according to the minimum BIC values. This suggests that GaTS are more optimal than the other distribution-based ones.
4.3. Range Estimates
Figures 5 and 6 illustrate the range estimation results for one day on Jan. 14, 2016, in which 95% confidence intervals (CI) are obtained by the predicted and along with . In Figure 5, the green fields are collected by four ’s in Table 1, and the yellow areas are obtained by the corresponding normal distribution-based models. Figure 5(a) shows that GaTS have a similar CI prediction performance, compared with the normal distribution-based models, for one-step-ahead prediction. However, Figures 5(b)–5(d) indicate that the GaTS-based models are of a more narrow range than the normal distribution-based models for multistep-ahead prediction. Furthermore, the normal distribution-based models for multistep-ahead prediction even obtained the negative lower bound of CI, when the traffic flow values are small. This is contradicting to the fact that the traffic flow is positive-valued.
In Figure 6, the green fields are also collected by four ’s in Table 1, and the yellow areas are obtained by the corresponding log-normal distribution-based models with the identical structures of four ’s. From Figures 5(a)–5(d), we can see that both the two models of exponential transformation are more stabilized than using normal distribution directly from one- to four-step. Therefore, the variance of the log-normal distribution is a wider range than the Gamma distribution in the daytime, which is contrary to the real traffic flow state. That will influence the estimation and prediction of the traffic state by the ITS. Furthermore, the CI obtained by GaTS can well approximate the heteroscedasticity shown in Figure 2. Thus, GaTS can achieve more rational CI predictions.
This research mainly focused on the prediction issue on the urban traffic flow. By specifying GLM with the Gamma distribution, we proposed GaTS to predict the nonstationary stochastic process of the traffic flow. The objective of GaTS is to predict the probability distribution of the traffic flow in real time. To this end, the Gamma distribution presents the stochastic properties of nonnegative-valued traffic flow data and one-side-tailed histogram for the midnight period. The GLM structure supports the heteroscedasticity of the traffic flow. The homochronous term improves the accuracy of the multistep-ahead prediction of the mathematical expectation.
The traffic flow data in this research were collected from Dalian, which is a large port in northern China, as well as a major destination for Chinese tourists . The aggregation of a large number of different types of crowds not only brings traffic and environmental problems but also makes the state of the use of the region’s urban public space become complicated and contradictory . Because it is relatively difficult to improve transportation infrastructure, we are focusing on developing intelligent software control and management to relieve traffic congestion. ITS need more accurate historical information and future prediction of the road network . Furthermore, to control and regularize the traffic flow, the model with precise accuracy should be of simple structures. Thus, our proposed GaTS is more proper than the research focus on the models of ANN and deep learning. Furthermore, a series of GaTS can be extended to model the joint distribution for the joint prediction of the multiple sensors.
Several research topics can be further considered. We successfully specified GLM by using the Gamma distribution for the urban zone. However, the probability distributions for other zones, like the highways, should be further summarized. The potential external factors, which have a relation with the traffic flow and can be governed by ITS, should be investigated to be embedded into GaTS. On the basis of GaTS with the external factors, the cost function for the control and regularization of the traffic flow should be constructed, and the corresponding optimization solver should be developed.
The data that support the findings of this study are available from the ITS database of the traffic police department in Dalian city of China. But restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the traffic police department.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research was funded by the National Natural Science Foundation of China (Grant Nos. U1560102, 61633006, and 61502074).
G. E. P. Box and D. R. Cox, “An analysis of transformations,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 26, no. 2, pp. 211–252, 1964.View at: Google Scholar
G. Shen, X. Kong, and X. Chen, “Short-term traffic flow intelligent hybrid forecasting model and its application,” Journal of Control Engineering and Applied Informatics, vol. 13, no. 3, pp. 65–73, 2011.View at: Google Scholar
J. Z. Zhu, J. X. Cao, and Y. Zhu, “Traffic volume forecasting based on radial basis function neural network with the consideration of traffic flows at the adjacent intersections,” Transportation Research Part C: Emerging Technologies, vol. 47, pp. 139–154, 2014.View at: Publisher Site | Google Scholar
X. X. Wang, L. H. Xu, and K. X. Chen, “Data-driven short-term forecasting for urban road network traffic based on data processing and LSTM-RNN,” Arabian Journal for Science and Engineering, vol. 44, no. 4, pp. 3043–3060, 2019.View at: Google Scholar
C. Chen, J. Hu, and Q. Meng, “Short-time traffic flow prediction with ARIMA-GARCH model,” in 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 607–612, Baden-Baden, Germany, June 2011.View at: Google Scholar
H. H. Xie, X. H. Dai, and Y. Qi, “Improved K nearest neighbor algorithm for short-term traffic flow prediction,” Journal of Traffic and Transportation Engineering (English Edition), vol. 14, no. 3, pp. 87–94, 2014.View at: Google Scholar
S. Sun, “Infinite mixtures of multivariate Gaussian processes,” in 2013 International Conference on Machine Learning and Cybernetics, pp. 1011–1016, Tianjin, China, July 2013.View at: Google Scholar
L. Guo and H. Li, “Effect of foreign trade and FDI on Dalian's upgrading of industrial structure-an analysis based on the empirical data in 1990-2010,” Journal of Lanzhou Commercial College, vol. 5, p. 20, 2012.View at: Google Scholar