Abstract

With the implementation of the freeway free policy during the holidays, traffic congestion in the freeway becomes a common phenomenon. In order to alleviate traffic pressure, traffic flow prediction during the holidays has become a problem of great concern. This paper proposes a hybrid prediction methodology combining discrete Fourier transform (DFT) with support vector regression (SVR). The common trend in the traffic flow data is extracted using DFT by setting an appropriate threshold, which is predicted by extreme extrapolation of the historical trend. The SVR method is applied to predict the residual series. The experimental results with measured data collected from the toll stations in Jiangsu province of China show that the proposed algorithm has higher accuracy compared with the traditional method, and it is an efficient method for traffic flow prediction during the holidays.

1. Introduction

With the rapid development of the social economy, the present massive road infrastructure still fails to meet people’s traveling demands [15] and traffic congestion has become a common phenomenon in the freeway. Especially, freeways often turn into parking lots during the holidays. To solve the problem of traffic congestion during the holidays, intelligent transportation systems (ITS) have been widely implemented. Traffic flow prediction is fundamental for ITS, and it requires the following preconditions: traffic guidance, planning, and control. Traffic flow is affected by people’s willingness to go to a trip, the weather, and many other factors. The implementation of some policies also results in more dramatic changes in traffic flow, such as free charge for cars under 7 seats during important holidays in China, which makes traffic flow prediction more difficult. Moreover, people’s travel is more for leisure and tourism during the holidays, which makes some changes in the composition of traffic flow. The passenger car flow increases obviously, while the truck flow decreases to some extent. Especially, traffic flow around scenic spots is increasing dramatically, and occasional traffic congestions often occur, which results in traffic flow data becoming more stochastic. Traffic flow distribution during the holidays is obviously different from those during workdays. However, most of the existing prediction models are mainly aimed at workdays, and there are relatively few prediction models aimed at the holidays. To improve transportation operation efficiency, accurate traffic flow prediction during the holidays has become a problem of great concern.

Traffic flow prediction is a complex issue. A large number of algorithms have been proposed in the past few years. Existing prediction methods can be basically classified into the following three categories: parametric approach, nonparametric approach, and hybrid approach. The parametric approach includes the historical average (HA) method, autoregressive moving average method (ARIMA) [6, 7], seasonal autoregressive integrated moving average method (SARIMA) [810], and Kalman filter (KF) [11, 12]. The nonparametric approach includes artificial neural networks (ANNS) [1317], -nearest neighbor (KNN) [1822], support vector regression (SVR) [23, 24], and the Bayesian model [25, 26]. The hybrid approach mainly combines the parametric approach with the nonparametric approach [2735]. Vlahogianni et al. [36] summarized existing traffic flow prediction algorithms, and readers interested in details of the models that apply to the field of traffic prediction could refer to their review paper as a reference.

Traffic flow data presents a certain regularity, which makes it possible to predict the traffic flow. In order to improve the accuracy of traffic flow prediction, it is a very effective method to decompose the traffic flow data into different components. There are various methods to retrieve the different components in traffic flow data, including the simple average trend [37], principle component analysis [38], empirical mode decomposition (EMD) [39, 40], wavelet methods [41], spectral analysis method [42, 43], and the low-pass filter [44]. Since traffic flow data is characterized by stochastic and highly nonlinear patterns during the holidays, the existing traffic flow decomposition methods cannot achieve good performance, and it is difficult to apply traditional prediction methods to traffic prediction. Due to the impact of the economic environment, the traffic flow data is almost increasing year by year. Although the annual growth rate is not the same and there is a certain stochastic fluctuation, the trend component of traffic flow data changes under the same law but slightly increases year by year. In order to improve the accuracy of the traffic flow prediction, it is very necessary to extract the trend component in traffic flow data. Li et al. [45, 46] have discussed the trend modeling for traffic time series and explained the benefit of detrending.

Although Dai et al. [47] have specifically studied the traffic flow during the holidays, they focused on the investigation of holiday traffic flow from the spatial and temporal aspects. The ANN model was used to predict the traffic flow, but this method did not aim at the characteristics of traffic flow during the holiday, it still belonged to the traditional prediction model, and it could not meet the requirement of practical engineering. This paper proposes a hybrid prediction methodology combining discrete Fourier transform (DFT) with support vector regression (SVR). Traffic flow time series are decomposed into a common trend and a high-frequency residual time series. The common trend is extracted with the DFT algorithm through setting the appropriate threshold. The common trend reflects the endogenous characteristics, which are approximately stationary. By extreme extrapolation of the historical trend, the common trend of traffic flow time series is predicted. The residual series, which includes fluctuations and bursts, reflects environment-dependent characteristics. As shown in Chen et al. [48] and Li et al. [49], the existence of bursts brings difficulties for prediction models. This paper proposes a method to identify and preprocess bursts, and the prediction of the residual series is estimated by SVR. Finally, the traffic flow in the next holidays is predicted by combining the prediction result of the common trend with the residual series. The tested results with traffic data collected from the toll stations in Jiangsu province of China, show that proposed algorithm has superior accuracy.

The rest of this paper is organized as follows. Section 2 gives details on a hybrid traffic prediction method based on hybrid DFT and SVR. In Section 3, the dataset used for our numerical experiments is introduced, and the results with different prediction algorithms are presented. Finally, conclusions and the direction for future research are given in Section 4.

2. Methodology

Traffic flow data changes more violently and stochastically, which makes predictions more difficult. In this paper, a hybrid prediction method during the holidays is proposed based on DFT and SVR. The traffic flow data is composed of the common trend and high-frequency residual time series. The traffic flow time series are transformed from the time domain into the frequency domain with DFT, and the trend is extracted through setting the appropriate threshold. Except for the natural growth of the traffic flow during the holiday, the change of the trend component is basically the same. Therefore, the trend component is predicted by extreme extrapolation of the historical trend. For the residual, the fluctuation and burst are defined at first. The mean and variance of the fluctuation are stable, but the burst has great randomness. The burst in the residual component is preprocessed, and then the residual is predicted with SVR. The final prediction result can be obtained by combining the trend with residual prediction result. Our approach is summarized in Figure 1.

2.1. Decomposition of Traffic Flow Time Series with DFT

The purpose of traffic flow time series decomposition is to extract the trend and residual, which can improve the prediction accuracy. Suppose the sampled traffic flow data on day holidays can be written as a time series: where is the sample time interval, is the data size in a day, and if is , then . is the sample data length on day holidays.

The spectrum of the time series of the traffic data by DFT can be represented as where is the th sample of , is the frequency spectrum of , and is the imaginary unit.

Set a threshold to remove the high-frequency variations, and the common trend of traffic flow data can be extracted. where is the th sample value of the common trend .

Then, traffic flow data is decomposed into the sum of the common trend and the residual :

2.2. Prediction of the Common Trend

The trend component represents the regular variations of traffic flow data. Commuters are more and more conscientious of the ecological impacts of their trip, and a change of trend in forthcoming years has attracted more attention. For holidays, the trend component of traffic flow data changes under the same law but slightly increases year by year. The increment is also basically the same, although there is a little difference in western countries. Therefore, the trend component is predicted by extreme extrapolation of the historical trend in this paper.

Suppose that is a common trend in consecutive years, the maximum value of the trend component in year can then be represented as

The increment of the maximum value can be computed with (7) as

The regression analysis is carried out on , and the regression equation is obtained as where and are regression coefficients, is the time variable, and is the ratio coefficient of the trend component maximum value compared to that of last year.

The predicted trend component in year is calculated directly with observed traffic flows in consecutive years. where is the ratio coefficient calculated by (8) and is the trend component in year.

2.3. Prediction of the Residual Component
2.3.1. Prediction Preprocessing

The residual series includes fluctuations and bursts. Suppose that the standard deviation is obtained from the square root of the variance for the entire residual time series. As demonstrated in Li et al. [45], a point is defined as a “burst” if the entire residual component is twice larger than the standard deviation . Most existing prediction models assume that model inputs have a certain smooth mapping relationship with model outputs. Therefore, the predicted value is near the positive and negative variance away from the trend component. That is, existing prediction models have a difficulty in forecasting the bursts. The intraday trend is an -shaped curve that represents the regular variations of daily traffic flow. The morning and evening rush hours result in the two peaks, while one shallow valley and one deep valley, respectively, appear at noon and midnight. This paper extracts the trend component using DFT, which results in the M-shaped curve being smoothed into a number of sine functions, and traffic flow value being slightly enlarged at noon. In order to improve the accuracy of prediction, a strategy is introduced to preprocess the residual component.

In the expression in (10), the positive bursts are preprocessed only twice the standard variance , and the negative bursts are not changed, which is different from the existing processing methods. The experiments show that the processing algorithm can help suppress the influence of burst disturbance and improve prediction performance.

2.3.2. Support Vector Regression (SVR)

SVR is widely used to deal with nonlinear problems and large-scale prediction algorithms. In this paper, the SVR method is applied to predict the preprocessed residual series. For a given training set , where is a dimensional input variable, is the corresponding output value, and represents the size of the training data.

Through a nonlinear mapping function , is mapped to a high-dimensional feature space, in which the optimal decision function is constructed as follows: where is the weight vector and is the bias value. SVR minimizes the following objective function: where is the expression for the complexity of the decision function, is a penalty coefficient, and is the upper training error ( is lower) subject to the insensitive factor.

This optimization problem can be transformed into a dual problem, and its solution is given by (13), where are the Lagrange multipliers that can be acquired by solving the dual problem and is the Mercer kernel function that equals the inner product of with .

The most frequently used Mercer kernel functions are instance sigmoid, polynomial, and radial basis function (RBF). RBF is a common choice for the kernel function because of the need to set very few parameters and excellent overall performance. So, RBF is selected as the kernel function in this paper.

2.4. Proposed Prediction Model

Suppose the sampled traffic flow data on day holidays in consecutive years can be written as the matrix : where is the sample time interval, is the data size in a day, and is the sample data length on day holidays.

According to (3), (4), and (5), the traffic flow data in consecutive years are decomposed into the sum of the trend components and the residuals. where are the trend components in consecutive years and are the residuals.

The predicted traffic flow during the holidays in year is calculated directly by adding the trend with residual prediction results. where is the prediction result of the trend component with (9) and is the prediction result of the residual component with SVR.

3. Case Study

3.1. Data Description

Jiangsu is a highly developed province in China. In this paper, the data used to evaluate the performance of the proposed model are collected from the toll stations in Jiangsu province on Tomb-sweeping Day and National Day from 2011 to 2015. The length of the traffic flow data interval has an important influence on the prediction results. An excessively short time interval tends to lead to the instability of the prediction model, but an interval that is too long will lose the inherent characteristics of traffic flow data. As shown in Ma et al. [50], the prediction performance improves as the data aggregation level increases due to less data fluctuation as the time interval becomes longer. The prediction accuracy reaches its maximum when the aggregation level reaches one hour. Therefore, we selected one hour as the sampling time, which can also meet engineering management requirements. There are total of 1200 data points (50 days from 2011 to 2015) in the dataset. The data are divided into two data sets. The first 960 data points (40 days from 2011 to 2014) are used as the training sample, while the remaining 240 data points (10 days in 2015) are employed as the testing sample for measuring the prediction performance of the proposed model. The traffic flow on National Day from 2011 to 2014 is shown in Figure 2. From Figure 2, we can see that the trend of change of the traffic flow is basically consistent every year and the traffic flow value increases around 10% year by year. The traffic flow data are similar during the whole holiday in each year except for the first and end day of the holiday, which shows a sharply increased and decreased flow. The hourly distribution of traffic flow in each day has the same trend with a high value in the daytime and a low value at night. It shows remarkable similarity and strong periodic features, but it also reveals the stochastic properties. It is necessary to decompose the traffic flow data in order to reveal its potential characteristics.

3.2. Decomposition of Traffic Flow Time Series with DFT

The traffic flow data increases year by year in Figure 2, but has similar patterns during the whole holiday from 2011 to 2014. It has been pointed out in literature that the prediction accuracy is greatly affected because of the existence of several trends, and the trends should be removed to improve the prediction performance. DFT is a powerful analytical method to extract the periodic component of the signal. This paper decomposes the traffic flow data into a trend and a residual component with DFT. The spectrum of the traffic flow on National Day in 2011 is shown in Figure 3, and three obvious components can be seen. The sampling period of the testing dataset is one hour, then , the sample data length on day holidays is . If the data length is the power of 2, we use DTF directly. Otherwise we will fill the data to make it a power of 2 by adding 0 in the end of the original traffic data, which will not change the frequency resolution and also conform to the calculation condition for DFT. From Figure 3, we can see that there are several peaks in the traffic flow spectrum and the intervals between the peaks are large enough to meet the frequency resolution requirements. The main purpose of DFT is to extract the trend components of traffic flow sequences. This indicates that we need to retain the low-frequency components and suppress high-frequency components. Through experiments, we can extract the trend components of traffic flow data effectively by keeping only the highest peak value component. However, the highest peak is not really an impulse function, so we must set a large threshold to suppress other frequency components. When the threshold alpha is set 0.7 times of the maximum frequency spectrum value, the trend components of the traffic flow sequences can be extracted efficiently according to (3) and (4). The trend components on National Day from 2011 to 2015 are shown in Figure 4. Compared to Figure 2, the law of trend components is more obvious year by year, so it becomes easier to predict the trend component in traffic flow. The traffic flow sequence minus the trend component and the residual series are obtained.

3.3. Preprocessing of the Residual Series

The residual series includes fluctuations and bursts, and it represents the stochastic component in the traffic flow, which is affected by the weather, traffic incidence, roadway conditions, and other factors. Especially, because of the presence of large bursts, the accuracy of traffic flow prediction is greatly affected. In order to improve the accuracy of the prediction, the residual series are preprocessed according to (10). The original traffic flow series, trend component, residual series, and preprocessed residual series on National Day in 2014 are shown in Figure 5.

3.4. Prediction Results and Performance Comparison
3.4.1. Measuring Performance of the Proposed Prediction Method

Several standard evaluation measurements are adopted in the experiments to evaluate the proposed method, including the mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE). Although these indicators are used to evaluate the prediction performance, each indicator has its emphasis on evaluation. In order to evaluate the performance of the proposed model accurately, we have adopted three indicators at the same time to represent the difference between the actual value and the predicted value. where is the prediction value, is the actual observed value, and is the length of the test series.

3.4.2. Prediction Results

In this section, the traffic flows on Tomb-sweeping Day and National Day in 2015 are predicted with the proposed DFT and SVR model using the traffic flow data from 2011 to 2014. Figures 6(a) and 6(b), respectively, exhibit the predicted and real traffic flow on Tomb-sweeping Day and National Day by 4 different models: DFT-SVR, ARIMA, SVR, and EMD-SVR. From this figure, it is apparent that the proposed DFT and SVR model has the best prediction performance, and the prediction value of the proposed method is identical to the real value during most of the hours, especially during the rush hours. The prediction results by ARIMA and EMD-SVR are worse than that of the proposed model. Because holidays are longer on National Day compared to Tomb-sweeping Day, the traffic flow is relatively smooth on National Day. However, Tomb-sweeping Day is a traditional holiday for tomb-sweeping, and traffic flow data during this holiday is very high with a more stochastic characteristic. The prediction accuracy is higher on National Day than on Tomb-sweeping Day in Figure 6.

The prediction performances are shown on Tomb-sweeping Day and National Day in Tables 1 and 2. It can be seen that the proposed DFT-SVR method has better prediction performance than ARIMA, SVR, and EMD-SVR on both Tomb-sweeping Day and National Day. All algorithms have better prediction accuracy on long holidays than on short holidays, and the average prediction accuracy is higher at 15.04% on National Day than on Tomb-sweeping Day. The prediction performance of the proposed DFT-SVR method improved at least by 3.69% on National Day, but it improved by at least 17.23% on Tomb-sweeping Day, indicating that the proposed method has more advantages for dealing with stochastic traffic flow data.

4. Conclusions

This paper presents a traffic flow prediction method for the holidays based on DFT and SVR. The trend components are extracted from holiday traffic flow data with DFT, and the trend can be predicted by extreme extrapolation with historical data. The residual component is further analyzed by a preprocessing method and SVR. The final prediction result is obtained by combining the trend with the residual prediction result. Real traffic data, collected from toll stations in Jiangsu province from 2011 to 2015, are used to evaluate prediction performances of the proposed DFT-SVR model. The test results show that the proposed DFT-SVR model has the best performance compared with ARIMA, SVR, and EMD-SVR, and prediction accuracy is higher on long holidays than on short holidays. Since the traffic flow data during the holidays is affected by weather and other factors, the impact of weather factors on traffic flow will also be further studied, so as to improve the prediction accuracy.

Data Availability

The data used in this paper were collected from the toll stations in Jiangsu province on Tomb-sweeping Day and National Day from 2011 to 2015. The data are supplied by the JiangSu Expressway Network Operation & Management Center in China. Because the data involve license and commercial confidentiality, these cannot be used freely. If any researcher requests for these data, he can send an email to [email protected].

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This research was partly supported by the National Key R&D Program of China (2018YFC0808706) and the National Natural Science Foundation of China (Grant no. 5157081053). The authors are also grateful to the JiangSu Expressway Network Operation & Management Center for providing the data.