Mathematical Problems in Engineering

Volume 2015, Article ID 154703, 9 pages

http://dx.doi.org/10.1155/2015/154703

## Short-Term Traffic Flow Local Prediction Based on Combined Kernel Function Relevance Vector Machine Model

^{1}College of Transportation, Jilin University, Changchun 130025, China^{2}State Key Laboratory of Automobile Simulation and Control, Jilin University, Changchun 130025, China^{3}Jilin Province Key Laboratory of Road Traffic, Jilin University, Changchun 130025, China

Received 30 May 2015; Accepted 3 August 2015

Academic Editor: Michael Small

Copyright © 2015 Qichun Bing et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Short-term traffic flow prediction is one of the most important issues in the field of adaptive traffic control system and dynamic traffic guidance system. In order to improve the accuracy of short-term traffic flow prediction, a short-term traffic flow local prediction method based on combined kernel function relevance vector machine (CKF-RVM) model is put forward. The C-C method is used to calculate delay time and embedding dimension. The number of neighboring points is determined by use of Hannan-Quinn criteria, and the CKF-RVM model is built based on genetic algorithm. Finally, case validation is carried out using inductive loop data measured from the north–south viaduct in Shanghai. The experimental results demonstrate that the CKF-RVM model is 31.1% and 52.7% higher than GKF-RVM model and GKF-SVM model in the aspect of MAPE. Moreover, it is also superior to the other two models in the aspect of EC.

#### 1. Introduction

Short-term traffic flow prediction is an important basis for intelligent transportation systems (ITS). Real-time and accurate prediction information can be directly applied to the advanced traffic management system (ATMS) and advanced traffic information service system (ATIS). Because of its importance, short-term traffic flow predication has generated great interest among the scientific community and a large number of relevant methods exist in the literature. These include the spectral analysis model [1, 2], time series model [3, 4], regression model [5, 6], the Kalman filtering model [7, 8], neural network model [9, 10], support vector machine model [11, 12], and wavelet network model [13]. Reader interested in details of models applied in traffic flow prediction field could refer to review papers such as [14–16]. With the development of chaos theory, recent studies such as [17–19] have found that the short-term traffic flow time series data had nonlinear chaotic phenomena. Therefore, short-term traffic flow chaotic predictions have gained special attention. The prediction of chaotic time series could be generally classified into two categories: global prediction and local prediction. Global prediction methods use all phase points to describe the evolution law and then to predict the future value. A number of researchers have utilized global prediction methods in prediction of chaotic time series. Karunasinghe and Liong [20] investigated the performance of artificial neural network as a global model in chaotic time series predictions compared to local prediction models. Dong et al. [21] adapted the Elman neural network to realize short-time traffic flow prediction based on chaos analysis. Baydaroglu and Kocak [22] used support vector regression model to predict evaporation amounts, and phase space reconstruction is used to prepare input data for SVR. Local prediction methods select neighboring points to fit the brief evolution trend of phase points and then to obtain the predicted value. Local prediction methods mainly include local average prediction method [23], weighted first-order local prediction method [24], the Lyapunov index prediction method [25], and support vector machine model [26]. Due to the less number of fitting phase points, the local prediction method has the advantage of low computational complexity and high fitting degree. Farmer and Sidorowich [27] had already proved that the performance of local prediction methods was better than global prediction method under the same embedding dimension. Therefore, local prediction is adopted to achieve short-term traffic flow prediction in this paper.

In order to get the accurate prediction results, we need to find the nonlinear prediction function. However, it is hard to get the accurate function due to the interference of inside and external excitations. But determining the linear function is not hard since detecting linear relations has been focus of much research in statistics and machine learning fields for decades and the resulting algorithms are well understood, well developed, and efficient. So if we could combine both, it will solve the problem. Instead of trying to fit a nonlinear model, we can map the problem from the input space to the feature space by doing a nonlinear transformation using suitably chosen basis functions and then use a linear model in the feature space. The basis function is called kernel function. The linear model in the feature space corresponds to a nonlinear model in the input space. This is the main idea of relevance vector machine (RVM) model. Due to RVM theoretical advantages, it has gained special attention in recent years, such as [28–30]. This paper is motivated to build the short-term traffic flow forecasting model based on RVM because of its ability to deal with the dynamic, nonlinear, and complex traffic flow time series. consequently, it is very suitable for short-term traffic flow prediction.

For these reasons, and with the goal of improving the accuracy of short-term traffic flow prediction, we put forward a short-term traffic flow local prediction method based on combined kernel function relevance vector machine model. The remainder of this paper is structured as follows: Section 2 presents the phase space reconstruction theory. Section 3 gives the process of building combined kernel function relevance vector machine model. Section 4 describes the experiment setup and case study. Section 5 draws some conclusions.

#### 2. Phase Space Reconstruction Theory

Phase space reconstruction theory proposed by Packard et al. [31] is a powerful tool in the study of complicated system. According to the theory of chaos dynamics, the time series contains total useful information and reflects the process of system evolution in a long term. Complex characteristics found in a time series may be the result of temporal evolution on a chaotic attractor, objects of fractal dimension created by means of stretching and folding of space. If we could capture chaotic behavior from the time series signal of traffic flow models, we could enhance our knowledge about the inherent properties of the traffic flow system. Phase space reconstruction theory is used to create topologically equivalent attractors to the original dynamical system using the information from a scalar time series only [32].

Phase space can be reconstructed using delay coordinate method. The basic idea of delay coordinate method is that the evolution of any single variable of a system is determined by the other variables with which it interacts. Information about the relevant variables is thus implicitly contained in the history of any single variable. For a time series , the phase space can be reconstructed according towhere is delay time and is embedding dimension.

Embedding dimension and delay time are the key parameters for phase space reconstruction. At present, there are two kinds of views about the selection of these two parameters. One view is that the two parameters are independent and could be determined separately. The methods of calculating delay time include Average Displacement method [33], Mutual Information method [34], and Autocorrelation Function method [35]. The methods of calculating embedding dimension include False Nearest Neighbors method [36], Cao method [37], and G-P method [38]. Another view is that the two parameters are interrelated and should be determined simultaneously, such as C-C method [39]. C-C method can obtain embedding dimension and delay time simultaneously. Compared with other methods, C-C method has the advantage of small amount of calculation and strong anti-interference. Therefore, C-C method is employed to determine delay time and embedding window width , and then the embedding dimension is calculated according to . The principle of C-C method is as follows.

denotes time series data; a new set of vector series denoted by could be obtained through phase space reconstruction. The correlation integral for the embedded time series is the following function:where is the neighborhood radius, is phase point in phase space, is delay time, is embedding dimension, is the number of embedded points in phase space, is the length of time series, denotes sup-norm, and is Heaviside unit function; if , ; if , . The correlation integral is a cumulative distribution function and denotes the probability that the distance between any two points is less than . We define the test statistics

The time series , , can be divided into disjoint time series. The results are as follows:

The test statistics iswhere denotes the correlation integral of the subsequence.

As , we can write

For fixed embedding dimension and delay time , as , will be identically equal to 0 for all if the time series data are independent and identical distribution. However, the actual time series data are finite and correlated, so is not equal to 0 generally. Thus, the local optimal times may be either the zero crossings of or the times at which shows the least variation with , because this indicates that these points are uniform distribution. Hence, we select the maximum and minimum radius to define quantity.

Consider

measures the maximum deviation of with . Therefore, the optimal delay time is the first zero crossings of or the first local minimum point of .

According to the BDS statistic result, we select , , , to calculate the following variables:where is the mean of for all subsequence. The optimal delay time is the first local minimum point of . The delay time window is global minimum point of .

#### 3. Modeling of CKF-RVM Model

##### 3.1. The Principle of RVM Model

The relevance vector machine (RVM) model proposed by Tipping [40] is a sparse probabilistic model based on Bayesian principle. Compared with other intelligent algorithms, RVM owns better performance. For example, the kernel function of RVM model need not be restricted by Mercer’s condition. Moreover, it inducts a priori distribution of the weights and then greatly reduces the complexity of calculation. The principle of RVM model is as follows.

Consider a data set , where , . The relationship between and is as follows:where is weight vector, is the independent additive noise term subject to , is the nonlinear basis function, and is the kernel function. Therefore, denotes the normal distribution of with mean and variance . Assume are independent of each other; the likelihood of the complete data set can be written aswhere and is the kernel function matrix in which .

Because there are many parameters in the model, the maximum likelihood estimates of and will lead to severe overfitting. Therefore, the sparse Bayesian theory is adopted and a prior zero-mean Gaussian distribution over is as follows:where is a vector of hyperparameters. Each weight is individually associated with a parameter, which controls the influence of the prior distribution over associated weight.

Because we have defined the prior probability distribution and the likelihood distribution, the posterior probability distribution is as follows according to the Bayesian theory:

Posterior covariance matrix and mean value are as follows, respectively:where .

According to the maximum expected hyperparameter estimation, the value of and can be obtained through iterative algorithm. Consider where is the posterior average weight and , where is the diagonal element of the covariance matrix computed by the current and .

The noise variance can be obtained through iterative algorithm

Given a new sample , is the corresponding prediction value. The probability distribution of prediction value follows a normal distribution with mean and variance . Consider where is the predictive mean on and is the predictive variance.

##### 3.2. The Construction of Combined Kernel Function

The traditional relevance vector machine model mostly adopts single kernel function to complete the process of feature space mapping, which has achieved good performance in many practical applications. But the single kernel function has great limitations when the sample data contains heterogeneous information. Therefore, this paper integrates the Gaussian kernel function and polynomial kernel function to construct a new combination kernel function. The form of combination kernel function is as follows:where is weight coefficient, , is the kernel width of Gaussian kernel function, and is the order of polynomial kernel function.

Different kernel functions have different advantages; if the weight coefficient of combination kernel function is inappropriate, the performance of combination kernel function may be lower than single kernel function. Therefore, proper weight coefficient is of great importance for the combined kernel function.

##### 3.3. Parameter Optimization Based on Genetic Algorithm

There are three parameters that need to be optimized in the combined kernel function. The commonly used parameter optimization methods mainly include cross validation method [41] and grid search method [42]. But these methods have a large amount of calculation and are often trapped in local optimum. Genetic algorithm (GA) [43] is a heuristic scientific method based on Darwin’s biological evolutionism, which has been widely applied to solve high dimensional optimization problem for parameter optimization in engineering and science areas. Genetic algorithm differs from traditional search and optimization methods in four significant points:Genetic algorithms search parallel from a population of points. Therefore, it has the ability to avoid being trapped in local optimal solution like traditional methods, which search from a single point.Genetic algorithms use probabilistic selection rules, not deterministic ones.Genetic algorithms work on the chromosome, which is encoded version of potential solutions’ parameters, rather than the parameters themselves.Genetic algorithms use fitness score, which is obtained from objective functions, without other derivative or auxiliary information.

Therefore, genetic algorithm is used to obtain the optimal parameters of combination kernel function. The specific steps are as follows.

*Step 1 (initialize the parameters). *The population size and maximal generation count: the population size is 20, and the maximal generation count is 100.

*Step 2 (representation). *The parameters to be optimized , , and are coded in binary to generate the chromosomes.

*Step 3 (fitness function definition). *The cross validation method is used to prevent overfitting and underfitting. The training data set is randomly divided into subsets in -fold cross validation. The RVM model is built using subset as the training set. The performance of the parameters is checked on the subset. In this paper, fivefold cross validation method is used. The fitness function is defined as the mean absolute percentage error of the fivefold validation method on the training data set.

*Step 4 (creating new population). *Selection, crossover, and mutation are carried out to generate population. The chromosomes with better fitness function values are selected using the roulette wheel method. The crossover probability of creating new chromosomes is set to 0.8. Mutation probability is set to 0.05.

*Step 5 (stopping criteria determine). *If the generation count reaches its maximum value, the iteration is stopped. Otherwise, the process is repeated from Step 3 to Step 4.

#### 4. Experiment Setup and Case Study

##### 4.1. Data Source

The experimental traffic flow data come from loop detectors located on the north–south viaduct expressway in Shanghai, China. This segment includes 24 mainline detecting sections and 30 ramp detecting sections, equipped with 88 mainline loop detectors and 60 ramp loop detectors, respectively. The experimental data are collected on five consecutive Mondays from September 1, 2008, to September 29, 2008. The original time interval of collected data is 5 min. Figure 1 gives the traffic flow time series data from five consecutive Mondays.