Mathematical Problems in Engineering

Volume 2016 (2016), Article ID 5128528, 13 pages

http://dx.doi.org/10.1155/2016/5128528

## Fuzzy Prediction for Traffic Flow Based on Delta Test

Beijing Key Lab of Traffic Engineering, Beijing University of Technology, 100 Pingleyuan, Chaoyang District, Beijing 100124, China

Received 15 March 2016; Revised 9 July 2016; Accepted 4 August 2016

Academic Editor: Alberto Borboni

Copyright © 2016 Yang Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper presents a novel approach to one-step-forward prediction of traffic flow based on fuzzy reasoning. The successful construction of a competent fuzzy inference system of Sugeno type largely relies on proper choice of input dimension and accurate estimation of structure parameters and rules. The first issue is addressed with a proposed method, based on *-*test, which can simultaneously determine input dimension and reduce noise level. In response to the second issue, two clustering techniques, based on nearest-neighbor clustering and Gaussian mixture models, are successively employed to determine the antecedent parameters and rules, and the estimation for the consequent parameters is achieved by the least square estimation technique. A number of experiments have been performed on the one-week data of traffic flow to evaluate the proposed approach in terms of denosing, prediction performances, overfitting, and so forth. The experimental results have demonstrated that the proposed prediction approach is effective in removing noise and constructing a competent and compact fuzzy inference system without significant overfitting.

#### 1. Introduction

Since intelligent transportation systems (ITS) emerged, research efforts have been continuously devoted to traffic flow prediction, as live and accurate prediction on traffic flow is a premise for efficient traffic management and control. While a wide spectrum of prediction techniques, such as Kalman filter [1] and its extension [2], support vector machine [3], Bayesian networks [4], and hybrid approach [5–8], has been reported in literature, the methodology behind these techniques is based either on influencing factors or on historical data of traffic flow. The influencing factors (e.g., weather) are used as inputs for the underlying traffic system which outputs traffic flows to reflect traffic states. The correlation between the inputs and outputs, established using either analytical model or data-derived approach, is used as the primary reasoning for future traffic flow. On the other hand, the methodology behind the historical data-based approaches is different from those based on influencing factors and the fundamental idea is that the future depends on its past. For a sequence of traffic flow observations with equal sampling interval, called traffic flow time series, the dependent relation between the traffic flow to be predicted at and its past can be formulated as follows:where the term , called residual in this paper, is the part that cannot be accounted by the model , due to either a lack of functional determination or real noise. Therefore, the term* residual* in this paper is defined to include modeling error and real noise.

The prediction performance by the methods based on influencing factors is considerably dependent on the quality of data collected for different factors which may not be guaranteed always. Therefore, in our ongoing work, the methodology based on historical data is adopted. Also, as only one type of data (i.e., traffic flow) is required, there is no need to manipulate units for different types of data. This paper reports our work on the prediction approach for one step ahead (i.e., short-term).

To model the dependent behavior, a number of techniques have been proposed, ranging from Kalman filter [1, 2], artificial neural network [9], and nonparametric regression [10] to hybrid approach [11]. Our research has been focused on the modeling approach based on fuzzy inference system (FIS), in that it is tolerant to noise, resistant to uncertainty, and easy to incorporate expert and field knowledge [12]. However, research activity in this area is relatively silent, reflected by a few notable contributions reported in literature over the last decade. Zhang and Ye proposed a prediction methodology by using fuzzy logic system to fuse the outputs of two methods out of autoregressive integrated moving average, backpropagation neural networks, exponential smoothing method, and Kalman filter, resulting in four different combinations [6]. Similar idea has been adopted in [7], but the two methods mixed by a fuzzy logical model are history mean and artificial neural network models. Paper [8] describes a hybrid methodology that two fuzzy rule-based systems are constructed, one providing the next flow estimation based on the current flow only and the other predicting the one-step-ahead flow based on the current flow at the current location and the upstream location. A genetic algorithm is used to tune the parameters for the fuzzy rule-based systems by minimizing the mean absolute relative error between the estimated and the observed values. Paper [13] presents a prediction approach for short-term traffic flow prediction primarily based on a Sugeno fuzzy system (also known as TSK fuzzy system). The initial structure is formed by partitioning the input vector space by the mean shift clustering algorithm and subsequently optimized to eliminate redundant structure by the mean firing technique, and finally the other parameters are determined by particle swarm optimization with the aim of minimizing root mean squared error (RMSE).

A multiple-inputs (or single input vector) and single-output (MISO) FIS captures the conditional dependence between inputs and output, , using the following rules with fuzzified inputs:Whether the consequent part in fuzzy rules (as described above) is based on fuzzy sets or linear functions of input vector yields two common types of FISs, namely, Mamdani and Sugeno FISs. Although the prediction method proposed in this paper adopts the Sugeno approach, it is interesting to investigate the predictability of Mamdani FIS as well.

To create a competent Sugeno FIS, careful decisions should be made on the following steps: (1) the input dimension: although all traffic flow recordings in the past should in principle be taken into account when predicting the future flow, it is not practical to consider the whole set of historical data as the computational complexity grows exponentially with the number of input dimension and curse of dimensionality [14]; (2) parameters and rules: a large number of membership functions can reduce model error, but the model suffers from the computational complexity, again. To compromise the complexity and accuracy, a proper choice should be made on the number of membership functions and the parameters associated should be determined in an optimal or near-optimal fashion. As a Sugeno FIS is adopted as the basis for the prediction, the set of parameters for the consequent part should be optimized to improve modeling performance.

This paper proposes a novel approach to one-step-ahead prediction (e.g., but multistep-ahead prediction can be realized simply by recursive prediction) for traffic flow based on a Sugeno FIS, with a particular emphasis on the abovementioned issues. Firstly, a nonparametric residual variance estimation method, called -test [15], is used to measure the noise level reduced using a wavelet-based denoising technique and the input dimension is determined when the noise reaches a reasonable level. To address the second issue, a clustering is performed using the nearest-neighbor clustering (NNC) method [16] to obtain the number of membership functions and a Gaussian mixture model (GMM) [17] is subsequently applied to determine the parameters associated with membership functions. Finally, the consequent parameters are obtained using the least square estimation (LSE) technique [18].

#### 2. Prediction Approach

##### 2.1. Approach Outline

The prediction approach presented in this paper uses a first-order Sugeno FIS as a basis of the predictor. Such choice has been made due to the fact that Sugeno FIS has been proved to be a universal approximator [19–21]. Additionally, FIS has been reported to be tolerant to noise and resistant to uncertainty [12]. To achieve a concrete implementation for the FIS, we follow the commonly adopted model structure: T-norm for conjunction operations, Gaussian membership functions, linear function for consequent part, and product inference of rules.

Figure 1 schematically shows the approach framework used to build the FIS based on the historical observations of traffic flow. Note that it is assumed that an appropriate preprocess has been taken to exclude any outliers and make up all missing recordings for the traffic flow time series. The algorithm firstly generates a set of input vectors by incrementing the dimension which has been bounded to the maximum dimension specified by users. For the set of input vectors generated, the -test is used to estimate the residual variance for each input vector without explicitly building a model. Based on the estimated residual variances, the algorithm subsequently evaluates whether any input vectors satisfy the requirement set by users. If the evaluation indicates none of them can meet the requirement, which implies the noise level still remains high, the time series is processed by a wavelet-based technique to reduce the noise level. Such process is iteratively performed until the residual error can be reduced to meet the requirement. During the second step, two clustering algorithms (NNC and GMM) are successively employed to determine the number of membership functions and the associated parameters, respectively. Also, the clustering results obtained from the GMM algorithm help determine the rules for the FIS to be constructed. In our approach, the LSE technique is used to determine a set of parameters for consequent part that minimizes the mean square error (MSE) between the model output and training samples. Those steps and how they are specifically implemented are discussed in detail in the following subsections.