Mathematical Problems in Engineering

Volume 2016, Article ID 9236156, 13 pages

http://dx.doi.org/10.1155/2016/9236156

## Short-Term Speed Prediction Using Remote Microwave Sensor Data: Machine Learning versus Statistical Model

^{1}Department of Automation, Tsinghua National Laboratory for Information Science and Technology (TNlist), Tsinghua University, Beijing 100084, China^{2}Key Laboratory of Road and Traffic Engineering of Ministry of Education, Tongji University, Shanghai 201804, China^{3}School of Transportation Science and Engineering, Harbin Institute of Technology, Harbin 150001, China^{4}Department of Civil & Environmental Engineering, University of Washington, P.O. Box 352700, Seattle, WA 98195, USA

Received 11 December 2015; Accepted 19 January 2016

Academic Editor: Valentin Lychagin

Copyright © 2016 Han Jiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Recently, a number of short-term speed prediction approaches have been developed, in which most algorithms are based on machine learning and statistical theory. This paper examined the multistep ahead prediction performance of eight different models using the 2-minute travel speed data collected from three Remote Traffic Microwave Sensors located on a southbound segment of 4th ring road in Beijing City. Specifically, we consider five machine learning methods: Back Propagation Neural Network (BPNN), nonlinear autoregressive model with exogenous inputs neural network (NARXNN), support vector machine with radial basis function as kernel function (SVM-RBF), Support Vector Machine with Linear Function (SVM-LIN), and Multilinear Regression (MLR) as candidate. Three statistical models are also selected: Autoregressive Integrated Moving Average (ARIMA), Vector Autoregression (VAR), and Space-Time (ST) model. From the prediction results, we find the following meaningful results: () the prediction accuracy of speed deteriorates as the prediction time steps increase for all models; () the BPNN, NARXNN, and SVM-RBF can clearly outperform two traditional statistical models: ARIMA and VAR; () the prediction performance of ANN is superior to that of SVM and MLR; () as time step increases, the ST model can consistently provide the lowest MAE comparing with ARIMA and VAR.

#### 1. Introduction

Collecting high quality traffic information is the key factor to achieve the performance of Intelligent Transportation System (ITS). Accurate prediction of future patterns in traffic flow becomes more important in Advanced Traffic Management System (ATMS) and Advanced Traveler Information Systems (ATIS). Using the forecasted information, such as traffic volume data, travel time data, and traffic condition information, travelers can replan the traveling paths to save their time and cost. Furthermore, transportation agencies can also improve the efficiency of management in traffic system based on forecasted information. Travel speed is an important indicator to estimate traffic conditions in road networks. Compared with general collecting approaches, loop detectors, and GPS equipment, Remote Traffic Microwave Sensor (RTMS) is another important nonintrusive device to directly detect instantaneous travel speed of vehicles. RTMS is installed on the side of the road, and it can directly detect moving or stationary objects without interrupting traffic flow. It can detect traffic volume, occupancy, and speed for multiple lanes simultaneously although sometimes in severe environment. As its high measurement accuracy [1] compared to single loop detector, travel speed data collected from RTMS is used as data source to construct prediction model in this paper.

Short-term traffic flow forecasting models rely on the regularity existing in historical data to predict the traffic patterns in future time periods. A good prediction algorithm usually requires advanced technologies and computational ability to capture high-dimensional and nonlinear characteristics in traffic flow data. In the past few years, a large amount of algorithms has been proposed to address traffic prediction problems. Vlahogianni et al. [2] summarized existing short-term traffic predictions algorithms up to 2003. And recently, Vlahogianni et al. [3] updated the literature from 2004 to 2013. Van Lint and Van Hinsbergen [4] reviewed existing applications of neural network and artificial intelligence in short-term traffic forecasting and classified prediction models into two major categories: parametric approach and nonparametric approach. Existing traffic prediction algorithms range from statistical prediction methods [5–10], neural networks [11–15], support vector regression [16–20], Kalman filter theory [21–26], and hybrid approaches [27–32].

In order to improve forecasting performance, the neural network model was used to aggregate speed information and acceleration information from the current forecasting segment and adjacent segments. Van Lint et al. [13] proposed a state-space neural network model that utilizes upstream and downstream traffic as model input to predict travel time with respect to missing or corrupt input data. Ma et al. [14] developed a Long Short-Term Memory Neural Network (LSTM) to predict travel speed prediction based on RTMS detection data in Beijing City; the proposed model can capture the long-term temporal dependency for time series and also automatically determine the optimal time window. For the support vector machine, Wu et al. [17] applied support vector regression (SVR) for travel time prediction, and they compared the proposed model with some traditional travel time prediction methods in highway network. Zhang and Liu [18] combined state-space approach and least squares support vector machines (LS-SVMs) to forecast travel time index. Asif et al. [20] firstly analyzed spatiotemporal trends for individual links at the network level and then constructed support vector regression (SVR) to predict travel speed in large interconnected road network. For the Kalman filter theory, Chen and Grant-Muller [22] proposed a Kalman filter type network to predict traffic flow, and they also discussed the effect of starting network parameters to the prediction performance. Chien and Kuchipudi [23] used Kalman filtering algorithm to predict travel time for its significance in continuously updating the state variable as new observations. Their empirical results indicated that the prediction performance based on historic path-based data is better than that based on link-based data during peak hours. Wang et al. [26] proposed a new extended Kalman filter (EKF) based online-learning approach to predict highway travel time. In order to effectively improve prediction performance, many scholars proposed various hybrid models to combine advantages of different kinds of methods. Dimitriou et al. [27] used genetic algorithm to structure an adaptive hybrid fuzzy rule-based system for forecasting traffic flow in urban arterial networks. Zheng et al. [28] introduced a neural network model combined with the theory of conditional probability and Bayes’ rule, and the combined model that is demonstrated outperforms the singular predictors from the experimental test of Singapore’s Ayer Rajah Expressway. Dong et al. [32] proposed a hybrid support vector machine that combines both statistical and heuristic models to consider the spatial-temporal patterns in traffic flow.

For the statistical model, Cetin and Comert [5] proposed a new statistical change-point detection algorithm to predict short-term traffic flow, in which the expectation maximization and the CUSUM (cumulative sum) algorithms are implemented to detect shifts. Chandra and Al-Deek [7] considered the effect of upstream and downstream locations on the traffic at a specific location into a traditional Autoregressive Integrated Moving Average (ARIMA) model. Williams and Hoel [8] modeled a seasonal ARIMA process to complete traffic flow forecasting. For the neural networks, Ye et al. [11] used a neural network model to forecast traffic flow time series based on GPS data recorded at irregular time intervals.

According to the reviewed literature, most of traffic prediction models are mainly based on statistical methods and machine learning techniques. These two types of models have their different characteristics. The statistical models can provide good theoretical interpretability with clear calculation construction. While machine learning models use a “black box” approach to predict traffic conditions and often lack a good interpretation of the model, however, compared with statistical models, machine learning methods are more flexible with no or little prior assumptions for input variables. In addition, these approaches are more capable of processing outliers, missing and noisy data [33]. In this study, we compare the prediction performances between statistical models and machine learning models. In statistical models, we select ARIMA, Vector Autoregression (VAR), and Space-Time (ST). In machine learning models, we chose Artificial Neural Network (ANN), SVM, and Multilinear Regression (MLR) as candidate. The travel speed data come from RTMS detector on fourth ring freeway in Beijing City. The contribution of this paper includes the following: comprehensively compare speed prediction performances of different models in machine learning and statistical method; analyze the prediction accuracy under different forecasting steps ahead; and evaluate models’ performance under different scenarios.

The remainder of paper is organized as follows. Section 2 briefly introduces the models used in this study. The data source and analysis are provided in Section 3. Section 4 discusses the results and compares prediction accuracies of different models. Section 5 provides the conclusion of the paper.

#### 2. Methodology

##### 2.1. Statistical Models

In this section, we briefly introduce three statistical methods (i.e., ARIMA, VAR, and ST) considered in this study.

###### 2.1.1. ARIMA

The Autoregressive Integrated Moving Average (ARIMA) model contains the following parameters: is the number of autoregressive terms, is the number of nonseasonal differences, and is the number of lagged forecast errors. An ARIMA model can be regarded as a generalization of autoregressive moving average (ARMA) model. The mathematical formulation of an ARMA process is defined as follows:where is stationary, is a normal white noise series with mean zero and variance , and are parameters for the autoregressive and the moving average terms, and the polynomials and have no common factors. Assuming , , , and , the ARMA model can be written as follows: The ARMA model requires that the data series are stationary. When time series data are nonstationary, the ARIMA model is proposed to model the data which does not show evidence of an ARMA model. In the ARIMA model, the integrated part with order , denoted as , means the th difference of the original data, which can transform the original data to a stationary series. The mathematical equation of an ARIMA model is

###### 2.1.2. VAR

The Vector Autoregression (VAR) model can capture the linear interdependencies among multiple time series and thus can consider the effect of the neighboring stations in predicting the future speed. Here, a 3-equation model is used and its formulation is defined as follows:where is the 3 × 1 vector of variables, is the constant term, through are coefficient matrices, and is the corresponding independently and identically distributed random vector with and time invariant positive definite covariance matrix . Before applying the model, the characteristic polynomial is evaluated to ensure the stability:where is a identity matrix. The necessary and sufficient condition for stability is that all characteristic roots lie outside the unit circle.

###### 2.1.3. Space-Time Model

The Space-Time (ST) model is a probabilistic modeling approach that can provide the point prediction of future observations [10]. In probabilistic speed prediction, the commonly used normal distribution is adopted for speed data. Thus, this study assumed that the speed at time at the target station, , follows a normal distribution. The point prediction of is the mean, , of the normal distribution. Then, is fitted by a linear combination of the present and past values of the speed series at all stations. For example, for station B, when (i.e., 2-minute ahead prediction), where , , and are the 2-minute average speed at stations A, B, and C at time ; stations A and C are the upstream and downstream of station B, and are model coefficients. Predictor variables for are selected based on an analysis of the speed data from first week of the dataset using a stepwise forward search (refer to [10] for details about predictor variable selection algorithm).

##### 2.2. Machine Learning Models

In this section, we select three models, Artificial Neural Network, support vector machine, and Multilinear Regression, to predict travel speed; the following subsections briefly describe these three models.

###### 2.2.1. Artificial Neural Network

Artificial Neural Network (ANN) is a popular tool for traffic flow prediction because of its capability of handling multidimensional data, flexible model structure, strong generalization and learning ability, and adaptability [33]. Different from the statistical methods, ANN does not require underlying assumptions regarding data and is also robust to missing and noisy inputs [33]. ANN model is generally constructed as multiplayer system and it is typically defined by three types of parameters: the interconnection pattern between different layers; the learning process for updating the weights for the layers; the activation function that converts input to output activation. An ANN system can be represented as follows:where and , respectively, represent the number of neurons in the input layer and hidden layer and and are the transfer functions for the input layer and hidden layer. The vector matrices of and , respectively, refer to the weight values for neurons in both input layer and hidden layer. To minimize the sum of estimated errors from ANN, a number of optimization algorithms were developed including Back Propagation Neural Networks, Levenberg-Marquardt method, and genetic algorithm. The detailed information about ANN is introduced in [11–15, 33]. As an important member in ANN family, nonlinear autoregressive model with exogenous inputs neural network (NARXNN) allows a delay line on the inputs, and the outputs feed back to the input by another delay line. This is a further extension of the time delay neural network since the NARXNN not only considers its own previous outputs but also incorporates the exogenous inputs [14].

###### 2.2.2. Support Vector Machine

The main idea of support vector regression is to map data into a high-dimensional feature space through a nonlinear relationship and then construct a linear regression in this space. Given a set of data points for regression, is the number of training samples. The SVM regression function is formulated as follows:where is a vector in a feature space and is called the feature, which maps the input to a vector in . Assume an -insensitive loss function:Then, and are estimated by solving the following optimization problem:where is the maximum deviation allowed; represents the associated penalty for expressing deviation during the training process, which evaluates the trade-off between the empirical risk and the smoothness of the model. The positive slack variables and are incorporated, which represent the size of positive and negative excess deviation, respectively. Thus, (10) is transformed to the following constrained formulation:The first term of (11), , is the regularized term. Thus, it controls the function capacity. The second term, , is the empirical error measured by -insensitive loss function. By using the appropriate Karush-Kuhn-Tucker (KKT) conditions to (11), we have the following dual form of the optimization problem:Therefore, the SVM equation for nonlinear predictions becomeswhere is called the kernel function. and are the solution to the dual problem. There are four conventional kernel functions: linear, radial basis function (RBF), polynomial, and sigmoid. In this study, we select two common functions, linear and RBF, to construct SVM model. The first reason is that these two functions are widely used in prediction and classification. For the second reason, the linear function has advantages of simple construction and low computational time; the RBF function uses nonlinear structure and produces reliable prediction performance based on optimal parameters. For the linear function,For the RBF kernel functions,where is the parameter in kernel function.

###### 2.2.3. Multilinear Regression

Compared with the above two supervised algorithms, the construction of multiple linear regressions is simpler and belongs to regression learning category. In MLR, the prediction values can be calculated by the following equation: represents the prediction value at the th period. The independent variable means the speed data at the previous th period, is the number of historical data considered in MLR, and and are the regression parameters which can be optimized by training samples. The prediction values in testing dataset are estimated from (16).

#### 3. Data Description

The travel speed data used in the study were collected in 4th ring road in Beijing. The segment we selected stretches from Dongfengbei Bridge to Zhaoyang Bridge, and its total length is approximately 2.74 km. This segment experiences significant traffic congestions during peak hours. The speed data were collected from three adjacent stations, which are shown in Figure 1. The distance between each two adjacent stations is about 1.4 km. Location A represents detector 9052, location B is detector 9053, and location C indicates detector 9054. All three detectors monitor southbound traffic with frequency of 2 minutes in 24 hours a day. The missing data for the three stations are all less than 3%, and historical averaged based data imputation method has been implemented to ensure that the selected speed data are appropriate for model validation and evaluation in this study. The data collection duration starts from December 1, 2014, to December 31, 2014, total of 31 days. In order to validate the prediction performance of different models and fairly compare the prediction accuracy, we divide data into two parts: training dataset and testing dataset. The data collected from the first 21 days are used to optimize model parameters, and data in last 10 days are employed to validate models effectiveness. The data in first 7 days from each station are plotted in Figure 2 to show the general trends. For three stations, we can see that speed data distribute similar patterns but different speed values. Figure 2 demonstrates clearly sharp reduction of speed during peak hours and also shows that traffic during the night is normally smooth without fluctuation. The speed data collected in locations A and B have similar distribution patterns, and they express obvious periodicity with low speed at evening peak hours. The speed detected in location C represents different patterns compared with A and B. The speed values are lower in evening peak hours than other locations, because traffic here is under high pressure and volumes are much higher in evening peak hours. The limitation of data includes erroneous samples and data missing. To the inaccurate data, for example, speed values are higher than speed limit or speed values are negative, we remove those samples from the original dataset. Furthermore, the data missing can be attributed to many natural and man-made reasons, for example, communication failures, malfunctioning devices, incorrect observations, or data transfer problems. So, aimed at the data collection shortcoming, historical averaged based method has been implemented to impute missing and removed data, which ensures that the selected speed samples are appropriate for model validation and evaluation in this study.