#### Abstract

Recently, a number of short-term speed prediction approaches have been developed, in which most algorithms are based on machine learning and statistical theory. This paper examined the multistep ahead prediction performance of eight different models using the 2-minute travel speed data collected from three Remote Traffic Microwave Sensors located on a southbound segment of 4th ring road in Beijing City. Specifically, we consider five machine learning methods: Back Propagation Neural Network (BPNN), nonlinear autoregressive model with exogenous inputs neural network (NARXNN), support vector machine with radial basis function as kernel function (SVM-RBF), Support Vector Machine with Linear Function (SVM-LIN), and Multilinear Regression (MLR) as candidate. Three statistical models are also selected: Autoregressive Integrated Moving Average (ARIMA), Vector Autoregression (VAR), and Space-Time (ST) model. From the prediction results, we find the following meaningful results: () the prediction accuracy of speed deteriorates as the prediction time steps increase for all models; () the BPNN, NARXNN, and SVM-RBF can clearly outperform two traditional statistical models: ARIMA and VAR; () the prediction performance of ANN is superior to that of SVM and MLR; () as time step increases, the ST model can consistently provide the lowest MAE comparing with ARIMA and VAR.

#### 1. Introduction

Collecting high quality traffic information is the key factor to achieve the performance of Intelligent Transportation System (ITS). Accurate prediction of future patterns in traffic flow becomes more important in Advanced Traffic Management System (ATMS) and Advanced Traveler Information Systems (ATIS). Using the forecasted information, such as traffic volume data, travel time data, and traffic condition information, travelers can replan the traveling paths to save their time and cost. Furthermore, transportation agencies can also improve the efficiency of management in traffic system based on forecasted information. Travel speed is an important indicator to estimate traffic conditions in road networks. Compared with general collecting approaches, loop detectors, and GPS equipment, Remote Traffic Microwave Sensor (RTMS) is another important nonintrusive device to directly detect instantaneous travel speed of vehicles. RTMS is installed on the side of the road, and it can directly detect moving or stationary objects without interrupting traffic flow. It can detect traffic volume, occupancy, and speed for multiple lanes simultaneously although sometimes in severe environment. As its high measurement accuracy [1] compared to single loop detector, travel speed data collected from RTMS is used as data source to construct prediction model in this paper.

Short-term traffic flow forecasting models rely on the regularity existing in historical data to predict the traffic patterns in future time periods. A good prediction algorithm usually requires advanced technologies and computational ability to capture high-dimensional and nonlinear characteristics in traffic flow data. In the past few years, a large amount of algorithms has been proposed to address traffic prediction problems. Vlahogianni et al. [2] summarized existing short-term traffic predictions algorithms up to 2003. And recently, Vlahogianni et al. [3] updated the literature from 2004 to 2013. Van Lint and Van Hinsbergen [4] reviewed existing applications of neural network and artificial intelligence in short-term traffic forecasting and classified prediction models into two major categories: parametric approach and nonparametric approach. Existing traffic prediction algorithms range from statistical prediction methods [5–10], neural networks [11–15], support vector regression [16–20], Kalman filter theory [21–26], and hybrid approaches [27–32].

In order to improve forecasting performance, the neural network model was used to aggregate speed information and acceleration information from the current forecasting segment and adjacent segments. Van Lint et al. [13] proposed a state-space neural network model that utilizes upstream and downstream traffic as model input to predict travel time with respect to missing or corrupt input data. Ma et al. [14] developed a Long Short-Term Memory Neural Network (LSTM) to predict travel speed prediction based on RTMS detection data in Beijing City; the proposed model can capture the long-term temporal dependency for time series and also automatically determine the optimal time window. For the support vector machine, Wu et al. [17] applied support vector regression (SVR) for travel time prediction, and they compared the proposed model with some traditional travel time prediction methods in highway network. Zhang and Liu [18] combined state-space approach and least squares support vector machines (LS-SVMs) to forecast travel time index. Asif et al. [20] firstly analyzed spatiotemporal trends for individual links at the network level and then constructed support vector regression (SVR) to predict travel speed in large interconnected road network. For the Kalman filter theory, Chen and Grant-Muller [22] proposed a Kalman filter type network to predict traffic flow, and they also discussed the effect of starting network parameters to the prediction performance. Chien and Kuchipudi [23] used Kalman filtering algorithm to predict travel time for its significance in continuously updating the state variable as new observations. Their empirical results indicated that the prediction performance based on historic path-based data is better than that based on link-based data during peak hours. Wang et al. [26] proposed a new extended Kalman filter (EKF) based online-learning approach to predict highway travel time. In order to effectively improve prediction performance, many scholars proposed various hybrid models to combine advantages of different kinds of methods. Dimitriou et al. [27] used genetic algorithm to structure an adaptive hybrid fuzzy rule-based system for forecasting traffic flow in urban arterial networks. Zheng et al. [28] introduced a neural network model combined with the theory of conditional probability and Bayes’ rule, and the combined model that is demonstrated outperforms the singular predictors from the experimental test of Singapore’s Ayer Rajah Expressway. Dong et al. [32] proposed a hybrid support vector machine that combines both statistical and heuristic models to consider the spatial-temporal patterns in traffic flow.

For the statistical model, Cetin and Comert [5] proposed a new statistical change-point detection algorithm to predict short-term traffic flow, in which the expectation maximization and the CUSUM (cumulative sum) algorithms are implemented to detect shifts. Chandra and Al-Deek [7] considered the effect of upstream and downstream locations on the traffic at a specific location into a traditional Autoregressive Integrated Moving Average (ARIMA) model. Williams and Hoel [8] modeled a seasonal ARIMA process to complete traffic flow forecasting. For the neural networks, Ye et al. [11] used a neural network model to forecast traffic flow time series based on GPS data recorded at irregular time intervals.

According to the reviewed literature, most of traffic prediction models are mainly based on statistical methods and machine learning techniques. These two types of models have their different characteristics. The statistical models can provide good theoretical interpretability with clear calculation construction. While machine learning models use a “black box” approach to predict traffic conditions and often lack a good interpretation of the model, however, compared with statistical models, machine learning methods are more flexible with no or little prior assumptions for input variables. In addition, these approaches are more capable of processing outliers, missing and noisy data [33]. In this study, we compare the prediction performances between statistical models and machine learning models. In statistical models, we select ARIMA, Vector Autoregression (VAR), and Space-Time (ST). In machine learning models, we chose Artificial Neural Network (ANN), SVM, and Multilinear Regression (MLR) as candidate. The travel speed data come from RTMS detector on fourth ring freeway in Beijing City. The contribution of this paper includes the following: comprehensively compare speed prediction performances of different models in machine learning and statistical method; analyze the prediction accuracy under different forecasting steps ahead; and evaluate models’ performance under different scenarios.

The remainder of paper is organized as follows. Section 2 briefly introduces the models used in this study. The data source and analysis are provided in Section 3. Section 4 discusses the results and compares prediction accuracies of different models. Section 5 provides the conclusion of the paper.

#### 2. Methodology

##### 2.1. Statistical Models

In this section, we briefly introduce three statistical methods (i.e., ARIMA, VAR, and ST) considered in this study.

###### 2.1.1. ARIMA

The Autoregressive Integrated Moving Average (ARIMA) model contains the following parameters: is the number of autoregressive terms, is the number of nonseasonal differences, and is the number of lagged forecast errors. An ARIMA model can be regarded as a generalization of autoregressive moving average (ARMA) model. The mathematical formulation of an ARMA process is defined as follows:where is stationary, is a normal white noise series with mean zero and variance , and are parameters for the autoregressive and the moving average terms, and the polynomials and have no common factors. Assuming , , , and , the ARMA model can be written as follows: The ARMA model requires that the data series are stationary. When time series data are nonstationary, the ARIMA model is proposed to model the data which does not show evidence of an ARMA model. In the ARIMA model, the integrated part with order , denoted as , means the th difference of the original data, which can transform the original data to a stationary series. The mathematical equation of an ARIMA model is

###### 2.1.2. VAR

The Vector Autoregression (VAR) model can capture the linear interdependencies among multiple time series and thus can consider the effect of the neighboring stations in predicting the future speed. Here, a 3-equation model is used and its formulation is defined as follows:where is the 3 × 1 vector of variables, is the constant term, through are coefficient matrices, and is the corresponding independently and identically distributed random vector with and time invariant positive definite covariance matrix . Before applying the model, the characteristic polynomial is evaluated to ensure the stability:where is a identity matrix. The necessary and sufficient condition for stability is that all characteristic roots lie outside the unit circle.

###### 2.1.3. Space-Time Model

The Space-Time (ST) model is a probabilistic modeling approach that can provide the point prediction of future observations [10]. In probabilistic speed prediction, the commonly used normal distribution is adopted for speed data. Thus, this study assumed that the speed at time at the target station, , follows a normal distribution. The point prediction of is the mean, , of the normal distribution. Then, is fitted by a linear combination of the present and past values of the speed series at all stations. For example, for station B, when (i.e., 2-minute ahead prediction), where , , and are the 2-minute average speed at stations A, B, and C at time ; stations A and C are the upstream and downstream of station B, and are model coefficients. Predictor variables for are selected based on an analysis of the speed data from first week of the dataset using a stepwise forward search (refer to [10] for details about predictor variable selection algorithm).

##### 2.2. Machine Learning Models

In this section, we select three models, Artificial Neural Network, support vector machine, and Multilinear Regression, to predict travel speed; the following subsections briefly describe these three models.

###### 2.2.1. Artificial Neural Network

Artificial Neural Network (ANN) is a popular tool for traffic flow prediction because of its capability of handling multidimensional data, flexible model structure, strong generalization and learning ability, and adaptability [33]. Different from the statistical methods, ANN does not require underlying assumptions regarding data and is also robust to missing and noisy inputs [33]. ANN model is generally constructed as multiplayer system and it is typically defined by three types of parameters: the interconnection pattern between different layers; the learning process for updating the weights for the layers; the activation function that converts input to output activation. An ANN system can be represented as follows:where and , respectively, represent the number of neurons in the input layer and hidden layer and and are the transfer functions for the input layer and hidden layer. The vector matrices of and , respectively, refer to the weight values for neurons in both input layer and hidden layer. To minimize the sum of estimated errors from ANN, a number of optimization algorithms were developed including Back Propagation Neural Networks, Levenberg-Marquardt method, and genetic algorithm. The detailed information about ANN is introduced in [11–15, 33]. As an important member in ANN family, nonlinear autoregressive model with exogenous inputs neural network (NARXNN) allows a delay line on the inputs, and the outputs feed back to the input by another delay line. This is a further extension of the time delay neural network since the NARXNN not only considers its own previous outputs but also incorporates the exogenous inputs [14].

###### 2.2.2. Support Vector Machine

The main idea of support vector regression is to map data into a high-dimensional feature space through a nonlinear relationship and then construct a linear regression in this space. Given a set of data points for regression, is the number of training samples. The SVM regression function is formulated as follows:where is a vector in a feature space and is called the feature, which maps the input to a vector in . Assume an -insensitive loss function:Then, and are estimated by solving the following optimization problem:where is the maximum deviation allowed; represents the associated penalty for expressing deviation during the training process, which evaluates the trade-off between the empirical risk and the smoothness of the model. The positive slack variables and are incorporated, which represent the size of positive and negative excess deviation, respectively. Thus, (10) is transformed to the following constrained formulation:The first term of (11), , is the regularized term. Thus, it controls the function capacity. The second term, , is the empirical error measured by -insensitive loss function. By using the appropriate Karush-Kuhn-Tucker (KKT) conditions to (11), we have the following dual form of the optimization problem:Therefore, the SVM equation for nonlinear predictions becomeswhere is called the kernel function. and are the solution to the dual problem. There are four conventional kernel functions: linear, radial basis function (RBF), polynomial, and sigmoid. In this study, we select two common functions, linear and RBF, to construct SVM model. The first reason is that these two functions are widely used in prediction and classification. For the second reason, the linear function has advantages of simple construction and low computational time; the RBF function uses nonlinear structure and produces reliable prediction performance based on optimal parameters. For the linear function,For the RBF kernel functions,where is the parameter in kernel function.

###### 2.2.3. Multilinear Regression

Compared with the above two supervised algorithms, the construction of multiple linear regressions is simpler and belongs to regression learning category. In MLR, the prediction values can be calculated by the following equation: represents the prediction value at the th period. The independent variable means the speed data at the previous th period, is the number of historical data considered in MLR, and and are the regression parameters which can be optimized by training samples. The prediction values in testing dataset are estimated from (16).

#### 3. Data Description

The travel speed data used in the study were collected in 4th ring road in Beijing. The segment we selected stretches from Dongfengbei Bridge to Zhaoyang Bridge, and its total length is approximately 2.74 km. This segment experiences significant traffic congestions during peak hours. The speed data were collected from three adjacent stations, which are shown in Figure 1. The distance between each two adjacent stations is about 1.4 km. Location A represents detector 9052, location B is detector 9053, and location C indicates detector 9054. All three detectors monitor southbound traffic with frequency of 2 minutes in 24 hours a day. The missing data for the three stations are all less than 3%, and historical averaged based data imputation method has been implemented to ensure that the selected speed data are appropriate for model validation and evaluation in this study. The data collection duration starts from December 1, 2014, to December 31, 2014, total of 31 days. In order to validate the prediction performance of different models and fairly compare the prediction accuracy, we divide data into two parts: training dataset and testing dataset. The data collected from the first 21 days are used to optimize model parameters, and data in last 10 days are employed to validate models effectiveness. The data in first 7 days from each station are plotted in Figure 2 to show the general trends. For three stations, we can see that speed data distribute similar patterns but different speed values. Figure 2 demonstrates clearly sharp reduction of speed during peak hours and also shows that traffic during the night is normally smooth without fluctuation. The speed data collected in locations A and B have similar distribution patterns, and they express obvious periodicity with low speed at evening peak hours. The speed detected in location C represents different patterns compared with A and B. The speed values are lower in evening peak hours than other locations, because traffic here is under high pressure and volumes are much higher in evening peak hours. The limitation of data includes erroneous samples and data missing. To the inaccurate data, for example, speed values are higher than speed limit or speed values are negative, we remove those samples from the original dataset. Furthermore, the data missing can be attributed to many natural and man-made reasons, for example, communication failures, malfunctioning devices, incorrect observations, or data transfer problems. So, aimed at the data collection shortcoming, historical averaged based method has been implemented to impute missing and removed data, which ensures that the selected speed samples are appropriate for model validation and evaluation in this study.

**(a) Station A**

**(b) Station B**

**(c) Station C**

#### 4. Models Comparison and Results Discussion

In ANN, Back Propagation Neural Network (BPNN) and NARXNN are selected as the candidate models in comparison, and they both have one hidden layer with 50 neurons. We use neural network tool in MATLAB to optimize parameters. In SVM, we use RBF and linear structure as kernel functions. For the parameters optimization, [31] provides detailed introduction. The parameters of ARIMA and VAR models are estimated using the maximum likelihood estimation available in forecast and vars packages in . The coefficients of the ST model are estimated using the minimum continuous ranked probability score (CRPS) estimation [34]*.* When forecasting future speed values, the best order of the ARIMA model is determined by the AIC values using the most recent 21 days of speed data. The VAR model is implemented using a maximal order of 10. And the best order of the VAR model is also selected based on the AIC values using the differenced speed data.

For all the prediction algorithms, the data in first 21 days are used for training models and data in last 10 days are used for validating models. Two performance measures including the mean absolute error (MAE) and the mean absolute percentage error (MAPE) are used as indicators to evaluate the multistep prediction performances of different models. The unit of the MAE is km per hour. The equations for calculating MAE and MAPE are shown as follows:where is the number of observations, is the actual speed at time at station, and is the predicted speed. Furthermore, in order to further evaluate the performance of all models, both one-step and multistep ahead prediction (i.e., 3-step (6 minutes), 5-step (10 minutes), and 10-step (20 minutes)) are considered. Tables 1, 2, and 3 provide the MAE and MAPE values of different models for different forecasting horizons. Note that, in Table 1, bold values indicate the smallest MAE and MAPE values. Figure 3 shows the prediction results of models for one forecasting step compared with observed speed data in five days. The left column represents the prediction results of machine learning methods and right column represents the prediction results of statistical models. In machine learning, we only show comparison among BPNN, SVM-RBF, and MLR. In statistical models, only prediction performances of ARIMA and ST are compared. Figure 4 displays the correlation between observed values and predicted values from five models (three models in machine learning and two models in statistical method) in ten days, and represents the correlation coefficient to evaluate the relevance between observed and predicted values. Figure 5 shows the frequency distribution of predicted errors of five models in ten days. The -axis is the errors (errors = predicted value − observed value), and -axis indicates the frequency in different error ranges. In the figure, is defined as the rate percentage that errors fell within the range of ±10% to estimate prediction strength of all models. Figure 6 represents the frequency distribution of relative errors of five models in ten days. The -axis is the relative errors (relative errors = (predicted value − observed value)/observed value), and -axis indicates the frequency in different error ranges. Similarly, in Figure 6, is defined as the rate percentage that relative errors fell within the range of ±10% to estimate the performance of models.

**(a) Results of machine learning for station A**

**(b) Results of statistical model for station A**

**(c) Results of machine learning for station B**

**(d) Results of statistical model for station B**

**(e) Results of machine learning for station C**

**(f) Results of statistical model for station C**

**(a) Station A**

**(b) Station B**

**(c) Station C**

**(a) Station A**

**(b) Station B**

**(c) Station C**

**(a) Station A**

**(b) Station B**

**(c) Station C**

From the observation of four figures, we can gain several conclusions as follows: the BPNN produce better prediction results with higher and compared with other models; SVM-RBF outperforms ARIMA. The reason is that machine learning models have complex structure and strong learning ability; considering correlation between spatial and temporal characteristics, ST also produces high prediction accuracy; it has similar prediction accuracy with SVM-RBF; MLR is inferior to ARIMA and SVM-RBF with lowest and for its simple calculation structure.

Based on the reported values in tables and corresponding figures, several interesting findings can be obtained:(1)As expected, the prediction accuracy of speed deteriorates as the prediction time steps increase for all models. The results in Tables 1, 2, and 3 show that the MAE and MAPE values for 10-step ahead forecasting are significantly larger than the results of 1-step ahead. From the observation of figures, we can obtain similar results: as the step of prediction increases, the predicted values of all models become more fluctuated, and the prediction accuracy and stability of multistep ahead decrease compared to the results of 1-step ahead prediction.(2)When comparing the results between machine learning and statistical model, the BPNN, NARXNN, and SVM-RBF can clearly outperform two traditional statistical models: ARIMA and VAR. The reason is that these three machine learning models have complex structure and strong learning ability. Considering correlation between spatial and temporal, ST also produces high prediction accuracy. SVM-LIN produces similar prediction results with ARIMA, and MLR is inferior to ARIMA and VAR.(3)For the models in machine learning, the prediction performance of ANN is superior to that of SVM and MLR. Furthermore, BP and NARX both achieve similar high accuracy. In the SVM, as RBF is more flexible than linear function, the SVM-RBF outperforms SVM-LIN clearly. As for simple calculation structure, the prediction accuracy of MLR is lower than that of NN and SVM.(4)For the statistical models, as the time step increases, the ST model can consistently provide the lowest MAE values comparing with ARIMA and VAR. It can be observed that the ST model is preferred over the VAR and ARIMA models when predicting multiple time points into the future. The possible reason is that the ST model uses spatial and temporal information from neighboring stations observed at time to directly predict the future speed value at time . On the contrary, the ARIMA and VAR models consider ’s prediction as an observed value and use it to predict speed at time and this procedure is repeated times to forecast speed values at time . Thus, the prediction error may accumulate after multiple steps when using ARIMA and VAR models.(5)For three stations, we observed that the predicted accuracy in stations A and B is relatively higher than that in station C from Tables 1, 2, and 3 and Figures 3, 4, 5, and 6. The reason is that the speed data collected in station C vary sharply from low values to high values, which can be seen in Figure 2. With different distribution patterns in stations A and B, the speed increases rapidly from 20 km/h to about 80 km/h in station C. Moreover, from the observation from Figure 4, we can see that the speed samples in station C are mainly classified into two clusters: low value part and high value part. Due to the lack of the part with continuous increase of speed values, the sharp fluctuation in speed distribution will definitely influence predicted performance of models.

Although from the prediction results above we can find that complex machine learning models achieve higher prediction results than statistical methods, it is still a challenging work to select proper model in actual applications. As Karlaftis and Vlahogianni [33] suggested, prediction accuracy is a very important indicator but model simplicity and suitability also should be considered. Kirby et al. [35] stated that accuracy should not be the sole determinant for selecting the proper methodology for prediction; other issues should be considered in selecting the appropriate approach such as the time and effort required for model development, skills and expertise required, transferability of the results, and adaptability to changing behaviors [33, 35]. Between classical statistical models and machine learning algorithms, researchers frequently prefer higher prediction accuracy over explanatory power of model. Karlaftis and Vlahogianni [33] concluded some criteria for model selection between NN and statistical models. Thus, we applied case study to compare the predicted performance between machine learning and statistical models, and the conclusion we obtained in this study is that complex machine learning models have advantages of nonlinear fitting ability and robustness for missing data but low explanatory power, and statistical methods can reach high prediction performance through inherent explanatory structure.

#### 5. Conclusions

This paper evaluated the multistep prediction performance of machine learning and statistical models using the speed data collected from three RTMS located on 4th ring road in Beijing City. The data are collected from December 1, 2014, to December 31, 2014, with interval of 2 minutes. In the models performance comparison, we choose five machine learning methods: BPNN, NARXNN, SVM-RBF, SVM-LIN, and MLR, and three conventional statistic models: ARIMA, VAR, and ST model. We firstly provide a brief introduction of each model. In the applications, we then optimize models parameters by using data collected in the first 21 days. Finally, we compare prediction accuracies of different models based on data collected in last 10 days. Based on observation of the results from this study, several interesting conclusions can be drawn. First, the prediction accuracy of speed decreases as the prediction time steps increase for all models. Second, the BPNN, NARXNN, and SVM-RBF are superior to two traditional statistical models: ARIMA and VAR. Third, among the machine learning models, the prediction performance of ANN is better than that of SVM and MLR. Fourth, in the statistical models, as the time step increases, the ST model can consistently provide the lowest MAE values comparing with ARIMA and VAR. The statistical models have good theoretical interpretability and machine learning models are more flexible with no or little prior assumptions for input variables. In addition, machine learning models are more capable of processing outliers, missing and noisy data. Through case study in Beijing based on observed data, this study supplies useful applications of speed prediction and comparisons of prediction performance.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This research was partly supported by the National Natural Science Foundation of China (Grant nos. 51138003, 51329801, and G030601), Natural Science Foundation of Heilongjiang Province (Grant no. QC2013C047), and Hi-Tech Research and Development Program of China (Grant no. 2014BAG03B04).