Journal of Sensors

Volume 2017 (2017), Article ID 7074143, 15 pages

https://doi.org/10.1155/2017/7074143

## Ensemble Learning for Short-Term Traffic Prediction Based on Gradient Boosting Machine

^{1}Institute of Transportation Engineering, Department of Civil Engineering, Tsinghua University, Beijing 100084, China^{2}School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China

Correspondence should be addressed to Senyan Yang; moc.621@gnaynaynes

Received 19 December 2016; Revised 5 March 2017; Accepted 19 March 2017; Published 4 May 2017

Academic Editor: Fanli Meng

Copyright © 2017 Senyan Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Short-term traffic prediction is vital for intelligent traffic systems and influenced by neighboring traffic condition. Gradient boosting decision trees (GBDT), an ensemble learning method, is proposed to make short-term traffic prediction based on the traffic volume data collected by loop detectors on the freeway. Each new simple decision tree is sequentially added and trained with the error of the previous whole ensemble model at each iteration. The relative importance of variables can be quantified in the training process of GBDT, indicating the interaction between input variables and response. The influence of neighboring traffic condition on prediction performance is identified through combining the traffic volume data collected by different upstream and downstream detectors as the input, which can also improve prediction performance. The relative importance of input variables for 15 GBDT models is different, and the impact of upstream traffic condition is not balanced with that of downstream. The prediction accuracy of GBDT is generally higher than SVM and BPNN for different steps ahead, and the accuracy of multi-step-ahead models is lower than 1-step-ahead models. For 1-step-ahead models, the prediction errors of GBDT are smaller than SVM and BPNN for both peak and nonpeak hours.

#### 1. Introduction

Massive traffic data have been constantly collected from a variety of sensors, such as inductive loop detectors, GPS-equipped vehicles, and mobile phones [1], promoting the development of data-driven intelligent transportation systems (ITS) [2]. Short-term traffic prediction is one of the most dynamic and typical researches in ITS, aiming at estimating the traffic state in the near future (within a few minutes) based on the historical traffic data [3, 4]. The prediction traffic information is essentially useful for travelers to make better travel planning in the pretrip stage or reschedule in the en route trip [5]. Accurate short-term traffic prediction is the first important step for real-time route guidance [6] and is quite critical in advanced travelers’ information systems (ATIS) and advanced traffic management systems (ATMS) [7].

Traditional statistical approaches for short-term traffic prediction, such as ARIMA [8] and Kalman filtering technique [9], take advantages of the significant temporal dependencies of the historical univariate time series data of traffic variables. These methods usually assume model structures beforehand and estimate model parameters from the historical data later, with enough interpretability. It is easy for the prediction accuracy to be affected by the unstable traffic conditions, such as the traffic condition at peak hours [10].

Nonstationary and nonlinearity are the basic characteristics of traffic variables [11]. A variety of data-driven approaches have been applied for short-term traffic prediction, capturing the nonlinear relationship among the variables. Higher prediction accuracy can be acquired by these nonparametric machine learning (ML) methods, including Back Propagation Neural Network (BPNN) [12, 13], Support Vector Machine (SVM) [14, 15], and -nearest neighbor algorithm (KNN) [16]. These methods belong to supervised learning method, and the target variables need to be prepared for the dataset beforehand, focusing on learning the relationship between the response and predictors [17]. The underlying information in the massive traffic data can be efficiently captured by these ML methods, achieving good prediction performance, but lacking interpretability [18].

Considering the freeway traffic condition independent of signalization, most short-term traffic prediction algorithms have been conducted and verified based on the freeway traffic data [3]. In the past decades, most researches focus on the prediction of traffic variables at one specific site of interest, solely considering the effect of its own previous traffic information. Actually, the traffic prediction performance for the given site is considerably influenced by the neighboring traffic condition. Spatial and temporal correlations were taken into account when performing short-term traffic prediction [6, 19, 20]. The traffic condition at a specific site is closely related to that of the upstream and downstream traffic condition. Multivariate traffic flow prediction model was constructed, improving the prediction performance by incorporating upstream traffic flow series as the transfer function input of ARIMA [21]. The influence of upstream and downstream traffic on the traffic condition of the given site is not symmetric [22]. The relationship between the current traffic speed at the given location and the past traffic speeds at the upstream and downstream locations was explored through cross correlation analysis [10].

The information provided by the traffic variables of neighboring sites can be used to improve the traffic prediction performance for the given site [10]. In this study, based on the freeway traffic data collected by the detectors, the historical upstream and downstream traffic volume are considered into the variables of prediction models. Actually, the traffic state variation of adjacent detectors is correlative. For many ML models, the effects of the input variables on the model output are difficult to interpret, and when the redundant or irrelevant variables are added, the prediction performance may get worse.

In order to capture the complex nonlinearity of traffic variation and identify the importance of variables, gradient boosting decision trees (GBDT) method, a tree-based ensemble learning method, is proposed to make short-term traffic prediction in this study. GBDT is a relatively new robust and accurate method in the machine learning field, which can cover different types of variables and identify the effects of upstream or downstream traffic on the traffic prediction of the given site, achieving excellent performance over classical methods. The main goal of this study is to identify the relative importance of input variables and enhance the accuracy of short-term traffic prediction.

Ensemble learning is one of the most popular and promising machine learning methods, which can improve the prediction performance by combining large numbers of weak base models [23]. The most commonly used ensemble techniques include boosting, bagging, and stacking. Different with other ML methods, the interaction between the input variables and prediction models can be interpreted, and the relative importance of critical factors can be identified by ensemble learning [24]. Tree-based ensemble methods, combining multiple simple decision trees, have been applied to handle prediction and classification problems in the transportation field, such as random forest, gradient boosting machine, and boosted regression trees. The prediction or classification output of model is the weighted summation or voting of the prediction of base trees. Random forests algorithm into AdaBoost algorithm is applied to estimate and predict traffic flow and congestion [25]. Stochastic gradient boosting is used to identify crashes with a superior classification performance [26]. The nonlinear relationships in the traffic accident data and the main effects of crucial variables are investigated by the boosted regression trees [27].

Additionally, the tree-based models on the basis of the random forest algorithm in the bagging framework are independently trained by uniformly and randomly sampling with replacement from the original dataset, strengthening the robustness, which can be trained by parallel computing. For each splitting node of the based trees, features are randomly selected [28]. Significantly different from the random forest, the tree-based models of GBDT are trained sequentially, and each base model is added to correct the error produced by its previous tree models. For each step, the samples misclassified by previous models are more likely to be selected as the train data, producing more accurate prediction performance. Comparing with the simple single tree model, GBDT is more stable with better prediction performance and interpretability by combining the output results of base trees [24].

The main contribution of this study is that the short-term traffic flow prediction models on the basis of gradient boosting machine are constructed, focusing on the influence of upstream and downstream traffic condition simultaneously and achieving a higher prediction accuracy than conventional machine learning methods. GBDT algorithm provides a flexible framework to adopt different combinations of the upstream and downstream historical traffic volumes as the input variables, which can capture the complex traffic nonlinearity, cover the hidden traffic patterns, and identify the relative importance of variables, and is of good interpretability. In addition, GBDT can resist the outliers of variables and perform well with partly erroneous data without cleaning [26].

#### 2. Methodology

Single decision tree is a fast but instable algorithm, easily affected by the small perturbations in the training data [18], but the performance can be significantly improved by ensemble techniques [26]. Gradient boosting regression trees algorithm (GBDT) is viewed as combining the strengths of boosting algorithms and decision trees. Friedman [29] proposed the gradient boosting machines (GBM), based on a gradient descent formulation of boosting methods, which is suitable for regression and classification problems. Boosting framework is essentially a constructive strategy of ensemble formation, sequentially adding new weak base models which are trained with respect to the error of the former whole ensemble model for each iteration, and these base learners just produce a slightly lower error rate than random guessing [30].

The approximation accuracy and execution speed of gradient boosting can be generally improved by randomly subsampling the training data to fit the base learner at each iteration, also called stochastic gradient boosting [31], which is employed to make the short-term traffic volume prediction in this study, simultaneously considering the influence of the upstream and downstream traffic. The output of short-term traffic prediction model is the traffic volume of the future time at the given site, and the input is the historical volume at the past 1 or 2 or 3 time steps of the given site and its adjacent sites. Similar to other supervised learning methods, GBDT needs to be trained by the dataset with target labels, denoted as , and are the input variables and are the corresponding labels of the response variable. To find out the optimal combination of trees, GBDT algorithm adopts the forward stagewise technique and minimizes the loss function by sequentially adding a new base learner (single tree) to the expansion at each iteration without adjusting the parameters of the existing trees that have already been added [23]. The loss function in using the estimated function to predict based on the training data is defined as

With regard to the continuous response variables, the classical squared-error loss is employed in this prediction model, resulting in consecutive error-fitting in the process:

In the boosting framework, when the algorithm is repeated for iterations, the overall ensemble function estimate is expressed in the additive functional form: where is the initial guess and are the function increments. The new base learners are constructed to be maximally correlated with the negative gradient of the loss function [30]. For the iteration, the negative gradient is defined as

is the local direction where decreases the most rapidly at . denotes the base learner model and the gradient descent step length is computed as

For each step, adding a new base tree is to correct the mistakes made by its previous base learners [18]. Thus, the current model is updated as

To sum up, the generic gradient boosting decision trees algorithm for regression is shown in Algorithm 1. ( is just a single terminal node decision tree.)