Research Article  Open Access
Rivindu Weerasekera, Mohan Sridharan, Prakash Ranjitkar, "Implications of Spatiotemporal Data Aggregation on ShortTerm Traffic Prediction Using Machine Learning Algorithms", Journal of Advanced Transportation, vol. 2020, Article ID 7057519, 21 pages, 2020. https://doi.org/10.1155/2020/7057519
Implications of Spatiotemporal Data Aggregation on ShortTerm Traffic Prediction Using Machine Learning Algorithms
Abstract
Shortterm traffic prediction is a key component of Intelligent Transportation Systems. It uses historical data to construct models for reliably predicting traffic state at specific locations in road networks in the near future. Despite being a mature field, shortterm traffic prediction still poses some open problems related to the choice of optimal data resolution, prediction of nonrecurring congestion, and the modelling of relevant spatiotemporal dependencies. As a step towards addressing these problems, this paper investigates the ability of Artificial Neural Networks, Random Forests, and Support Vector Regression algorithms to reliably model traffic flow at different data resolutions and respond to unexpected traffic incidents. We also explore different feature selection methods to identify and better understand the spatiotemporal attributes that most influence the reliability of these models. Experimental results indicate that data aggregation does not necessarily achieve good performance for multivariate spatiotemporal machine learning models. The models learned using highresolution 30second input data outperformed the corresponding baseline ARIMA models by . Furthermore, feature selection based on Recursive Feature Elimination resulted in models that outperformed those based on linear correlationbased feature selection.
1. Introduction
Traffic congestion results in significant monetary losses in countries around the world, with the cost of traffic congestion in 2014 estimated to be billion in the US alone [1]. A significant amount of effort has been put into reducing congestion in cities. In many cities, it is becoming impractical to build new roads or to expand existing roads, and it is becoming all more important to make the best use of the available resources. Intelligent Transportation Systems, Advanced Traffic Management Systems, and route guidance systems use realtime data of traffic flow gathered from various sensors. In such systems, shortterm traffic prediction, which helps make decisions based on predictions of traffic in the near future, is more useful than just using the realtime data of traffic conditions. The field of shortterm traffic prediction is over 30 years old with early work utilizing BoxJenkins ARIMA methods [2]. Recent approaches still use variations of the original ARIMA models, for example, seasonal ARIMA [3, 4], but there has been a shift towards using machine learning algorithms to address the traffic prediction challenges [5]. Although such models based on machine learning algorithms have been shown to be more reliable than the traditional ARIMA models, there are still many open problems [6]. These include building responsive algorithms that are able to predict nonrecurring congestion, determining the optimum data resolution, and identifying and modelling the important spatiotemporal dependencies in traffic data. The study described in this paper is a step towards addressing these challenges. We make the following key contributions:(i)Explore the effect of the resolution of multivariate spatiotemporal input data on the accuracy of shortterm traffic predictions models; we specifically consider models built using Artificial Neural Networks, Support Vector Regression, and Random Forests.(ii)Evaluate the responsiveness of these predictive models to nonrecurring congestion events. Specifically, we study the reliability of the predictions provided by these models in the presence of unexpected events such as accidents.(iii)Identify the spatiotemporal traffic attributes that most influence the performance of these models and their ability to model the complex dependencies in traffic data.
We illustrate these contributions using historical data of volume and occupancy measurements on a highway in Auckland (New Zealand). We first motivate the need for the proposed study by discussing related work in Section 2. Next, Section 3 describes the dataset and methodology used to build and evaluate the predictive models, and Section 4 describes the machine learning algorithms used to build these models. Section 5 describes the hypotheses and measures used for experimental evaluation, and Section 6 analyzes the corresponding experimental results. Finally, Section 7 discusses the conclusions and directions for future work.
2. Background
Many algorithms have been developed for shortterm traffic prediction, which is a complex problem influenced by a variety of factors such as the resolution (i.e., the aggregation level) of the input and output data, and spatiotemporal dynamics. We review some of the related work in this section.
Although studies in the existing literature predominantly use data aggregated over 5 min and 15 min intervals, some prior studies have investigated the effect of data resolution on the reliability of the predictions provided by the corresponding models; the results have, however, been inconclusive. For instance, Park et al. [7] investigated the effect of aggregation on travel time prediction and considered aggregation levels from 2 min to 60 min in the context of an ARIMA model. They concluded that higher levels of aggregation were required to forecast route travel time than when forecasting link travel times. Dougherty and Cobbett [8] constructed a neural network model for making predictions and found that data aggregated over 5 min intervals gives better results than data aggregated over 1 min intervals. Vlahogianni and Karlaftis [9] looked at aggregation levels and although they found that temporal aggregation may distort critical traffic flow information, they also concluded that further research was necessary to determine the optimum aggregation level(s).
The use of highresolution data is challenging for multiple reasons. First, for some statistical models used for shortterm traffic state prediction, it is necessary to ensure that the input data and the output data have the same aggregation level, but this constraint can be relaxed when machine learning algorithms are used to build predictive models. Second, while research shows that the highresolution data (as expected) includes more accurate measurements; for example, Martin et al. [10] state that inductive loops are “one of the most accurate count and presence detectors;” it also makes the noise in sensor measurements more distinct. Although data from these inductive loops can represent individual vehicles in the network, computational models developed to capture the flow of vehicles between segments or links in the network need to be robust to such noise and be able to capture spatiotemporal dynamics in order to exploit the information encoded in highresolution data. Studies based on univariate timeseries methods often perform aggregation to smooth out the variability in higherresolution data [9]; however, these data smoothing techniques result in loss of information (and sensitivity) and make it difficult for the corresponding models to capture the spatiotemporal dynamics of traffic flow. In the study reported in this paper, we fixed the resolution of the output data (i.e., for the predictions being made) and examined the effect of different input data aggregation levels on the prediction accuracy.
There has been considerable research on analyzing the effects of spatiotemporal dynamics. For instance, Kamarianakis and Prastacos [11] used a Spatiotemporal Autoregressive Moving Average (STARIMA) model to incorporate data from links upstream to the link of interest in their prediction model, and Chandra and AlDeek [12] found that vector autoregressive models that incorporate data from links neighbouring the link of interest perform better than ARIMA models that do not consider the data from the neighbouring links. Yang et al. [13] found that a sparse selection of neighbours chosen based on the level of correlation with the link of interest improves performance. Min and Wynter [14] showed that a multivariate spatiotemporal model with templates was able to provide very good prediction accuracy. However, these models depend on fixed correlations matrices that are modified infrequently. As a result, it is difficult for these models to track changes or to capture sudden (or significant) changes between congested and freeflowing traffic conditions.
In addition to the approaches that build on the ARIMA models [2–4, 11, 14], models based on machine learning and probabilistic estimation algorithms have also been explored because they are wellsuited to model the complex spatiotemporal relationships in data. Popular approaches include Artificial Neural Networks (ANN) [15–19], Support Vector Machines (SVM) [20–24], kNearest Neighbours (kNN) [25–29], Kalman Filters [30–32], Bayesian Networks [33–35], and Random Forests [36, 37]. For instance, existing work has explored various ANN configurations. Wang et al. [19] developed a spacetime delay neural network (STDNN) that included 22 links in central London and showed that this model outperforms a STARIMA model. Hodge et al. [38] used a binary neural network that incorporates spatiotemporal data for traffic prediction. Vlahogianni et al. [18] used a neural network model optimized with genetic algorithms and found that incorporating spatial and temporal data was helpful for multistep predictions. More recently, there have been efforts to use deep neural network architectures, including deep belief networks [39, 40] and stacked autoencoders [41].
There is no agreement in the literature regarding the number of upstream and downstream links (neighbouring any link of interest) that should be considered while building the predictive models. While some algorithms consider just one upstream or downstream link [24, 29], others consider a variable number of upstream and downstream links [38]. For an extensive review of spatiotemporal forecasting, please see Ermagun and Levinson [42]. As noted in Vlahogianni et al. [6], capturing spatial attributes in traffic data from a freeway is still an open problem.
Most existing work on shortterm traffic prediction focuses on typical conditions [21]. Traffic is (on average) inherently periodic with daily or weekly patterns, and many studies exploit this periodicity in their algorithms. However, accurate predictions are arguably more useful in situations of nonrecurring congestion such as accidents where periodic patterns do not hold. Of the studies that do not leave out nonrecurring congestion in their input data, a common approach is to create multiple models to deal with different conditions. For example, Dunne and Ghosh [43] used a model with nonlinear preprocessing in cases of congestion. Fusco et al. [44] reported good performance during nonrecurring congestion with a SARMA model, while a Bayesian Network performed better during recurring congestion. An onlineSVRbased model was found to predict nonrecurring congestion accurately by CastroNeto et al. [21]. Pan et al. [45] also highlight some of the challenges in capturing moving bottlenecks and nonrecurring congestion. See Vlahogianni et al. [6], Ermagun and Levinson [42], Oh et al. [46], and Oh et al. [47] for a more comprehensive overview of the existing literature in shortterm traffic prediction.
In this study, we explore three machine learning algorithms that have demonstrated the ability to incorporate spatiotemporal data in predictive models built for intelligent transportation and other applications. Specifically, we explore (1) Artificial Neural Networks (ANN), (2) Support Vector Regression (SVR), and (3) Random Forests (RF). We chose ANN and SVR because they are the most widely used machine learning algorithms used to build predictive models in the literature. We chose Random Forests since it is an ensemble learning algorithm that requires a small number of parameters to be tuned. Please note that the primary objective of our study was not to introduce new algorithms. Instead, we make three key contributions. First, we examine how the predictive accuracy of models based on these algorithms changes as a function of the aggregation level of the input data. Second, we explore the ability of these models to respond accurately to nonrecurring congestion conditions. Third, we identify the spatiotemporal attributes that most influence the predictive accuracy of these models and their ability to model the complex dependencies in traffic data.
3. Methodology
This section introduces the study area and data and provides a mathematical formulation of the shortterm traffic prediction problem (Section 3.1). This is followed by a description of the data preprocessing steps used in the proposed study (Section 3.2).
3.1. Study Area and Mathematical Formulation
This study was carried out in a section of State Highway 1 (SH1) in Auckland, New Zealand. We considered data from 45 segments along SH1 from the suburb of Papakura towards Auckland City (see Figure 1). On average, there are three lanes of roadway in each direction, and we only considered lanes going northbound in this study. The average length of a segment was , with the length varying between and .
Traffic can be measured in different ways. The most common sensor used to collect traffic data is the Inductive Loop Detector, which comes in different forms. Dual loop detectors, which have two inductive loops placed a short distance apart, are able to accurately capture the speed of a vehicle going over them, the volume (i.e., count of vehicles passing the detector), and occupancy (i.e., the amount of time a vehicle was over the detector). However, most of the loops in many cities (including Auckland) are single loop detectors, which can measure volume and occupancy but can only estimate vehicle speed as a function of these measured values and the average effective vehicle length. Research shows that measuring speed with a constant effective vehicle length can lead to errors of up to [48]. Using these derived speed estimates for making decisions can lead to misleading results—we thus did not use speed data in this study.
The fundamental model of traffic flow established by traffic engineers considers the relationship between three key traffic variables: (1) flow (volume), (2) density, and (3) speed. Since density is difficult to measure directly, occupancy is frequently used as a substitute [49]. It is not possible to accurately and comprehensively describe the current state of traffic using only information about flow. For example, if 200 vehicles pass over a detector during a interval, this could correspond to freeflow conditions during early mornings and evenings, but it could also correspond to highly congested conditions due to an accident during peak hours. The combination of both volume and occupancy uniquely defines the current state of traffic. Unlike many existing studies that have only considered flow when making predictions, which does not define the traffic state uniquely, we consider both volume and occupancy because they each provide useful information. Together they help eliminate ambiguities, such as those described above.
For each predictive model, the input vector is of the form:where and denote volume and occupancy (respectively) of segment at timestep , is the total number of segments, and is the total number of historical timesteps considered. The output of each such model is the volume or occupancy aggregated over the subsequent fiveminute interval for each specific segment of interest. This output is a function of the input vector; for example, if traffic volume is to be predicted, the output of the models is . The goal of each machine learning algorithm is to build a model of this functional relationship between the inputs and outputs. The learned model can then be used to predict the output for any given input.
3.2. Data Processing
Data from 30 days of April 2016 was collected for 45 segments on the motorway. In order to get segment level data from loop detectors, individual values were aggregated across the lanes (volume data was summed, and occupancy was averaged) for each segment and at each point in time. We use the volume and occupancy values of all segments in the past 20 timesteps , resulting in an input vector with 1800 attributes. To ensure that each segment has data from a reasonable number of upstream and downstream segments, predictions are only made for segments on the motorway (see Figure 1). Recall that volume and occupancy readings were reported every 30 seconds, which correspond to 86400 timesteps. A naive aggregation would have resulted in smaller datasets of 8640 samples and 2880 samples for and aggregation, respectively. To minimize the imbalance in the size of the datasets, a sliding window approach was used, resulting in a new sample being generated every 30 seconds for all the aggregation levels. The final size of the input dataset, with 20 timesteps included in each input sample, was thus 86370 samples for resolution, 86190 for , and 85790 for aggregation. Also, to ensure a fair comparison, the output is aggregated over the same time period for each model for all input time resolutions, that is, the amount of time represented in the input depends on the resolution of the data, whereas in the output, all models will consider the aggregated values over the interval from when the final input reading was taken to five minutes past this time.
The dataset was preprocessed to remove some extreme values that were highly unlikely. First, we used winsorization [50] to set the upper bound of the values in the dataset. Winsorization, a common approach for dealing with outliers, replaces all values above and below a certain percentile with the value of that percentile. In this paper, we set the upper percentile to so that all values above this percentile are replaced by the value of this percentile. If a standard normal distribution is assumed, this choice of upper bound corresponds to clipping values that are standard deviations from the mean. Figure 2 shows volume values from segment 23 before and after winsorization.
Second, we scaled each attribute in the input data to lie ; this scaling was especially crucial for producing stable results with Support Vector Regression and Artificial Neural Networks. Scaling was performed using the training data, and the corresponding scaling constants were applied to the test data. The occupancy values always stayed between and in the input and output, and no additional processing was needed to constrain the data to this range. Nonstationary timeseries data is typically transformed into stationary data before applying timeseries models. However, traffic data is considered to be cyclostationary and we model shortterm traffic prediction as a multivariate pattern recognition problem with all data assumed to arise from the same underlying distribution. Thus, we did not perform any transformations to make the data stationary. Also, although the periodic nature of traffic can be exploited to improve the prediction accuracy of the learned models, doing so will make it difficult to reliably and efficiently identify and respond to nonrecurring congestion conditions (also see Section 4.2).
Training of the models was accomplished using data from the first 20 days (57,600 samples), and data corresponding to the remaining ten days was used for testing. The parameters of each model were tuned using the training dataset. Next, we briefly discuss the algorithms that we used to build the models for shortterm traffic prediction.
4. Machine Learning Algorithms
In this section, we describe the three machine learning algorithms used to build the predictive models explored in this paper: Artificial Neural Networks (Section 4.1), Support Vector Regression (Section 4.2), and Random Forests (Section 4.3).
4.1. Artificial Neural Network
Feedforward neural networks or multilayer perceptrons are the most common Artificial Neural Network (ANN) models. A neural network is composed of neurons arranged in layers with each layer containing one or more neurons. Each neuron is connected to all the neurons in its adjacent layers, and neurons within a layer are not connected. Each neuron takes a linear weighted sum of all its inputs (from the layer before it) and passes it through a nonlinear activation function to produce the output :
Each such output is then used as an input to the next layer of neurons until the final (i.e., output) layer is reached. The weights associated with each neuron may be initialized randomly to enable each neuron to potentially learn a different function of its inputs.
The weights associated with each neuron are the parameters defining the neural network model, and these parameters are estimated by minimizing a loss function that measures the difference between the output values estimated by the network and the groundtruth values included in the training data. For regression problems, the squared error between the estimated and groundtruth output values is generally used as the loss function. The backpropagation algorithm is then used to calculate the gradient of this error and to propagate this gradient back through the network (towards the input layer) to update the weights of each neuron by gradient descent. Stochastic gradient descent algorithms are used widely to update the weights, and we used a stochastic gradientbased optimizer called Adam that is computationally efficient and is known to scale well to larger datasets [51]. All parameters of this optimizer were set to their default values.
Although the nonlinear activation function in a neural network has traditionally been the sigmoid function, empirical results have indicated that the rectified linear unit (ReLU) activation function improves the ability to model complex relationships and reduces the time taken to train the model [52]. We thus used the ReLU activation function in a network with three hidden layers, each with 150 neurons. We performed 400 iterations of learning with minibatches of data with 200 samples (each).
4.2. Support Vector Regression
For classification problems, a Support Vector Machine computes a decision boundary that maximizes the margin between this boundary and the closest data sample. Support Vector Regression (SVR) uses a similar approach for regression problems—errors corresponding to estimated values within an distance from the groundtruth values are ignored. More specifically, given a set of training data, the objective is to find a function that produces at most deviation from the actual target values for the training data and is as flat as possible [53]. For instance, a linear function is flat if it has a small —this can be accomplished by minimizing . Since a function that satisfies all the required constraints may not exist, some slack variables are introduced to allow for some errors. We then obtain the following formulation for SVR:
We can also incorporate nonlinear kernel functions to extend SVR to nonlinear problems. Popular kernels include linear kernel and the Radial Basis Function (RBF) kernel, which transform the input sample into a higher dimensional space that results in better separation (for classification) or estimation of values (for regression). We experimentally chose to use a linear kernel for SVR because it provided better results.
4.3. Random Forest
Random Forest (RF) [54] is an ensemble method for building classification or regression models. Ensemble methods combine predictions from multiple models to improve accuracy. In an RF, the ensemble is a set of decision trees trained on subsets of the full dataset. Each subset is selected by a technique known as bagging or bootstrap aggregation. If the training set is defined as input vectors and the corresponding (target) output values , decision trees will be created as follows: for in 1…do Pick training samples randomly with replacement; call this subset Train a decision tree using where each split in a decision tree is based on a random subset of the attributes end for
In other words, each subset created by sampling from the training set with replacement results in a decision tree. The prediction for any test input is then the average of the predictions from each decision tree:
This approach ensures that individual trees are not highly correlated because of a small number of strong predictors. RF methods are popular because they provide some robustness to noisy data with outliers. They are also able to focus on attributes most useful to the regression or classification task under consideration and ignore attributes that are less relevant. In our study, we used a RF with 100 trees.
5. Hypotheses and Measures
We experimentally evaluated the following hypotheses regarding the predictive models learning using the machine learning algorithms:(1)The learned models are able to disregard the amplification of noise and variations in highresolution data and provide higher accuracy than models that do not use highresolution data(2)The learned models are responsive to nonrecurring congestion events such as accidents, and this ability improves with the increase in the resolution of data(3)The learned models are able to capture the complex spatiotemporal evolution of traffic by assigning higher importance to volume and occupancy attributes extracted from segments near the segment of interest
As baselines for comparison, wherever appropriate, we used two established methods for volume prediction in existing literature (ARIMA, historical average). To experimentally evaluate the hypotheses, we used three measures: accuracy, root mean square error (RMSE), and mean absolute error (MAE), defined as follows:where is the predicted value and is the groundtruth value of the data sample.
To quantify responsiveness to nonrecurring conditions, we computed these measures over samples that were representative of nonrecurring conditions. Specifically, a sample was considered if the difference between its output value and the weekly seasonal mean of the predicted variable was more than two standard deviations away from the mean of the distribution of output values:where is the standard deviation and is the mean of the values of the predicted variable during the corresponding time period for that day of the week.
6. Experimental Results
This section discusses the results of experimentally evaluating the three hypotheses listed in Section 5. We summarize the results in Sections 6.1, 6.2, and 6.4 and examine the computational efficiency of the proposed models in Section 6.3. Unlike results reported in many papers, our predictive models considered different traffic conditions such as peak and offpeak traffic at different times of the week, including weekends and public holidays. Recall that we explore different aggregation levels ranging from to for the input data, but the output of each model is the volume or occupancy of vehicles (in a particular segment in the highway) aggregated over a period of five minutes—see Section 3.1 for more details.
6.1. Using HighResolution Data
As stated in Section 3.1, the predictive models were constructed using the training set and evaluated on the test set. We repeated the trials to check that the performance of the models was stable using different random initializations.
The results summarized in Table 1 show that all three machine learning algorithms performed better with aggregation level for input data in comparison with the and aggregation levels. While the increase in prediction accuracy with resolution may not be surprising, it is important to note that the increase in resolution also amplifies the noise and minor variations in the data. As baselines for comparison, we considered two established methods for volume prediction in the existing literature (ARIMA, historical average). For the ARIMA models, we applied a squareroot transformation in addition to the firstorder difference and verified their stationarity. To compare the outputs from these methods with the outputs from the learned models, we evaluated all models at the same output resolution of . For instance, for the aggregation level, the aggregated output value was obtained by iterating and aggregating the output over ten onestepahead predictions. Also, results for the input aggregation level were obtained by first applying the StramWei temporal disaggregation [55] to extract aggregated values from the aggregated data. ARIMA (2, 1, 2) models were used for predicting volume at the and input aggregation levels, ARIMA (2, 1, 1) models were used for predicting occupancy at the and aggregation levels, and ARIMA (4, 1, 0) models were used for the input aggregation level. These models were selected experimentally using the BoxJenkins method.
 
Standard deviations across segments are reported in parentheses and numbers in boldface show the best results. 
The results in Table 1 indicate that the models corresponding to the input aggregation level provide an average accuracy improvement of over the ARIMA approach and an average improvement over the historical average baseline. Note that these results include both recurring and nonrecurring congestion events; we examine the nonrecurring events in more detail in Section 6.2. To confirm the significance of these results, we conducted Diebold–Mariano (DM) tests for predictive accuracy [56]. The DM test compares the forecast accuracy of a pair of forecast methods. The test’s null hypothesis is that the two forecasts have the same accuracy. The null hypothesis will be rejected if the computed DM statistic falls outside the required significance level under a standard normal distribution; for example, for a significance of , the null hypothesis is rejected if the DM statistic . We used MSE as the error metric. Table 2 shows the DM test statistic for each pair of models. Except for the SVR and RF models, all other models have significantly different levels of accuracy.
 
Critical value: ; numbers in boldface indicate pairs of models that are not significantly different. 
Table 3, which summarizes the results of predicting occupancy, indicates similar trends. Although all three predictive models based on machine learning algorithms performed well, the model based on the Random Forest algorithm (Section 4.3) provided the highest accuracy.
 
Standard deviations across segments are reported in parentheses and numbers in boldface show the best results. 
Next, the average accuracy and MAE at different times of the day, for the three different data aggregation levels, are shown in Figure 3. For each algorithm, the accuracy increases with the resolution. Overall, we observe that the performance of the learned predictive models improves significantly with the increase in resolution despite the associated amplification of noise and minor variations in data.
(a)
(b)
The results discussed so far support the first hypothesis that predictive models based on machine learning algorithms are able to disregard the amplification of noise in highresolution data and provide higher accuracy than models that do not use the highresolution data. The lower accuracy values during overnight hours can be explained by the accuracy being represented as a percentage of vehicles and the average number of vehicles overnight being significantly lower; this is confirmed by the lower MAE values for the same period.
6.2. Nonrecurring Congestion
Next, we evaluated the second hypothesis by examining the responsiveness of the predictive models to nonrecurring congestion events. We did so by only evaluating the trained predictive models on a subset of the test set comprising samples that were significantly different from historical average values. The results are summarized in Tables 4 and 5. We observe that the models built using input data at the aggregation level outperform the models use input data at the and aggregation levels. Among the learned models, the model based on the ANN algorithm provides marginally better performance than that based on the RF algorithm for volume predictions while the converse is true for occupancy predictions. Furthermore, we observe that the learned predictive models provide better performance than the models based on historical average and ARIMA, which are established methods for shortterm traffic prediction.
 
Standard deviations across segments are reported in parentheses and numbers in boldface show the best results. 
 
Standard deviations across segments are reported in parentheses, and numbers in boldface show the best results. 
To further explore the responsiveness of the learned models, we examined a known (i.e., reported) breakdown along the motorway in more detail. Figure 4(a) compares the average volume of traffic on segment 23 of SH1 on Thursday with the traffic volume on a specific Thursday, April 21, 2016. The data corresponding to this date was in the test dataset, that is, not used to train the predictive models. Figure 4(a) shows that there was a significant deviation from the average traffic around 6.40 am on April 21, 2016. As reported on the social media site, Twitter, there was a breakdown near SH1 at that day (see Figure 4(b)). More specifically, the Ellerslie onramp mentioned in the tweet is near segment 27 of SH1, which is from segment 23 on SH1.
(a)
(b)
Figures 5(a)–5(c) show how the learned predictive models are able to track the traffic volume corresponding to this event, with each of the three different input data aggregation levels. For comparison, the figures also include the performance of the ARIMA approach. We observe in Figure 5(a) that using the highresolution input data aggregation level enabled the learned models to predict the change in traffic volume at almost the same timestep when the nonrecurring event occurred, whereas there is a lag when the other two aggregation levels are used; the performance is significantly worse with the baseline ARIMA model.
(a)
(b)
(c)
For additional examples of how the models predicted during nonrecurring congestion, see Figure 6. These plots indicate that the ANN model at the input aggregation level responds very quickly to nonrecurring congestion. The SVRbased models and the coarserresolution models tend to smooth out shocks to traffic and are better at smoothing out the noise in typical congestion conditions. The RFbased learned models tend to provide good overall performance that lies in between that provided by the ANNbased and SVR models.
Figure 7 shows that an ANNbased learned model at the input data aggregation level accurately predicts traffic volume on a public holiday. Recall that this model had no information about the day of the week and the seasonal mean. Overall, these results support the second hypothesis that the models based on machine learning algorithms and highresolution data are more responsive to nonrecurring congestion.
6.3. Computational Efficiency and Practical Scalability
Table 6 summarizes the training time and testing time of the proposed models, when they are built and evaluated on an Intel Core desktop with of RAM. The time taken to generate a forecast was under 0.1 seconds for all models. The training time, even in the most extreme case, was under 20 minutes. Since the training process can easily be parallelized to create models for all segments on a network and this can be done in an initial offline phase, we believe these methods can be easily implemented for forecasts over the entire traffic network.

We did not optimize our algorithms—performance could have been improved by using fewer training samples or tuning the algorithms’ parameters, for example, by using a smaller number of trees in the Random Forest or a smaller neural network. The different algorithms take different amounts of time for training and testing; for example, models based on the (linear) SVR algorithm have the lowest training time and testing time—the nonlinear SVR models have a much longer training time ( one hour for one model) but they did not perform as well as the linear model. The ANNbased models take longer to train but are fast during testing, whereas the RFbased ensemble models take longer to train and test.
Overall, we believe that models based on these machine learning methods will scale to large road networks. The retraining of the models can be undertaken as new data comes in over several weeks or months, enabling the system to adapt to changes in the road network.
6.4. Attribute Selection
Next, we evaluate the third hypothesis regarding the ability to model the complex spatiotemporal evolution of traffic. To do so, we first identify the attributes that most influence the performance of the learned predictive models.
One common approach for identifying informative attributes is to compute the Pearson correlation coefficient between the target variable and each of the input attributes [42]. However, the Pearson correlation coefficient is not able to capture nonlinear relationships that may exist between the input attributes and thetarget variable . We, therefore, used the Recursive Feature Elimination (RFE) approach to select the most relevant (i.e., informative) attributes [58, 59]. RFE works by iteratively considering an increasingly smaller subset of attributes, dropping (in each iteration) the attributes considered to be the least relevant. In each iteration, we removed 10 attributes ranked the lowest in terms of importance.
There are different ways to characterize the importance of attributes in RFbased models. Since any RF is a collection of decision trees, the gini importance of each attribute in all decision trees can be averaged, for instance, to arrive at the importance of the attribute. In the case of an ANN, the weights of the first layer of an ANNbased model can provide insight into the attributes that contributed significantly to making the predictions. In a similar manner, the weights assigned to each attribute of a linear SVM can be used to identify the relative importance of the attributes [60].
Figures 8(a), 9, and 10 visualize the relative ranking of each of the 1800 input attributes considered by the models for traffic prediction at a particular segment (segment 23 in these figures). The darker shades represent the more informative attributes. For each figure, the plot on the left visualizes the volume attributes and the plot on the right visualizes the occupancy attributes. In each of these plots, the columns going from left to right along the xaxis represent the segments in spatial order along the motorway from the south to the north. Along the yaxis, the first row is the most recent timestep, and the top row is the oldest timestep, for example, for the aggregation level for input data, row 20 corresponds to the data from 10 minutes before the current timestep. Overall, we observed that all three models provide a higher rank to neighbouring segments over a few timesteps.
(a)
(b)
(c)
(a)
(b)
(c)
(a)
(b)
(c)
A more careful examination of the results indicated that the predictive models based on SVR and RF assign higher importance to volume attributes than occupancy attributes when making decisions. Also, the same set of attributes do not contribute significantly to the performance of all three models. For all three models, the attributes that are considered important change when the resolution of the input data changes. For instance, for the models based on the aggregation level (i.e., highest resolution), the set of attributes considered to be important for decisionmaking mostly included values (of volume and occupancy) from nearby spatial locations and timesteps. The number of attributes corresponding to downstream segments that are nearby is high for the higherresolution models, especially when predicting nonrecurring congestion events. For the models based on the and aggregation levels, on the other hand, the set of attributes considered to be important also included values from more distant segments. These results add to the current knowledge about representing information for shortterm traffic prediction. For instance, some recent research found that having more than one timestep of data from neighbouring locations only provides minor improvements in performance [13]. Our results, on the other hand, indicate that volume and occupancy values from multiple neighbouring locations and timesteps may be important for accurate prediction of traffic depending on the resolution of the input data.
To further analyze the importance of the attributes, we considered the relative importance of different subsets of these ranked attributes. We observed that the performance, specifically accuracy, flattens out after including attributes. Figure 11 shows the performance of the three models for the aggregation level, as a function of the number of attributes considered, with the attributes ordered in decreasing order of importance. A similar result was observed for the other two aggregation levels.
Finally, we compared the performance of the RFE approach for ranking attributes with the more common correlationbased approach and an approach that chose important attributes randomly; we considered the performance of the corresponding models under normal conditions and in the presence of nonrecurring congestion events. Tables 7 and 8 as well as Figures 12 and 13 indicate that the RFE approach outperforms the other two approaches for ranking attributes. In fact, in the case of nonrecurring congestion, the prediction accuracy using correlationbased attribute selection is similar to that with a random selection of the important attributes. One explanation for the poor performance provided by correlationbased feature selection is that the features that are most likely to be highly correlated to the output correspond to the road segments closest to the segment under consideration. However, in most cases, these features give redundant information. Segments further away may contain information about situations such as queues building up or a spike in traffic that is not necessarily correlated with the output but are quite informative for predictions. The RFE provides an opportunity to identify these dependencies, and the experimental results show that it is a much better choice for accurate traffic prediction, especially with nonrecurring congestion events. The experimental results also support the hypothesis that the predictive models based on the machine learning algorithms capture the complex spatiotemporal evolution of traffic by assigning higher importance to the attributes that are more relevant to the prediction task.

 
Traffic volume predictions under nonrecurring conditions with input data; given a constrained number of features, in most cases, the RFE method achieves better performance compared to random and correlationbased feature selection. 
7. Conclusions
Traffic congestion results in significant monetary losses in countries around the world. Shortterm traffic prediction helps make decisions based on predictions of traffic in the nearfuture and is more useful than just using the realtime data of traffic conditions. Despite being a mature field, shortterm traffic prediction poses many open problems such as the (a) choice of the optimal input data resolution; (b) reliable prediction and efficient tracking of nonrecurring congestion events; and (b) accurate modelling of the complex spatiotemporal dependencies influencing traffic estimation. We have explored the construction and use of predictive models based on three established machine learning algorithms for addressing the aforementioned problems. Specifically, we investigated the use of Artificial Neural Network (ANN), Support Vector Regression (SVR), and Random Forest (RF) and evaluated the predictive performance of these models for three different input data aggregation levels, , , and . For each learned model, the output was a prediction (of volume or occupancy) over a period, although the same methodology can be used to provide predictions over or intervals as well. Our experiments indicate the following.(i)Aggregation of highresolution data to a lower resolution is not required for accurate forecasting with machine learning algorithms. Aggregation may actually have a negative effect on accuracy for these multivariate models. Our results indicate that machine learning algorithms are able to extract useful information from highresolution data despite the corresponding amplification of noise and variability in the sensor measurements.(ii)By not explicitly exploiting the periodic characteristics in traffic, the machine learning models studied here perform equally well under both recurring and nonrecurring congestion without requiring any special changes to the models. The corresponding experimental results also indicate that these learned models are able to capture the underlying complex, spatiotemporal evolution of traffic.(iii)Recursive Feature Elimination provides a good ranking of attributes for shortterm traffic prediction. The more commonly used linear Pearson correlation coefficientbased feature selection [42] provides poor prediction accuracy similar to that with a random selection of features in the presence of nonrecurring congestion. Furthermore, feature selection enables us to visualize and better understand the spatiotemporal patterns modeled by the machine learning models.
These results open up multiple directions for further research. First, we will incorporate these findings in more sophisticated machine learning algorithms for shortterm traffic prediction. For instance, the complex, nonlinear relationships influencing traffic flow may be modeled well using deep network architectures, especially when highresolution input data is considered. We will also consider other datasets in order to generalize the findings reported in this paper based on data from a single highway. Second, we will build on the indicated ability to track nonrecurring congestion events in order to consider both accidents and weather conditions. This will require the underlying algorithms to model additional variables and their effect on traffic flow. Furthermore, we will explore networkwide traffic predictions towards the longterm objective of effective use of resources for the smooth flow of traffic under a wide range of circumstances.
Data Availability
The terms of use of the data used in this study do not allow the authors to distribute or publish the data directly. However, these data can be obtained directly from NZTA through APIs on the following web page: https://www.nzta.govt.nz/trafficandtravelinformation/infoconnectsectionpage/.
Conflicts of Interest
Mr. Rivindu Weerasekera (BE (Hons)) is a doctoral candidate at the University of Auckland, New Zealand. He holds a first class honors degree in Electrical and Electronics Engineering from the University of Auckland. His research interest focus on the intersection of Intelligent Transportation Systems and Machine Learning. Dr. Mohan Sridharan (Ph.D.) is a senior lecturer in the School of Computer Science at the University of Birmingham (UK). He was previously a senior lecturer in the Department of Electrical and Computer Engineering at The University of Auckland (NZ), and a faculty member at Texas Tech University (USA) where he is currently an Adjunct Associate Professor of Mathematics and Statistics. He received his Ph.D. in Electrical and Computer Engineering from The University of Texas at Austin (USA). Dr Sridharan’s primary research interests include knowledge representation and reasoning, interactive machine learning, cognitive systems, and computational vision, in the context of adaptive robots and agents. Dr. Prakash Ranjitkar (Ph.D., MEng, BEng (Civil)) is a senior lecturer in Transportation Engineering in the Department of Civil and Environmental Engineering and a founding member of the Transportation Research Centre (TRC) at the University of Auckland, New Zealand. He has over 19 years of academic, research, and consulting work experience in a range of transport and other infrastructure engineering projects. He has strong research interest in modelling and simulation of traffic, Intelligent Transportation System, traffic operations and management, traffic safety, human factors, and applications of advanced technologies in transportation. Prior to joining the University of Auckland in 2007, Prakash worked for the University of Delaware in USA (2006–2007) and before that in Hokkaido University in Japan (2001–2006). He is a member of IPENZ Transportation Group and Institute of Transportation Engineers (USA). He is an Editorial Board Member for the Open Transportation Journal and reviewer of Journal of Transportation Research Board, Journal of Eastern Asia Society for Transportation Studies, Journal of Intelligent Systems, and IEEE Transactions of Intelligent Transportation Systems.
Acknowledgments
The authors would like to thank Mike Duke from Auckland’s Joint Transport Operations Centre (JTOC) for helping them obtain access to the data used for experimental evaluation in this paper.
References
 D. Schrank, B. Eisele, T. Lomax, and J. Bak, Urban Mobility Scorecard, International Transport Forum, Paris, France, 2015.
 M. S. Ahmed and A. R. Cook, “Analysis of freeway traffic timeseries data by using BoxJenkins techniques,” Transportation Research Record, vol. 722, p. 116, 1979. View at: Google Scholar
 B. L. Smith, B. M. Williams, and R. Keith Oswald, “Comparison of parametric and nonparametric models for traffic flow forecasting,” Transportation Research Part C: Emerging Technologies, vol. 10, no. 4, pp. 303–321, 2002. View at: Publisher Site  Google Scholar
 B. M. Williams and L. A. Hoel, “Modeling and forecasting vehicular traffic flow as a seasonal arima process: theoretical basis and empirical results,” Journal of Transportation Engineering, vol. 129, no. 6, pp. 664–672, 2003. View at: Publisher Site  Google Scholar
 M. G. Karlaftis and E. I. Vlahogianni, “Statistical methods versus neural networks in transportation research: differences, similarities and some insights,” Transportation Research Part C: Emerging Technologies, vol. 19, no. 3, pp. 387–399, 2011. View at: Publisher Site  Google Scholar
 E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, “Shortterm traffic forecasting: where we are and where we’re going,” Transportation Research Part C: Emerging Technologies, vol. 43, no. 3–19, 2014. View at: Publisher Site  Google Scholar
 D. Park, L. R. Rilett, B. J. Gajewski, C. H. Spiegelman, and C. Choi, “Identifying optimal data aggregation interval sizes for link and corridor travel time estimation and forecasting,” Transportation, vol. 36, no. 1, pp. 77–95, 2009. View at: Publisher Site  Google Scholar
 M. S. Dougherty and M. R. Cobbett, “Shortterm interurban traffic forecasts using neural networks,” International Journal of Forecasting, vol. 13, no. 1, pp. 21–31, 1997. View at: Publisher Site  Google Scholar
 E. Vlahogianni and M. Karlaftis, “Temporal aggregation in traffic data: implications for statistical characteristics and model choice,” Transportation Letters, vol. 3, no. 1, pp. 37–49, 2011. View at: Publisher Site  Google Scholar
 P. T. Martin, Y. Feng, and X. Wang, Detector Technology Evaluation (MPC03154), International Transport Forum, Paris, France, 2003.
 Y. Kamarianakis and P. Prastacos, “Forecasting traffic flow conditions in an urban network: comparison of multivariate and univariate approaches,” Transportation Research Record: Journal of the Transportation Research Board, vol. 1857, no. 1, pp. 74–84, 2003. View at: Publisher Site  Google Scholar
 S. R. Chandra and H. AlDeek, “Predictions of freeway traffic speeds and volumes using vector autoregressive models,” Journal of Intelligent Transportation Systems, vol. 13, no. 2, pp. 53–72, 2009. View at: Publisher Site  Google Scholar
 S. Yang, S. Shi, X. Hu, and M. Wang, “Spatiotemporal context awareness for urban traffic modeling and prediction: sparse representation based variable selection,” PLoS One, vol. 10, no. 10, pp. 1–22, 2015. View at: Publisher Site  Google Scholar
 W. Min and L. Wynter, “Realtime road traffic prediction with spatiotemporal correlations,” Transportation Research Part C: Emerging Technologies, vol. 19, no. 4, pp. 606–616, 2011. View at: Publisher Site  Google Scholar
 X. Ban, C. Guo, and G. Li, “Application of extreme learning machine on large scale traffic congestion prediction,” in Proceedings of ELM2015 Volume 1: Theory, Algorithms and Applications (I), J. Cao, K. Mao, J. Wu, and A. Lendasse, Eds., pp. 293–305, Springer International Publishing, Cham, Switzerland, 2016. View at: Google Scholar
 S. Dunne and B. Ghosh, “Weather adaptive traffic prediction using neurowavelet models,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 1, pp. 370–379, 2013. View at: Publisher Site  Google Scholar
 S. Sun, R. Huang, and Y. Gao, “Networkscale traffic modeling and forecasting with graphical lasso and neural networks,” Journal of Transportation Engineering, vol. 138, no. 11, pp. 1358–1367, 2012. View at: Publisher Site  Google Scholar
 E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, “Optimized and metaoptimized neural networks for shortterm traffic flow prediction: a genetic approach,” Transportation Research Part C: Emerging Technologies, vol. 13, no. 3, pp. 211–234, 2005. View at: Publisher Site  Google Scholar
 J. Wang, I. Tsapakis, and C. Zhong, “A spacetime delay neural network model for travel time prediction,” Engineering Applications of Artificial Intelligence, vol. 52, pp. 145–160, 2016. View at: Publisher Site  Google Scholar
 M. T. Asif, J. Dauwels, C. Y. Goh et al., “Spatiotemporal patterns in largescale traffic speed prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 2, pp. 794–804, 2014. View at: Publisher Site  Google Scholar
 M. CastroNeto, Y.S. Jeong, L. D. Han, D. Lee, and Han, “OnlineSVR for shortterm traffic flow prediction under typical and atypical traffic conditions,” Expert Systems with Applications, vol. 36, no. 3, pp. 6164–6173, 2009. View at: Publisher Site  Google Scholar
 A. Cheng, X. Jiang, Y. Li, C. Zhang, and H. Zhu, “Multiple sources and multiple measures based traffic flow prediction using the chaos theory and support vector regression method,” Physica A: Statistical Mechanics and Its Applications, vol. 466, pp. 422–434, 2016. View at: Publisher Site  Google Scholar
 Y.S. Jeong, Y.J. Byon, M. M. CastroNeto, and S. M. Easa, “Supervised weightingonline learning algorithm for shortterm traffic flow prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 4, pp. 1700–1707, 2013. View at: Publisher Site  Google Scholar
 B. Yao, C. Chen, Q. Cao et al., “Shortterm traffic speed prediction for an urban corridor,” ComputerAided Civil and Infrastructure Engineering, vol. 32, no. 2, 2016. View at: Publisher Site  Google Scholar
 P. Cai, Y. Wang, G. Lu, P. Chen, C. Ding, and J. Sun, “A spatiotemporal correlative knearest neighbor model for shortterm traffic multistep forecasting,” Transportation Research Part C: Emerging Technologies, vol. 62, pp. 21–34, 2016. View at: Publisher Site  Google Scholar
 H. Chang, Y. Lee, B. Yoon, and S. Baek, “Dynamic nearterm traffic flow prediction: systemoriented approach based on past experiences,” IET Intelligent Transport Systems, vol. 6, no. 3, p. 292, 2012. View at: Publisher Site  Google Scholar
 G. A. Davis and N. L. Nihan, “Nonparametric regression and shortterm freeway traffic forecasting,” Journal of Transportation Engineering, vol. 117, no. 2, pp. 178–188, 1991. View at: Publisher Site  Google Scholar
 S. Oh, Y.J. Byon, and H. Yeo, “Improvement of search strategy with knearest neighbors approach for traffic state prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 4, pp. 1146–1156, 2015. View at: Publisher Site  Google Scholar
 D. Xia, B. Wang, H. Li, Y. Li, and Z. Zhang, “A distributed spatialtemporal weighted model on MapReduce for shortterm traffic flow forecasting,” Neurocomputing, vol. 179, pp. 246–263, 2016. View at: Publisher Site  Google Scholar
 J. Guo, W. Huang, B. M. Williams, and Williams, “Adaptive Kalman filter approach for stochastic shortterm traffic flow rate prediction and uncertainty quantification,” Transportation Research Part C: Emerging Technologies, vol. 43, pp. 50–64, 2014. View at: Publisher Site  Google Scholar
 I. Okutani and Y. J. Stephanedes, “Dynamic prediction of traffic volume through Kalman filtering theory,” Transportation Research Part B: Methodological, vol. 18, no. 1, pp. 1–11, 1984. View at: Publisher Site  Google Scholar
 Y. Xie, Y. Zhang, and Z. Ye, “Shortterm traffic volume forecasting using Kalman filter with discrete wavelet decomposition,” ComputerAided Civil and Infrastructure Engineering, vol. 22, no. 5, pp. 326–334, 2007. View at: Publisher Site  Google Scholar
 B. Ghosh, B. Basu, and M. O’Mahony, “Bayesian timeseries model for shortterm traffic flow forecasting,” Journal of Transportation Engineering, vol. 133, no. 3, pp. 180–189, 2007. View at: Publisher Site  Google Scholar
 E. Horvitz, A. Johnson, S. Raman, and L. Liao, “Prediction, expectation, and surprise: methods, designs, and study of a deployed traffic forecasting service,” in Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 1–10, Edinburgh, Scotland, July 2005. View at: Google Scholar
 A. Pascale and M. Nicoli, “Adaptive bayesian network for traffic flow prediction,” in Proceedings of the 2011 IEEE Statistical Signal Processing Workshop (SSP), pp. 177–180, IEEE, Nice, France, June 2011. View at: Publisher Site  Google Scholar
 B. Hamner, “Predicting travel times with contextdependent random forests by modeling local and aggregate traffic flow,” in Proceedings of the IEEE International Conference On Data Mining, ICDM, pp. 1357–1359, IEEE, Sydney, Australia, December 2010. View at: Publisher Site  Google Scholar
 N. Zarei, M. A. Ghayour, and S. Hashemi, “Road traffic prediction using contextaware random forest based on volatility nature of traffic flows,” in Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, vol. 7802, pp. 196–205, Springer, Berlin, Germany, 2013, Lecture Notes in Computer Science. View at: Google Scholar
 V. J. Hodge, R. Krishnan, J. Austin, J. Polak, and T. Jackson, “Shortterm prediction of traffic flow using a binary neural network,” Neural Computing and Applications, vol. 25, no. 7–8, pp. 1639–1655, 2014. View at: Publisher Site  Google Scholar
 W. Huang, G. Song, H. Hong, and K. Xie, “Deep architecture for traffic flow prediction: deep belief networks with multitask learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 5, pp. 2191–2201, 2014. View at: Publisher Site  Google Scholar
 R. Soua, A. Koesdwiady, and F. Karray, “Bigdatagenerated traffic flow prediction using deep learning and dempstershafer theory,” in Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), pp. 3195–3202, Vancouver, Canada, July 2016. View at: Publisher Site  Google Scholar
 Y. Lv, Y. Duan, W. Kang, Z. Li, and F. Y. Wang, “Traffic flow prediction with big data: a deep learning approach,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 2, pp. 865–873, 2015. View at: Publisher Site  Google Scholar
 A. Ermagun and D. Levinson, “Spatiotemporal traffic forecasting: review and proposed directions,” Transport Reviews, vol. 38, no. 6, pp. 786–814, 2018. View at: Publisher Site  Google Scholar
 S. Dunne and B. Ghosh, “Regimebased shortterm multivariate traffic condition forecasting algorithm,” Journal of Transportation Engineering, vol. 138, no. 4, pp. 455–466, 2011. View at: Publisher Site  Google Scholar
 G. Fusco, C. Colombaroni, and N. Isaenko, “Shortterm speed predictions exploiting big data on large urban road networks,” Transportation Research Part C: Emerging Technologies, vol. 73, pp. 183–201, 2016. View at: Publisher Site  Google Scholar
 T. L. Pan, A. Sumalee, R. X. Zhong, and N. Indrapayoong, “Shortterm traffic state prediction based on temporalspatial correlation,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3, pp. 1242–1254, 2013. View at: Publisher Site  Google Scholar
 S. Oh, Y.J. Byon, K. Jang, and H. Yeo, “Shortterm traveltime prediction on highway: a review of the datadriven approach,” Transport Reviews, vol. 35, no. 1, pp. 4–32, 2015. View at: Publisher Site  Google Scholar
 S. Oh, Y.J. Byon, K. Jang, and H. Yeo, “Shortterm traveltime prediction on highway: a review on modelbased approach,” KSCE Journal of Civil Engineering, vol. 22, no. 1, pp. 298–310, 2018. View at: Publisher Site  Google Scholar
 Z. Jia, C. Chen, C. Ben, and P. Varaiya, “The PeMS algorithms for accurate, realtime estimates of gfactors and speeds from singleloop detectors,” in Proceedings of the ITSC 2001. 2001 IEEE intelligent transportation systems. Proceedings (Cat. No.01TH8585), pp. 536–541, Oakland, CA, USA, August 2001. View at: Publisher Site  Google Scholar
 P. Ryus, M. Vandehey, L. Elefteriadou, G. Richard Dowling, and K. Barbara Ostrom, Highway Capacity Manual 2010: Number 273, Transportation Research Board, Washington, DC, USA, 2010.
 D. Ghosh and A. Vogt, “Outliers: an evaluation of methodologies,” Joint Statistical Metings, vol. 2012, pp. 3455–3460, 2012. View at: Google Scholar
 D. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in Proceedings of the International Conference on Learning Representations, pp. 1–13, Banff, Canada, April 2014. View at: Google Scholar
 A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proceedings of the Advances in Neural Information Processing Systems, pp. 1–9, New York, NY, USA, (NIPS 2012). View at: Google Scholar
 A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004. View at: Publisher Site  Google Scholar
 L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at: Publisher Site  Google Scholar
 D. O. Stram and W. W. S. Wei, “A methodological note on the disaggregation of time series totals,” Journal of Time Series Analysis, vol. 7, no. 4, pp. 293–302, 1986. View at: Publisher Site  Google Scholar
 F. X. Diebold and R. S. Mariano, Comparing Predictive Accuracy, National Bureau of Economic Research Inc., Cambridge, MA, USA, 1994, https://ideas.repec.org/p/nbr/nberte/0169.html.
 @NZTA Akld & Nthlnd. @NZTA Akld & Nthlnd, 2015, https://twitter.com/NZTAAkl/status/722855139099426816.
 I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. 46, no. 1–3, pp. 389–422, 2002. View at: Google Scholar
 T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer, New York, NY, USA, 2009, Springer Series in Statistics.
 Y.W. Chang and C.J. Lin, “Feature ranking using linear SVM: causation and prediction challenge,” in Proceedings of the 3rd JMLR Workshop on Causality and Conference WCCI2008, no. 2, pp. 53–64, London, UK, 2008. View at: Google Scholar
Copyright
Copyright © 2020 Rivindu Weerasekera et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.