Research Article  Open Access
Muhammad Faisal Iqbal, Muhammad Zahid, Durdana Habib, Lizy Kurian John, "Efficient Prediction of Network Traffic for RealTime Applications", Journal of Computer Networks and Communications, vol. 2019, Article ID 4067135, 11 pages, 2019. https://doi.org/10.1155/2019/4067135
Efficient Prediction of Network Traffic for RealTime Applications
Abstract
Accurate realtime traffic prediction is required in many networking applications like dynamic resource allocation and power management. This paper explores a number of predictors and searches for a predictor which has high accuracy and low computation complexity and power consumption. Many predictors from three different classes, including classic time series, artificial neural networks, and wavelet transformbased predictors, are compared. These predictors are evaluated using real network traces. Comparison of accuracy and cost, both in terms of computation complexity and power consumption, is presented. It is observed that a double exponential smoothing predictor provides a reasonable tradeoff between performance and cost overhead.
1. Introduction
Internet traffic has grown tremendously in the past decade due to advent of new technologies, industries, and applications. In wireless networks, the growth in traffic is being driven by both increased smart phone subscriptions and a continued increase in average data volume per subscription. Mobile internet traffic grew by 70% between 2016 and 2017 [1]. In the future, traffic generated by smart phones will dominate even more than it does today. Internet of Things (IOT), Cloud computing, and datacenter applications are on the rise. Networks have to cope with everincreasing traffic demands and must provide good quality of service to the users. As a result, efficient utilization of networking equipment has become crucial. This task can be accomplished effectively if network traffic can be predicted accurately. Accurate traffic prediction has application in many networking areas like energy savings [2], network resource management [3], and wireless sensor networks [4].
One application of traffic prediction is power saving. Traffic prediction is being used in core routers of the Internet to save significant amount of power. With increasing traffic demands and computational requirements, the number and complexity of processors used in these routers are on the rise, resulting in greater power consumption. High power consumption of equipment and subsequent rise in cooling costs result in increased operational expenditure of the network. If the network traffic can be predicted accurately, additional processors in these core routers can be turned off during low traffic times to save power. A precisely predicted future traffic load will help us design greener trafficaware networks [5–8].
Second, traffic prediction is required to efficiently use network resources. Accurate prediction of network traffic at access points enables efficient resource allocation to ensure good quality of service [9, 10]. Applications also include congestion control, admission control, network bandwidth allocation, and anomaly detection. With new applications like Youtube, Netflix, and video chats, the amount of video traffic has also increased in the network. Many techniques have been proposed to predict video traffic [11–15]. Furthermore, the task of detecting and preventing network abuse is becoming very difficult with growing amount of traffic and complexity of networks. One countermeasure against network abuse is Anomaly Detection. A significant deviation from normal behavior of traffic can be used to detect an attack. The performance of anomaly detection is directly related to the traffic prediction accuracy [16, 17].
Finally, in wireless sensor networks, energy savings is of extreme importance as the quality of service they provide depends on the energy supply. Networked microsensors technology is viewed to be among the most enabling technologies for the 21st century. Cheap smart devices with multiple onboard sensors, networked through wireless links and the Internet and deployed in large numbers, provide unprecedented opportunities for instrumenting and controlling homes, cities, and the environment. In addition, networked microsensors provide the technology for a broad spectrum of systems in the defense arena, generating new capabilities for reconnaissance and surveillance as well as other tactical applications. Network uptime can be enhanced significantly if unused nodes are powered off. Accurate prediction of future demands is necessary in order to power up these nodes on time and optimize energy saving [4, 18, 19].
The objective of this study is to find a traffic predictor suitable for realtime applications. This predictor needs to be accurate and lightweight in terms of computation cost and power consumption. In the quest for such predictor, this maker makes the following contributions:(i)We compare existing schemes to find their effectiveness for real time applications. Previously proposed traffic predictors are classified into three categories: classic time seriesbased predictors, artificial neural networksbased predictors, and wavelet transformbased predictors. A detailed analysis of accuracy and overhead of these three types of predictors is presented in this study.(ii)Overhead in terms of both computation cost and power consumption is presented, whereas previous work has mainly focused on accuracy.(iii)We propose a new metric called Error_Energy Score (EEScore) to compare the performance of predictors. This metric combines accuracy and energy consumption into a single global performance score for comparing predictors.
The remainder of the paper is organized as follows. Section 2 describes the previous work related to traffic prediction. In Section 3, we present a brief description of three classes of forecasting predictors. Evaluation methodology is described in Section 4. Finally, Section 5 provides the experimental results, and we conclude in Section 6.
2. Related Work
Accurate traffic prediction is useful in numerous networking applications. Consequently, it has been studied by researchers, and many different traffic predictors have been proposed. We can categorize these predictors into three broad classes:
2.1. Time Series Predictors
Many researchers have used time series models for predicting network [20, 21]. One application is network design and capacity planning. Network design and capacity planning can be performed efficiently if traffic behavior is understood and its growth is predicted accurately. For this purpose, evolution in network traffic was studied by researchers [22]. They used autoregressive integrated moving average (ARIMA) mathematical models to predict yearly growth in traffic. This kind of application usually requires an offline analysis of traffic. Therefore, complexity of the predictor is not a big concern in this application. But, an online prediction is needed in many applications like network security. To detect anomalies, it was proposed to predict traffic by statistically separating traffic into its components (e.g., trends, bursts, and noise) and using the components separately to predict traffic [16]. Autoregressive moving average (ARMA) was used to model these traffic components. Similar separation and prediction strategies were proposed by other researchers [23–25].
2.2. Neural NetworksBased Predictors
Artificial intelligence and neural networks have also found applications in network traffic prediction. Artificial neural network is an excellent tool to find complex patterns in input data and has been used for prediction of network traffic [26–28]. Deeplearning algorithms are used for traffic prediction in wireless mesh networks in [29] where researchers propose a network traffic prediction method based on a deeplearning architecture and the spatiotemporal compressive sensing method.
2.3. Wavelet TransformBased Predictors
Some applications require prediction of traffic at different resolutions in different situations. For example, in a Message Travel Time Advisor (MTTA), the timescale of prediction depends on the query presented to it. For small messages, short range prediction is required, and for long messages, the prediction should be on a larger timescale. Wavelet transform was found suitable for multiscale prediction as it naturally transforms a signal into multiple resolutions [30, 31]. Researchers also studied sensitivity of different parameters of wavelet transform for the purpose of traffic prediction [27].
In this study, we focus on shortterm prediction of traffic for applications like dynamic power management of network processors. Obviously, for such an application, complexity and power consumption of the predictor is a major concern. We study the complexity and power consumption of predictors, while previous research has focused only on accuracy. To our knowledge, none of the previous work has studied power and performance overhead of traffic predictors. Our poster presentation [32] presents some initial results from this study. We study three different categories of predictors for onestep ahead prediction and explore the tradeoff between their performance and complexity. Section 3 provides a brief introduction of these prediction techniques.
3. Traffic Prediction Techniques
In this section, we briefly describe representative predictors in the three categories of traffic prediction techniques.
3.1. Classic Time Series Predictors
3.1.1. Last Value (LV) Predictor
This is the simplest predictor and works well in many applications. This predictor uses the last observed value as prediction for the next interval.
3.1.2. Windowed Moving Average (MA)
In this technique, the average of n past observations is used as the prediction for the next interval. This technique gives same weight to all n previous observations. The number of past observations (n) used is called the order of MA.
3.1.3. Double Exponential Smoothing (DES)
Exponential smoothing assigns exponentially lower weights to older observations. Single exponential smoothing does not work well when there is a trend in data [33]. Trend means that the average value of the time series increases or decreases with time. However, double exponential smoothing adds the trend component for estimation and is considered more appropriate for data with trends. Table 1 presents equation for DES. α defines the speed at which older values are damped. When α is close to 1, dampening is quick, and when α is close to 0, dampening is slow. The values of α and γ are obtained using nonlinear optimization techniques and are learned during the training phase of the predictor (see Section 4 for more details on training).

3.1.4. Autoregression (AR)
The next value of the series is the weighted sum of the previous observations. The weights are called autoregressive coefficients. Autoregression is similar to regression with a slight difference. Regression analysis provides a bestfit mathematical equation between the dependent and independent variables, whereas in autoregression technique, a signal is regressed with itself to exploit the autocorrelation structure. The number of past observations used for regression defines the order of AR. The AR model with order n is represented as AR(n).
3.1.5. Autoregression Moving Average (ARMA)
This model uses the moving average (MA) of the previous error terms in addition to autoregression (AR). The error term is defined as the difference between actual and predicted values. Final prediction is the sum of the AR component and MA component. The order of AR and MA can be different. An ARMA model with AR order n and MA order m is represented as AR(n, m).
Figure 1 shows the block diagram of an ARMA(n, m) predictor. This predictor requires one queue of size n for holding past observations (), one queue of size m for past errors (), n registers for storing autoregression coefficients (), and m registers for storing MA coefficients (). Each past observation and error term is multiplied with its respective coefficients to calculate AR and MA components. AR and MA components are then added together to get the final prediction. Similar requirements for all other predictors are presented in Table 1.
3.2. Artificial Neural Network (ANN) Based Predictors
Figure 2 shows a block diagram for a very simple ANN. This ANN has “n” neurons in the input layer, two neurons in the middle layer, and 1 neuron in the output layer. The weights are learned during the training phase. Table 1 shows computation and storage required for making prediction by the ANN predictor shown in Figure 2. The requirements seem similar to an ARMA predictor, but ANN has additional overhead of activation function f. The simplest activation function is a sigmoid function . This function involves calculating exponent which requires order of x multiplications. Figure 2 shows that even this simple ANN (with only 1 hidden layer with 2 neurons) requires a lot more processing for prediction as compared to an ARMA predictor. For this research, fast ANN library [34] is used.
3.3. WaveletBased Predictors
The prediction using wavelets involves three steps: wavelet decomposition, signal extension, and signal reconstruction [35]. Wavelet decomposition is performed using a wavelet prototype function called an analyzing wavelet or a mother wavelet. This step divides the signal into a lowpass output called Approximation and a highpass output called Detail. The wavelet decomposition function can be applied recursively to the approximations to get further levels of approximations and details.
The final output can be thought of as a tree such that as we move down level by level, we see coarser and coarser version of the signal. Figure 3 shows a threelevel decomposition of a signal x. At each level, the decomposition involves convolution of the signal with lowpass and highpass wavelet filters and downsampling the results to get the coefficients of approximations and details, respectively. At any level, the original signal is the sum of the approximation at that level plus details at all lower levels, i.e., for a threelevel decomposition of signal x, . The smooth part of x is captured by , and details are captured by and . The signals are obtained by inverse wavelet transform of the coefficients , respectively. A model (e.g., AR) is fitted on approximation and details which are extended by predicting the next values using this model. Finally, the extended approximations and details are combined to get the predictions for the original time series. Overhead of waveletbased prediction is also shown in Table 1. For each level of decomposition, we have to perform convolution twice, once to get approximation and once to get detail. Convolution is an operation. Inverse wavelet transform (additional convolution operations) is used to get signals and . We have to further perform AR on and which have additional overhead. Waveletbased prediction is clearly a very expensive technique.
4. Experimental Methodology
We evaluate the performance of all techniques described above using real network traces. Figure 4 shows our prediction methodology. The original trace contains arrival times of each packet. We calculate the traffic rate for each time interval t and predict the traffic rate for the next interval. The trace is divided into two parts. The initial 25% of trace constitutes training set, and the remaining is used to test the prediction accuracy. During the training phase, we identify the optimal model parameters such that the error between the predicted and actual values is minimized. During the evaluation phase, outputs are predicted using the inputs (the test data) which were not used for training and the error between predicted and actual values is calculated. The traces used in this study are described next.
4.1. Traces
A large number of real network traces are used to study the performance of predictors. The details of the set of traces are described below.
4.1.1. CAIDA Traces
This dataset contains anonymized passive traffic traces from CAIDA’s equinixchicago and equinixsanjose monitors on highspeed internet backbone traffic links [36]. Both monitors are connected to OC192 links. The traces are of onehour long duration captured in the year 2011. 20 traces from the year 2011 are available on the website. We used 16 out of these 20. The other four traces had significant packet drops due to monitoring inaccuracies and are not included in this analysis.
4.1.2. University of Auckland Traces
This set of traces, also known as AUCKII, captures the traffic between the University of Auckland and its ISP [37]. All external connections from the university to the outside world pass through this measurement point. Most traces were targeted at 24hour runs, but hardware failures have resulted in most traces being significantly shorter. All nonIP traffic has been discarded, and only TCP, UDP, and ICMP traffic is available in the traces. 20 traces are selected at random from all available traces.
4.1.3. Bellcore Research Traces
These are four historical traces recorded in 1989 and used by almost all previous studies. Each trace contains a million packet arrivals seen on an Ethernet at the Bellcore Morristown Research and Engineering facility. Two of the traces are LAN traffic (with a small portion of transit WAN traffic), and two are WAN traffic [38].
4.2. Predictability Metric
In this paper, we use normalized mean square error to compare the performance of predictors. This metric is widely used for evaluating prediction performance. It is the ratio of mean square error to the variance of the series.where is the actual value of time series at time , is the predicted value of , and is the total number of predictions. is the variance of time series during prediction. This metric compares the performance of the predictor with a trivial predictor (one which always predicts the mean of the time series). In case of this trivial predictor (mean predictor), NMSE = 1. If NMSE > 1, this means that the predictor is worse than the trivial. NMSE = 0 in case of a perfect predictor.
5. Results and Discussion
In this section, we study and characterize the traffic traces and present the results of the prediction techniques. Understanding the behavior of traffic is important, so we first study traffic behavior.
5.1. Is Network Traffic Purely Random?
The first question that comes to mind when trying to predict any traffic is to know if it has some pattern and is not random. Of course, if a traffic is purely random white noise, any predictor will do poorly on it. In fact, it can be statistically shown that a meanvalue predictor is the best we can do in terms of minimizing mean square error for such a random series [33]. There are two methods which are generally used to evaluate the nonrandomness in data.
5.1.1. Runs Test
This is a statistical test to evaluate the randomness in data. Null hypothesis for this test is that the data in a given series appear in a random order. A run is defined as a series of consecutive increasing values or decreasing values. The number of increasing or decreasing values is the length of the run. In a random dataset, the probability that the next value is larger or smaller than the Ith value follows a binomial distribution, which forms the basis of the runs test [33]. We used function from Matlab® to test the randomness of traffic traces. For all of the traces that we have used in this study, this test rejected the null with 95% confidence, i.e., it concluded that these traces are not random and should be predictable if modeled properly.
5.1.2. Autocorrelation Factor (ACF) Plots
Autocorrelation is similar to the correlation coefficient. But, instead of correlation between two different variables, it is the correlation between two values of the same variable at times and [33]. ACF can be used for two purposes: (1) to detect nonrandomness in data, (2) to detect an appropriate time series model if data are nonrandom. The randomness in data is ascertained by measuring ACF at different time lags. The lags represent the number of previous values (or previous intervals) in the time series. If data are random, ACF will be zero or near zero for any and all time lags. If data are nonrandom, then ACF has a significant value at one or more time lags. We studied ACF of all the traffic traces and found that almost all the traces show nonrandom behavior.
The ACF plot for a typical traffic trace is shown in Figure 5(b). The horizontal axis shows time lag (h = 0, 1, 2, 3, …) and the vertical axis shows correlation coefficient. The plot also has some reference horizontal lines. The middle line is at 0, and the other two dotted lines are 95% confidence lines.
(a)
(b)
(c)
Figure 5(b) shows that correlation at lag 1 is moderate (approximately 0.6); then, it decreases gradually to zero. Such a pattern is the autocorrelation plot signature of “moderate autocorrelation,” which in turn provides moderate predictability if modeled properly. If data were truly random, then all the coefficients should have been close to zero inside those dotted lines. Figure 5(a) shows the ACF plot of a randomly generated trace. We see that all the lags except lag 0 (which is 1 by definition) are close to zero for this trace. Real network traces however show nonrandom behavior. Most of the traces we used have ACF similar to Figure 5(b), but there were a few traces (4 out of 40) which showed very weak correlation. Figure 5(b) shows one such example. The ACF drops very low (0.2) at lag 1, and almost all of the lags are close to zero. This behavior is not truly random, and we still have some hope of predictability even in these traces.
5.2. Accuracy of Predictors
This section presents accuracy comparison of all the prediction techniques described in Section 3. For MA predictors, an order 8 gave the best answer for all traces and the results of order 8 are presented. For both AR and ARMA predictors, the order of AR and MA processes was varied from 0 to 20 and the best performing order is chosen. In most cases, AR(8) and ARMA(9, 8) gave the best answers and the results with these orders are presented. For ANNbased predictor, we used one input, one output, and one hidden layer for neurons. The number of input neurons was fixed to eight, i.e., eight previous samples were used for prediction of the next one value. The number of neurons in the middle layer was varied from 1 to 10, and the results for best performing ANN (with 4 neurons in the middle layer) are presented in this section. For waveletbased predictions, we used twolevel decomposition and evaluated the predictor for different mother filters . For most of the experiments, db3 gave the best performance. The results with db3 (as the mother filter) are presented in this section. Representative predictors from each category are compared. We tried many other predictors for which we do not present the results. For example, in the classic predictors category, we tried exponential smoothing (ES) and triple exponential smoothing (TES) in addition to DES. ES did not provide good accuracy, and TES increased the complexity without increasing the accuracy. So, we skipped these two predictors. We also tried ANN with different configurations, i.e., with different number of layers and different number of neurons in each layer. The results were consistent with previous research. The configuration of ANN predictor which was found suitable by previous research is presented in this study. The same is true with waveletbased prediction and choice of different orders of classic predictors.
5.2.1. CAIDA Traces
We applied the testing methodology presented in Section 4 to the 16 CAIDA traces. The normalized mean square error is shown in Figure 6. The NMSE value for all traces is less than 1 for all the predictors, showing that these traces are generally very predictable. DES is a clear winner in all these traces. For most of the traces, DES gives an NMSE value less than 0.4, which means that DES is able to explain more than 60% of traffic variation. The ARMA predictor also performs very well for these traces. Waveletbased prediction does not provide good prediction quality. This result agrees with findings in [27] and contradicts the results in [31]. This unsatisfactory performance of waveletbased prediction may be due to the effect of boundary conditions when applying wavelet transform to a finite length time series [27]. We did not study the boundary conditions in this paper, and interested readers can refer to [27] for a detailed study on effects of boundary conditions.
5.2.2. University of Auckland Traces
Figure 7 shows the performance of different traces from the University of Auckland. Most of the predictors have NMSE less than 1 which indicates that traffic in these traces is also predictable. The lastvalue predictor performs very badly. Its NMSE value is greater than 1 for many traces. The best prediction performance for these traces is shown by the ARMA predictor. This result agrees with the previous research [31, 39] which has argued that the ARMA predictor is suitable for the traffic since it is capable of capturing both short and longrange dependence. ANN also performs well for these traces. ANN has sufficiently more complexity than ARMA as we have seen in Section 3. It consumes more resources without providing any additional benefits. The third best predictor in these traces is DES. DES is the cheapest among predictors (except LV). So, the situations where the cost of predictor matters, DES seems to provide a very good balance between cost and complexity. Also, note that, for traces au5, au6, and au18, none of the predictors show reasonable performance. In fact, these three are the traces which show limited predictability when we studied the ACF characteristics. The plot in Figure 5(c) is for au18 trace. au5 and au6 also show similar ACF plots.
5.2.3. BC Traces
Figure 8 shows how the predictors perform for BC traces. For all these traces, ARMA, DES, and ANN all perform very well. For the fourth trace, the NMSE value is very close to zero for all the predictors. Trace bc4 captures only external traffic and contains long periods of inactivity. So, most of the predictors show very good behavior for this trace.
5.3. Effect of Increasing Prediction Interval on Accuracy
We have thus far presented results with the prediction interval of 100 ms, i.e., the traffic rate is observed for 100 ms intervals, and a prediction is made for the next 100 ms. Figure 9 shows the effect of increasing the prediction interval on performance of the predictor for au7 trace. Increasing the interval size works as a lowpass filter and filters out highfrequency noise. Shortterm traffic variations are smoothed out and thus, the performance of all the predictors increases with increasing the prediction interval. All other traces also exhibit the same behavior. Previous study [31] reported that the predictability graphs show concave behavior, i.e., there is a sweet spot for traffic prediction. We did not observe any such behavior, and all the traces consistently exhibit behavior of Figure 9. Figure 9 shows that NMSE for ANN increases slightly when we increase the prediction interval from 2 to 5 seconds. This may be due to the fact that we are using only 25% of the trace for training. Increasing the prediction interval decreases the number of data points in the training portion which might affect accuracy. Researchers have provided a detailed study of effect of changing size of training data on prediction results [21].
5.4. Power and Performance Overhead of Traffic Predictors
One of our main objectives in this paper is to find a predictor which accurately predicts traffic while consuming the minimum amount of power. Table 1 shows the computations and data storage requirements of these predictors. In this study, we focus only on power and performance overhead during the prediction phase. A predictor needs to be trained only once and that overhead can be ignored. In other situations, where traffic behavior changes over time, we may need to retrain the predictors. But, this training is required very rarely as previous research has shown that traffic behavior remains steady over time [31, 39].
We implemented these predictors in software and measured the performance and energy overhead of these software predictors on a simple 2issue processor. Table 2 shows the specification of the processor.

We used a onehour long trace and measured the performance and power using a GEMS fullsystem simulator. Table 3 shows instructions per prediction and energy per prediction for each type of predictor when the predictors are implemented in software. We see that ANNbased and waveletbased predictors require considerably more instructions than other predictors. The energy is measured by augmenting GEMS [40] with a Wattch power simulator [41]. The power numbers are presented using the most aggressive clock gating “cc3” provided by Wattch and are for 90 nm technology. We measured the total energy consumed for the execution of traffic trace and divided the total energy by the number of predictions to get the energy per prediction. It is interesting to note that DES has a minimum number of instructions/prediction and consequently, minimum power per prediction. We have seen from the performance results in Section 5 that ANN and ARMA also give very good performance results for most of the traces. But, when comparing energy consumption, we can see that DES is the lowest powerconsuming predictor. It is also comparable in performance to the highcost predictors like ANN which makes this very useful for applications like onestepahead traffic prediction for power management. Energy consumption by the ARMA predictor is also fairly low as compared to ANN and wavelet. Although ANN performs well in most situations, the power and performance cost associated with it make it suitable only for offline applications like network design and capacity planning.

5.5. Combining Accuracy and Energy Metrics
The accuracy and energy consumption results for all the studied predictors are presented in Sections 5.2 and 5.4, respectively. Now, we combine the accuracy and energy metrics into a single function which we call Energy_Error Score or EEScore. This metric is defined based on a general technique of combining multiple metrics into a global measure described in [42]. This combined global metric allows us to compare all the predictors using a single number and can help us declare the “winner.” As accuracy (described in terms of NMSE) and energy consumption (μj per prediction) come from two different distributions and have different scales, we first standardize these metrics by calculating their standard score (zscore). Now, the Energy_Error Score iswhere is the relative weight of energy metric in the EEScore and is the relative weight of accuracy metric in the EEScore, such that .
For example, gives equal weight to energy and error. Relative weights can be set based on the particular application and scenario. For our application, we give equal weight to accuracy and energy consumption. As we are trying to minimize both energy and error, a predictor with the lowest value of EEScore will be the winner.
Figure 10 shows the result of the EEScore for all the techniques studied. This number is calculated based on average NMSE and average energy consumption per prediction across all the traces. These numbers are normalized and combined using equation (2) to get the EEScore. Note that EEScore of 0 means the performance of the average predictor. Negative numbers mean the predictors’ EEScore is less than the mean, and a positive score means it is greater than the mean. Our goal is to minimize the EEScore so that both error and energy consumption metrics are minimized. From the figure, we see that the DES (double exponential smoothing) predictor has the lowest EEScore, and hence, this can be declared as the “best one” if we give equal weights to accuracy and energy consumption. Note that LV performs poorly because it has least accuracy and wavelet performs badly because it has the highest cost without giving any benefit in accuracy.
6. Conclusions
We have provided a performance and power comparison of three different classes of predictors using a large number of real network traces. Our results indicate that network traffic is generally predictable. Furthermore, the choice of the predictor is dependent on the characteristics of the network. We found different predictors suitable for traces from different sources. The same predictor performs consistently well for all the traces from the same source. Also, in power critical online applications, DES and ARMA show promising accuracy with minimal energy overhead. The ANNbased predictor performed consistently well but has high power and computation overhead. We have proposed a new metric to combine accuracy and power consumption into a single number. Based on this metric, DES emerged as the predictor of choice when accuracy and energy consumption are viewed collectively.
Data Availability
The network traces used in this study are taken from three different sources. University of Auckland traces can be found at http://wand.net.nz/wits/auck/2. Bellcore traces are available at http://ita.ee.lbl.gov/html/contrib/BC.html. The network traces from CAIDA were supplied under license. Request to access these data should be made to CAIDA directly (http://www.caida.org).
Disclosure
Some preliminary results from this work appeared as a poster presentation in the 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
References
 Erricson Website, Erricson Mobility Report, 2017, https://www.ericsson.com/en/mobilityreport/reports/november2017.
 R. Li, Z. Zhao, X. Zhou, and H. Zhang, “Energy savings scheme in radio access networks via compressive sensingbased traffic load prediction,” Transactions on Emerging Telecommunications Technologies, vol. 25, no. 4, pp. 468–478, 2012. View at: Publisher Site  Google Scholar
 U. Paul, M. Buddhikot, and S. R. Das, “Opportunistic traffic scheduling in cellular networks,” in Proceedings of IEEE International Symposium on Dynamic Spectrum Access Networks, pp. 339–348, Bellevue, WA, USA, October 2012. View at: Google Scholar
 Z. Li, J. Bi, and S. Chen, “Traffic predictionbased fast routing algorithm for wireless multimedia sensor networks,” International Journal of Distributed Sensor Networks, vol. 9, no. 5, Article ID 176293, 2013. View at: Publisher Site  Google Scholar
 R. Li, Z. Zhao, J. Zheng, C. Mei, Y. Cai, and H. Zhang, “The learning and prediction of applicationlevel traffic data in cellular networks,” IEEE Transactions on Wireless Communications, vol. 16, no. 6, pp. 3899–3912, 2017. View at: Publisher Site  Google Scholar
 M. F. Iqbal, WorkloadAware Network Processors, The University of Texas at Austin, Austin, TX, USA, 2013.
 H. Kawase, Y. Mori, H. Hasegawa, and K.c. Sato, “Dynamic router performance control utilizing support vector machines for energy consumption reduction,” IEEE Transactions on Network and Service Management, vol. 13, no. 4, pp. 860–870, 2016. View at: Publisher Site  Google Scholar
 M. F. Iqbal and L. K. John, “Efficient traffic aware power management for multicore communications processors,” in Proceedings of IEEE/ACM Conference on Architecture for Network and Communication Systems, Austin, TX, USA, July 2012. View at: Google Scholar
 A. Svigelj, R. Sernec, and K. Alic, “Network traffic modeling for load prediction: a usercentric approach,” IEEE Network, vol. 29, no. 4, pp. 88–96, 2015. View at: Publisher Site  Google Scholar
 R. Sivakumar, E. A. Kumar, and G. Sivaradje, “Prediction of traffic load in wireless networks using time series model,” in Proceedings of International Conference on Process Automation, Control and Computing (PACC), pp. 1–6, Coimbatore, India, July 2011. View at: Google Scholar
 S.J. SangJo Yoo, “Efficient traffic prediction scheme for realtime VBR MPEG video transmission over highspeed networks,” IEEE Transactions on Broadcasting, vol. 48, no. 1, pp. 10–18, 2002. View at: Publisher Site  Google Scholar
 W. Xu and A. G. Qureshi, “Adaptive linear prediction of MPEG video traffic,” in Proceedings of Fifth International Symposium on Signal Processing and its Applications, vol. 1, pp. 67–70, Brisbane, Australia, August 1999. View at: Google Scholar
 A. Abdennour, “Shortterm MPEG4 video traffic prediction using ANFIS,” International Journal of Network Management, vol. 15, no. 6, pp. 377–392, 2005. View at: Publisher Site  Google Scholar
 A. D. Doulamis, N. D. Doulamis, and S. D. Kollias, “Recursive non linear models for on line traffic prediction of VBR MPEG coded video sources,” in Proceedings of IEEEINNSENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, vol. 6, pp. 114–119, Como, Italy, July 2000. View at: Google Scholar
 A. S. Soares, T. P. Oliveira, and J. S. Barbar, “Computer network traffic prediction: a comparison between traditional and deep learning neural networks,” International Journal of Big Data Intelligence, vol. 3, no. 1, p. 28, 2016. View at: Publisher Site  Google Scholar
 J. Jiang and S. Papavassiliou, “Enhancing network traffic prediction and detection via statistical network traffic separation and combination strategies,” Computer Communications, vol. 29, no. 10, pp. 1627–1638, 2006. View at: Publisher Site  Google Scholar
 P. RomirerMaierhofer, M. Schiavone, and A. D’Alconzo, “Device specific traffic characterization for root cause analysis in cellular networks,” in Proceedings of International Workshop on Traffic Monitoring and Analysis, Barcelona, Spain, April 2015. View at: Google Scholar
 C.J. Yi, Y. Geng, H. dai, liang Liu, and L. Ning, “Switching algorithm with prediction strategy for maximizing lifetime in wireless sensor network,” International Journal of Distributed Sensor Networks, vol. 11, no. 11, Article ID 592093, 2015. View at: Publisher Site  Google Scholar
 Y. Cai and L. Yu, “Sensor network traffic load prediction with markov random field theory,” in Proceedings of IEEE International Conference on Computer Science and Network Technology, Harbin, China, December 2015. View at: Google Scholar
 S. Basu, A. Mukherjee, and S. Klivansky, “Time series models for internet traffic,” in Proceedings of Fifteenth Annual Joint Conference of the IEEE Computer and Communications Societies Conference on Computer Communications  Volume 2, INFOCOM’96, pp. 611–620, Washington, DC, USA, March 1996. View at: Google Scholar
 M. F. Zhani, H. Elbiaze, and F. Kamoun, “Analysis of prediction performance of training based models using real network traffic,” International Journal of Computer Applications in Technology, vol. 37, no. 1, 2010. View at: Publisher Site  Google Scholar
 N. K. Groschwitz and G. C. Polyzos, “A time series model of longterm nsfnet backbone traffic,” in Proceedings of ICC/SUPERCOMM’94  1994 International Conference on Communications, New Orleans, LA, USA, May 1994. View at: Google Scholar
 D. Shen and J. L. Hellerstein, “Predictive models for proactive network management: application to a production web server,” in Proceedings of IEEE/IFIP Network Operations and Management Symposium “The Networked Planet: Management Beyond 2000”, pp. 833–846, Honolulu, HI, USA, April 2000. View at: Google Scholar
 J. L. Hellerstein, B. Fan, and P. Shahabuddin, “An approach to predictive detection for service management,” in Proceedings of Sixth IFIP/IEEE International Symposium on Integrated Network Management, pp. 309–322, Boston, MA, USA, May 1999. View at: Google Scholar
 A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E. D. Kolaczyk, and N. Taft, “Structural analysis of network traffic flows,” ACM SIGMETRICS Performance Evaluation Review, vol. 32, no. 1, p. 61, 2003. View at: Publisher Site  Google Scholar
 M. F. Zhani, H. Elbiaze, and F. Kamoun, “SNFAQM: an active queue management mechanism using neurofuzzy prediction,” in Proceedings of 12th IEEE Symposium on Computers and Communications, pp. 381–386, Aveiro, Portugal, July 2007. View at: Google Scholar
 H. Feng and Y. Shu, “Study on network traffic prediction techniques,” in Proceedings of International Conference on Wireless Communications, Networking and Mobile Computing, pp. 1041–1044, Beijing, China, June 2005. View at: Google Scholar
 A. D. Doulamis, N. D. Doulamis, and S. D. Kollias, “An adaptable neuralnetwork model for recursive nonlinear traffic prediction and modeling of MPEG video sources,” IEEE Transactions on Neural Networks, vol. 14, no. 1, pp. 150–166, 2003. View at: Publisher Site  Google Scholar
 L. Nie, X. Wang, L. Wan, S. Yu, H. Song, and D. Jiang, “Network traffic prediction based on deep belief network and spatiotemporal compressive sensing in wireless mesh backbone networks,” Wireless Communications and Mobile Computing, vol. 2018, Article ID 1260860, 10 pages, 2018. View at: Publisher Site  Google Scholar
 H. Zhao and N. Ansari, “Wavelet transformbased network traffic prediction: a fast online approach,” Journal of Computing and Information Technology, vol. 20, no. 1, 2012. View at: Publisher Site  Google Scholar
 Y. Qiao, J. Skicewicz, and D. Peter, “An empirical study of the multiscale predictability of network traffic,” in Proceedings of 13th IEEE International Symposium on High performance Distributed Computing, Honolulu, HI, USA, June 2003. View at: Google Scholar
 M. Faisal Iqbal and L. K. John, “Power and performance analysis of network traffic prediction techniques,” in Proceedings of IEEE International Symposium on Performance Analysis of Systems & Software, Austin, TX, USA, April 2012. View at: Google Scholar
 NIST/SEMATECH, EHandbook of Statistical Methods, National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA, 2012.
 S. Nissen, Implementation of a Fast Artificial Neural Network Library (FANN), University of Copenhagen, Nørregade, Denmark, 2003.
 A. Graps, “An introduction to wavelets,” IEEE Computational Science and Engineering, vol. 2, no. 2, pp. 50–61, 1995. View at: Publisher Site  Google Scholar
 K. C. Claffy, D. Andersen, and H. Paul, The CAIDA Anonymized 2011 Internet Traces, CAIDA, La Jolla, CA, USA, 2011.
 WAND WITS, Auckland II Traces, 2000, http://wand.net.nz/wits/auck/2.
 B. Morristown, Bellcore Traces, 1989, http://ita.ee.lbl.gov/html/contrib/BC.html.
 A. Sang and S. Q. Li, “A predictability analysis of network traffic,” in Proceedings of Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies, San Diego, CA, USA, August 2000. View at: Google Scholar
 M. M. K. Martin, D. J. Sorin, B. M. Beckmann et al., “Multifacet’s general executiondriven multiprocessor simulator (GEMS) toolset,” ACM SIGARCH Computer Architecture News, vol. 33, no. 4, p. 92, 2005. View at: Publisher Site  Google Scholar
 D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: a framework for architecturallevel power analysis and optimizations,” in Proceedings of 27th Annual International Symposium on Computer Architecture, pp. 83–94, New York, NY, USA, June 2000. View at: Google Scholar
 M. Chignell, T. Tong, S. Mizobuchi, T. Delange, W. Ho, and W. Walmsley, “Combining multiple measures into a single figure of merit,” in Proceedings of 7th International Conference on Advances in Information Technology, Elsevier, Bangkok, Thailand, August 2015. View at: Google Scholar
Copyright
Copyright © 2019 Muhammad Faisal Iqbal et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.