#### Abstract

Accurate real-time traffic prediction is required in many networking applications like dynamic resource allocation and power management. This paper explores a number of predictors and searches for a predictor which has high accuracy and low computation complexity and power consumption. Many predictors from three different classes, including classic time series, artificial neural networks, and wavelet transform-based predictors, are compared. These predictors are evaluated using real network traces. Comparison of accuracy and cost, both in terms of computation complexity and power consumption, is presented. It is observed that a double exponential smoothing predictor provides a reasonable tradeoff between performance and cost overhead.

#### 1. Introduction

Internet traffic has grown tremendously in the past decade due to advent of new technologies, industries, and applications. In wireless networks, the growth in traffic is being driven by both increased smart phone subscriptions and a continued increase in average data volume per subscription. Mobile internet traffic grew by 70% between 2016 and 2017 [1]. In the future, traffic generated by smart phones will dominate even more than it does today. Internet of Things (IOT), Cloud computing, and data-center applications are on the rise. Networks have to cope with ever-increasing traffic demands and must provide good quality of service to the users. As a result, efficient utilization of networking equipment has become crucial. This task can be accomplished effectively if network traffic can be predicted accurately. Accurate traffic prediction has application in many networking areas like energy savings [2], network resource management [3], and wireless sensor networks [4].

One application of traffic prediction is power saving. Traffic prediction is being used in core routers of the Internet to save significant amount of power. With increasing traffic demands and computational requirements, the number and complexity of processors used in these routers are on the rise, resulting in greater power consumption. High power consumption of equipment and subsequent rise in cooling costs result in increased operational expenditure of the network. If the network traffic can be predicted accurately, additional processors in these core routers can be turned off during low traffic times to save power. A precisely predicted future traffic load will help us design greener traffic-aware networks [5–8].

Second, traffic prediction is required to efficiently use network resources. Accurate prediction of network traffic at access points enables efficient resource allocation to ensure good quality of service [9, 10]. Applications also include congestion control, admission control, network bandwidth allocation, and anomaly detection. With new applications like Youtube, Netflix, and video chats, the amount of video traffic has also increased in the network. Many techniques have been proposed to predict video traffic [11–15]. Furthermore, the task of detecting and preventing network abuse is becoming very difficult with growing amount of traffic and complexity of networks. One countermeasure against network abuse is *Anomaly Detection*. A significant deviation from normal behavior of traffic can be used to detect an attack. The performance of anomaly detection is directly related to the traffic prediction accuracy [16, 17].

Finally, in wireless sensor networks, energy savings is of extreme importance as the quality of service they provide depends on the energy supply. Networked microsensors technology is viewed to be among the most enabling technologies for the 21st century. Cheap smart devices with multiple on-board sensors, networked through wireless links and the Internet and deployed in large numbers, provide unprecedented opportunities for instrumenting and controlling homes, cities, and the environment. In addition, networked microsensors provide the technology for a broad spectrum of systems in the defense arena, generating new capabilities for reconnaissance and surveillance as well as other tactical applications. Network uptime can be enhanced significantly if unused nodes are powered off. Accurate prediction of future demands is necessary in order to power up these nodes on time and optimize energy saving [4, 18, 19].

The objective of this study is to find a traffic predictor suitable for real-time applications. This predictor needs to be accurate and lightweight in terms of computation cost and power consumption. In the quest for such predictor, this maker makes the following contributions:(i)We compare existing schemes to find their effectiveness for real time applications. Previously proposed traffic predictors are classified into three categories: classic time series-based predictors, artificial neural networks-based predictors, and wavelet transform-based predictors. A detailed analysis of accuracy and overhead of these three types of predictors is presented in this study.(ii)Overhead in terms of both computation cost and power consumption is presented, whereas previous work has mainly focused on accuracy.(iii)We propose a new metric called Error_Energy Score (EE-Score) to compare the performance of predictors. This metric combines accuracy and energy consumption into a single global performance score for comparing predictors.

The remainder of the paper is organized as follows. Section 2 describes the previous work related to traffic prediction. In Section 3, we present a brief description of three classes of forecasting predictors. Evaluation methodology is described in Section 4. Finally, Section 5 provides the experimental results, and we conclude in Section 6.

#### 2. Related Work

Accurate traffic prediction is useful in numerous networking applications. Consequently, it has been studied by researchers, and many different traffic predictors have been proposed. We can categorize these predictors into three broad classes:

##### 2.1. Time Series Predictors

Many researchers have used time series models for predicting network [20, 21]. One application is network design and capacity planning. Network design and capacity planning can be performed efficiently if traffic behavior is understood and its growth is predicted accurately. For this purpose, evolution in network traffic was studied by researchers [22]. They used autoregressive integrated moving average (ARIMA) mathematical models to predict yearly growth in traffic. This kind of application usually requires an offline analysis of traffic. Therefore, complexity of the predictor is not a big concern in this application. But, an online prediction is needed in many applications like network security. To detect anomalies, it was proposed to predict traffic by statistically separating traffic into its components (e.g., trends, bursts, and noise) and using the components separately to predict traffic [16]. Autoregressive moving average (ARMA) was used to model these traffic components. Similar separation and prediction strategies were proposed by other researchers [23–25].

##### 2.2. Neural Networks-Based Predictors

Artificial intelligence and neural networks have also found applications in network traffic prediction. Artificial neural network is an excellent tool to find complex patterns in input data and has been used for prediction of network traffic [26–28]. Deep-learning algorithms are used for traffic prediction in wireless mesh networks in [29] where researchers propose a network traffic prediction method based on a deep-learning architecture and the spatiotemporal compressive sensing method.

##### 2.3. Wavelet Transform-Based Predictors

Some applications require prediction of traffic at different resolutions in different situations. For example, in a Message Travel Time Advisor (MTTA), the timescale of prediction depends on the query presented to it. For small messages, short range prediction is required, and for long messages, the prediction should be on a larger timescale. Wavelet transform was found suitable for multiscale prediction as it naturally transforms a signal into multiple resolutions [30, 31]. Researchers also studied sensitivity of different parameters of wavelet transform for the purpose of traffic prediction [27].

In this study, we focus on short-term prediction of traffic for applications like dynamic power management of network processors. Obviously, for such an application, complexity and power consumption of the predictor is a major concern. We study the complexity and power consumption of predictors, while previous research has focused only on accuracy. To our knowledge, none of the previous work has studied power and performance overhead of traffic predictors. Our poster presentation [32] presents some initial results from this study. We study three different categories of predictors for one-step ahead prediction and explore the tradeoff between their performance and complexity. Section 3 provides a brief introduction of these prediction techniques.

#### 3. Traffic Prediction Techniques

In this section, we briefly describe representative predictors in the three categories of traffic prediction techniques.

##### 3.1. Classic Time Series Predictors

###### 3.1.1. Last Value (LV) Predictor

This is the simplest predictor and works well in many applications. This predictor uses the last observed value as prediction for the next interval.

###### 3.1.2. Windowed Moving Average (MA)

In this technique, the average of *n* past observations is used as the prediction for the next interval. This technique gives same weight to all *n* previous observations. The number of past observations (*n*) used is called the order of MA.

###### 3.1.3. Double Exponential Smoothing (DES)

Exponential smoothing assigns exponentially lower weights to older observations. Single exponential smoothing does not work well when there is a trend in data [33]. Trend means that the average value of the time series increases or decreases with time. However, double exponential smoothing adds the trend component for estimation and is considered more appropriate for data with trends. Table 1 presents equation for DES. *α* defines the speed at which older values are damped. When *α* is close to 1, dampening is quick, and when *α* is close to 0, dampening is slow. The values of *α* and *γ* are obtained using nonlinear optimization techniques and are learned during the training phase of the predictor (see Section 4 for more details on training).

###### 3.1.4. Autoregression (AR)

The next value of the series is the weighted sum of the previous observations. The weights are called autoregressive coefficients. Autoregression is similar to regression with a slight difference. Regression analysis provides a best-fit mathematical equation between the dependent and independent variables, whereas in autoregression technique, a signal is regressed with itself to exploit the autocorrelation structure. The number of past observations used for regression defines the order of AR. The AR model with order *n* is represented as AR(*n*).

###### 3.1.5. Autoregression Moving Average (ARMA)

This model uses the moving average (MA) of the previous error terms in addition to autoregression (AR). The error term is defined as the difference between actual and predicted values. Final prediction is the sum of the AR component and MA component. The order of AR and MA can be different. An ARMA model with AR order *n* and MA order *m* is represented as AR(*n*, *m*).

Figure 1 shows the block diagram of an ARMA(*n*, *m*) predictor. This predictor requires one queue of size *n* for holding past observations (), one queue of size *m* for past errors (), *n* registers for storing autoregression coefficients (), and *m* registers for storing MA coefficients (). Each past observation and error term is multiplied with its respective coefficients to calculate AR and MA components. AR and MA components are then added together to get the final prediction. Similar requirements for all other predictors are presented in Table 1.

##### 3.2. Artificial Neural Network- (ANN-) Based Predictors

Figure 2 shows a block diagram for a very simple ANN. This ANN has “*n*” neurons in the input layer, two neurons in the middle layer, and 1 neuron in the output layer. The weights are learned during the training phase. Table 1 shows computation and storage required for making prediction by the ANN predictor shown in Figure 2. The requirements seem similar to an ARMA predictor, but ANN has additional overhead of activation function *f*. The simplest activation function is a sigmoid function . This function involves calculating exponent which requires order of *x* multiplications. Figure 2 shows that even this simple ANN (with only 1 hidden layer with 2 neurons) requires a lot more processing for prediction as compared to an ARMA predictor. For this research, fast ANN library [34] is used.

##### 3.3. Wavelet-Based Predictors

The prediction using wavelets involves three steps: wavelet decomposition, signal extension, and signal reconstruction [35]. Wavelet decomposition is performed using a wavelet prototype function called an *analyzing wavelet* or a *mother wavelet*. This step divides the signal into a low-pass output called *Approximation* and a high-pass output called *Detail*. The wavelet decomposition function can be applied recursively to the approximations to get further levels of approximations and details.

The final output can be thought of as a tree such that as we move down level by level, we see coarser and coarser version of the signal. Figure 3 shows a three-level decomposition of a signal *x*. At each level, the decomposition involves convolution of the signal with low-pass and high-pass wavelet filters and downsampling the results to get the coefficients of approximations and details, respectively. At any level, the original signal is the sum of the approximation at that level plus details at all lower levels, i.e., for a three-level decomposition of signal *x*, . The smooth part of *x* is captured by , and details are captured by and . The signals are obtained by inverse wavelet transform of the coefficients , respectively. A model (e.g., AR) is fitted on approximation and details which are extended by predicting the next values using this model. Finally, the extended approximations and details are combined to get the predictions for the original time series. Overhead of wavelet-based prediction is also shown in Table 1. For each level of decomposition, we have to perform convolution twice, once to get approximation and once to get detail. Convolution is an operation. Inverse wavelet transform (additional convolution operations) is used to get signals and . We have to further perform AR on and which have additional overhead. Wavelet-based prediction is clearly a very expensive technique.

#### 4. Experimental Methodology

We evaluate the performance of all techniques described above using real network traces. Figure 4 shows our prediction methodology. The original trace contains arrival times of each packet. We calculate the traffic rate for each time interval *t* and predict the traffic rate for the next interval. The trace is divided into two parts. The initial 25% of trace constitutes training set, and the remaining is used to test the prediction accuracy. During the training phase, we identify the optimal model parameters such that the error between the predicted and actual values is minimized. During the evaluation phase, outputs are predicted using the inputs (the test data) which were not used for training and the error between predicted and actual values is calculated. The traces used in this study are described next.

##### 4.1. Traces

A large number of real network traces are used to study the performance of predictors. The details of the set of traces are described below.

###### 4.1.1. CAIDA Traces

This dataset contains anonymized passive traffic traces from CAIDA’s *equinix-chicago* and *equinix-sanjose* monitors on high-speed internet backbone traffic links [36]. Both monitors are connected to OC192 links. The traces are of one-hour long duration captured in the year 2011. 20 traces from the year 2011 are available on the website. We used 16 out of these 20. The other four traces had significant packet drops due to monitoring inaccuracies and are not included in this analysis.

###### 4.1.2. University of Auckland Traces

This set of traces, also known as *AUCK-II*, captures the traffic between the University of Auckland and its ISP [37]. All external connections from the university to the outside world pass through this measurement point. Most traces were targeted at 24-hour runs, but hardware failures have resulted in most traces being significantly shorter. All non-IP traffic has been discarded, and only TCP, UDP, and ICMP traffic is available in the traces. 20 traces are selected at random from all available traces.

###### 4.1.3. Bellcore Research Traces

These are four historical traces recorded in 1989 and used by almost all previous studies. Each trace contains a million packet arrivals seen on an Ethernet at the Bellcore Morristown Research and Engineering facility. Two of the traces are LAN traffic (with a small portion of transit WAN traffic), and two are WAN traffic [38].

##### 4.2. Predictability Metric

In this paper, we use normalized mean square error to compare the performance of predictors. This metric is widely used for evaluating prediction performance. It is the ratio of mean square error to the variance of the series.where is the actual value of time series at time , is the predicted value of , and is the total number of predictions. is the variance of time series during prediction. This metric compares the performance of the predictor with a trivial predictor (one which always predicts the mean of the time series). In case of this trivial predictor (mean predictor), NMSE = 1. If NMSE > 1, this means that the predictor is worse than the trivial. NMSE = 0 in case of a perfect predictor.

#### 5. Results and Discussion

In this section, we study and characterize the traffic traces and present the results of the prediction techniques. Understanding the behavior of traffic is important, so we first study traffic behavior.

##### 5.1. Is Network Traffic Purely Random?

The first question that comes to mind when trying to predict any traffic is to know if it has some pattern and is not random. Of course, if a traffic is purely random white noise, any predictor will do poorly on it. In fact, it can be statistically shown that a mean-value predictor is the best we can do in terms of minimizing mean square error for such a random series [33]. There are two methods which are generally used to evaluate the nonrandomness in data.

###### 5.1.1. Runs Test

This is a statistical test to evaluate the randomness in data. Null hypothesis for this test is that the data in a given series appear in a random order. A run is defined as a series of consecutive increasing values or decreasing values. The number of increasing or decreasing values is the length of the run. In a random dataset, the probability that the next value is larger or smaller than the Ith value follows a binomial distribution, which forms the basis of the runs test [33]. We used function from Matlab® to test the randomness of traffic traces. For all of the traces that we have used in this study, this test rejected the null with 95% confidence, i.e., it concluded that these traces are not random and should be predictable if modeled properly.

###### 5.1.2. Autocorrelation Factor (ACF) Plots

Autocorrelation is similar to the correlation coefficient. But, instead of correlation between two different variables, it is the correlation between two values of the same variable at times and [33]. ACF can be used for two purposes: (1) to detect nonrandomness in data, (2) to detect an appropriate time series model if data are nonrandom. The randomness in data is ascertained by measuring ACF at different time lags. The lags represent the number of previous values (or previous intervals) in the time series. If data are random, ACF will be zero or near zero for any and all time lags. If data are nonrandom, then ACF has a significant value at one or more time lags. We studied ACF of all the traffic traces and found that almost all the traces show nonrandom behavior.

The ACF plot for a typical traffic trace is shown in Figure 5(b). The horizontal axis shows time lag (*h* = 0, 1, 2, 3, …) and the vertical axis shows correlation coefficient. The plot also has some reference horizontal lines. The middle line is at 0, and the other two dotted lines are 95% confidence lines.

**(a)**

**(b)**

**(c)**

Figure 5(b) shows that correlation at lag 1 is moderate (approximately 0.6); then, it decreases gradually to zero. Such a pattern is the autocorrelation plot signature of “moderate autocorrelation,” which in turn provides moderate predictability if modeled properly. If data were truly random, then all the coefficients should have been close to zero inside those dotted lines. Figure 5(a) shows the ACF plot of a randomly generated trace. We see that all the lags except lag 0 (which is 1 by definition) are close to zero for this trace. Real network traces however show nonrandom behavior. Most of the traces we used have ACF similar to Figure 5(b), but there were a few traces (4 out of 40) which showed very weak correlation. Figure 5(b) shows one such example. The ACF drops very low (0.2) at lag 1, and almost all of the lags are close to zero. This behavior is not truly random, and we still have some hope of predictability even in these traces.

##### 5.2. Accuracy of Predictors

This section presents accuracy comparison of all the prediction techniques described in Section 3. For MA predictors, an order 8 gave the best answer for all traces and the results of order 8 are presented. For both AR and ARMA predictors, the order of AR and MA processes was varied from 0 to 20 and the best performing order is chosen. In most cases, AR(8) and ARMA(9, 8) gave the best answers and the results with these orders are presented. For ANN-based predictor, we used one input, one output, and one hidden layer for neurons. The number of input neurons was fixed to eight, i.e., eight previous samples were used for prediction of the next one value. The number of neurons in the middle layer was varied from 1 to 10, and the results for best performing ANN (with 4 neurons in the middle layer) are presented in this section. For wavelet-based predictions, we used two-level decomposition and evaluated the predictor for different mother filters . For most of the experiments, db3 gave the best performance. The results with db3 (as the mother filter) are presented in this section. Representative predictors from each category are compared. We tried many other predictors for which we do not present the results. For example, in the classic predictors category, we tried exponential smoothing (ES) and triple exponential smoothing (TES) in addition to DES. ES did not provide good accuracy, and TES increased the complexity without increasing the accuracy. So, we skipped these two predictors. We also tried ANN with different configurations, i.e., with different number of layers and different number of neurons in each layer. The results were consistent with previous research. The configuration of ANN predictor which was found suitable by previous research is presented in this study. The same is true with wavelet-based prediction and choice of different orders of classic predictors.

###### 5.2.1. CAIDA Traces

We applied the testing methodology presented in Section 4 to the 16 CAIDA traces. The normalized mean square error is shown in Figure 6. The NMSE value for all traces is less than 1 for all the predictors, showing that these traces are generally very predictable. DES is a clear winner in all these traces. For most of the traces, DES gives an NMSE value less than 0.4, which means that DES is able to explain more than 60% of traffic variation. The ARMA predictor also performs very well for these traces. Wavelet-based prediction does not provide good prediction quality. This result agrees with findings in [27] and contradicts the results in [31]. This unsatisfactory performance of wavelet-based prediction may be due to the effect of boundary conditions when applying wavelet transform to a finite length time series [27]. We did not study the boundary conditions in this paper, and interested readers can refer to [27] for a detailed study on effects of boundary conditions.

###### 5.2.2. University of Auckland Traces

Figure 7 shows the performance of different traces from the University of Auckland. Most of the predictors have NMSE less than 1 which indicates that traffic in these traces is also predictable. The last-value predictor performs very badly. Its NMSE value is greater than 1 for many traces. The best prediction performance for these traces is shown by the ARMA predictor. This result agrees with the previous research [31, 39] which has argued that the ARMA predictor is suitable for the traffic since it is capable of capturing both short- and long-range dependence. ANN also performs well for these traces. ANN has sufficiently more complexity than ARMA as we have seen in Section 3. It consumes more resources without providing any additional benefits. The third best predictor in these traces is DES. DES is the cheapest among predictors (except LV). So, the situations where the cost of predictor matters, DES seems to provide a very good balance between cost and complexity. Also, note that, for traces au5, au6, and au18, none of the predictors show reasonable performance. In fact, these three are the traces which show limited predictability when we studied the ACF characteristics. The plot in Figure 5(c) is for au18 trace. au5 and au6 also show similar ACF plots.

###### 5.2.3. BC Traces

Figure 8 shows how the predictors perform for BC traces. For all these traces, ARMA, DES, and ANN all perform very well. For the fourth trace, the NMSE value is very close to zero for all the predictors. Trace bc4 captures only external traffic and contains long periods of inactivity. So, most of the predictors show very good behavior for this trace.

##### 5.3. Effect of Increasing Prediction Interval on Accuracy

We have thus far presented results with the prediction interval of 100 ms, i.e., the traffic rate is observed for 100 ms intervals, and a prediction is made for the next 100 ms. Figure 9 shows the effect of increasing the prediction interval on performance of the predictor for au7 trace. Increasing the interval size works as a low-pass filter and filters out high-frequency noise. Short-term traffic variations are smoothed out and thus, the performance of all the predictors increases with increasing the prediction interval. All other traces also exhibit the same behavior. Previous study [31] reported that the predictability graphs show concave behavior, i.e., there is a sweet spot for traffic prediction. We did not observe any such behavior, and all the traces consistently exhibit behavior of Figure 9. Figure 9 shows that NMSE for ANN increases slightly when we increase the prediction interval from 2 to 5 seconds. This may be due to the fact that we are using only 25% of the trace for training. Increasing the prediction interval decreases the number of data points in the training portion which might affect accuracy. Researchers have provided a detailed study of effect of changing size of training data on prediction results [21].

##### 5.4. Power and Performance Overhead of Traffic Predictors

One of our main objectives in this paper is to find a predictor which accurately predicts traffic while consuming the minimum amount of power. Table 1 shows the computations and data storage requirements of these predictors. In this study, we focus only on power and performance overhead during the prediction phase. A predictor needs to be trained only once and that overhead can be ignored. In other situations, where traffic behavior changes over time, we may need to retrain the predictors. But, this training is required very rarely as previous research has shown that traffic behavior remains steady over time [31, 39].

We implemented these predictors in software and measured the performance and energy overhead of these software predictors on a simple 2-issue processor. Table 2 shows the specification of the processor.

We used a one-hour long trace and measured the performance and power using a GEMS full-system simulator. Table 3 shows instructions per prediction and energy per prediction for each type of predictor when the predictors are implemented in software. We see that ANN-based and wavelet-based predictors require considerably more instructions than other predictors. The energy is measured by augmenting GEMS [40] with a Wattch power simulator [41]. The power numbers are presented using the most aggressive clock gating “cc3” provided by Wattch and are for 90 nm technology. We measured the total energy consumed for the execution of traffic trace and divided the total energy by the number of predictions to get the energy per prediction. It is interesting to note that DES has a minimum number of instructions/prediction and consequently, minimum power per prediction. We have seen from the performance results in Section 5 that ANN and ARMA also give very good performance results for most of the traces. But, when comparing energy consumption, we can see that DES is the lowest power-consuming predictor. It is also comparable in performance to the high-cost predictors like ANN which makes this very useful for applications like one-step-ahead traffic prediction for power management. Energy consumption by the ARMA predictor is also fairly low as compared to ANN and wavelet. Although ANN performs well in most situations, the power and performance cost associated with it make it suitable only for offline applications like network design and capacity planning.

##### 5.5. Combining Accuracy and Energy Metrics

The accuracy and energy consumption results for all the studied predictors are presented in Sections 5.2 and 5.4, respectively. Now, we combine the accuracy and energy metrics into a single function which we call Energy_Error Score or EE-Score. This metric is defined based on a general technique of combining multiple metrics into a global measure described in [42]. This combined global metric allows us to compare all the predictors using a single number and can help us declare the “winner.” As accuracy (described in terms of NMSE) and energy consumption (*μ*j per prediction) come from two different distributions and have different scales, we first standardize these metrics by calculating their standard score (*z*-score). Now, the Energy_Error Score iswhere is the relative weight of energy metric in the EE-Score and is the relative weight of accuracy metric in the EE-Score, such that .

For example, gives equal weight to energy and error. Relative weights can be set based on the particular application and scenario. For our application, we give equal weight to accuracy and energy consumption. As we are trying to minimize both energy and error, a predictor with the lowest value of EE-Score will be the winner.

Figure 10 shows the result of the EE-Score for all the techniques studied. This number is calculated based on average NMSE and average energy consumption per prediction across all the traces. These numbers are normalized and combined using equation (2) to get the EE-Score. Note that EE-Score of 0 means the performance of the average predictor. Negative numbers mean the predictors’ EE-Score is less than the mean, and a positive score means it is greater than the mean. Our goal is to minimize the EE-Score so that both error and energy consumption metrics are minimized. From the figure, we see that the DES (double exponential smoothing) predictor has the lowest EE-Score, and hence, this can be declared as the “best one” if we give equal weights to accuracy and energy consumption. Note that LV performs poorly because it has least accuracy and wavelet performs badly because it has the highest cost without giving any benefit in accuracy.

#### 6. Conclusions

We have provided a performance and power comparison of three different classes of predictors using a large number of real network traces. Our results indicate that network traffic is generally predictable. Furthermore, the choice of the predictor is dependent on the characteristics of the network. We found different predictors suitable for traces from different sources. The same predictor performs consistently well for all the traces from the same source. Also, in power critical online applications, DES and ARMA show promising accuracy with minimal energy overhead. The ANN-based predictor performed consistently well but has high power and computation overhead. We have proposed a new metric to combine accuracy and power consumption into a single number. Based on this metric, DES emerged as the predictor of choice when accuracy and energy consumption are viewed collectively.

#### Data Availability

The network traces used in this study are taken from three different sources. University of Auckland traces can be found at http://wand.net.nz/wits/auck/2. Bellcore traces are available at http://ita.ee.lbl.gov/html/contrib/BC.html. The network traces from CAIDA were supplied under license. Request to access these data should be made to CAIDA directly (http://www.caida.org).

#### Disclosure

Some preliminary results from this work appeared as a poster presentation in the 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.