Research Article  Open Access
Normality of Ethernet Traffic at Large Time Scales
Abstract
We contribute the quantitative descriptions of the large time scales for the Ethernet traffic to be Gaussian. We focus on the normality property of the accumulated traffic data under different time scales. The investigation is carried out graphically by the quantilequantile (QQ) plots and numerically by statistical tests. The present results indicate that the larger the time scale, the more normal the Ethernet traffic.
1. Introduction
The experimental research of the internet traffic (traffic for short), including the Ethernet one, exhibits that fractional Gaussian noise (fGn) may be a model in the sense of unifractal see, for example, [1–3]. This implies that traffic is Gaussian [4]. However, nonGaussian models, such as stable processes, were also reported; see, for example, [5–8]. Therefore, the normality of traffic is an issue worth investigating.
Research described in [9, 10] revealed a scaling phenomenon of traffic. Taking into account the scales of traffic, we say that whether a traffic trace is Gaussian or not relies on time scales. Paxson and Floyd [10] and Feldmann et al. [9] claimed that traffic is Gaussian at time scales larger than second. That property was qualitatively further confirmed by [11]. Note that realtraffic data used in [9, 10] were recorded in 1980s and 1990s, which are publicly accessible [12]. Thus, one second, as the critical time point, corresponds to the data in [12] and the infrastructure of the internet then.
Though the research exhibits that the statistics of traffic remain the same from the internet last century to the current years [13], the quantity of the critical time point, say one second, may be vague due to the development of highspeed networking. Therefore, when using the same data as those used in [1, 3, 9, 10], we use the concept of packet count, that is, the number of packets within an interval, to represent the number of bytes of packets within an interval.
Let be a sample record of traffic time series, where is the series of time stamps, indicating the time stamp of the th packet. The series therefore represents the packet size of the th packet at time . In this research, instead of using , we use representing the packet size of the th packet. On an intervalbyinterval basis, therefore, the accumulated traffic, denoted by , is given by where is the interval width, which also has the similar meaning of time scales. Thus, stands for the accumulated bytes of arrival traffic in the th interval. The statistics of may considerably differ when is small (small time scale) or large (large time scale) [1, 9, 10].
This research utilizes four realtraffic traces, listed in Table 1, which were measured on an Ethernet at the Bellcore Morristown Research and Engineering facility in 1989 [12]. (the originally statistical properties described in the early literature, e.g., [1, 3], turn to be ubiquitous in today's traffic, according to the research stated in [13]. Thus, the traffic trace, BCAug89, which was measured in 1989, keeps its value in the description of traffic pattern today).

Figure 1 illustrates four series of realtraffic trace BCAug89. Note that the statistics of is consistent with that of , but we may obtain the time scale represented by in (1), which is irrelevant of the networking speed. Let the interval width be . Then, Figure 2 indicates of BCAug89 for .
(a)
(b)
(c)
(d)
The paper aims at presenting the quantitatively minimum interval range for the accumulated Ethernet traffic traces to be Gaussian based on the accumulated bytes of the packets within an interval.
The remainder of this paper is organized as follows. In Section 2 we introduce briefly the commonly used normality tests and the idea of the QQ plot. The graphical and numerical results are presented in Section 3, and the discussion of the investigation results is followed in Section 4. Section 5 concludes the paper.
2. Statistical Investigation for Accumulated Traffic
In this section, we discuss the normality tests for the following null and alternative hypotheses: the data are sampled from a normal distribution; the data are not sampled from a normal distribution.
Many statistical tests have been proposed to find out whether a sample is drawn from a normal distribution or not [14], including the ShapiroWilk test, D’Agostino’s test, the JarqueBera test, the AndersonDarling test, the CramérVon Mises criterion, the Lilliefors test, the Pearson’s test, and the ShapiroFrancia test.
The absence of exact solutions for the sampling distributions generated a large number of simulation studies exploring the power of these statistics. A convincing evidence from these studies is that convergence of the sampling distributions to asymptotic results was very slow. The paper [15] concludes that the ShapiroWilk test has the best power for a given significance, followed closely by AndersonDarling test when comparing the ShapiroWilk, KolmogorovSmirnov, Lilliefors, and AndersonDarling tests. On the other hand, some publications recommend the JarqueBera test [16, 17]. But it is not without weakness. It has low power for distributions with short tails. Therefore, we mainly consider three normality test methods listed in the following.
2.1. ShapiroWilk Test
The ShapiroWilk test tests the null hypothesis that a sample came from a normally distributed population [18]. The test statistic is where is the th order statistic; is the sample mean; is given by where ; and are the expected values of the order statistics of independent and identically distributed random variables sampled from the standard normal distribution, and is the covariance matrix of those order statistics. It is worth mentioning that the ShapiroWilk test is restricted for the sample size greater than and less than .
2.2. AndersonDarling Test
The AndersonDarling test is a statistical test of whether a given sample of data is drawn from a given probability distribution [19, 20]. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values are distribution free. When applied to testing if a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality [21, 22], whereas the sample size needs to be greater than .
2.3. JarqueBera Test
The JarqueBera test is a goodnessoffit test of whether sample data have the skewness and kurtosis matching a normal distribution [23, 24]. The test statistic JB is defined as where If the data comes from a normal distribution, the JB statistic asymptotically has a distribution, so the statistic can be used to test the hypothesis that the data is from a normal distribution. For small samples, the chisquared approximation is overly sensitive, often rejecting the null hypothesis when it is in fact true. Thus, JB test only applies to large sample size, at least according to the finite sample study.
Besides statistical tests, we have another informal but powerful tool to assess the normality property of the series, that is, the normal probability plot. This graphical tool is often called the quantilequantile plot (QQ plot) of the standardized data against the standard normal distribution. The correlation between the sample data and normal quantiles measures how well the data is modeled by a normal distribution. For normal data, the points plotted in the QQ plot should fall approximately on a straight line, indicating high positive correlation.
3. Graphical and Statistical Results
In this section, we present the graphical and numerical results for all the Ethernet traffic series, that is, pAug.TL, pOct.TL, OctExt.TL, and OctExt4.TL data. Figures 3, 4, 5, and 6 are the QQplot of the four accumulated traffic series under different time scales.
In order to obtain a more complete inference for the series’ normality and to be more objective, we finally choose to take advantage of three popular normality tests, that is, the ShapiroWilk test, AndersonDarling test, and JarqueBera test to verify the normality property in the application. Based on the software R, we mainly utilize the functions of the packages “fBasics” and “nortest” to realize the statistical tests. The value of each test under the time scales are presented in Tables 2, 3, 4, and 5. In particular, since the AndersonDarling test requires the sample size greater than , there is no testing result for the time scale .




4. Discussions
Graphically, from Figures 3, 4, 5 and 6, we have some findings listed below. (i)Comparatively, the pAug.TL series asks for the relatively smallest time scale to be Gaussian among four series. (ii)The pAug.TL and pOct.TL data seem more likely to be normal than the other two series at each corresponding time scale. (iii)It is not difficult to observe that the OctExt.TL and OctExt4.TL series exhibit the similar normality behaviors. However, only at quite large time scale, the theoretical normal quantile and the empirical quantile have the high positive correlation. (iv)The OctExt4.TL series seems to be even more strict on the time scale. It requires minimum time scale about to be Gaussian.
Numerically, as could be expected, the testing results given in Tables 2, 3, 4, and 5 provide the evidence that the larger the time scale, the more normal the accumulated traffic series . Specifically, (i)it is straightforward to see that the normality behavior of pAug.TL data “surpasses” the others according to the values of the tests; that is, given the significance level , the null hypothesis of normality could not be rejected when the time scale is greater than ; (ii)whereas, the pOct.TL and OctExt.TL series possess the comparable normality performance who need the time scale to be at least in order not to be rejected by the null hypothesis given the significance level . (iii)for the OctExt4.TL series, in order not to reject the null, the time scale should be greater than given the significance level .
The previous discussions are for the Ethernet traffic, but the methods may also be a reference for other types of time series, such as those in [25–28].
5. Conclusions
We have discussed the normality performance of the Ethernet traffic data under different time scales using several normality tests (ShapiroWilk test, AndersonDarling test, and JarqueBera test). The graphical results by QQplot are consistent with the numerical results, which also provides the evidence for the quantitative results of the large time scales for the normality of the Ethernet traffic traces investigated.
Acknowledgments
This work was in part supported by the 973 plan under the project grant number 2011CB302800, the National Natural Science Foundation of China under the project grant numbers 11101158, 61272402, 61070214, 60873264, and “the Fundamental Research Funds for the Central Universities”. We appreciate W. Willinger, W. Leland, and D. Wilson with Bellcore, Morristown, who provided us with their data in this research.
References
 W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, “On the selfsimilar nature of Ethernet traffic (extended version),” IEEE/ACM Transactions on Networking, vol. 2, no. 1, pp. 1–15, 1994. View at: Publisher Site  Google Scholar
 D. McDysan, QoS & Traffic Management in IP & ATM Networks, McGrawHill, New York, NY, USA, 2000.
 P. Abry and D. Veitch, “Wavelet analysis of longrange dependent traffic,” IEEE Transactions on Information Theory, vol. 44, no. 1, pp. 2–15, 1998. View at: Publisher Site  Google Scholar
 W. Willinger and V. Paxson, “Where mathematics meets the internet,” Notices of the American Mathematical Society, vol. 45, no. 8, pp. 961–970, 1998. View at: Google Scholar  Zentralblatt MATH  MathSciNet
 Ph. Barbe and W. P. McCormick, “Heavytraffic approximations for fractionally integrated random walks in the domain of attraction of a nonGaussian stable distribution,” Stochastic Processes and Their Applications, vol. 122, no. 4, pp. 1276–1303, 2012. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 R. G. Garroppo, S. Giordano, M. Pagano, and G. Procissi, “Testing αstable processes in capturing the queuing behavior of broadband teletraffic,” Signal Processing, vol. 82, no. 12, pp. 1861–1872, 2002. View at: Publisher Site  Google Scholar
 A. Karasaridis and D. Hatzinakos, “Network heavy traffic modeling using αstable selfsimilar processes,” IEEE Transactions on Communications, vol. 49, no. 7, pp. 1203–1214, 2001. View at: Publisher Site  Google Scholar
 G. Terdik and T. Gyires, “Lévy flights and fractal modeling of internet traffic,” IEEE/ACM Transactions on Networking, vol. 17, no. 1, pp. 120–129, 2009. View at: Publisher Site  Google Scholar
 A. Feldmann, A. C. Gilbert, W. Willinger, and T. G. Kurtz, “The changing nature of network traffic: scaling phenomena,” ACM SIGCOMM Comput Communication Review, vol. 28, no. 2, pp. 5–29, 1998. View at: Google Scholar
 V. Paxson and S. Floyd, “Wide area traffic: the failure of Poisson modeling,” IEEE/ACM Transactions on Networking, vol. 3, no. 3, pp. 226–244, 1995. View at: Publisher Site  Google Scholar
 A. Scherrer, N. Larrieu, P. Owezarski, P. Borgnat, and P. Abry, “NonGaussian and long memory statistical characterizations for Internet traffic with anomalies,” IEEE Transactions on Dependable and Secure Computing, vol. 4, no. 1, pp. 56–70, 2007. View at: Publisher Site  Google Scholar
 http://www.sigcomm.org/ITA/.
 P. Borgnat, G. Dewaele, K. Fukuda, P. Abry, and K. Cho, “Seven years and one day: sketching the evolution of internet traffic,” in Proceedings of the 28th Conference on Computer Communications (INFOCOM '09), pp. 711–719, Rio de Janeiro, Brazil, April 2009. View at: Publisher Site  Google Scholar
 H. C. Thode, Jr., Testing for Normality, vol. 164 of Statistics: Textbooks and Monographs, Marcel Dekker, New York, NY, USA, 2002. View at: Publisher Site  MathSciNet
 N. Razali and Y. B. Wah, “Power comparisons of ShapiroWilk, KolmogorovSmirnov, Lilliefors and AndersonDarling tests,” Journal of Statistical Modeling and Analytics, vol. 2, no. 1, pp. 21–33, 2011. View at: Google Scholar
 D. N. Gujarati, Basic Econometrics, McGrawHill, New York, NY, USA, 4th edition, 2002.
 G. G. Judge, R. C. Hill, W. E. Griffiths, H. Lütkepohl, and T. C. Lee, Introduction to the Theory and Practice of Econometrics, John Wiley & Sons, New York, NY, USA, 2nd edition, 1988. View at: MathSciNet
 S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality: complete samples,” Biometrika, vol. 52, no. 34, pp. 591–611, 1965. View at: Google Scholar  Zentralblatt MATH  MathSciNet
 T. W. Anderson and D. A. Darling, “Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes,” Annals of Mathematical Statistics, vol. 23, pp. 193–212, 1952. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 T. W. Anderson and D. A. Darling, “A test of goodness of fit,” Journal of the American Statistical Association, vol. 49, pp. 765–769, 1954. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 M. A. Stephens, “EDF statistics for goodness of fit and some comparisons,” Journal of the American Statistical Association, vol. 69, pp. 730–737, 1974. View at: Publisher Site  Google Scholar
 M. A. Stephens, “Tests based on EDF statistics,” in GoodnessofFit Techniques, R. B. d’Agostino and M. A. Stephens, Eds., pp. 97–193, Marcel Dekker, New York, NY, USA, 1986. View at: Google Scholar
 C. M. Jarque and A. K. Bera, “Efficient tests for normality, homoscedasticity and serial independence of regression residuals,” Economics Letters, vol. 6, no. 3, pp. 255–259, 1980. View at: Publisher Site  Google Scholar  MathSciNet
 C. M. Jarque and A. K. Bera, “Efficient tests for normality, homoscedasticity and serial independence of regression residuals: Monte Carlo evidence,” Economics Letters, vol. 7, no. 4, pp. 313–318, 1981. View at: Google Scholar
 C. Cattani, G. Pierro, and G. Altieri, “Entropy and multifractality for the myeloma multiple TET 2 gene,” Mathematical Problems in Engineering, vol. 2012, Article ID 193761, 14 pages, 2012. View at: Google Scholar  MathSciNet
 C. Cattani, “On the existence of wavelet symmetries in archaea DNA,” Computational and Mathematical Methods in Medicine, vol. 2012, Article ID 673934, 21 pages, 2012. View at: Google Scholar  Zentralblatt MATH  MathSciNet
 C. Toma, “Advanced signal processing and command synthesis for memorylimited complex systems,” Mathematical Problems in Engineering, vol. 2012, Article ID 927821, 13 pages, 2012. View at: Publisher Site  Google Scholar  MathSciNet
 E. G. Bakhoum and C. Toma, “Specific mathematical aspects of dynamics generated by coherence functions,” Mathematical Problems in Engineering, vol. 2011, Article ID 436198, 10 pages, 2011. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2013 Zhiping Lu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.