Abstract

We investigate the stationarity property of the accumulated Ethernet traffic series. We applied several widely used stationarity and unit root tests, such as Dickey-Fuller test and its augmented version, Phillips-Perron test, as well as the Kwiatkowski-Phillips-Schmidt-Shin test and some of its generalizations, to the assessment of the stationarity of the traffic traces at the different time scales. The quantitative results in this research provide evidence that when the time scale increases, the accumulated traffic series are more stationary.

1. Introduction

Stationarity testing is essential for time series. According to the meaning of nonstationarity discussed in [1, 2], investigating time-varying spectra turns to be a natural way of testing nonstationarity of time series [37]. However, one may encounter difficulties in nonstationarity testing of long-range dependent (LRD) network traffic (traffic for short) because the LRD property implies noise, which is divergent at the zero frequency [8]. In addition to time-frequency distributions, there are many approaches in this regard, such as bootstrap testing [9] and so on [1012]. However, conventional approaches may not be properly used for testing the nonstationarity of LRD traffic as can be seen from Abry and Veitch [13], Grossglauser and Bolot [14]. That difficulty may also refer to a paper by Mandelbrot in 1976 [15]. Abry and Veitch [13] proposed a test method for LRD traffic by investigating the time invariability of the Hurst parameter, . However, in terms of engineering, time-varying may not always imply nonstationarity but multifractal in general, see [16, 17] for details. Therefore, further research of nonstationarity of LRD traffic is desired.

The paper aims at providing the following contributions.(i)Different scales are taken into account for investigating the stationarity of the accumulated traffic data through unit root tests. (ii)The quantitative description of the large scales for the Ethernet traffic to be stationary is given.

The remainder of this paper is organized as follows. In Section 2, the dataset is described. In Section 3, several widely used unit root tests and stationarity tests are presented. In Section 4, the numerical results of the statistical tests are included and the discussion is followed. Finally, Section 5 presents some concluding remarks.

2. Datasets

This research utilizes four real Ethernet traffic traces, listed in Table 1, which were measured on an Ethernet at the Bellcore Morristown Research and Engineering facility [18] in 1989. We may refer to [19] for more details about the datasets.

Denote by a traffic time series, where () is the time-stamp series, which indicates the time stamp of the th packet. Note that represents the packet size of the th packet at time . This research uses to represent the packet size of the th packet on a packet-by-packet basis. Further, on an interval-by-interval basis, we consider the accumulated traffic, denoted by . It is given by where is the interval width. It is in fact the accumulation scale. Without causing confusion, we still call it time scale (or scale for short) throughout the paper. Thus, stands for the accumulated bytes of arrival traffic in the th interval with the time scale . We shall exhibit an interesting phenomenon that the stationarity of of the Ethernet traffic traces considerably relies on the time scale . More precisely, qualitatively speaking, is nonstationary if is small while it is stationary when is large. The key point of this research is to exhibit the quantitative descriptions of at which changes from the non-stationary case to the stationary one.

3. Statistical Tools

Computer scientists concern about the stationarity property of traffic, see for example, [13]. Nevertheless, the consensus of that property of traffic may not be achieved due to different attributes of traffic. For instance, our research investigates the stationarity property of traffic at different time scales using the attribute of accumulation scale, which is rarely reported, to the best of our knowledge. For that reason, we explain our results in what follows.

On one hand, from Figures 1, 2, 3, and 4, we could observe the trajectories of the accumulated under different time scales. Visually, it seems that there exists some nonstationarity in all the four series, especially under the large time scales. However, we need some more objective methods to verify the reliability of this opinion. On the other hand, we present in Figures 5, 6, 7, and 8 the spectrum of the accumulated under different time scales, which clearly indicates the existence of long range dependence in the series.

It is important to learn about the statistical property of the data before quantitatively using the accurate models. In statistics, one of the most investigated branches is that of unit root and stationarity testing. Suppose a discrete time stochastic process can be written as an autoregressive process. If is a root of the characteristic equation, the stochastic process is said to have a unit root or, alternatively, is integrated of order one, denoted and is non-stationary. If the other roots of the characteristic equation lie inside the unit circle —that is, have a modulus (absolute value) less than one—then the first difference of the process will be stationary. This is the original idea of the unit root testing proposed in 1979, and hereafter many extensions and modifications have been developed. In the following we concentrate on the existing stationarity testing as well as some of their variants and generalizations to identify the existence of unit root in the series.

The standard Dickey-Fuller test (DF) [20] is based on i.i.d. errors and has as the null hypothesis the unit root. The DF test is valid if the time series is well characterized by an Auto-Regressive () process with white noise errors. Many time series, however, have a more complicated dynamic structure than is captured by a simple model. Said and Dickey (1984) [21] augment the basic autoregressive unit root test to accommodate Auto-Regressive and Moving-Average (ARMA()) models with unknown orders and their test is referred to as the augmented Dickey-Fuller (ADF) test. On the other hand, the Phillips-Perron test (PP) [22] is non-parametric and allows for some heterogeneity and serial correlation in the innovations, which is different from the ADF tests. Some class of so-called efficient unit root tests was proposed by Elliott et al. (1996) (hereafter ERS) [23] whose test statistics come very close to the power envelope for a wide range of alternatives, and they can have substantially higher power than the ADF or PP unit root tests. Unlike AR(MA) unit root tests, stationarity tests are far less numerous and have as null hypothesis the stationarity assumption and as alternative the unit root. Among the most well known stationarity tests, the most extensively used one is the Kwiatkowski et al. (1992) test, also known as KPSS [24], which is intended to complement unit root tests and can be used to distinguish short memory and long memory stationary processes.

There exist many other unit root and stationarity tests as well as generalizations and combinations of the ones mentioned above. For supplementary material on unit root and stationarity testing, see [10, 2529], and so forth, for more details.

However, one fundamental problem is that there is still no consensus on the optimal choice of the stationarity test. Therefore, to be more objective, we take advantage of several common tests to make a comprehensive study of the Ethernet traffic data in the following empirical study.

4. Discussion

In this section, we present the statistical testing results of the four series, that is, pAug.TL, pOct.TL, OctExt.TL, and OctExt4.TL data. Our interest lies in the identification of the existence of the unit root in the series. We finally adopt the most widely used ADF, PP, ERS, and KPSS tests in the empirical application. Based on the software R, we mainly utilize the functions of the packages “urca” and “fUnitRoots” to realize the statistical tests. The null hypothesis is that the series is stationary, while the alternative is that it is . The value of each test under the time scales , are presented in Tables 2, 3, 4, and 5. It is worth mentioning that we have also tried to utilize the Robinson (1994) test [30] to test the stationarity of the series due to their identified long range dependence features. However, according to the result of Ferrara et al. (2010) [31], this test is appropriate to use only for series whose sample size is at least , which could never be satisfied in our case.

As seen in Tables 25 as time scale increases, the accumulated traffic series is more likely to be stationary, which is on the contrast with the intuitive observation from the trajectory figures. Moreover, given the significance level at 1%, when the time scale is small, the accumulated traffic could be judged as non-stationary and possesses a unit root. Specifically, we have the following findings.(i)The accumulated traffic of pOct.TL series could be regarded as stationary until the time scale reaches , while the other three series cannot be rejected to be stationary once the time scale is greater than . Thus, among the four series, the pOct.TL series seems to possess the least possibility to be stationary under each time scale.(ii)Considering the series of pAug.TL, pOct.TL, OctExt.TL, and OctExt4.TL, it seems that their stationarity behaviors are quite similar and hard to distinguish from each other under the corresponding time scales.(iii)For all these four series, when the time scale is small, the accumulated traffic has a unit root and exhibits at the same time some long range dependence behavior, which indicates after first order difference we could obtain a stationary process.

The previous discussions are for the Ethernet traffic but the methods may also be a reference for other types of time series, for instance, those in [3235]. Our future research will work on the general description of network traffic rather than the Ethernet one alone.

5. Conclusions

In this paper, we have carried out several widely used tests of stationarity in order to study the stationary property of the accumulated Ethernet series at different time scales. The quantitative results reveal that when the time scale increases, the investigated accumulated Ethernet traffic is more likely to be stationary, which coincides with the normality investigation results for the same series and provide a useful empirical evidence for the traffic data modeling under large time scale.

Acknowledgments

This work was in part supported by the 973 Plan under Project Grant no. 2011CB302800, the National Natural Science Foundation of China under Project Grant nos. 11101158, 61272402, 61070214, and 60873264, and “the Fundamental Research Funds for the Central Universities”. The authors acknowledge W. Willinger, W. Leland, and D. Wilson with Bellcore, Morristown, who provided the data in this research.