Abstract

Self-similar process with long-range dependence (LRD), that is, fractional Gaussian noise (fGn) with LRD is a widely used model of Internet traffic. It is indexed by its Hurst parameter that linearly relates to its fractal dimension . Note that, on the one hand, the fractal dimension of traffic measures local self-similarity. On the other hand, LRD is a global property of traffic, which is characterized by its Hurst parameter . However, by using fGn, both the self-similarity and the LRD of traffic are measured by . Therefore, there is a limitation for fGn to accurately model traffic. Recently, the generalized Cauchy (GC) process was introduced to model traffic with the flexibility to separately measure the fractal dimension and the Hurst parameter of traffic. However, there is a fundamental problem whether or not there exists the generality that the GC model is more conformable with real traffic than single parameter models, such as fGn, irrelevant of traffic traces used in experimental verification. The solution to that problem remains unknown but is desired for model evaluation in traffic theory or for model selection against specific issues, such as queuing analysis relating to the autocorrelation function (ACF) of arrival traffic. The key contribution of this paper is our solution to that fundamental problem (see Theorem 3.17) with the following features in analysis. (i) Set-valued analysis of the traffic of the fGn type. (ii) Set-valued analysis of the traffic of the GC type. (iii) Revealing the generality previously mentioned by comparing metrics of the traffic of the fGn type to that of the GC type.

1. Introduction

This paper explores the Internet traffic (traffic for short) modeling which plays a role in telecommunications [1]. Let be an arrival traffic function, implying the number of bytes in the th packet arriving at , where is the timestamp of the th packet [2]. To avoid confusion, we use and to represent a traffic time series in the continuous case and the discrete one, respectively, where implies the size of the th packet. Note that traffic statistics for corresponds with the statistics of the traffic time series represented by either byte count or packet size [3].

The pioneer in stochastic modeling of traffic refers to the Danish scientist A. K. Erlang, see Bojkovic et al. [4]. As early as the 1920s, he contributed to his experimental work on the statistics of the traffic in telephony networks and introduced the traffic models of the Poisson type [5, 6]. Erlang’s work was so successful in characterizing the old telephony traffic such that it was applied as a law in traffic engineering, see for example, Yue et al. [7], Papoulis [8], Gibson [9], Cooper [10], Pitts and Schormans [11], and McDysan [12]. Note that the autocorrelation function (ACF) of the traffic of the Poisson type, which is Markovian, is exponentially decayed [13]. In fact, the ACF of a Markov process decays exponentially [14]. The Poisson-type models fit in with the traffic in old telephony networks, which are circuit-switched, see for example, [9], Le Gall [15], Lin et al. [16], Manfield and Downs [17], Reiser [18], and Lu [19]. Those types of models, however, fail to effectively characterize the traffic in the Internet, which is packet switched. As a matter of fact, the ACF, the probability density function (PDF), and the power spectrum density (PSD) function of traffic, follow power law, see for example, Resnick [20], Csabai [21], Leland et al. [22], Beran et al. [23], López-Ardao et al. [24], and Cleveland and Sun [25]. Therefore, system responses to the Internet have to take into account the arrival traffic with long-range dependence (LRD), see for example, Tsybakov and Georganas [26], Norros [27], Fishman and Adan [28], Li and Zhao [29], Dahl and Willemain [30], and Kingman [31].

Theoretically, on one hand, Taqqu’s Theorem relates a heavy-tailed PDF in power law to a hyperbolically decayed ACF, that is, power law-type ACF [3, 32]. On the other hand, the Fourier transform connects a hyperbolically decayed ACF with noise (power law-type PSD), see for example, Li [33].

Note that, before the Internet’s worldwide prevalence, in the seventies of the last century, Tobagi et al. [34] reported a noticeable behavior of traffic, which is called “burstiness” [12]. It implies that there would be no packets transmitted for a while, then flurry of transmission, no transmission for another long period of time, and so on if one observes traffic over a long period of time. This also means that traffic has intermittency. In 1986, Jain and Routhier [35] further described the intermittency or burstiness of traffic using the term “packet trains.” They inferred that traffic is neither a Poisson process nor a compound Poisson one [35]. The results in [34, 35] are quite qualitative but they may be considered as early work with respect to fractal-type traffic. The concept of packet train is interesting [36] but we utilize the concept of fractal time series for traffic modeling in this paper.

The early literature quantitatively describing the statistical properties of traffic from a view of fractals refers to Csabai [21], Leland et al. [22], Beran et al. [23], Paxson and Floyd [37], and Crovella and Bestavros [38]. Those scientists revealed some of the main properties of traffic, such as LRD and asymptotic self-similarity. The traffic model described in [22, 23, 37, 3943], just citing a few, is the fGn that was introduced by Mandelbrot and Van Ness in mathematics [44].

The model of fGn is characterized by a single parameter , called the Hurst parameter. Its limitation in accurately modeling traffic was noticed by Paxson and Floyd [37], and Tsybakov and Georganas [39]. Paxson and Floyd noted that “it might be difficult to characterize the correlations over the entire traffic traces with a single Hurst parameter [37, Section 7.4].” They suggested that “further work is required to fully understand the correlational structure of wide-area traffic [37].” Tsybakov and Georganas remarked that “the class of exactly self-similar processes, that is, fGn or fractional Brownian motion (fBm), is too narrow for modeling actual network traffic [39, Section II].” The authors of [37, 39] qualitatively stated the limitation of fGn in traffic modeling without mentioning how to release the limitation. In this regard, Beran [45, page 101-102] suggested to develop a sufficiently flexible class of parametric correlation models. The key of the Beran’s idea implies that the ACF of an LRD series may be fitted by a correlation model with several parameters instead of one, but he did not mention what concrete parametric correlation models are.

Li and Lim recently reported a two-parameter traffic model called the GC process with the demonstrations based on sets of real-traffic traces in [46, 47]. Li [48] discussed its simulation. Nevertheless, whether or not it has the generality to be more agreement with traffic than single parameter models, such as fGn, remains an unsolved problem. Therefore, it may be useful, especially for traffic engineers, to exhibit that generality. Motivated by this, we, in this paper, aim at presenting a solution to it based on the abstract analysis, more precisely, the set-valued analysis in Hilbert spaces, to thoroughly reveal that generality, irrelevant of traces used in experimental verification. To the best of our knowledge, the set-valued analysis of traffic models is rarely seen.

The rest of paper is organized as follows. Related work is explained in Section 2. The set-valued analysis is presented in Section 3. An application case is demonstrated in Section 4. Discussions are provided in Section 5, followed by our conclusions.

We first respectively brief the ACFs of the fGn and the GC process. Then, fractal dimension and the Hurst parameter are discussed.

2.1. fGn

The continuous fGn is the derivative of the smoothed fractional Brownian motion (fBm) in the sense of the generalized functions over the Schwartz space of test functions, refer to Kanwal [49] for generalized functions.

Denote by the ACF of the fGn as the increment process of the fBm of the Weyl type. Then, for time lag , which is the set of real numbers, where is the Hurst parameter, is used by smoothing the fBm so that the smoothed fBm is differentiable, and [44]. The PSD of fGn is given by [50] where is angular frequency.

FGn includes three classes of time series. When is positive and finite for all . It is nonintegrable and the corresponding series is LRD. For , the integral of is zero, corresponding series with short-range dependence (SRD). Besides for changes its sign and becomes negative for some proportional to . It reduces to the white noise when .

The ACF of fGn in the discrete case is given by where , where is the set of integers. To avoid confusion, we often consider ACFs for in the normalized case in what follows as an ACF is an even function. Thus, for , one has Considering the right side of (2.4) as the finite 2-order difference of and approximating it with the 2-order differential of yields the following equation. Its right side is quite accurate to the left for [51]:

2.2. GC Process

A random function is called the GC process if it is stationary Gaussian with the ACF given by where and . When , one gets the usual Cauchy process the ACF of which is expressed by which is used in geostatistics, see Chilès and Delfiner [52].

The PSD of the GC process is given by (see [47]) where and

The PSD of the GC process for is given by (see [53]) On the other hand, for is given by The above exhibits the power law of . The GC process is LRD if and is SRD if .

As noted in [53], “the GC process is non-Markovian since does not satisfy the triangular relation given by which is a necessary condition for a Gaussian process to be Markovian, see Todorovic [54].” In fact, up to a multiplicative constant, the Ornstein-Uhlenbeck process is the only stationary Gaussian Markov process, see Lim and Muniandy [55] and Wolpert and Taqqu [56].

2.3. Fractal Dimension and the Hurst Parameter

On the one hand, fractal dimension, denoted by , of traffic is a measure to characterize its local self-similarity or irregularity. On the other hand, the Hurst parameter is used to measure its statistical dependence, see Mandelbrot [57]. Thus, we respectively use and to describe the local property and the global property of , see Li and Lim [46, 47] and Li and Zhao [58]. In fact, if the ACF is sufficiently smooth on and if where is a constant and is the fractal index of , of is expressed by see, for example, Kent and Wood [59], Hall and Roy [60], Chan et al. [61], and Adler [62]. Applying the binomial series to yields Therefore, one has Consequently, the fGn, as the incremental process of the fBm of the Weyl type, is stationary. Its happens to linearly relate to its , see [57, page 27] and Gneiting and Schlather [63]. Hence, a single parameter model fails to separately capture the local irregularity and the LRD of traffic.

Recall that a self-similar process with the self-similarity index requires for , where denotes equality in joint finite distribution. It is known that a stationary Gaussian random function that is not exactly self-similar may satisfy a weaker self-similar property known as local self-similarity. Taking into account the definition of the local self-similarity provided in [5962], we say that a Gaussian stationary process is locally self-similar of order if its ACF satisfies for , The fractal dimension of a locally self-similar process of order is given by (2.14). Therefore, we have the asymptotic expressions given by

Note that traffic is LRD if its ACF satisfies where . Denote by and the fractal dimension and the Hurst parameter of traffic of the GC type, respectively. Then, according to (2.19), one has Replacing and , respectively, by and according to (2.21), we have where is independent of . Thus,

3. Set-Valued Analysis

A physically measured traffic trace has single history with finite length. Without losing generality, the maximum possible length of a traffic series is assumed as . Let be a space containing all ACFs, including ACFs of real traffic. Let be an ACF of a real-traffic series. Define the norm of as an inner product given by Then, the inner space given by is a Hilbert space when all limits are included [64, 65].

Remark 3.1. is a finite-dimensional normed space.

Now, we consider the following consequences of a linear normed space with finite dimensions.

Lemma 3.2. In a linear finite-dimensional space, all norms are equivalent [66].

Lemma 3.3. Every finite-dimensional subspace of a linear normed space is closed [67].

Lemma 3.4. Let be a Hilbert space and be a closed subspace of . Let , . Then there exists a unique element satisfying [66, 67], Aubin [68].

From the above, we obtain the following theorem. Its proof is straightforward according to Lemmas 3.23.4.

Theorem 3.5. Let be an ACF of a real-traffic series. Let be a closed subspace of . Then, there exists a unique such that [64, 65].

Let be the error. Its norm is defined by Let the functional of be . Then, is convex. Thus, the optimal approximation of in can be expressed by

Suppose has parameters such that Then, the error by taking the approximation (3.5) as a traffic model is a function of . To clarify this point, we utilize the cost function of dimensions expressed by The partial derivative of with respect to parameters, which will be zero at the minimum, yields [69] Let be the solution of . Then, is the optimal approximation of in .

The above discussions draw attention to the fact that an optimal approximation of in may have parameters. Obviously, an approximation of is related to a subspace of as can be seen from Theorem 3.5. For this reason, we, below, consider the extensions of the fGn’s ACF towards constructing the ACF of the GC process.

Definition 3.6. Let be a Hilbert space equipped with a distance . When is a subset of , the distance from to is denoted by [70, 71].

Definition 3.7 (see [70]). Let be a sequence of subspaces of a Hilbert space . Then, the subset is the upper limit of the sequence . Besides, the subset is the lower limit of . A subset is said to be the limit or the set limit of if

Considering the above terms, one has the lemma below.

Lemma 3.8. Any monotone sequence of subsets has a limit [70].

According to Lemma 3.8, therefore, the following holds.

Corollary 3.9 (see [70, 71]). Let be a family of increasing closed subspaces of a Hilbert space . Then,

We now turn to constructing the ACF of the GC process.

Corollary 3.10. .

Proof. According to (2.5), this corollary results.

Let Then, is the set containing the ACF of fGn. Therefore, we have the following remark.

Remark 3.11. . Besides, it is closed according to Lemma 3.3.

We now construct the second space. Let be the set containing ACFs of traffic in the form for . Then, According to Corollary 3.10, element in is an approximation of the ACF of fGn. Hence, we have . Based on , we further construct a space as follows.

Proposition 3.12. The following is an extension of , where ;

Proof. equals to for , meaning . Thus, this proposition results.

Remark 3.13. is nonintegrable for because . Clearly, . In addition, it is closed according to Lemma 3.3.

The space can be further extended into the following.

Proposition 3.14. The following is an extension of ; where , , .

Proof. is a special case of for , implying . Thus, Proposition 3.14 holds.

According to Proposition 3.14, therefore, we have the remarks below.

Remark 3.15. is nonintegrable for because . Clearly, . It is closed according to Lemma 3.3.

Remark 3.16. Proposition 3.14 presents a class of parametric ACF structures.

From the above, we have the theorem below.

Theorem 3.17. Let be an ACF of real traffic. Then,

Proof. Because , Theorem 3.17 holds according to Corollary 3.9.

Theorem 3.17 exhibits the generality of the GC process in accurate modeling of traffic. In what follows, we let and so as to be consistent with (2.6) in computations.

In the end of this section, we note that the purpose for using the abstract expression of -parameter model (3.5) as well as (3.6) and (3.7) is simply to mention the concept of multiparameter model of ACF. For traffic, the GC model equipped with two parameters can be well explained because one parameter is the fractal index for local property and the other the LRD index for global one.

4. Application of Theorem 3.17 to Traffic Modeling

As an application of Theorem 3.17, we show the ACF modeling of of real-traffic trace named by AMP-1131669938-1.psize, which was collected by the US National Laboratory for Applied Network Research (NLANR) in November 2005 [73]. We first model it in . Then, we compare it with that in (i.e., fGn model). Because , we use in this section.

Denote the measured ACF of by . Denote by and the modeled ACFs in and , respectively. Let be the mean square error (MSE) by using and be the MSE by using . For the sake of demonstration, we use (4.1) for the MSE in and (4.2) for that in ;

Figure 1(a) shows the first 2048 points of AMP-1131669938-1.psize. Figure 1(b) is the right part, that is, the part for , of the measured ACF with the block size and average count = 30. By least squares fitting, we obtain the estimates = (0.020, 0.028). Thus, we have with . Figure 1(c) shows and Figure 1(d) indicates that fits well with . Figure 2 illustrates the estimates of and . According to (2.21), and of that series equal to 0.986 and 1.990, respectively.

With least squares fitting in , however, we have with . Figure 3(a) plots the and Figure 3(b) shows the data fitting in . Figures 1(d) and 3(b) exhibit an application case of (3.16) in Theorem 3.17. Judging from them, it is obvious that the GC process is more effective with that trace for both short-term and long-term lags.

Purely from a view of curve fitting, the fitting accuracy of 10−3 in may not be too large. The unsatisfactory point of the modeling in is in two aspects. One is that may overestimate autocorrelations of traffic for small lags (around the knee of the ACF curve). The other is that it may underestimate autocorrelations for large lags as evidenced by Figure 3(b), refer to Li and Lim [46] for more cases regarding modeling real-traffic traces in .

5. Discussion

A conventional method to assess whether a model is appropriate is goodness-of-fit test in statistics ([3, 69], Press et al. [74]). However, it still needs sets of traffic data involved in the test. In fact, experimental processing of specific sets of real traffic, no matter how many traces are involved in experimental verification or goodness-of-fit test, may not deterministically infer the generality of the GC process expressed by Theorem 3.17, theoretically speaking.

Recall that an ACF of arrival traffic has a considerable impact on queuing systems, see, for example, Hajek and He [75], Livny et al. [72], Li and Hwang [76, 77], Wittevrongel and Bruneel [78], and Geist and Westall [79]. Therefore, using the ACF of the arrival traffic of the GC type may bring in considerable advances in practice, such as system analysis or evaluation, which we will work on in the future.

The GC model has one significance to separately characterize the local self-similarity and the LRD. In the case study in the previous section, we have and for AMP-1131669938-1.psize. Both and are of large value for this trace since for LRD and . Note that a large value of corresponds to strong LRD while a large value of implies highly local irregularity. The phenomenon of traffic like this was demonstrated with more real-traffic traces in [46]. This phenomenon may not be satisfactorily observed using single parameter models, that is, fGn due to the restrictive relationship .

The GC model has another significance to explain the complicated phenomenon of traffic, which was observed by Paxosn and Floyd [37] and Feldmann et al. [80], and which was stated like this. Traffic has robust long-term persistence at large time scales but high irregularity at small time scales. This phenomenon may be described by , where and are the fractal dimension and the Hurst parameter of traffic in the th interval on an interval-by-interval basis for , respectively. This complicated phenomenon of traffic can be well characterized by the GC model because is independent of , refer to [46] for the demonstrations of this phenomenon with real traffic. Again, we note that it may not be described by single parameter models, such as fGn. In fact, because and are restricted by .

The third significance of the traffic model of the GC type can be briefed as follows. It is well known that the amount of traffic accumulated in the interval is upper bounded by where and are constants and , see Cruz [81]. It is obviously that a tightened bound of is particularly desired in practice, such as delay computations. By applying the GC model to the traffic bound, we have the tightened bound expressed by where is a small-scale factor, is a large-scale factor, and , is the unit step function, see Li and Zhao [58] for details. For instance, if we let , , , and , then we have a tightened bound given by The conventional traffic bound, that is, the right side of (5.1), is a special case of (5.2) for . We should emphasize that the fractal dimension and the Hurst parameter in (5.2) have to be considered in the sense of the GC model of traffic [58].

Our future work is in two ways. One is to explore more specific significances of the GC model of traffic in practical issues, for example, queuing. The other is to study whether the GC model of random processes may provide new explanation for the random phenomena in nonlinear time-varying systems or complex systems discussed by Dong et al. [8284], and Shen et al. [8587], Chen et al. [88], and Sheng et al. [89, 90].

6. Conclusions

FGn, which is a self-similar process with LRD for and a widely used model in traffic engineering, was proposed as a traffic model by Leland et al. [22], Beran et al. [23], and Paxson and Floyd [37], based on their data processing of sets of real-traffic traces. The GC process, which is a locally self-similar process with LRD for , was recently reported by Li and Lim [46], also based on their processing the same sets of traffic traces as those in [22, 37]. However, experimental processing of real traffic relying on selected sample records of traffic may be limited, in methodology, to be used to abstractly evaluate which is more conformable with real traffic without relating to the selected sample records of traffic. The theoretical significance of this paper is to provide us with the abstract assessment in terms of the generality described by (3.16) in Theorem 3.17 that the GC model is more conformable with real traffic than single parameter models, for example, fGn, regardless of any sample records of traffic, which may yet be a theoretical supplement with respect to the traffic model of the GC type. In addition, we have given our construction procedure of the ACF of the GC process in Hilbert spaces with the technique of extensions based on fGn.

Acknowledgments

This paper was supported in part by the National Natural Science Foundation of China under the project Grant numbers 60873264 and 61070214, the 973 plan under the project number 2011CB302801/2011CB302802, and by the University of Macau. Thanks go to NLANR for allowing the authors to use its real-traffic data in the research.