Abstract

In order to estimate the memory parameter of Internet traffic data, it has been recently proposed a log-regression estimator based on the so-called modified Allan variance (MAVAR). Simulations have shown that this estimator achieves higher accuracy and better confidence when compared with other methods. In this paper we present a rigorous study of the MAVAR log-regression estimator. In particular, under the assumption that the signal process is a fractional Brownian motion, we prove that it is consistent and asymptotically normally distributed. Finally, we discuss its connection with the wavelets estimators.

1. Introduction

It is well known that different kinds of real data (hydrology, telecommunication networks, economics, and biology) display self-similarity and long-range dependence (LRD) on various time scales. By self-similarity we refer to the property that a dilated portion of a realization has the same statistical characterization as the original realization. This can be well represented by a self-similar random process with a given scaling exponent (Hurst parameter). The long-range dependence, also called long memory, emphasizes the long-range time correlation between past and future observations and it is thus commonly equated to an asymptotic power law behaviour of the spectral density at low frequencies or, equivalently, to an asymptotic power-law decrease of the autocovariance function, of a given stationary random process. In this situation, the memory parameter of the process is given by the exponent characterizing the power law of the spectral density. (For a review of historical and statistical aspects of the self-similarity and the long memory see [1].)

Though a self-similar process cannot be stationary (and thus nor LRD), these two properties are often related in the following sense. Under the hypothesis that a self-similar process has stationary (or weakly stationary) increments, the scaling parameter enters in the description of the spectral density and covariance function of the increments, providing an asymptotic power law with exponent . Under this assumption, we can say that the self-similarity of the process reflects on the long-range dependence of its increments. The most paradigmatic example of this connection is provided by the fractional Brownian motion and by its increment process, the fractional Gaussian noise [2].

In this paper we will consider the problem of estimating the Hurst parameter of a self-similar process with weakly stationary increments. Among the different techniques introduced in the literature, we will focus on a method based on the log-regression of the so-called modified Allan variance (MAVAR). The MAVAR is a well-known time-domain quantity generalizing the classic Allan variance [3, 4], which has been proposed for the first time as a traffic analysis tool in [5]. In a series of paper [57], its performance has been evaluated by simulation and comparison with the real IP traffic. These works have pointed out the high accuracy of the method in estimating the parameter and have shown that it achieves better confidence if compared with the well-established wavelet log-diagram.

The aim of the present work is to substantiate and enrich these results from the theoretical point of view, studying the rate of convergence of the estimator toward the memory parameter. In particular, our goal is to provide the limit properties and the precise asymptotic normality of the MAVAR log-regression estimator in order to compute the related asymptotic confidence intervals. This will be reached under the assumption that the signal process is a fractional Brownian motion. Although this hypothesis may look restrictive (indeed this estimator is designed for a larger class of stochastic processes), the obtained results are a first step toward the mathematical study of the MAVAR log-regression estimator. To our knowledge, there are no similar results in the literature. The present paper also provides the theoretical foundations (mathematical details and proofs) for [8]. Indeed the formulas analytically obtained here have been implemented and numerically tested in [8] for different choices of the regression weights and it has been shown that the numerical evidences are in good agreement with the theoretical results proven here.

For a survey on Hurst parameter estimators of a fractional Brownian motion, we refer to [9, 10]. However we stress once again that the MAVAR-estimator is not specifically a target for the fractional Brownian motion, but it has been thought and successfully used for more general processes.

Our theorems can be viewed as a counterpart of the already established results concerning the asymptotic normality of the wavelet log-regression estimators [1113]. Indeed, although the MAVAR can be related in some sense to a suitable Haar-type wavelets family (see [14] for the classical Allan variance), the MAVAR and wavelets log-regression estimators do not match as the regression runs on different parameters (see Section 5). Hence, we adopt a different argument which in turns allows us to avoid some technical troubles due to the poor regularity of the Haar-type functions.

The paper is organized as follows. In Section 2 we recall the properties of self-similarity and long-range dependence for stochastic processes and the definition of fractional Brownian motion; in Section 3 we introduce the MAVAR and its estimator, with their main properties; in Section 4 we state and prove the main results concerning the asymptotic normality of the estimator; in Section 5 we make some comments on the link between the MAVAR and the wavelet estimators and on the modified Hadamard variance, which is a generalization of the MAVAR; in the Appendix we recall some results used along the proof.

2. Self-Similarity and Long-Range Dependence

In the sequel we consider a centered real-valued stochastic process , that can be interpreted as the signal process. Sometimes it is also useful to consider the -increment of the process , which is defined, for every and , as

In order to reproduce the behavior of the real data, it is commonly assumed that satisfies one of the two following properties: (i) self-similarity; (ii) long range dependence.(i)The self-similarity of a centered real-valued stochastic process , starting from zero, refers to the existence of a parameter , called Hurst index or Hurst parameter of the process, such that, for all , it holds In this case we say that is an -self-similar process. (ii)We first recall that a centered real-valued stochastic process is weakly stationary if it is square integrable and its autocovariance function, , is a translation invariant, namely, if In this case we also set . If is a weakly stationary process, we say that it displays a long-range dependence, or long memory, if there exists such that the spectral density of the process, , satisfies the condition for some finite constant , where we write as , if . Due to the correspondence between the spectral density and the autocovariance function, given by the long-range condition (2.4) can be often stated in terms of the autocovariance of the process as for and some finite constant .

Notice that if is a self-similar process, then it obviously cannot be weakly stationary. On the other hand, assuming that starts from zero and it is as -self-similar process with weakly stationary increments, that is, the quantity does not depend on , it turns out that the autocovariance function is given by with , which is clearly not a translation invariant. Consequently, denoting by its -increment process (see (2.1)), the autocovariance function of is such that ([15]) In particular, if , the process displays a long-range dependence in the sense of (2.6) with . Under this assumption, we thus embrace the two main empirical properties of a wide collection of real data.

A basic example of the connection between self-similarity and long-range dependence is provided by the fractional Brownian motion [2]. This is a centered Gaussian process, starting from zero, with autocovariance function given by (2.8), where [16] It can be proven that is a self-similar process with Hurst index , which corresponds, for , to the standard Brownian motion. Moreover, its increment process called fractional Gaussian noise, turns out to be a weakly stationary Gaussian process [2, 17].

In the next sections we will perform the analysis of the modified Allan variance and of the related estimator of the memory parameter.

3. The Modified Allan Variance

In this section we introduce and recall the main properties of the modified Allan variance (MAVAR) and of the log-regression estimator of the memory parameter based on it.

Suppose that is a centered real-valued stochastic process, starting from zero, with weakly stationary increments. Let be the “sampling period” and define the sequence of times taking and setting , that is, .

Definition 3.1. For any fixed integer , the modified Allan variance (MAVAR) is defined [3] as where we set . For we recover the well-known Allan variance.

Let us assume that a finite sample of the process is given, and that the observations are taken at times . In other words we set for .

A standard estimator for the modified Allan variance (MAVAR estimator) is given by for and .

For , let us set so that the process turns out to be weakly stationary for each fixed , with , and we can write

3.1. Some Properties

Let us further assume that is an -self-similar process (see (2.2)), with . Applying the covariance formula (2.8), it holds with and where is the polynomial of degree given by

Since we are interested in the limit for , we consider the approximation of the two finite sums in (3.6) by the corresponding double integral, namely, Computing the integral and inserting the result in (3.6), we get where From (3.5) and (3.9), we get

Under the above hypotheses on , one can also prove that the process satisfies the stationary condition To verify this condition, we write explicitly the covariance as Setting we get This immediately provides the stationary condition (3.12).

To better understand the behavior of the covariance as varies, we apply the covariance formula (2.8) and get Thus, from (3.14), where Inserting (3.17) in (3.15), we obtain where as before, and Now we set with and an integer number in and we study the asymptotic behavior of for .

The limit relation that is, implies that and so, for , where . We can conclude that

3.2. The MAVAR Log-Regression Estimator

Let be the sample size, that is, the number of the observations.

Definition 3.2. Let such that , and let be a vector of weights satisfying the conditions The MAVAR log-regression estimator associated to the weights is defined as Roughly speaking, the idea behind this definition is to use the approximation in order to get, by (3.11) and (3.26), where . Thus, given the data , the following procedure is used to estimate : (i)compute the modified Allan variance by (3.2), for integer values , with ; (ii)compute the weighted MAVAR log-regression estimator by (3.27) in order to get an estimate of ; (iii)estimate by . In the sequel we will give, under suitable assumptions, two convergence results in order to justify these approximations and to get the rate of convergence of toward . Obviously, we need to take as in order to reach jointly the above two approximations.

4. The Asymptotic Normality of the Estimator

Since now on we will always assume that is a fractional Brownian motion (with ) so that the process is also Gaussian. Under this assumption, and with the notation introduced before, we can state the following results.

Theorem 4.1. Let be a sequence of integers such that Let be a given integer, the vector and, analogously, set . Then, it holds where is a suitable symmetric matrix.

From this theorem, as an application of the -method, we can state the following result.

Theorem 4.2. Let be defined as in (3.27), for some finite integer and a weight-vector satisfying (3.26). If is a sequence of integers such that then where , the column-vector is such that , and .

Let us stress that from the above result, and due to the condition , it follows that the estimator is consistent.

Before starting the proof of the above theorems, we need the following lemma.

Lemma 4.3. Let be a sequence of integers such that For two given integers , with , set and . Then where is a finite quantity.

Proof. Since , without loss of generality we can assume that for each . Recall the notation , and set and . From the definition of the MAVAR estimator and applying the Wick’s rule for jointly Gaussian random variables (see the Appendix), we get where . Since, by (3.20), the function only depends on and as , we rewrite the last line as This term, multiplied by , is equal to It is easy to see that this quantity converges (to a finite strictly positive limit) if and only if converges. From (3.25) it holds Thus, the quantity (4.10) is controlled in the limit by the sum that is convergent.

Proof of Theorem 4.1. As before, without loss of generality, we can assume that for each . Moreover, set again and . For a given real vector , let us consider the random variable defined as a linear combination of the empirical variances as follows: In order to prove the convergence stated in Theorem 4.1, we have to show that the random variable converges to the normal distribution with zero mean and variance . To this purpose, we note that where is the random vector with entries , for and , and is the diagonal matrix with entries By Lemma 4.3, if is the symmetric matrix with when , it holds therefore, condition of Lemma A.2 is satisfied.
In order to verify condition of Lemma A.2, let denote the covariance matrix of the random vector , and let denote its spectral radius. By Lemma A.3, we have where is the covariance matrix of the subvector . Applying the spectral radius estimate , and from equality (3.19), we then have In order to conclude, it is enough to note that

Proof of Theorem 4.2. By assumptions (4.3) on the sequence , and in particular from the condition , and inequality (3.11), it holds Thus, from Theorem 4.1, we get where is the vector with elements Now we observe that if , then, by (3.26) and (3.27), we have Moreover, and . Therefore, by the application of the -method, the convergence (4.20) entails where and thus concludes the proof.

Remark 4.4. By Lemma 4.3, and since by definition , an estimate of , with , is given by Therefore, setting equal to the corresponding estimate of divided by , that is and from (4.6), we obtain the convergence which can be used to obtain an asymptotic confidence interval for the parameter .

4.1. An Alternative Representation of the Covariance

It is well known that an FBM (with and has the following stochastic integral representation (see [2, 16]): Using this representation and Ito’s formula, we can obtain another representation of the covariance and, consequently, another formula for the coefficient .

Indeed, assuming without loss of generality so that , by (3.3) and (4.28), we can write for and where By Ito’s formula we get where Inserting this expression in (3.4), we obtain It follows, using Ito’s formula again and after the change of variables and ,

5. Some Comments

5.1. The Modified Allan Variance and the Wavelet Estimators

Suppose that is a self-similar process with weakly stationary increments and consider the generalized process defined through the set of identities In short, we write . From this definition and with the notation introduced in Section 3, we can rewrite the MAVAR as and its related estimator as Now we claim that, for fixed, the quantity recalls a family of discrete wavelet transforms of the process , indexed by and . To see that, let us fix and and set and , so that , for all . With this choice on the sequence of times, it is not difficult to construct a function such that An easy check shows that the function satisfies (5.5). Notice also that the components , , are suitably translated and renormalized Haar functions.

In the case , corresponding to the classical Allan variance, the function is exactly given by the Haar mother wavelet, as was already pointed out in [14].

Though the wavelet representation could be convenient in many respects, the Haar mother wavelet does not satisfy one of the conditions which are usually required in order to study the convergence of the estimator (see condition (W2) in [18]). Moreover, there is a fundamental difference between the two methods: in the wavelet setting the log-regression is done over the scale parameter for fixed, while the MAVAR log-regression given in (3.27) is performed over and for fixed.

5.2. The Modified Hadamard Variance

Further generalizing the notion of the MAVAR, one can define the modified Hadamard variance (MHVAR). For fixed integers , and , set where . Notice that for we recover the modified Allan variance. The MHVAR is again a time-domain quantity which has been introduced in [7] for the analysis of the network traffic. A standard estimator for this variance is given by for and .

Similarly to the analysis performed for the MAVAR, let us set so that we can write This suggests that convergence results, similar to Theorems 4.1 and 4.2, can be achieved also for the MHVAR and its related log-regression estimator.

5.3. The Case of Stationary Processes

In applications, MAVAR and MHVAR are also used in order to estimate the memory parameter of long-range dependent processes. This general case is not included in our analysis (which is restricted to the fractional Brownian motion) and it requires a more involved investigation. To our knowledge, there are no theoretical results along this direction.

Appendix

In this appendix we recall the Wick’s rule for jointly Gaussian random variables and some facts used along the proofs.

Wick’s Rule
Let us consider a family of jointly Gaussian random variables with zero mean. The Wick’s rule is a formula that provides an easy way to compute the quantity , for any index-set (see, e.g., [19]).

Since the ’s are zero-mean random variables, if has odd cardinality, we trivially get . We then assume that , for some . To recall the Wick’s rule, it is convenient to introduce the following graph representation. To the given index-set we associated a vertex-set indexed by the distinct elements of , and to every vertex we attached as many half-edges as many times the index appears in . In particular there is a biunivocal correspondence between the set of half-edges and , while . Gluing together two half-edges attached to vertices and , we obtain the edge . Performing this operation recursively over all remaining half-edges, we end up with a graph , with vertex set and edge-set . Let denote the set of graphs (possibly not distinguishable) obtained by performing this “gluing procedure” in all possible ways.

With this notation, and for all index-sets with even cardinality, the Wick’s rule for a family of jointly centered Gaussian random variables provides the identity

Example A.1. Consider the quantity , for . By the graphical representation, we take a vertex set with two half-edges attached to each vertex and perform the gluing operation on the half-edges. We then obtain the following three graphs (identified by their edges): , . Thus, from the Wick’s rule, we get the identity that has been used throughout the paper.

Now we recall some facts used in the proof of Theorem 4.1.

Denote by the spectral radius of a matrix , then Moreover the following lemmas hold.

Lemma A.2 (see [13]). Let be a sequence of centered Gaussian random vectors and denote by the covariance matrix of . Let be a sequence of deterministic symmetric matrices such that (1), (2). Then converges in distribution to the normal law .

Lemma A.3 (see [11]). Let be an integer and a covariance matrix. Let be an integer such that . Denote by the top left submatrix with size and by the bottom right submatrix with size , that is, Then .

Acknowledgments

The authors are grateful to Stefano Bregni for having introduced them to this subject, proposing open questions and providing useful insights. They are also grateful to Marco Ferrari for a helpful remark on the first version of this work. This work was partially supported by GNAMPA (2011).