Computational and Mathematical Methods in Medicine

Volume 2019, Article ID 4025305, 12 pages

https://doi.org/10.1155/2019/4025305

## Biases in the Simulation and Analysis of Fractal Processes

Correspondence should be addressed to Didier Delignières; rf.reilleptnomu@sereingiled.reidid

Received 11 March 2019; Accepted 20 June 2019; Published 3 December 2019

Academic Editor: Marko Gosak

Copyright © 2019 Clément Roume et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Fractal processes have recently received a growing interest, especially in the domain of rehabilitation. More precisely, the evolution of fractality with aging and disease, suggesting a loss of complexity, has inspired a number of studies that tried, for example, to entrain patients with fractal rhythms. This kind of study requires relevant methods for generating fractal signals and for assessing the fractality of the series produced by participants. In the present work, we engaged a cross validation of three methods of generation and three methods of analysis. We generated exact fractal series with the Davies–Harte (DH) algorithm, the spectral synthesis method (SSM), and the ARFIMA simulation method. The series were analyzed by detrended fluctuation analysis (DFA), power spectral density (PSD) method, and ARFIMA modeling. Results show that some methods of generation present systematic biases: DH presented a strong bias toward white noise in fBm series close to the 1/*f* boundary and SSM produced series with a larger variability around the expected exponent, as compared with other methods. In contrast, ARFIMA simulations provided quite accurate series, without major bias. Concerning the methods of analysis, DFA tended to systematically underestimate fBm series. In contrast, PSD yielded overestimates for fBm series. With DFA, the variability of estimates tended to increase for fGn series as they approached the 1/*f* boundary and reached unacceptable levels for fBm series. The highest levels of variability were produced by PSD. Finally, ARFIMA methods generated the best series and provided the most accurate and less variable estimates.

#### 1. Introduction

The repeated measurement of physiological or behavioral events (stride durations, heartbeat intervals, and intertap intervals) is typically characterized by a marked variability. For a long time, this variability has just been considered a random perturbation, without any functional signification. However, a number of authors, during the last two decades, showed that these biological series presented a typical long-range correlational structure over time and especially a statistical self-similar (fractal) pattern [1–4]. Fractal processes have recently received a growing interest, especially in the domain of rehabilitation. More precisely, the evolution of fractality with aging and disease, suggesting a loss of complexity [5], has inspired a number of studies that tried, for example, to entrain patients with fractal rhythms [6–8].

In this domain, authors are confronted with two main methodological problems: The first one concerns the evaluation of the level of long-range correlations in physiological series. A number of different methods have been proposed, and their respective qualities were systematically assessed in comparative studies [9–12]. The second one refers to the generation of exact fractal signals necessary of providing experimental devices (metronomes and virtual environments) with controlled long-range correlation properties. A number of methods have been proposed for simulating such series [13–15]. The assessment of estimation methods, on the one hand, and simulation methods, on the other hand, raises a typical problem of circularity, performances, and biases in the former being analyzed on the basis of the latter, and vice versa [11]. When a bias is identified, it remains difficult to attribute the problem to the method of simulation or to the method of assessment. In order to overcome this problem, we propose in the present paper a cross-validation study, combining three methods of generation and three methods of analysis.

We first propose a formal introduction of the three main domains of definition of long-range correlated processes: the fractional Brownian motion framework [16], the spectral domain [17], and the autoregressive fractionally integrated moving average (ARFIMA) processes [18].

#### 2. Theoretical Models

##### 2.1. The fBm/fGn Model

The *fractional Brownian motion* (fBm) denoted as *B*_{H}(*t*) is a mathematical model of continuous stochastic process introduced by Mandelbrot and Van Ness [16] as a generalization of the Brownian motion, where increments do not need to be independent. *B*_{H}(*t*) is characterized by the Hurst parameter (*H*), which can take any real value within the interval ]0, 1[. This value gives information about the nature and the strength of the correlation between successive increments in the process. If *H* is below 0.5, *B*_{H}(*t*) is underdiffusive, and its increments are anticorrelated. In contrast, if *H* is above 0.5, *B*_{H}(*t*) is overdiffusive, and its increments are positively correlated. In the case that *H* equals 0.5, *B*_{0.5}(*t*) is an ordinary Brownian motion (normal diffusion), and its increments are an uncorrelated Gaussian white noise.

fBm has some fundamental properties. The first one is that its variance grows as a power function of the length of the time interval observed, with an exponent 2*H*:The second is that fBm is a fractal process, characterized by statistical self-similarity:By definition, *B*_{H}(*t*) is fully described by its autocovariance function :

As previously indicated, the successive increments of *B*_{H}(*t*) can be correlated and the Hurst exponent informs about the nature of this correlation, and thus, the derivative of *B*_{H}(*t*) should be a stationary correlated noise. However, this derivative cannot be calculated because in theory Brownian motion describes an infinitely broken continuous trajectory. In other words, nondifferentiability is a fundamental property of *B*_{H}(*t*). One can still estimate this derivative using a discrete version of *B*_{H}(*t*) where increments are defined on a time interval *m*. This estimate is a discrete process called *fractional Gaussian noise* (fGn), denoted as *G*_{m}(*i*):

As for the fBm, the fGn is fully described by its autocovariance function, which is easily derived from equation (3) with *m* = 1:

##### 2.2. The Spectral Model

Stochastic fractal processes can also be defined in the frequency domain, on the basis of a scaling law that relates power (i.e., squared amplitude) to frequency according to an inverse power function, with an exponent *β* [9, 10]. For an fGn process with exponent *H*, the power spectrum can be expressed as follows [12, 19]:and for the corresponding fBm process [12, 20]:

Then the power spectrum of fGn/fBm processes has the general form:with *β* ∈ ]−1, 1[ for fGn and *β* ∈ ]1, 3[ for fBm. This suggests that fGn and fBm could be considered as a continuum, with both families being characterized in the time domain by a scaling exponent *α*, *α* = *H* for fGn and *α* = *H* + 1 for fBm. This assumption has been exploited by Peng et al. [4] in the conception of the detrended fluctuation analysis (see Section 3). The scaling exponent *α* is linearly related to the spectral exponent *β* over the whole fGn/fBm continuum by [21]. The case *β* = 1 defines the so-called “1/*f* noise,” which represents the boundary between fGn and fBm processes.

##### 2.3. The ARFIMA Model

A third approach to long-range correlated processes is provided by the autoregressive fractionally integrated moving average (ARFIMA) models [18]. This approach is an extension of the ARIMA (for autoregressive, integrated, moving average) framework, introduced by Box et al. [22], which intended to represent a variety of short-term relationships in time series. ARIMA models are potentially composed of three components. The autoregressive component suggests that the current observation *y*(*t*) is determined by a weighted sum of the previous observations, plus a random perturbation *ε*:

The moving average component supposes that the current observation depends on the value of the random perturbations that affect the *q* preceding observations, plus its own specific perturbation:

Finally, the differencing parameter *d* indicates the number of differencing that should be applied to the series before modeling. An ARIMA model is the combination of these three components and can be designated by the respective orders of the three processes as (*p*, *d*, *q*). ARIMA modeling has been used either for generating time series with specified *p*, *d*, *q* parameters or for determining the best *p*, *d*, *q* combination for accounting for a given series [22, 23].

ARIMA models could be more conveniently expressed using the so-called *backshift operator*, defined as

The generic ARIMA (*p*, *d*, *q*) model can then be rewritten aswhere *ϕ*(*B*) and *θ*(*B*) are, respectively, the autoregressive and the moving average operators, represented as polynomials in the backshift operator: and [24]. In the initial formulation of the model, the *d* parameter was an integer [22]. Granger and Joyeux [18] showed that it was possible to provide this model with long-range dependence properties by allowing the differencing parameter *d* to take on fractional values, thereby obtaining an ARFIMA model.

Here we focus on the most simple model ARFIMA(0, *d*, 0), which is supposed to only contain long-range correlations. Using the backshift operator notation, this model is expressed as follows [25]:with

Granger and Joyeux [18] derive a filter *A*(*B*) from equations (13) and (14) and demonstrate that the process can be rewritten aswhere *d* is a measure of the intensity of long-range correlations in the series. Note, however, that ARFIMA models account only for stationary series, *d* being bounded within the interval ]−0.5; 0.5[. In other words, ARFIMA models remain limited to fGn series. *d* is related to the spectral exponent *β* and the scaling exponent *α* by the following linear equations:

Each of these theoretical frameworks provided specific methods for generating fractal signals or for assessing the fractality of empirical series. In the present work, we engaged a cross validation of three methods of generation and three methods of analysis. We selected one simulation method and one estimation method in each previously presented domain of definition. Our rational is that biases that are revealed by the three analysis methods should be attributed to the generation method and conversely biases that appear whatever the generation method should originate in the analysis method.

#### 3. Methods

In order to explore the whole fGn/fBm continuum, we first generated series from *α* = 0.1 to 0.9, by steps of 0.1, and from 1.01 to 1.9, by steps of 0.1. These values were used in most previous similar studies [9, 10, 26]. Additionally, in order to analyze more closely the behavior of simulation and analysis methods around to the 1/*f* boundary, we generated a series from *α* = 0.91 to 1.09, by steps of 0.01. This range of exponents was rarely considered in the literature (for a noticeable exception, see [27]). However, this focus on the 1/*f* boundary seems of particular interest, because the problems of fGn/fBm classification are concentrated within this interval [10] and also because one could have some doubts about the hypothesis of continuity around the 1/*f* boundary [28].

We used three methods of generation: the Davies–Harte algorithm, the spectral synthesis method, and the ARFIMA simulation method. These methods are detailed below. For each selected *α* value and with each method, we generated 120 series of 1024 data points. In this section, all methods are written in the discrete time and frequency domain; for reading convenience, we keep the variable *t* for discrete time domain and *f* for discrete frequency domain.

##### 3.1. Davies–Harte Algorithm (DH)

We used the algorithm proposed by Davies and Harte [13], for generating fGn series of length *N* (*N* being a power of 2). As previously indicated, an fGn process is fully described by its autocovariance function (see equation (5)). Then, one can deduce the exact spectral power *S* expected for this autocovariance function, from the discrete Fourier transform of the following sequences of covariance values *γ*_{G1} defined by equation (5): *γ*_{G1}(0), *γ*_{G1}(1), …, *γ*_{G1}((*N*/2)−1); *γ*_{G1}(*N*/2), *γ*_{G1}((*N*/2)−1), …, *γ*_{G1}(1):where *f* = 0, 1, …, *N* − 1. It is important to check that for all *f*. Negativity would indicate that the sequence is not valid.

Let , where *f* = 0, …, *N* − 1, be a white Gaussian noise. The randomized spectral amplitudes, *V*(*f*), are calculated according to the following equations:

Finally, the first *N* elements of the discrete Fourier transform of *V* are used to compute the simulated series *x*(*t*):where *t* = 1, 2, …, *N*.

We first generated fGn series for *H* values ranging from 0.1 to 0.9, by steps of 0.1. In order to explore more precisely the performance of DFA close to the 1/*f* boundary, we also generated fGn series for *H* values ranging from 0.91 to 0.99, by steps of 0.01. A second set of fGn series was generated, for *H* values ranging from 0.1 to 0.9, by steps of 0.1, and was integrated for obtaining fBm series for each corresponding *H* value (i.e., for *α* values ranging from 1.1 to 1.9). Finally, we generated fGn series for *H* ranging from 0.01 to 0.09 and integrated them for obtaining fBm series close to the 1/*f* boundary (i.e., for *α* values ranging from 1.01 to 1.09).

##### 3.2. Spectral Synthesis Method (SSM)

The spectral synthesis method is designed to produce fBm and fGn series *x*(*t*) based on the characteristics of the power spectral density *S*(*f*) of these signals. In other words, the idea of SSM is to generate a right kind of *S*(*f*) that gives rise to fBm or fGn with an exponent 0 *<* *H* *<* 1.

Since *S*(*f*) is obtained as the square of the modulus of the Fourier transform of the signal and its proportional to (see equation (8)),

We need to generate a complex series *X*(*f*) with a modulus *r*(*f*) proportional to 1/*f*^{β/2}. *r*(*f*) is obtained as follows:where *W*_{gn} is a white Gaussian noise and *f* is the frequency ranging from 1 to *N*/2.

To generate the *X*(*f*), we also need to generate a random phase in radian:where *W*_{un} is a white noise with uniform distribution.

We can create the complex coefficients:

These coefficients are extended to the whole range of expected values (in respect of the Shannon theorem), *a*(*f*) from to *N* being equal to *a*(*f*) from (*N*/2) to 1 and *b*(*f*) from to *N* being equal to *b*(*f*) from (*N*/2)to 1. Then, the complex series *X*(*f*) are generated as follows:

Finally, the inverse Fourier transform of this complex number is computed to obtain the time series:where *t* = 1, 2, …, *N*.

##### 3.3. ARFIMA Simulation Method

We used the Matlab code proposed by Fatichi [14], based on equation (12), and in the present case, as we limited ourselves to ARFIMA (0, *d*, 0) models, on equation (13). This code just bounds the summation in equation (15) to *k* = 100. We used this procedure for generating fGn series (*d* ∈ ]−0.5, 0.5[). In order to obtain the whole set of series we needed, we computed fBm series by cumulated summation.

We now present the three estimation methods we used. Note that for a better readability, the exponents of the simulated series and the estimates are expressed or converted in *α* metrics. Estimates are denoted with a circumflex (, , ).

##### 3.4. Detrended Fluctuation Analysis

The DFA algorithm works as follows, for a series *x*(*t*) of length *N* where *t* = 1, 2, …, *N*. The series is first integrated, by computing for each *t* the accumulated departure from the mean of the whole series:

This integrated series is then divided into *k* nonoverlapping intervals of length *n*. The last *N* *−* *kn* data points are excluded from analysis. Within each interval, a least squares line is fitted to the data. The series *X*(*t*) is then locally detrended by subtracting the theoretical values *X*^{th}(*t*) given by the regression. For a given interval length *n*, the characteristic size of fluctuation for this integrated and detrended series is calculated by the following equation:

In the original algorithm, this computation is repeated over all possible interval lengths, for example, from *n* = 10 to *n* = *N*/2 [9]. In the present paper, we applied the *evenly spaced averaged* version of DFA [29], which significantly reduces the variability of estimates. This procedure consists in dividing the (log) abscissa into *P* bins of length , starting from . The *P* bins are defined as follows:where *p* = 1, 2, …, *P*. In the present analyses, we set *n*_{min} = 10, *n*_{max} = 512, and *P* = 18.

Within each bin , the average interval length and the average fluctuation size are computed. A power law is expected as

The exponent estimate () is obtained as the slope of the double logarithmic plot of as a function of .

##### 3.5. Power Spectral Density Analysis (PSD)

This method works on the basis of the periodogram obtained by the fast Fourier transform algorithm and exploits the power law given by equation (8). is estimated by calculating the negative slope (−*β*) of the line relating log (*S*(*f*)) to log *f*.

In the present paper, we used the improved version proposed by Fougere [30] and modified by Eke et al. [10], designated as ^{low}PSD_{we}. This method uses a combination of preprocessing operations: first, the mean of the series is subtracted from each value, and then, a parabolic window is applied—each value in the series is multiplied by the following function:where *t* = 1, 2, …, *N*.

Secondly, a bridge detrending is performed by subtracting from the data the line connecting the first and last point of the series:where *t* = 1, 2, …, *N*.

The Fourier transform is applied on the modified series , and the fitting of *β* excludes the high-frequency power estimates (*f* > 1/8 of maximal frequency). This method was proven by Eke et al. [10] to provide more reliable estimates of the spectral index. For allowing a direct comparison between methods, *β* was then converted into *α* metrics by a simple linear transform .

##### 3.6. ARFIMA Modeling

The *d* parameter was estimated by fitting an ARFIMA(0, *d*, 0) model to the series using Whittle approximation of the maximum likelihood estimator. We used the ARFIMA(*p*, *d*, *q*) estimator package for Matlab proposed by Inzelt [31] on the Matlab central file exchange platform.

As previously explained, ARFIMA modeling holds only for fGn series and *d* is bounded within the interval ]−0.5, 0.5[. For fBm series, Diebolt and Guiraud [32] proposed to apply ARFIMA modeling to the corresponding fGn (obtained by differentiation) and then to estimate the theoretical fractional parameter of the fBm series by adding 1 to the *d* value obtained from the fGn. This strategy, however, requires an a priori assessment of the fGn/fBm classification of series. Some solutions, based on the preliminary application of methods working indifferently on fGn and fBm, such as DFA or PSD, have been proposed but yielded important percentages of misclassification around the 1/*f* boundary [9, 10].

In order to apply ARFIMA modeling indifferently on fGn and fBm series, as DFA and PSD, we used the following strategy: the algorithm consists in finding the best Whittle approximate of the maximum likelihood estimator by constrained optimization. According to ARFIMA properties, the output parameter is bounded, yielding the parameter value if the algorithm did not converge on the upper bound. When , obtained from the original series, was equal to 0.4999, the series was considered as fBm and the algorithm was applied on the differentiated time series in order to obtain estimate. In this nonstationary case, else . This parameter estimate was then converted into *α* metrics by a simple linear transform .

#### 4. Results

We present in Figure 1 the relationships between the true *α* and the mean values, for each simulation and estimation method. If we consider the global shape of results around the identity line, it seems obvious that estimation methods performed better when series were generated by the method belonging to the same domain (see the three graphs ranging along the descending diagonal). This was already observed by Eke et al. [11].