Abstract

Extreme events, which are usually characterized by generalized extreme value (GEV) models, can exhibit long-term memory, whose impact needs to be quantified. It was known that extreme recurrence intervals can better characterize the significant influence of long-term memory than using the GEV model. Our statistical analyses based on time series datasets following the Lévy stable distribution confirm that the stretched exponential distribution can describe a wide spectrum of memory behavior transition from exponentially distributed intervals (without memory) to power-law distributed ones (with strong memory or fractal scaling property), extending the previous evaluation of the stretched exponential function using Gaussian/exponential distributed random data. Further deviation and discussion of a historical paradox (i.e., the residual waiting time tends to increase with an increasing elapsed time under long-term memory) are also provided, based on the theoretical analysis of the Bayesian law and the stretched exponential distribution.

1. Introduction

Extreme events in complex systems have been widely explored for decades, such as natural hazards including extreme climate events [1], megalandslides [2], and earthquakes [3, 4] that cause severe challenges in economy, society, and environment. The clustering phenomena of extreme events imply the existence of long-term memory [5, 6]. Those phenomena were widely observed in river water levels [7], ocean temperature fluctuations [8], large-scale climate temperature [9], and so on. The generalized extreme value (GEV) distribution model (or the interval model) is designed to analyze the maximum within the interval (see Figure 1(a), where the artificial random data is generated using the Lévy stable distribution, which will be further discussed in Section 2). According to the traditional extreme value theorem, these extremes will converge to the three generalized extreme value distributions: Fréchet, Gumbel, and Weibull [10]. Though the GEV model has achieved many successful and empirical results [10, 11, 12], it is a statistical model based on independently and identically distributed (i.i.d.) data to investigate the probability density distribution characteristics without the impact of the temporal memory [13]. Prediction of the tail of the distribution, which has low probability but high impact, cannot be obtained accurately using the traditional extreme statistics, since it is impossible to obtain the effective description from the spatial probability density distribution or the i.i.d.

Previous studies have confirmed that the recurrence time analysis (see Figure 1(b)) is a powerful tool to characterize the temporal scaling properties and derive quantitative risk estimation of hazardous events [14]. This method can more efficiently use experimental data and characterize the physical correlations of time scales. Meanwhile, previous studies show that the recurrence of extreme events is not necessary to follow a pure memoryless Poisson distribution [15, 16].

This study aims at investigating extreme events with memory using two major steps. First, we will identify the memory effect embedded in extreme events. The memory effect can be characterized by the autocorrelation function [17, 18], given, for example, a normalized time series {} (where ): with a power law decay and the correlation exponent . In this study, we use the detrended fluctuation analysis (DFA) [21] to detect this long-correlated behavior with Hurst exponent . For long-term correlated data, the Hurst exponent equals to [2225] where . The stretched exponential distribution with the correlation exponent proposed in [1720] is then adopted to characterize the recurrence time.

Second, we will explore the influence of temporal memory on the forecast of extreme events based on the artificial data following the Lévy stable distribution. Under the memory behavior of previous events, there may be an improved estimate of the probability of a future event occurrence. To directly quantify this influence, we refer to Davis et al. [26]: “The longer it has been since the last earthquake, the longer the expected time till the next?” Therefore, we will apply the stretched exponential distribution, which is a widely used statistical model describing temporal memory features, to explore the possible “paradox” between the residual waiting and elapsed times. By extending the numerical analysis in literature [2628], Sornette and Knopoff proposed a rigorous statistical framework for a quantitative conditional probability response and found that this framework is very sensitive to the assumed distribution [29]. Hereby, we will make an attempt to offer a derivation to this paradox (residual waiting time increases with the elapsed time under long-term memory), based on the theoretical analysis of the Bayesian law and the stretched exponential distribution.

It is also noteworthy that we select the Lévy stable distribution to quantify the heavy-tailed distribution of time series when analyzing extreme events. Cautions are needed when generating the time series data with long-term memory, because the non-Gaussian distribution feature of power-law processes cannot be well analyzed using traditional statistical models, such as the Gaussian distribution and lognormal distribution [17, 18, 30]. Based on extensive successful investigations of the Lévy stable distribution in real-world applications [31, 32], here we characterize heavy-tail behavior of time series using the Lévy stable distribution with a stability index (0). We apply the Lévy stable distribution and Hurst exponent in linear fractional stable noise (LFSN) to simulate heavy-tail and long-term memory processes and then investigate the property of extremal behavior using the methods proposed above.

The rest of this work is organized as follows. In Section 2, we introduce the LFSN model and explain the simulation parameters used to test the extreme value statistical behavior. In Section 3, we show the defects of traditional extreme value statistical models in describing the temporal behavior. The influence of temporal memory of the recurrence interval is then described using the stretched exponential distribution. In addition, based on the Bayesian theory and the stretched exponential statistics model, the “paradox” mentioned above is deduced in principle. Conclusions are drawn in Section 4.

2. Methods

2.1. Random Number Generation

The Lévy stable distribution includes four parameters: stability index (), skewness parameter (1), scale parameter (), and location parameter (). We employ the Lévy stable distribution to provide insight on the heavy-tail probability distribution, and this heavy-tail simulated fluctuation process is controlled by stability index α < 2 in this study [33]. In the following, we use the random number generation method of the Lévy distribution proposed by Chambers et al. for analysis [34]. More details of the algorithm can be found in [35].

Linear fractional stable motion (LFSM) is a generalization of fractional Brownian motion (fBm) [36]. LFSN, which is an increment process of LFSM, displays both abnormal fluctuations and long-term memory through Hurst exponent and stability index . The LFSM stochastic process is given as follows [37]: where in which , , and are a standard symmetric -stable Lévy random measure on . Linear fractional stable noise as the LFSM increment process is stated as where and presents the long-term memory when , and it reduces to the fractional Gaussian noise when . In our artificial data generation, the stability index is , the Hurst parameter is which corresponds to correlation exponent (see 2), and the number of generated data is .

2.2. Influence of Long-Term Memory and Non-Gaussian Processes on GEV Statistics

In the classical GEV model, one assumes that are independent and identically distributed data described by the cumulative distribution function . The maximum value is also an element of the original data. Hence, the distribution of the maxima satisfies

According to the Fisher-Tippett extreme value theorem, if there are constant columns and , is a nondegenerate distribution function. Hence, must converge to one of the three types of extreme value distributions according to the distribution of the original data, when the number of data [9]. For the original data following a power-law distribution, converges to Fréchet distribution, or type II distribution, which is defined as where is the location parameter, represents the scale parameter, and is the shape parameter.

Figure 2 shows a comparison of the probability density distribution for both the correlated () and uncorrelated shuffled data for (corresponding to the annual maxima). The distribution of long-term memory data (blue color) shifts to the left compared with the uncorrelated sequence, and the left tail exhibits an obvious broadening trend, which is consistent with the one offered in [18]. It is noteworthy that the skewness of the probability density distribution of the correlated data following the Lévy stable distribution focused by this study is apparently larger than that following the Gaussian and exponential distributions considered in [18]. This discrepancy is mainly because the Lévy stable distribution is a middle part-dominated distribution with a heavy tail, compared with the Gaussian and exponential distributions. It is also clear that the distribution of long-term memory data (represented by the blue line in Figure 2) is more divergent compared with the uncorrelated sequence, especially on the left-hand side. This result indicates the influence of the memory or the role of correlation, which can make many large value events be clustered in certain time intervals while the maximum values in other periods are generally small. Because the large values will be still identified as annual maxima, the right tail of the extreme distribution is almost unaffected by the correlations. The GEV model cannot make a clear response to the temporal behavior.

When we investigate the extreme value problem, the calculation of the extreme value of the reoccurrence period is a very important part. For the -year maximum value, the corresponding probability . Therefore, the maximum value of the return period is estimated as the quantile of the probability , and we can get

Here, we let and then estimate a hundred-year maximum of two different memory behavior data (analyzed in Figure 2). Through (8), , , and we can find . It is noteworthy that the obtained reoccurrence value has strong one-sidedness in practical sense, and the traditional extreme value model does not describe the factors of time-related behavior.

3. Results and Discussion

3.1. Statistics of Extreme Recurrence Times

We analyzed the return interval over threshold values for the return time statistics of long-term correlated time series. For uncorrelated data, such as “white noise,” the return intervals are also uncorrelated and follow the exponential distribution according to Poisson statistics [12]. When the return interval is affected by the long-term correlation, exhibits a significant slower decay than the Poisson exponential distribution. This slower decay can be captured by the stretched exponential distribution [38, 39]: where the exponent is the correlation exponent to characterize the memory of the data, the parameters and are independent of , and is the average of the return interval at the given threshold . In the study of the universality of (9), the return interval functions of four different original distribution data (Gaussian, exponential, power-law, and lognormal) were fitted in [14], where the results show that the stretched exponential agrees well with the Gaussian data and also good for the other distributions. It is also worth of note that the stretched exponential distribution of the recurrence time can be derived exactly from a deeper process, namely, the Hawkes process of interevent triggering [40].

Figure 3 shows the distribution of of the return intervalsfor both the original data (red symbols) and the shuffled data (rescaled by 10−1, shown by the black symbols). In both cases, since reflects only the variation of the ratio , the application of the recurrence time analysis is no longer limited to the actual threshold . Figure 3 also confirms that the distribution function is exponential for the shuffled data.

Compared with the results of the uncorrelated data, the influence of the exponent in (9) makes the return intervals to exhibit an obvious two-stage differentiation. More intuitively speaking, the return intervals for both and are considerably more frequent for memory records than for the uncorrelated data. It means that the mean is a poor description, because the analysis object has no typical scale or the “characteristic” scale is missing, or broadly referred to as “scale-free” phenomenon [41]. It also implies that the distribution changes from the exponential () to the power-law distribution () when the index decreases (the degree of correlations increases) [42]. It means that the stretched exponential distribution is a subslow decay distribution between the exponential distribution and the power-law distribution with , in which the power-law relation is a statistical form of fractal which emphasizes the similarity of all scales [19, 43]. Therefore, the subslow decay of the stretched exponential distribution is in fact a scale-free statistical form or the result of transition from a nonsimilarity structure to a fully statistical fractal structure.

3.2. The “Paradox” Phenomenon of the Residual Waiting Time under Long-Term Memory Effect

According to the discussion in the above section, the “cluster” phenomenon means that the data has a long memory effect, and the occurrence of the event is no longer a simple memoryless Poisson exponential distribution. The stretched exponential distribution of slow decay embodies the occurrence of events as a “scale-free” process. This temporal behavior can be quantified by the prior events, so there will be a corresponding predictable effect on the occurrence of the next event. In general, it reflects the dependence of the last return interval on the previous interval. The waiting time to the next event in the time interval also follows the stretched exponential distribution.

Bunde et al. [17] discussed the existence of memory effects according to the simulation data and obtained the “paradox” phenomenon of the residual waiting time that increases with increasing and . Here, we try to derive a specific demonstration and quantification to the abnormal results, based on the fitting parameters for the stretched exponential distribution shown in Figure 3. In the following derivation, we adopt the numerical analysis method introduced by Sornette and Knopoff [29].

First, we assume P(r) to be the return interval distribution, and the unknown residual waiting timesatisfies the prior hypothesis. According to the Bayesian conditional probability theorem, the distribution function of satisfies

Here, we let the expected waiting time to be a function of time and analyze the variation rule of. From (10), the expected waiting time is calculated as

To develop some intuition, we first analyze the Poisson exponential distribution. From (10), we can get

Corresponding to the exponential distribution without memory, the estimation of the time does not depend on the elapsed time , with . As for the stretched exponential, from (9) and (10), we get

We then calculate (13) using the Gauss-Kronrod integration method. Figure 4 shows the distribution of at time , , and based on the fitting results of Figure 3. It is obvious that has a progressively broadening tail to the origin as increases and lies above at; that is, the probability of large numbers for the residual waiting time increases as elapsed time increases. The answer to the anomalous result is positive, and this property is evidently connected with the slow decay in compared with the Poisson exponential distribution.

When we further investigate the relationship between the expected time ((1)) and the elapsed time based on the fitting results in Figure 3, one can find that the expected waiting time depends on, compared with the exponential distribution of the memoryless data. It clearly displays the effect of different long-term correlations, where the expected residual time to the next event increases with an increasing . At the same time, the degree of the anomalous behavior increases, and this change is enhanced by decreasing . This result implies that the dependence of the memory effect exists not only between adjacent return intervals but also in the unknown interval between the waiting time and the elapsed time . This result confirms the finding in the previous statistical analysis [17].

4. Conclusion

This study investigates the influence of memory effect on extreme value models, based on the random data generated by the Lévy stable distribution, which is different from the previous evaluation using the Gaussian or exponential distribution in [17, 18]. Combining with non-Gaussian and memory effect, the LFSN is used to simulate the experimental data. The simulation result shows that the stretched exponential distribution provides a reliable way to estimate the scaling behavior of extreme event intervals, generalizing the previous evaluation of the stretched exponential function to analyze random data following the Gaussian and exponential distributions. Using the Bayesian conditional statistical principle in conjunction with the stretched exponential distribution, we also theoretically validate the “anomalous” behavior identified by various studies [17, 18] (where the residual waiting time can increase with an increasing elapsed time under long-term memory or the so called “anomalous residence time”), which may shed light on the real-world extreme event prediction.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant nos. 11572112, 41628202, and 41330632 and the Fundamental Research Funds for the Central Universities under Grant no. 2017B21614.