#### Abstract

We search for the signature of universal properties of extreme events, theoretically predicted for Axiom A flows, in a chaotic and high-dimensional dynamical system. We study the convergence of GEV (Generalized Extreme Value) and GP (Generalized Pareto) shape parameter estimates to the theoretical value, which is expressed in terms of the partial information dimensions of the attractor. We consider a two-layer quasi-geostrophic atmospheric model of the mid-latitudes, adopt two levels of forcing, and analyse the extremes of different types of physical observables (local energy, zonally averaged energy, and globally averaged energy). We find good agreement in the shape parameter estimates with the theory only in the case of more intense forcing, corresponding to a strong chaotic behaviour, for some observables (the local energy at every latitude). Due to the limited (though very large) data size and to the presence of serial correlations, it is difficult to obtain robust statistics of extremes in the case of the other observables. In the case of weak forcing, which leads to weaker chaotic conditions with regime behaviour, we find, unsurprisingly, worse agreement with the theory developed for Axiom A flows.

#### 1. Introduction and Motivation

The investigation of extreme events is extremely relevant for a range of disciplines in mathematical, natural, and social sciences and engineering. Understanding the large fluctuations of the system of interest is of great importance from a theoretical point of view, but also when it comes to assessing the risk associated with low probability and high impact events. In many cases, in order to gauge preparedness and resilience properly, one would like to be able to quantify the return times for events of different intensity and take suitable measures for preventing the expected impacts. Prominent examples are weather and climate extremes, which can have a huge impact on human society and natural ecosystems. The present uncertainty in the future projections of extremes makes their study even more urgent and crucial [1].

In practical terms, the main goal behind the study of extreme events is to understand the properties of the highest quantiles of the variable of interest. A fundamental drawback comes from the fact that extreme events are rare, so that it is difficult to collect satisfactory statistics from the analysis of a time series of finite length. Additionally, in the absence of a strong mathematical framework, it has its limits to make quantitative statements about the probability of occurrence of events larger than observed. Therefore, statistical inference based on empirical models tends to suffer from the lack of predictive power. A theoretical framework for analysing extreme events is provided by extreme value theory (EVT). After the early contributions by Fisher and Tippett [2], EVT was introduced by Gnedenko [3], who discovered that under rather general conditions the extreme events associated with stochastic variables can be described by a family of parametric distributions. As a result of this, the problem of studying the extremes of a stochastic variable can be reduced to estimating the parameters of a known probability distribution. The presumably most important parameter of such a distribution is called the shape parameter, which determines the qualitative properties of the distribution and, in particular, whether it has a finite or infinite upper point or, more concretely, whether extremes are bounded by an absolute finite maximum or not.

The two most important properties of extremes—their rare occurrence and their unusually high or low magnitude—constitute the basis of two popular methods of EVT, the block maxima (BM) and the peak-over-threshold (POT) approaches. The BM approach aims at finding the limiting distribution of maxima of independent identically distributed random (i.i.d.r.) variables separated into blocks of size , as . Under rather general conditions and for many different parent distributions, the limiting distributions of block maxima belong to the Generalized Extreme Value (GEV) distribution family; this is the point of view originally proposed by Gnedenko [3]. The POT approach aims at understanding the statistical properties of the exceedances of the random variable above a given threshold . Under the same conditions as for the BM approach, if is very high, the distribution of the threshold exceedances belongs to the Generalized Pareto (GP) distribution family [4, 5]. The existence of well-defined functional forms for the distributions describing extreme events provides predictive power: one can in principle compute the return time of events of yet unobserved magnitude. It is remarkable that these two points of view on extremes (which lead to different selection procedures and different choices of the events classified as extremes) are, in fact, equivalent. For a given random variable , the GEV and GP distributions describing its extreme events have the same asymptotic behaviour. Additionally, it is possible to express the parameters characterizing one distribution as a function of the parameters characterizing the other distribution; in particular, the shape parameter is the same for both.

Problems in applying EVT to actual time series result from the fact that, typically, the observed data feature a certain degree of serial correlations [6]. Note that the setting behind the construction of EVT can be extended by relaxing the hypothesis of independence of the random variables, as long as correlations are decaying sufficiently fast. This is of clear relevance when trying to use EVT for studying observables of deterministic dynamical systems. In this case, in fact, the underlying dynamics determines the existence of correlations between the values of observables at different times, and one can easily guess that, when the dynamical system is chaotic, there is good hope of deriving EVT for its observables [7]. Obtaining the true limiting EVT can be extremely hard, even in simple dynamical systems [8], or, in that case, it can be argued that they do not even exist [9]. When analysing finite time series, the convergence of the estimated GEV or GP shape parameters to near-zero asymptotic values can be very slow. The speed of convergence depends on the type of parent distribution [10] and can be additionally slowed down by correlations [11, 12]. Due to the fact that the dataset size is always limited, there is typically a difference between the asymptotic GEV or GP parameters and the estimated ones; finite-size estimates are generally biased. For example, the GEV shape parameter of a simple Gaussian process is 0, but, for any finite time series, we would estimate typically a negative shape parameter [2].

When performing statistical inference using the BM or POT method (fitting the GEV or GP model, resp., to data), it is crucial to have an appropriate protocol of selection of “good" candidates for extremes [12]. On the one hand, if the chosen blocks (for the BM method) are too short or the threshold (for the POT method) is too low, the approximation of the limit model is likely to be inadequate. Hence, the verification of the agreement between the statistical model and the available data is essential, which can be done based on goodness-of-fit tests, like the Kolmogorov-Smirnov [13], Anderson-Darling [14], or Pearson’s chi-squared tests [15]. On the other hand, if the blocks are too large or the threshold is too high, the number of extremes may be insufficient for a reliable estimation of the parameters, and the uncertainty becomes very high. As discussed later in the paper, Coles [12] shows how to derive an optimal choice for the value of the block size or the threshold, in such a way as to verify that we are close to the asymptotic level as required by EVT but we use the available data as efficiently as possible. In this regard, we introduce a new way of choosing an optimal block size or threshold based on a concept of accuracy of estimation, which can be actually carried out only when the true shape parameter is known from theory.

Classical EVT has been extended and adapted to analyse extremes of observables of chaotic dynamical systems, where the sensitive dependence on initial conditions is fundamentally responsible for generating a de facto stochastic process. The reader is referred to Lucarini et al. [7] for a detailed overview of the field of EVT for dynamical systems. It is possible to establish an EVT for observables of chaotic dynamical systems when one considers Axiom A dynamical systems. These are rather special chaotic dynamical systems that are uniformly hyperbolic on their attractor (so that stable and unstable directions are well separated), which supports a so-called Sinai-Ruelle-Bowen (SRB) measure. Such an invariant measure has physical relevance because it is stable against weak stochastic perturbations [16, 17]. One of the great merits of Axiom A dynamical systems is that they allow for deriving rigorous and robust statistical mechanical properties for purely deterministic background dynamics. Despite having deterministic dynamics, when looking at their observables, they behave just like generators of stochastic processes. While Axiom A systems are rather special and indeed not generic, they have great relevance for applications if one takes into account the chaotic hypothesis, which indicates that high-dimensional chaotic systems behave at all practical purposes as if they were Axiom A [18, 19].

Several studies dealing with EVT for dynamical systems reveal a link between the statistical properties of the extremes and geometric (and possibly in turn global dynamical) characteristics of the system producing these extremes [8, 20–24]. The main findings are that when suitable observables are chosen for the dynamical system of interest, it is possible to relate the GEV or GP parameters describing the extremes to basic properties of the dynamics and especially to the geometry of the measure supported by the attractor. In particular, depending on the choice of the observable, one can associate the most important parameter of the GEV or GP distribution, the shape parameter, with the information dimension of the measure supported on the attractor or with the partial information dimension along the stable and unstable directions of the flow [23]. These partial dimensions are well defined everywhere on the chaotic attractor, possibly with a variation with location, and also for nonuniformly hyperbolic systems [25], beside Axiom A systems. However, Axiom A systems possess an ergodic SRB measure which lends itself to a universality of the shape parameter for all sufficiently smooth observables; the local or pointwise (partial) dimensions take the same value almost everywhere [26]. In this case, the uniform shape parameter can be related to the (partial) Kaplan-Yorke dimension(s), which is (are) defined by the global dynamical characteristic numbers, the Lyapunov exponents. Clearly, this is an asymptotic result, and one must expect that differences emerge on preasymptotic levels when different observables are studied. In other words, this theory does not make predictions regarding the convergence of shape parameter estimates, the analysis of which is the main objective of this paper. Via the connection with fractal dimensions, it can be said that the analysis of extremes acts as a microscope able to assess the fine scale properties of the invariant measures.

Some preliminary numerical tests show that there is no obvious convergence to the predicted asymptotic shape parameter in low-dimensional cases [23]. Bódai [9] examined the convergence to the GEV distribution in the case of extremes of site variables in the Lorenz 96 model [27], investigating separately a range of cases extending from weak to strong chaos. He found that when considering configurations supporting weak chaos with a low-dimensional attractor, the theoretical results obtained in the context of the Axiom A hypothesis are hard to verify. For lower dimensions, up to a dimension of about 5, shape parameter estimates fluctuate greatly rather than converge, while block maxima data can be shown not to conform to a GEV model, and for somewhat larger dimensions, up to 9 in the study, estimates could diverge from the predicted value while data already conform to a GEV model. Good agreement with the theory was found only in the highly turbulent case possessing a higher-dimensional attractor, about 30, supporting the basic idea behind the chaotic hypothesis. Also in this case, nonetheless, very slow convergence was found.

In a previous analysis performed on higher-dimensional, intermediate complexity models with degrees of freedom, very slow (if any) convergence to EVT distributions could be found in the case of extremes of local temperature observables [28]. In another analysis of a similar model [29], the agreement of the distribution of global energy extremes with a member of the GEV family was indeed good, yet large uncertainty remained on the value of the shape parameter, and no stringent test was made to make sure that the estimate was stable against changes in the block size considered in the BM analysis. Clearly, the specific choice of the observable and the degree of chaoticity of the underlying dynamics is of primary relevance regarding the convergence to the limiting GEV or GP distribution.

In this work, we use a quasi-geostrophic (QG) atmospheric model of intermediate complexity, featuring 1056 degrees of freedom, to analyse extremes of different types of observables: local energy (defined at each grid point), zonally averaged energy, and the average value of energy over the mid-latitudes. Our main objective is to compare the estimated GEV and GP shape parameters with a shape parameter derived, based on the theory referred to above, from the properties of the attractor and of the measure supported on it along the stable, unstable, and neutral directions. We refer to this as the “theoretical shape parameter.” Thus, we explore numerically the link between the purely statistical properties of extreme events based on EVT and the dynamical properties of the system producing these extremes. We perform simulations applying two different levels of forcing: a strong forcing, producing a highly chaotic behaviour of the system, and a weak forcing, producing a less pronounced chaotic behaviour. The dimensionality of the attractor is much larger in the former than in the latter case. This work goes beyond the previously mentioned studies, based on more simple dynamical systems, in a sense that with our model we can study the convergence for observables being different physical quantities or representing different spatial scales/characteristics of the same physical quantity. Additionally, compared to previous studies also performed on intermediate complexity models, we consider longer time series and a variety of observables. Our model is simple compared to a GCM (General Circulation Model) but contains two of the main processes relevant for mid-latitude atmospheric dynamics: baroclinic and barotropic instabilities. Hence, we contribute to bridging the gap between the analysis of extremes in simple and very high-dimensional dynamical systems, as in the case of the GCMs used for atmospheric and climate simulations, by using a model that simulates to a certain degree Earth-like atmospheric processes and allows also for computing with feasible computational costs some dynamical system properties, like Lyapunov exponents or Kaplan-Yorke dimensions. The properties of the model have been extensively studied by Schubert and Lucarini [30, 31].

Based on numerical results (Sebastian Schubert, personal communication), the model is expected to be nonhyperbolic, but one can assume that the chaotic hypothesis applies to it (in the case of sufficiently high forcing levels inducing a strongly chaotic behaviour of the system), and so in analysing the convergence of shape parameter estimates one can take the predicted theoretical shape parameter as reference. We also assume that the symmetry of the model with respect to longitude, which introduces a central direction besides the directions of expansion and contraction in phase space, does not alter the ergodicity of the system at a practical level, and hence the true shape parameter remains uniform.

Although we use an idealised model, our results are transferable to time series obtained from more realistic model simulations or from measurements. By understanding the differences among the analysed observables, we gain insight into the statistical properties of extremes of geophysical observables with different spatial scales. By using two forcing levels, we are able to study the convergence to theoretical shape parameters related to different chaotic systems: one exhibiting fast decaying correlations and another one characterised by slower decaying correlations. These aspects are relevant in the case of geophysical applications where one deals also with time series on several spatial scales and with different degrees of correlations.

The structure of this article is as follows. Section 2 gives a theoretical overview, describing the block maxima and the peak-over-threshold approaches. In Section 3, we present our model, the performed simulations, and the applied methods. In Section 4, we discuss our results regarding the statistics of extremes for strong forcing and for weak forcing. We summarise and discuss our results in Section 5.

#### 2. Elements of Extreme Value Theory

Let us consider , where is a sequence of i.i.d.r. variables with common distribution function . The extremal types theorem [2, 3] states that if there exist sequences of constants and , so that the distribution of normalised , that is, , converges for to a nondegenerate distribution function , then is one of three possible types of the so-called extreme value distributions, having the cumulative distribution function where , , for , and for . Note that the form of on the first line of (1) is the generic form, valid for [12], as its limit as is the form written out on the second line. Similar comments apply to (2), (3), and (4).

represents the GEV family of distributions with three parameters: the location parameter , the scale parameter , and the shape parameter . The shape parameter describes the tail behaviour and determines to which one of the three types of extreme value distributions belongs. If , the tail decays exponentially, and is a type I extreme value distribution or Gumbel distribution. If , the tail decays polynomially, and belongs to type II or Fréchet distribution. If , the domain of the distribution has an upper limit and is referred to as a type III or Weibull distribution.

Under the same conditions, for which the distribution of converges to the GEV distribution, the exceedances above a threshold reaching the upper right point of the distributions of , given that , are asymptotically distributed according to the Generalized Pareto (GP) distribution family [12]where for , , and . has two parameters: the scale parameter and the shape parameter . The shape parameter describes again the tail behaviour and determines to which one of the three types of GP distributions belongs. If , the tail of the distribution decays exponentially; if , the tail decays polynomially; and if , the distribution is bounded [4, 5, 32]. If convergence to the GEV and GP distributions is realised, and . As a result, once we estimate the parameters for the GEV, we can derive the corresponding GP parameters (including the threshold ) and vice versa [12].

From the values of the GEV or GP parameters, it is possible to infer the expected return levels or extreme quantiles. Return levels are obtained from the GEV distribution by inverting (1): where and represents the return period. In the case of the GP distribution, the -observation return level (i.e., the level that is exceeded on average every observations) can be derived from (2): where represents the probability of an individual observation exceeding the threshold [12]. By plotting the GEV (GP) return level () against the return period () on a logarithmic scale, the plot is linear if (), convex if (), and concave if ().

In the case of a correlated stationary stochastic process, the same GEV limit laws apply as for i.i.d.r. variables if certain conditions, regarding the decay of serial correlation, are fulfilled [7, 33, 34]. By stationary, we refer to a sequence of correlated variables whose joint probability distribution is time-invariant. However, an important restriction is that, as an effect of serial correlation, we have to introduce an effective size for the block, which is smaller than the actual number of observations contained in it. This can enhance the bias in the parameter estimation, appearing as a slower or delayed convergence of the block maxima distribution to the limiting GEV distribution [11, 12]. Another possible effect of serial correlation is the appearance of extremes at consecutive time steps (clusters). If an extreme value law does exist in this case, then , where is called the extremal index and ( denotes the limiting distribution of BM from the correlated sequence and the one from an uncorrelated sequence, having the same marginal distribution). Clusters of extremes represent a problem especially when applying the POT approach. A widely adopted method to get rid of the correlated extremes is declustering, which basically consists of identifying the maximum excess within each cluster and fitting the GP distribution to the cluster maxima [34–36].

As mentioned before, several studies on EVT for observables of dynamical systems relate the GEV and GP shape parameters to certain properties of the system itself. In the case of the so-called “distance” observables, one can relate the GEV and GP parameters to basic geometrical properties of the attractor [8, 20, 21, 24]. The distance observables are functions of the Euclidean distance between one point on the attractor and the orbit . The function is chosen in a way to have a global maximum for , so that large values of correspond to recurrences of the orbit near . Depending on the choice of the function , the extremes of the distance observables can have positive, negative, or vanishing value for the shape parameter. In particular, when is chosen to have the form for , the shape parameter is negative, and it is proportional to the inverse of the Kaplan-Yorke or information dimension of the SRB measure supported by the attractor; see below for further details [20, 21, 23].

While recurrence properties are indeed important for characterising a system, distance observables are not well suited for studying some basic physical properties, such as, in the case of fluids, energy or enstrophy. Hence, Holland et al. [22] studied the extremes of smooth functions which take their maximum on the attractor in a point where the corresponding level surface of is tangential to the unstable manifold of the attractor, referring to them as “physical” observables. They found a relationship between the GEV shape parameter and some geometrical properties of the measure supported by the attractor dealing with the properties of the unstable and stable directions in the tangent space. The results of Holland et al. [22] were reexamined by Lucarini et al. [23], using the POT approach for physical observables of Axiom A systems. They considered the time-continuous time series of physical observables and found that for all nonpathological physical observables the shape parameter can be written as with defined as where , , and are the partial information dimensions of the SRB measure supported by the attractor restricted to the stable, unstable, and neutral (i.e., central) directions. As mentioned in Section 1, these local or pointwise (partial) dimensions take the same value almost everywhere on the attractor if one considers smooth observables of systems possessing an ergodic measure. is equal to the number of positive Lyapunov exponents [26], is equal to the number of zero Lyapunov exponents, and , where is the Kaplan-Yorke dimension with denoting the Lyapunov exponents of the system, in a descending order, and is such that is positive and is negative. We remark that a more general point of view, taking into consideration possible geometrical degeneracies, suggests that , and, additionally, [7].

According to (5) the shape parameter is always negative (due to the compactness of the attractor), and it is close to zero in the case of systems having large values of the Kaplan-Yorke dimension. Furthermore, it shows a universal property of extremes, which does not depend on the chosen observable but only on the geometry of the measure. In what follows, we will focus on comparing (5) with statistically inferred GEV and GP shape parameters in the case of energy extremes of the model investigated here, which is described next.

#### 3. Model Description and Methods

We consider a spectral quasi-geostrophic (QG) 2-layer atmospheric model similar to the one introduced by Phillips [37]. Specifically, our model, including the simulation code, is the same as in [30] and is a modified version of the one presented in [38]. The model represents synoptic scale mid-latitude atmospheric dynamics based on the quasi-geostrophic approximation, which assumes hydrostatic balance and allows only small departures from the geostrophic balance. The model features baroclinic conversion and barotropic stabilisation processes and simulates a turbulent jet-like zonal flow when suitable values are chosen for the parameters of the system. The reader is referred to [39] for a detailed physical and mathematical description of quasi-geostrophic approximation for mid-latitude atmospheric dynamics.

##### 3.1. Model Description

The model domain is a rectangular channel with latitudinal and longitudinal coordinates . represents the Equator and corresponds to the north Pole. We assume periodicity along the -direction, so that corresponds to the length of the parallel at 45°N. The vertical structure of the model atmosphere consists of only two discrete layers: this is the minimal vertical resolution needed to represent baroclinic processes [39]. Five vertical pressure levels define the two layers with boundaries at hPa (surface level), hPa, hPa, hPa, and hPa (top level). The geostrophic stream function is defined at levels and , and , where the quasi-geostrophic vorticity equation for the mid-latitude -plane (7)-(8) is applied, while the vertical velocity is specified at level , where the thermodynamic energy equation (9) is valid.

The model is described by the following equations in terms of the barotropic stream function , baroclinic stream function , and temperature : In the above, we expressed the advection in terms of the Jacobian operator defined as . represents the static stability parameter [39]. We define the stability parameter , where is the Rossby radius of deformation. The name and values of model parameters are listed in Table 1.

The vertical velocity is set to 0 at the top level, , and is defined through Ekman pumping at the surface level, , which parameterises the dissipative processes occurring in the boundary layer. Subgrid-scale processes are represented by momentum and heat diffusion terms. The system is driven by a Newtonian cooling term that involves the restoration temperature field: denotes the forced meridional temperature difference and quantifies the external forcing in the model. In the performed simulations, no time-dependence of is assumed, with the aim of creating time series of a deterministic equivalent of a stationary process. If is sufficiently large, the system reaches a steady state featuring a turbulent atmospheric flow with sensitive dependence on initial conditions. The physical processes responsible for limited predictability are in general the baroclinic and barotropic instability. The Newtonian cooling provides the so-called baroclinic forcing to the system and activates a set of energy exchanges and transformations summarised by the framework of the Lorenz energy cycle. See discussion in [39].

Using hydrostatic approximation [39], we obtain for our vertical discretization that . Thus, the three model equations (7)–(9) can be reduced to two equations with two variables and . The model equations can be transformed [30] into a nondimensional form using the scaling factors in Table 1. In the following, we use nondimensional quantities, if not indicated otherwise. As mentioned, the channel is periodic in the -direction. At the meridional boundaries we set the meridional velocity and the zonally integrated zonal velocity to 0, and . For these boundary conditions the solution of the model equations is where denotes the nondimensional stream function and indices and represent the real and the imaginary coefficients. We apply a spectral cut-off at in both and directions. Hence, the total dimension of the model phase space is .

##### 3.2. Methods

We substitute (11) into the evolution equations, perform a Galerkin projection, and eventually integrate numerically the nondimensional model equations using the fourth-order Runge-Kutta scheme. We perform simulations with two different forced meridional temperature differences, K and K. In the case of strong forcing (K), the system has a Kaplan-Yorke (KY) dimension of 585.95 with 222 positive Lyapunov exponents, so that , , and . Note that the presence of a second neutral direction is related to the existence of a rotational symmetry in the system and to the fact that we consider a spectral model. This feature is of little relevance for the analysis below. We produce stationary time series of 96,576 years with a time step of 0.7 hours. In the case of weak forcing (K), the system has a KY dimension of 39.31 with 17 positive Lyapunov exponents, so that , , and . We produce stationary time series of 485,760 years with a time step of 2.8 hours. The spectral coefficients of the stream functions are recorded every 5.5 hours with either forcing. The Lyapunov exponents are obtained by the same simulation code as the one used in [30], based on the method of Benettin et al. [40].

The spectral output of the model is transformed into the grid point space using the Fast Fourier Transform resulting in grid points with in the and directions. We refer to the grid points by indices , where . We analyse extremes of the total energy observables defined in nondimensional form below. For our extreme value analysis, we consider only the “mid-latitudes” of the QG model, which we define as the region between the latitudes and , that is, the latitudinally central 0.5 fraction of the whole domain. The total energy () is the sum of the kinetic energy of the lower () and upper () layers and of the available potential energy (): where represents the discrete time coordinate. The components of the right side of (12) are defined for each grid point as with the zonal components of the horizontal velocities and and the meridional components of the horizontal velocities , , and .

We obtain the zonally averaged energy by averaging along longitudes the value of the local energy (12): and we derive the average mid-latitude energy by averaging the local energy over the area corresponding to the mid-latitudes: The energy observables are analysed in their nondimensional form. The physical values expressed in units of J/Kg () can be obtained by multiplying the nondimensional values by the factor ().

Although we record the model output, as stated above, every 5.5 hours, we save only the maximum values over one month in the case of strong forcing and over three months in the case of weak forcing. We estimate the GEV and GP parameters based on block maxima and threshold exceedances obtained from the monthly, respectively, 3-monthly, maxima series. Such an operation has no effect on the subsequent GEV analysis. Instead, it might modestly impact the GP analysis, as some above-threshold events might be lost, because they could be masked by a larger event occurring within the same 1-month or 3-month period. Nonetheless, since we consider very high thresholds and an extremely low fraction of events, the risk of losing information is negligible. The GEV and GP parameters are inferred by maximum likelihood estimation (MLE), as described by Coles [12]. We estimate the GEV and GP parameters, as well as the confidence intervals, using the MATLAB® functions* gevfit* and* gpfit*. The computed confidence intervals contain the true value of the parameters with a probability of 95%. The autocorrelation coefficients and histograms are obtained based on 1000 years of the “raw” simulated time series.

#### 4. Extreme Value Statistics in the QG Model

As mentioned before, the simulations are performed using two different configurations, where the value of the parameter , describing the baroclinic forcing, is set to 133 K and 40 K, respectively. As we see below, in the case of strong forcing we find good agreement between the results of the statistical inference and the theory, for local observables at least, even if the speed of convergence of the estimated shape parameters (not predicted by the theory), to the value that is predicted by the theory, is rather diverse among the considered observables. In the case of weak forcing and the resulting weakly turbulent behaviour, the results of the statistical inference analysis are not in so good agreement with the theory, and we find that for the different observables the shape parameter estimates have varying nonmonotonic dependence on the block size. We will investigate possible reasons for such a behaviour.

##### 4.1. Strong Forcing ( K)

Before presenting the results related to the statistics of extreme events, we outline some general statistical properties of the analysed observables. As emphasised in Sections 1 and 2, correlations have an effect on the convergence of the distribution of block maxima (threshold exceedances) to the GEV (GP) distribution. The autocorrelation coefficient for a stationary signal can be defined as follows: where represents the time and the time lag and denotes the mean and the variance of [41].

By taking the ergodic hypothesis, we estimate the autocorrelation coefficient for the local energy according to (16) and obtain for each grid point . We calculate the integrated autocorrelation time scale [42] according to (if the decay of the autocorrelation is exponential, the integrated autocorrelation time scale is equal to the -folding time). We set (corresponding to about 140 days) as an upper limit for the integration in order to avoid the noisy tail of the autocorrelation coefficient. We proceed the same way in the case of the zonally averaged and average mid-latitude energy to obtain and . The integrated autocorrelation time scales, expressed in days, are shown in Figure 1(a). As expected, the weakest autocorrelations are recorded for the local observables, yielding about 1-2 days. Because of the information propagation along parallels due to the prevailing zonal winds, the zonal average time series are impacted by a low-pass filtering as a result of averaging along a latitudinal band; thus, the correlations become stronger. For these zonally averaged observables, as opposed to the local ones, the integrated autocorrelation changes substantially with latitudes. We observe a minimum in the middle of the channel (≈3.5 days) and an increase outwards to the boundaries (≈15 days). Through averaging over the area of mid-latitudes, the zonally averaged time series with different properties are merged together. The resulting time series has an integrated autocorrelation time scale of about days. Note that if in a time series of length with reasonably fast (e.g., similar to exponential) decay of correlations the integrated autocorrelation time scale is , then one can deduce that the time series has approximately effectively independent entries, where is the time interval. As discussed above, this impacts, for example, the effective length of the blocks over which maxima are computed and can have important effects in determining when the asymptotic behaviour of the EVT statistics is valid.

**(a)**

**(b)**

**(c)**

**(d)**

Figures 1(b)–1(d) illustrate the histograms and the approximated probability density functions (PDFs) of our observables. Although, due to the -effect, the dynamical properties of the flow as a function of latitudes are not exactly symmetric with respect to the meridional middle of the channel, our estimations of statistical quantities and density functions exhibit approximate meridional symmetries with respect to the center of the channel. Therefore, in the case of the local and zonally averaged observables only half of the channel’s meridional extension (at every second latitude) is shown. The strongest skewness and the longest right tails are observed in the case of the PDFs of the local observables. After spatial averaging, the PDFs become more symmetric and almost similar to a Gaussian distribution (which would look like a parabola on a semilogarithmic scale), according to the central limit theorem.

In what follows, we present the results of the EVT analysis starting with the local observables. We first discuss the convergence of the shape parameter for GEV and GP, then the convergence of the GP modified scale parameter (to be introduced below), and, at the end, the convergence of return levels. Taking advantage of the fact that statistics are uniform in the zonal direction, we concatenate the monthly maxima series for every second longitude one after the other in the -direction, thus increasing the data length to about (from about ) years. Therefore, we can estimate the GEV and GP shape parameters for larger block sizes and higher thresholds than in the case of the zonally averaged or average mid-latitude observables. Although the time series at every second longitude are correlated with each other, the correlation almost vanishes at the block size of 8 years, being below 0.15 at every latitude. In other words, correlations are very weak at extreme levels, which is the only important condition for the GEV limit laws to apply in the case of a stationary process [12]. Block sizes smaller than 8 years are not relevant for our analysis, since (as presented below) much larger ones are needed to approach the theoretical shape parameter. In the case of the POT approach, we use the same argument of choosing very high thresholds, above which the correlations are extremely weak.

The theory discussed in Section 2 indicates that the true (asymptotic) GEV shape parameter is given by , as expressed by (5), which corresponds to approximately −0.002, and is indicated by the straight line in Figure 2. Note that the range of the theoretical values derived taking into consideration possible geometrical degeneracies, according to what is described in Section 2, is too small to be visible in this case. We define the* precision * of estimation as half of the width of the 95% maximum likelihood confidence interval. It is more common to define the precision by the standard deviation of the estimates. For a Gaussian distribution the 95% confidence interval is larger than the standard deviation by a factor of approximately 2. However, this distinction does not matter for our purposes. Besides, we have a* single* estimate only, and so we can obtain only the confidence interval not the standard deviation of the distribution of estimates. Additional to the precision, we define the* trueness* of a* single* estimate as the distance between and : . Note that the latter is different from the usual definition in that the reference from which we measure the distance is , not the true value of the distribution from which the BM data is drawn. In fact, strictly, the BM data is not drawn from a GEV distribution that we are fitting, and hence we cannot even talk about the true value of an underlying GEV distribution. We emphasise that our interest is the convergence to the asymptotic value, which is why we take a reference value in our definition other than customary. Accordingly, we shall refer to the “bias” of the estimator, again different from customary, as the expected trueness. We remark that, since our estimates are obtained based on one realisation instead of several realisations yielding a distribution of estimates, our trueness approximates the bias of the estimates as long as . We emphasise that we are able to calculate here because we know the true ; this is not the case in practice when facing just a measured time series. Obviously, we aim at obtaining a joint optimisation by having the bias and the precision as small as possible. Clearly, optimality requires a compromise between these two requirements. When we apply the BM method and increase the block size , the number of blocks and of BM decreases; thus, the estimation of becomes more and more uncertain, and increases monotonically. At the same time, for increasing we expect a (not necessarily monotonic) convergence of our estimated shape parameter to the true value, so that the actual bias should (on the long run) decrease with . Clearly, instead, our approximation decreases only until a certain block size, above which it becomes more uncertain with increasing values of , because less BM are available. We choose as optimal block size the smallest block size for which the estimate of the bias is lower than the estimate of the precision . On the scale of variation ranges of and , we have . With this we obtain a single number that can quantify the* accuracy* of estimation. This measure of accuracy provides here a basis for comparing different observables with regard to the speed of convergence or a basis for assessing the degree of nonuniformity of estimates of various observables of interest (in terms of the range of accuracy values), as a finite-data-size deviation from the uniformity predicted by theory. We have verified that the optimal choice for is virtually unchanged when we use an alternative definition of the accuracy, where one aims at minimizing , borrowing an idea concerning the optimality of MLE estimators (results not shown).

First we assess the uniformity for the local observables. Figure 2(a) shows the GEV shape parameter estimates against exponentially increasing block sizes of years (), for different latitudes. The estimated GEV shape parameters seem to converge monotonically for every latitude to . The monotonic convergence is pointed out also in panel (b) in terms of . In this diagram, we display too, by which we can determine the optimal and the accuracies of estimation. These accuracies, depending on the latitude, have a range of . At the same time, the value of ranges from several tens of years to a few hundreds of years depending on the latitude we are considering. This is unsurprising, because the speed of convergence to the asymptotic level is not universal. As a consequence, when finite block sizes are considered, extremes of different observables can feature rather distinct properties. The slow convergence suggests that customary choices like yearly maxima are not always good enough for an accurate modelling of extremes. Figure 2(c), giving a different view of the same data seen in panel (a), illustrates the estimated GEV shape parameter as a function of latitudes for various block sizes. For small block sizes, we observe a slight latitudinal dependence of the shape parameter. This latitudinal structure flattens as one increases the block size, and the estimated shape parameters get closer to the theoretical value. According to Figure 2(c), universality emerges as we approach the asymptotic level.

To assess the goodness of fit, we perform a one-sample Kolmogorov-Smirnov-test (KS-test) [13] at 5% significance level using the MATLAB function KS-test. We remark that the KS-test is performed in the case of each block size based on the whole BM data, meaning that this amount of data decreases as we increase the block size. The shape parameter values for which the KS-test value is above 0.05 (i.e., the hypothesis that the distribution of BM is a GEV distribution cannot be rejected) are marked by circle markers in Figure 2(a). We define as the smallest block size for which , , and . Figure 2 points out that the KS-test suggests a good fit already at smaller block sizes than the optimal block size, , and for lower shape parameter values than the best estimate, . Thus, a very important conclusion is that the value of the KS-test is not an appropriate measure for the convergence to the limiting distribution. More precisely, it indicates that we have indeed agreement with a member of the GEV family of distributions, but we cannot say what is the error from the asymptotic value of the parameters. We emphasise that , just like , depends on the time series length, and it would be even smaller if shorter time series were considered. This implies that, in the case of applications with less data, the results of the KS-test are even less reliable. The misleading property of values was also shown by Bódai [9], who studied the convergence to the GEV distribution of extremes of site variables in the Lorenz 96 model and found values above the significance level in the cases where the theoretical prediction did not even apply, and the shape parameter did not converge. The goodness-of-fit test was in the mentioned study a Pearson’s chi-squared test. Misleading values based on the KS-test were pointed out also by Faranda et al. [8] in the case of the BM approach in simple systems. A slow convergence of the estimated GEV shape parameters and a poor quality of diagnostic tools (return level and quantile plots) for small block sizes were also found by Vannitsem [28] in the case of local temperature extremes in a three-layer QG model with orography.

Figure 2(d) illustrates the GP shape parameter estimates as a function of decreasing exceedance ratio (the fraction of above-threshold data) , which is equivalent to an increasing threshold. To ensure direct comparability between the BM and POT approaches of EVT, sample values of the threshold are chosen corresponding to the sample values of the block size in such a way that , where is the data amount in a year. Thus, the number of threshold exceedances is equal to the number of block maxima. By comparing the GP shape parameter (Figures 2(d)–2(f)) with the GEV shape parameter (Figures 2(a)–2(c)), we generally observe the same features as above. More precisely, the changes of the GP shape parameter as a function of exponentially decreasing exceedance ratio are very similar to the variation of the GEV shape parameter according to an exponentially increasing block size. Both GEV and GP shape parameters seem to converge to . This is also consistent with theoretical results according to which the two distributions are asymptotically equivalent [12, 23]. However, we expect that in the case of finite block sizes (i.e., in the case of every practical application) differences might emerge in the estimates of the GEV and GP shape parameters. Although in the case of consistent estimations one would expect that at large block sizes, corresponding to low exceedance ratios, the difference between them should be small, as it is the case for our estimators. Besides the mentioned similarities, we observe some differences between the estimates of the GEV and GP shape parameters. These differences concern, for example, their latitudinal dependence (less pronounced in the case of the GP shape parameter) or the width of the confidence intervals (larger in the case of the GP shape parameter, indicating larger estimation uncertainty). The most relevant difference is, however, that the GP shape parameter seems to converge faster to . This is unsurprising as in many applications it is usually suggested to use the POT over the BM method as the former is less data-hungry and provides (usually) a faster convergence [7].

We perform another test to check whether the GP distribution is a good approximation for the distribution of threshold exceedances based on our data and consider the GP modified scale parameter. The GP scale parameter depends on the chosen threshold according to [12], where represents the asymptotic shape parameter, is the lowest threshold at which the GP distribution is a reasonable model for exceedances, and represents any other threshold . The scale parameter can be reparameterised yielding the modified scale , which should converge to a nonzero value. Figure 3 illustrates the modified scale parameter estimate (calculated based on the finite-size GP parameter estimates, i.e., taking threshold dependent GP shape parameter estimates instead of ) as a function of the exceedance ratio . We observe estimates of relatively stable to further decreases of (for . Note that in this case there is no universality in the value of the modified scale parameter, as for stochastic variables one has that the upper right endpoint of the distribution is given by . Such an endpoint is clearly observable-specific.

Having practical applications in mind, the BM and POT methods aim at obtaining statistical estimates of either return levels or expected return periods, for even unobserved extreme events. Figures 4(a) and 4(b) show GEV and GP return level plots for the local observables based on a fixed block size, years, and corresponding (as explained) %, respectively, at five different latitudes (every second latitude from the southern meridional boundary to the channel centre). We compute the GEV return levels according to (3) and the GP return levels based on (4) and estimate the 95% confidence intervals using the delta method described by Coles [12]. The GEV and GP return level plots look very similar, except two minor differences. One emerges simply from the different equations for the GEV and GP distributions, leading to slightly different definitions of return levels (as described in Section 2 and in more detail in [12]), and affects short return periods, and the other one comes from the larger uncertainty in the estimation of the GP parameters compared to the GEV parameters and results in slightly wider confidence intervals in the case of the GP return levels. The main message of Figures 4(a)-4(b) is, however, that the GEV and GP return level estimates using the chosen and fit the empirical data quite well, which is in agreement with the results of the KS-test reported above. The 95% confidence intervals of the estimated return levels (continuous lines) contain the empirical return levels (dot markers) or are very near to them, except a few very high extremes at some latitudes. The return level is almost linear to the logarithm of the return period, showing the effect of a shape parameter very close to 0 (see (3) and (4)).

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

If the GEV distribution is an adequate model for extreme events for a certain block size, one expects return levels with a certain return period not to change much anymore with increasing block size. Figures 4(c)–4(e) show indeed that, above a certain block size, the estimated return levels for three different return periods (, , and years) are stable against further increase of . But it also shows that the longer the return period, the slower the convergence. While in the case of the -year return period we obtain stable return level estimates already at , in the case of -years the return level estimates are still increasing for . Here we experience the practical effect of the issue mentioned above, namely, that the KS-test suggests a good fit even for . This implies that the estimation of return levels with long return periods can be erroneous even if the KS-test does not reject the GEV distribution. We also notice that the return levels are underestimated if the block size is too small, and this underestimation is more severe in the case of return levels with longer return periods. We come to the same conclusion by considering the convergence of the GP return levels (not shown), as suggested already by the similarity between panels (a) and (b).

After having discussed in detail the convergence in the case of the local observables, we proceed with the results for the zonally averaged observables. Figure 5 illustrates the GEV and GP shape parameters for the zonally averaged observables (a, c, d, f) as well as the estimated bias and precision of the inferred shape parameters (b, e). As mentioned above, in the case of the zonally averaged observables, we have shorter time series ( instead of years). Because of this, results for the accuracies of estimates cannot be “fairly” compared to the accuracies found for local observables. Nevertheless, we produce the same type of diagrams suitable to determine the accuracies and show it in Figures 5(b) and 5(e). Clearly, the range of accuracy values depending on the latitude and the maximal value of the accuracies (i.e., of the bias at the optimal block size) are both considerably larger than those for the local observables. What is fair to compare, however, is the range of biases for a certain block size where the confidence of the estimates is high, , and the amount of data does not affect significantly the parameter estimate. In this regard, the zonal observables display a much larger nonuniformity regarding the shape parameter estimates. Otherwise, the estimates feature typically a monotonic change towards the theoretical value (up to at least the optimal block size), what can be seen as convergence.

Our observation that the estimated shape parameters depend strongly on the considered latitude has to do with the effect of serial correlation on the convergence to the limiting distribution. We obtain weak autocorrelations, fast convergence to , and low bias in the middle of the channel, versus strong autocorrelations, slow convergence, and large bias at the margins of the channel. As already mentioned before, the stronger the serial correlation the less the number of uncorrelated data in a block, and the larger the block sizes needed in order to approach the same bias (see also [12]). Thus, the latitudinal structure of the GEV shape parameter estimates (Figure 5(c)) is related to the one of the integrated autocorrelation time scale (Figure 1(a), dotted line with circle markers). By increasing the block size, this latitudinal structure flattens, and the estimated shape parameters seem to approach . Nonetheless, we note that, due to the presence of (relatively) large statistical uncertainty on the shape parameter, we cannot make more precise statements on the success of the analysis.

We present now the analysis of extremes of the average mid-latitude observable. Figure 6 shows the GEV and GP shape parameter estimates for the average mid-latitude observable and their estimated bias and precision as a function of the block size and exceedance ratio, respectively. In the case of the average mid-latitude energy, we have the same amount of data as in the case of the zonally averaged energy. Similar to the zonally averaged observables, the estimated GEV and GP shape parameters seem to approach the theoretical shape parameter, but, when more stringent definitions for selecting the extremes are used, the bias is relatively large, being about at the optimal block size in the case of the GEV and about at the optimal exceedance ratio in the case of the GP shape parameter. Again, also in this case, our analysis is limited by the amount of available data.

**(a)**

**(b)**

**(c)**

**(d)**

In short, our numerical results do allow for conclusions regarding the universality of extremes, as predicted by the theory presented in Section 2. However, considering the most various observables one would typically see a nonuniformity in the finite-size shape parameter estimates simply because of their distinct convergence properties (not predicted by the theory). The observables that we found in our study to have the fastest converging shape parameter estimates are the local observables at every latitude and the zonally averaged observables at central latitudes, where the autocorrelation has a minimum. However, convergence is very slow and is additionally slowed down by the presence of serial correlations in the time series. Thus, the estimated shape parameters are relatively far from the theoretical value, as given by the accuracy, in the case of several latitudes of the zonally averaged observables (especially marginal latitudes exhibiting strong autocorrelations) and in the case of the average mid-latitude observable. This slow convergence in combination with the finite size of the data makes the actual observation of the theoretical limit extremely difficult.

##### 4.2. Weak Forcing ( K)

Before analysing the extreme events for weak forcing, we discuss some statistical (and dynamical) properties of our observables, which influence directly the statistics of extremes. Figure 7(a) shows the integrated autocorrelation time scales for the three observables: local, zonally averaged, and average mid-latitude energy. We compute the integrated autocorrelation time scale according to the method described in Section 4.1 for the strong forcing. In the case of weak forcing, however, we set the time lag (corresponding to about 400 days) as the upper limit for the integration, according to the slow decay of the autocorrelation (especially in the case of the zonally averaged and average mid-latitude observables). The integrated autocorrelation time scales are substantially higher than for strong forcing: around 10 days in the case of the local, about 30–48 days in the case of the zonally averaged observables, and approximately 45 days for the average mid-latitude observable. Figure 7(b) shows the time series of the local observables at the central latitude (at two different longitudes and ) and suggests two alternating states of our system: one with strong fluctuations and another one with reduced fluctuations. Thus, it seems that our system exhibits a regime behaviour, which definitely supports the presence of strong correlations.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

In contrast to the case of strong forcing, the zonal averages of the local energy observables show remarkable deviations from a Gaussian behaviour, even more than the PDFs of the local energy observables (Figures 7(c)–7(e)). One has that the PDFs of the zonally averaged observables typically have a marked skewness and very strong kurtosis and often contain rather pronounced “shoulders,” where smoothness is basically lost. The presence of large kurtosis indicates that there is significant positive spatial correlation of the energy along a longitude. The presence of skewness indicates that there is asymmetry between the occurrences of anomalies of either sign. Another particular property of the spatial energy field for weak forcing is the strong anticorrelation (especially in the case of the zonally averaged observables) between time series at central and marginal latitudes (not shown). Accordingly, the “shoulders” appear in different parts of the PDFs at different latitudes: on the left in the case of central latitudes and on the right in the case of marginal latitudes. We conclude that the regime behaviour is connected to nontrivial spatial structures, with the system living in a transitional range where one can still distinguish long-lived unstable waves amidst chaos. We note that such conditions are different from what is foreseen by the chaotic hypothesis, and, therefore, the statistics of extremes might not converge (according to our finite-size dataset) to what is predicted by the theory developed for Axiom A systems.

For the analysis of extreme events, we use a similar procedure as in the case of strong forcing (K) and concatenate the three-monthly maxima series for every second longitude one after the other in the -direction. Thus, we increase the length of available data for the local observables to about (from about ) years. Although the time series at every second longitude are correlated with each other, the correlation almost vanishes at extreme levels, being below 0.1 for every latitude in the case of the 8-year BM. We define the GP exceedance ratios so that the number of threshold exceedances corresponds to the number of block maxima, as described in Section 4.1.

In the case of weak forcing, the theoretical shape parameter is −0.03, shown by the grey horizontal line in Figure 8. The grey shading represents the range of theoretical values resulting from taking into consideration possible geometrical degeneracies according to the limits described in Section 2. We plot the GEV shape parameter against exponentially increasing block sizes of years, where for the local observables and for the zonally averaged and average mid-latitude observables. Focusing first on the local observables, we notice a nonmonotonic change of the shape parameter with increasing block sizes. For block sizes smaller than 30 years, the shape parameter even reaches nonphysical positive values for certain latitudes. This change of sign of the estimated shape parameters is similar to what has been observed by Vannitsem [28] in the case of local temperature extremes in a more realistic QG model with orography. The nonmonotonic changes and the positive shape parameter estimates have to do with the fact that, if the block size is not large enough, we select events from both regimes (more and less fluctuating), thus “contaminating” the statistics of extremes, whereas if the block size is large enough, only extremes from the more unstable regime are selected. Figure 8(a) also shows that the estimated shape parameter seems to converge at almost every latitude to a value which is lower than the theoretical shape parameter, yet near to the range of values obtained taking into consideration possible geometrical degeneracies; see Section 2. As discussed above, this is in fact unsurprising given the qualitative properties of the system in the low forcing regime.

In the case of the zonally averaged and average mid-latitude observables, we cannot detect any convergence. This is an expected result, considering the statistical and dynamical characteristics of our data and the fact that the length of the time series is in this case even shorter than for the local observables. As an effect of the “shoulders” in the PDFs, we obtain very uncertain estimates even for large block sizes, and the KS-tests reject the hypothesis of a GEV model in these cases. The shape parameter estimates have a large latitudinal spread due to the varying form of PDFs with different latitudes. Except for the differences between the GEV (Figures 8(a)–8(c)) and GP (Figures 8(d)–8(f)) shape parameters at small block sizes and high exceedance ratios, both methods show us basically the same picture. The misleading property of the KS-test values is underlined by Figure 8. Even in the case of the zonally averaged and average mid-latitude observables, where we cannot detect any convergence at all, we find for a wide range of block sizes and exceedance ratios (circle markers).

#### 5. Summary and Discussion

In this paper, we have studied the convergence of statistically estimated GEV and GP shape parameters to the theoretical shape parameter, which, following the mathematical findings reported in [7, 20–23], can be expressed in terms of the partial Kaplan-Yorke dimensions along the unstable, neutral, and stable directions [22, 23]. We have analysed a quasi-geostrophic 2-layer atmospheric model. We have studied the extremes of different types of energy observables: local, zonally averaged, and average mid-latitude energy. We have performed simulations with two different forcing levels: a strong forcing (K), producing a highly chaotic behaviour of the system, and a weak forcing (K), producing a less pronounced chaotic behaviour. In the case of strong (weak) forcing, we produce time series of about () years, representing a deterministic equivalent to a stationary process. We have estimated the GEV and GP shape parameters for exponentially increasing block sizes and exponentially decreasing exceedance ratios (fractions of above-threshold events), that is, increasing thresholds, by performing maximum likelihood estimation. For comparability, we have chosen the GP thresholds so that the number of threshold exceedances corresponds to the number of block maxima. We have taken advantage of the fact that statistics are uniform in the zonal direction and use the data from every second longitude for the analysis of extreme events, thus increasing the length of available data for the local observables to about () years in the case of strong (weak) forcing.

We start the discussion of our results with the strong forcing regime. In this case, we observe a roughly monotonic increase of the estimated GEV (GP) shape parameters towards the theoretical value . The estimated shape parameters seem to converge to in the case of the local observables at every latitude and in the case of the zonally averaged observables at central latitudes. Thus, our numerical results allow for robust conclusions regarding the universality of extremes, according to the theory presented in Section 2. However, in the case of several (especially marginal) latitudes of the zonally averaged observables, as well as for the average mid-latitude observable, the estimated shape parameter is relatively far from the theoretical one. For these observables, the amount of data seems to be not enough to approach asymptotic levels; thus, we cannot make more precise statements on the success of the analysis. Even in this extremely chaotic case, the convergence is very slow, suggesting that customary choices like yearly maxima are not always the best option for an accurate modelling of extremes.

Despite the predicted universal asymptotic properties of extremes, if we consider a certain block size (threshold), we find that the shape parameter estimates are different among the observables and latitudes. Thus, in view of finite-size estimates, extremes show rather diverse properties. The speed of convergence to the asymptotic level is not universal. The local observables exhibit high-frequency fluctuations, as an effect of boundary fluxes, and, at the same time, the fastest convergence of the shape parameter estimates to the theoretical value. Since the energy is transported mostly along the zonal direction by the zonal mean flow, by averaging along a latitudinal band the highest frequencies are filtered out, and fluctuations with lower frequencies become stronger. In the case of the zonally averaged observables, we obtain weak autocorrelations and fast convergence to in the middle of the channel where the baroclinicity is the strongest, versus high autocorrelations and slow convergence at the margins of the channel, where instead the baroclinicity is weak. The stronger the serial correlation, the less the number of uncorrelated data in a block, and the larger the block sizes needed in order to approach the same bias (see also [12]). By averaging over the mid-latitude area, one merges zonally averaged time series exhibiting different autocorrelations. Thereby, the convergence to is faster than in the case of the zonally averaged observables at marginal latitudes. To sum up, a very important conclusion of our study is the existence of latitude-dependent finite-size differences, as a counterpart to the universal asymptotic properties.

We assume that the extremely slow convergence has to do mainly with the fact that is negative but very close to 0. Based on and on the estimated GP modified scale parameter, one is able to estimate according to Lucarini et al. [23] the absolute maximum, which is the upper end point of the GP distribution, as mentioned in Section 4.1. By performing a very rough estimation (and neglecting the weak latitude-dependence of the GP modified scale parameter), the absolute maximum in the case of the local observables , which is about 200 times the mean local energy value (see Figure 1) and 20 times larger than some of the largest estimated return levels obtained for the largest return times considered here (see Figure 4). This means that extremes are bounded, and an absolute maximum does exist, but the tail is extremely stretched out, and ultra-long simulations are needed to explore this absolute maximum, so that at all practical purposes it can be treated as being infinitely large, as in the case of a Gumbel distribution where the shape parameter vanishes. Our results point out the (sometimes unexpected) existence of a discrepancy between the existence of a mathematical limit and the actual possibility of practically observing it. Note that if the asymptotic shape parameter is lower, the absolute maximum will be much closer to the maximum observed within a long yet finite time series, as it is shown in a recent study on temperature extremes in Southern Pakistan [43].

Our conclusions regarding the convergence of the estimated shape parameter to are confirmed by results based on the GP modified scale and return level estimates, in the case of the local observables. We point out, however, that the longer the return period, the slower the convergence of the estimated return levels to their asymptotic values, and the larger the underestimation of the asymptotic return levels if we consider small block sizes (low thresholds).

In the case of weak forcing, temporal and spatial correlations are very strong due to a regime behaviour of our system, which exhibits two well-defined regimes: a more unstable one with stronger fluctuations and a less unstable one with reduced fluctuations. Due to such a regime behaviour the statistics of extreme events is “contaminated”: if the block size (threshold) is not large (high) enough, we select events from both regimes, whereas if it is large (high) enough, only extremes from the more unstable regime are selected. This induces nonmonotonic changes of the estimated shape parameters by increasing the block size (threshold) and leads to the appearance of positive, that is, nonphysical, or very low shape parameter estimates. In the case of the local observables, the estimated shape parameters seem to converge at almost every latitude to a value which is lower (≈−0.06) than the theoretical shape parameter (). Furthermore, in the case of the zonally averaged and average mid-latitude observables, we cannot detect any convergence at all. The inconsistency of our numerical results with the theory is, in fact, unsurprising given the qualitative properties of the system in the low forcing regime, which do not resemble characteristics of Axiom A systems, at least on the finite time scales we are able to explore based on the available data.

Our results show that with increasing block size or threshold the shape parameters of the GEV and GP distributions are becoming more and more similar, according to the asymptotic equivalence of the two models [12, 23]. Both methods show us basically the same picture regarding the statistical properties of extreme events. Despite the mentioned similarities, we observe also some differences between the two approaches. The convergence to the limiting distribution seems to be somewhat faster in the case of the POT approach. This is in agreement with the well-established fact that the POT approach produces often more accurate predictions in the case of applications [12, 32]. Despite the faster convergence, however, the best GP shape parameter estimates (defined in Section 4.1) do not approximate more accurately than the best GEV shape parameter estimates. Therefore, the advantage of the POT approach compared with the BM approach is irrelevant in the case of very long time series.

We use the Kolmogorov-Smirnov test (KS-test) to verify the fit of the GEV (GP) distribution to the distribution of extremes, selected as block maxima (threshold exceedances). Our results show that the KS-test is merely an indicator of the fit quality and does not show whether the convergence to the correct GEV (GP) distribution is reached or not. The KS-test suggests a good fit to the GEV (GP) distribution even in the cases when the distance between the estimated and the asymptotic shape parameter is substantial and even if no convergence can be detected. The misleading property of values of the KS and Pearson’s chi-squared tests was also pointed out in previous studies in the case of more simple systems [8, 9]. In this work, we estimate the GEV and GP parameters performing maximum likelihood estimation [12], but it would be relevant to find out to what extent other estimation procedures, like the L-moments [44] or probability-weighted moments methods [45], would change the results.

Concluding, we would like to emphasise some key messages one can get from our results:(i)Indeed, we have been able to find the signature of the universal properties of the extremes of physical observables in strongly chaotic dynamical systems, as predicted in the case of Axiom A systems. Nonetheless, given the availability of very long yet finite time series, we have been able to find more convincing results (yet with a relatively large uncertainty) only for specific observables, because in the case of observables featuring serial correlations it is extremely hard to collect robust statistics of extremes.(ii)We have observed that in the case of strong forcing the estimate of the shape parameter increases monotonically towards its asymptotic value for stricter and stricter criteria of selection of extremes. This corresponds to the fact that we manage to collect more detailed information on the local properties of the attractor and of the measure supported on it near the point of absolute maximum of the observable, and thus we explore all the dimensions of the attractor.(iii)We also remark that agreement of the results with the theory of extremes of observables of dynamical systems developed in the context of Axiom A flows cannot be found in the case of the weakly chaotic flow featuring regime behaviour and strong spatial and temporal correlations, as these features suggest strong deviations from the conditions behind the chaotic hypothesis. Note that conceptually analogous results had been reported in a simple model in [9].(iv)We note that the predicted and estimated shape parameters are extremely small so that the statistics of extremes is virtually indistinguishable, up to ultra-long return periods, from what would be predicted by a Gumbel distribution (), which emerges as the statistical model of reference for physical extremes in high-dimensional chaotic systems, and corresponds in the case of fluids to the existence of a well-developed turbulent state.(v)We conclude by noting that in some cases of great practical relevance one finds results in contradiction with the basic tenets of the theory of extremes of dynamical systems, suggesting that one should never find block maxima distributed according to the Fréchet distribution, which allows for arbitrarily large extremes. The precipitation, as opposed to geophysical fields like temperature or pressure, is a nonsmooth intermittent field with multifractal properties in space and time [46, 47], as a result of the complex chain of multiscale physical processes ranging from large scale water vapour transport on scales of m and days, convection occurring on scales of m and hours, and phase transitions occurring at microscopic level and on ultrashort time scales. As a result of the intermittency of the rainfall, a very large (yet finite) amount of water can precipitate at a specific location, with little or no precipitation occurring nearby, as in the case of localised intense thunderstorms. Instead, extremely large anomalies of temperature or pressure cannot be reached as the climate has efficient mechanisms to dissipate them via, for example, waves. Indeed, the analysis of block maxima of rain gauge readings shows many cases where the Fréchet distribution appears as the optimal model [48, 49]. This is a result of the fact that in order to observe the actual physical limit of rainfall one should observe the system for an impossibly long time and that closed physical budgets exist locally (and on a finite spatial domain) for the water balance (involving evaporation from surface and horizontal convergence of water transport), not in the precipitation per se. One can expect that such an anomalous behaviour is reduced if one chooses a smoother, better defined observable, such as the spatial average of precipitation over a region.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The authors would like to thank Sebastian Schubert, Christian Franzke, Maida Zahid, and Richard Blender for useful discussions. The authors are indebted to Sebastian Schubert for his support in performing some simulations and providing the code for computing the Lyapunov Exponents and Kaplan-Yorke dimensions. Valerio Lucarini acknowledges the many exchanges on these topics with Davide Faranda, Antonio Speranza, and Renato Vitolo. Valerio Lucarini also acknowledges support received from the Sfb/Transregio Project TRR181 and from the StG-ERC Project NAMASTE (Grant No. 257106). Valerio Lucarini and Tamás Bódai are grateful for support from the CRESCENDO Project (Grant no. 641816). Vera Melinda Gálfi acknowledges funding from the International Max Planck Research School on Earth System Modelling (IMPRS-ESM).