Abstract
The ensemblebased Kalman filter requires at least a considerable ensemble (e.g., 10,000 members) to identify relevant error covariance at great distances for multidimensional geophysical systems. However, increasing numerous ensemble sizes will enlarge sampling errors. This study proposes a modified Cholesky decomposition based on the covariance localization (CL) scheme, namely a covariance localization scheme with modified Cholesky decomposition (CLMC). Our main idea utilizes a modified Cholesky (MC) decomposition technique for estimating the background error covariance matrix; meanwhile, we employ the tunable singular value decomposition method on the background error covariance to improve the ensemble increment and avoid the imbalance of the system. To verify if the proposed method can effectively mitigate the sampling errors, numerical experiments are conducted on the Lorenz96 model and largescale model (SPEEDY model). The results show that the CLMC method outperforms the CL method for different data assimilation parameters (ensemble sizes and localizations). Furthermore, by performing one year of assimilation experiments on the SPEEDY model, it is found that the 1day forecast RMSEs obtained by the CL are approximately equal to the 5day forecast RMSEs of CLMC. So, the CLMC method has potential advantages for longterm forecasting. Maybe the proposed CLMC method achieves good prospects for widespread application in atmospheric general circulation models.
1. Introduction
Data assimilation (DA) is the combination of observations with “prior knowledge” (e.g., numerical models and model outputs) to achieve an estimate of the true state of a system (cf. Katzfuss et al. [1]). In the process of sequential data assimilation, the traditional Kalman filter (KF; Kalman and Bucy [2]) is used for the estimation and prediction of linear problems. For complex multidimensional systems, the classical ensemble DA filter (EnKF; Evensen [3]) derived from the merging of the linear estimation theory of Kalman filter and the nonlinear Monte Carlo estimation theory has been proposed, where is widely applied in the largescale models (cf. Evensen and Van Leeuwen [4]). However, the ensemble members are always much smaller than the model components so sampling errors cannot be avoided. The sampling errors can introduce numerous other problems (e.g., inbreeding and filter divergence) into the process of the ensemble filter. Inbreeding is a term used to describe the underestimate of the analysis error covariance (cf. Furrer and Bengtsson [5]), and filter divergence occurs when the analysis error distribution moves away from the truth and this can be caused by the underestimating of the background error covariance. Such problems can be addressed with the covariance inflation method, which operates on the background error covariance multiplied by an inflation factor and prevents the underestimation of the covariance (cf. Anderson and Anderson [6]). However, the size of the inflation factor will depend on many factors including the type of filter and the used dynamical system (cf. Hamill et al. [7]). Thus, covariance inflation does not address all problems, especially since the spurious longrange correlations between great distance observations and model components from the EnKF. It is common to use localization schemes to solve such spurious correlations.
Localization of the covariance matrix in the EnKF is necessary for largescale models because the traditional ensembles are limited to only 10–100 states, which is considerably less than the degree of freedom of the model. Localization methods are typically implemented by taking the Schur product of the correlation matrix or the localization function, such that the localization methods contain covariance localization (CL; Hamill et al. [7]) and observation localization (OL; Hunt et al. [8]). Alternatively, the domain is decomposed as in domain localization (DL; Houtekamer and Mitchell [9]) and separate analysis is conducted for each domain. The CL method is commonly employed in ad hoc treatment of the spurious correlation problem. The OL and DL methods are frequently combined to use and achieve better observation weights according to their distance. Additionally, other localization techniques, such as local analysis (LA; Sakov and Bertino [10]), background error CL (BL; Greybush et al. [11]), and observation error CL (RL; Greybush et al. [11]) have been devised. Although such localization can be effective in avoiding spurious correlations and limiting the negative effects of sampling errors, it simultaneously introduces imbalances into the system (cf. Kepert [12]). Especially covariance localization is a major driver of potential imbalance in EnKF. Efforts are being made to find approaches to avoid the imbalance system. The imbalance is primarily produced by the computation of analysis increments. The incremental analysis update (IAU) procedure is used to prevent imbalance generation during the computation of the local increments (cf. Kleist and Ide [13]). In addition, it is possible to avoid imbalances by exploiting the shorter assimilation window (Houtekamer and Zhang [14]).
In the Localization scheme framework, the localization parameters (i.e., the localization length and the weighting function), a key indicator of the localization scheme, are often manually configured. These parameters are unknown a priori and can usually be found by trial and error. Kirchgessner et al. [15] investigated that the optimal localization radius of DL and OL are linearly related to the local observation dimension, and the OL effect was demonstrated to be better than the DL effect. With higherresolution models, the use of a narrower localization radius reduces the sampling error, but it limits the use of observations. Miyoshi and Kondo [16] proposed a multiscale localization (500 km and 900 km) approach for EnKF to find analysis increments in independent subobservation spaces. However, it requires a large computational cost when the number of observations available is . In addition, to preserve the largerscale structures from a limited ensemble size, Kondo et al. [17] employed a duallocalization approach to analyze smallscale and largescale analysis increments separately using spatial smoothing. For complex geophysical models, although the impact of the observations will be decreased with longer localization ranges, some issues of localization require to be briefly discussed. First, distancebased localization methods employ unimodal segmentation functions (e.g., GC functions) to construct the corresponding matrices, which have difficulties in “localizing” nonlocal observations (Bai et al. [18]). These observations exhibit multimodal correlations with model components (cf. Fertig et al. [19]; De La Chevrotière and Harlim [20]). Furthermore, in some cases, the model components or the corresponding observations may not have correlated physical locations, which may be against the principles of distancebased methods (cf. Bai et al. [21]). The fuzzy control functions are used for yielding more precise observation weights instead of localization functions (e.g., GC functions) (cf. Mingheng et al. [22]; Mingheng et al. [23]). For an ideal intermediate atmospheric general circulation model (AGCM), the negative effects of localization could be avoided with a sufficiently large ensemble size. Miyoshi et al. [24] investigated that using large ensemble sizes (up to 10,240 members) could capture longrange scale error correlations in the implementation of the local ensemble transform Kalman filter (LETKF; Miyoshi et al. [25]), however, it is essential for preventing memory overflow to employ relatively large ensemble members. Kondo and Miyoshi [26] modified the EnKF code without using the covariance localization prerequisite. It was found that using 10,240 ensemble members also reveals the longterm structural features of the background error covariance matrix. The aforementioned studies commonly considered the effect of horizontal scales on assimilation effects. Farchi and Bocquet [27] considered a realistic extension of the LEnSRF using covariance localization in the vertical direction. Wang et al. [28] introduced a multiscale local gain from ensemble transform Kalman filter (MLGETKF) to allow simultaneous update of multiple scales for the forecast ensemble mean and perturbations via assimilating all observations at once. Recently, The Cholesky decomposition method has been applied to the estimate of the forecast error covariance matrix (cf. Kang and Deng [29]).
In the abovementioned studies, the use of a multiscale localization scheme or large ensemble members (e.g., 10,000 members) has so far been necessary to mitigate the sampling errors. However, there are few investigations on the estimate of the forecast error covariance matrix. Here, a covariance localization scheme with modified Cholesky decomposition (CLMC) for estimating the background error covariance is proposed. The effectiveness of the proposed CLMC method is examined by the Lorenz96 and the SPEEDY models. Thus, the article is organized as follows: Section 2 focuses on the background of the ensemble Kalman filter, covariance localization, modified Cholesky decomposition, and the proposed covariance localization with the modified Cholesky decomposition. Section 3 describes the configuration of the twin models and analyzes the experimental results. Section 4 provides the discussion and conclusion of the results.
2. Background
2.1. Ensemble Kalman Filter (EnKF)
The EnKF is an ensemblebased Kalman filtering scheme, it was proposed by Evensen [3]; subsequently, it evolved many forms of EnKF methods (cf. Tippett et al. [30]; Houtekamer and Mitchell [9]; Evensen and Van Leeuwen [4]; Evensen [31]; Burgers et al. [32]. Anderson [33]). These methods can be roughly classified into two categories: deterministic filters (cf. Burgers et al. [32]) and stochastic filters (cf. Whitaker and Hamill [34]). The main difference between them is whether or not perturbed observations are introduced to maintain the correct statistics of the filter.
The implementation of EnKF is to update the a priori estimate of the atmosphere valid at atmosphere valid time with the information in the new observation to reach the latest estimate of the atmosphere (equation (1)). For this purpose, the Kalman gain matrix is used to dispatch appropriate weights to the observation error covariance and the background error covariance (equation (2)). In all equations, the superscripts and refer to the analysis and predictor variables, respectively. The observation operator denotes the mapping from the model space to the observation space. The model operator denotes the transfer of the estimate to the next analysis time (equation (3)).where denotes the ensemble size and represents the atmosphere valid time. For the Kalman gain in equation (2), the ensemble size is usually used to approximate the estimates and (cf. Houtekamer and Mitchell [35]; Ghil and MalanotteRizzoli [36]). In the subsequent equations, we omit the superscript of the time index for ease of notation.
For the stochastic EnKF, small spurious correlations between the ensemble background and observations lead the analysis to move away from the truth. Whitaker and Hamill [34] introduced the deterministic ensemble square root filter (EnSRF). However, using the EnSRF method to compute the analysis error covariance with a relatively small ensemble size, the Kalman gain is susceptible to contamination by sampling error, making it likely that the true analysis error is much larger than the background error. The ensemble may become too concentrated to the detriment of the assimilation results one would expect. It is common to use covariance inflation or covariance localization to compensate for this particular problem. Depending on the covariance inflation, it does not address all problems associated with undersampling. The covariance localization (CL) is essential for data assimilation.
There are two reasons for using covariance localization in EnKF, these are usually listed in the literature as “Spurious covariance” and insufficient ensemble rank (cf. Sakov and Bertino [10]). The “spurious covariance” means that the limitation of the ensemble size makes no connection between the covariances of the elements of the remote state vectors. It can lead to excessive reduction of the state error covariance in the analysis, resulting in ensemble collapse and filter divergence. The insufficient ensemble rank denotes that the ensemble size is less than the model components. Therefore, the proper use of the ensemble size is one of the most basic rules to ensure the design of EnKF systems.
2.2. Covariance Localization (CL)
For the CL method, it is used to modify and update equation (2) by multiplying the distance correlation coefficient matrix with each element of the background error covariance (Schur). It is one of the localization methods that are applied to largescale systems. The core idea behind this operation is to modify each element of the background error covariance to increase the rank of the covariance matrix.
However, the localized background error covariance matrix has hardly been explicitly computed except for smallscale systems. Instead, if we have enough observations, we can approximately compute the terms and in equation (2) by the ensemble size. Algorithm 1 summarizes the analysis step of the resulting generic CL method in detail.

Where refers to the ensemble anomalies of each member’s position around observations, represents the observation operator with all rows set to zero except for the th, implies to the observation position matrix, such that , ; 0, , and is the observation member. Although the introduction of covariance inflation and localization techniques can overcome the problems in the EnKF. There will be a restriction to apply these techniques only to the background error covariance matrix . There still exists finite ensemble sizes. Therefore, the local sample covariance matrix may be rankdeficient. To avoid this problem, the background error covariance matrix is estimated directly by an improved Cholesky decomposition. A sparse approximation of is replaced by . The aim is to avoid limiting the ensemble size as well as to prevent memory overflow.
2.3. Modifying Cholesky Decomposition
The accuracy of background error covariance estimation has been the core problem for the covariance localization schemes when using limited ensemble sizes. Recently, some researchers employ Cholesky decomposition to achieve the accuracy of background error covariance estimates (cf. Ueno and Tsuchiya [37]; NinoRuiz et al. [38]; Kang and Deng [29]; Boyles and Katzfuss [39]). In this regard, we achieve the estimation of the inverse of the sparse background error covariance matrix by using a modified Cholesky decomposition. The Cholesky decomposition is also known as the square root decomposition method. Its core principle is to represent a symmetric positive definite matrix as a decomposition of the product of a lower triangular matrix and its transpose. Consider (approximately) samples of Gaussian random vectors with statistical moments for . indicates the th model component for all samples. The modified Cholesky decomposition is derived from the previous component regression (NinoRuiz [40]).
Then, the estimate of can be computed as follows:for equations (5) and (6), the eigenstructure of depends on the structure of . Based on the structural features of , the sparse estimates of are obtained by imposing the elements of to be zero. The main zero components in are the following: if the two model components are conditionally independent for a given localization radius , the corresponding elements in are zero. Furthermore, based on the CL approach, the conditional independence of background errors between different model components can be achieved by using local neighborhoods (cf. Ueno and Tsuchiya [37]). For a given model component, its neighborhood is formed by some components within a localization radius . If a model component exceeds the distance, the correlation between the corresponding background errors is set to zero.
2.4. Covariance Localization with the Modified Cholesky Decomposition
This section investigates the estimation of the covariance matrix with modified Cholesky decomposition of the covariance localization (CLMC). First, according to equation (5), the structure of depends on the structure of , considering the estimation of the background error covariance. If we assume that the correlation between the model components is local and there is no correlation outside the localization radius , we obtain a lower triangular sparse estimate of , and is sparse and localized. According to literature (cf. NinoRuiz et al. [38]; Boyles and Katzfuss [39]), the regression equation can only be performed on the former for each model component, so the model components must be sorted (labeled) before computing . For the grid model, we mainly considered the column order.
The estimation procedure for is as follows:
1. For the th model component, we use the truncated singular value decomposition method to estimate the regression coefficients for and , the model component as follows:where represents the predecessor indexes of the model component when the localization radius is . denotes the th model component which precedes under the condition of . Figure 1(a) shows the local predecessors and neighborhood of model element 28 when column order is employed. represents the error of the th model component regression. denotes the coefficients can be calculated by the solving the optimization problem, that is , where , . The simplest solution is .
(a)
(b)
(c)
2. Create the sparse matrix
The residuals in equation (8) will never be zero, it implicitly estimates the covariance inflation. The application of covariance inflation may not be required for the CLMC assimilation scheme.
Once is estimated, in the next section we present how to avoid the imbalance effect and handle it with . However, here, the analysis formulation is as follows:here, the matrix of perturbed observations for . For equation (9), we obtain it by using the basic decomposition matrix identities.where and is the observation members. The term in equation (10) represents the perturbation observation. Equation (10) is well known as the EnKF dual formulation and equation (9) is the incremental form of the primal formulation. Using equation (10), we conclude that the computational effort of the method is bounded by the following equation:
2.5. Balance and Truncated Singular Value Decomposition (svd)
In the above section, we introduced the CL method with the modified Cholesky decomposition. However, covariance localization is a major source of significant potential imbalances in the EnKF, especially for short localization distances. Thus, how to effectively reduce the source of imbalance has become one of the key issues for data assimilation. There are many main ways to the reduction of sources of imbalance (such as modifying the initial spinup procedure and incremental analysis update (IAU)) (cf. Bloom et al. [41]; Kleist and Ide [13]). In practical operations, modifying the initial spinup procedure or applying IAU has so far been essential to mitigate the imbalance in the system. In our localization scheme, we use the sparse background error covariance as an alternative to the , the distribution of the model components, and the analyzed point in the spatial dimensions as shown in Figures 1(b) and 1(c). Schematic diagrams of the observations used in the localization scheme from the th grid point to the th grid point. For efficiency, we compute the distances for only observations with the coordinates and . Valid local observations reside within the circle of the local range radius , which is determined according to this distance. The physical distance between the analyzed point and the observation position is calculated before the state estimate is updated. The closer the distance to the observation position is, the greater the weight of the observation point, and vice versa.
Although we use a modified Cholesky decomposition to estimate the , and exhibit a strong correlation. The correlation is retained within a defined distance but has been eliminated beyond the defined distance. However, with highdimensional systems, the ensemble covariance becomes full rank, and we have to deal with a great covariance matrix (by exploiting symmetry, we can represent it by using elements). One does not want to represent such a matrix in memory, and it is preferable to formulate the filter in some matrixfree manner. Most of the matrixfree formulations rely on one of two things: domain decomposition and Schur product. Here, we employ the truncated svd and represent such a matrix in memory.
Suppose that we have a truncated svd of given by the following equation:where orthogonal matrix, diagonal matrix, , and . Because is symmetric positive definite, equation (12) is a truncated singular value decomposition. In truncated svd scheme, if the ensemble size is smaller than the number of local observations, then the local ensemble perturbation is written as ; in the opposite case, it becomes . Let be an matrix such thatNote that represents ensemble perturbations, which call that “augmented ensemble” for simplicity; and denotes the ensemble increments. Then, is a solution to equation (12) with perturbations. For equation (13), how can we efficiently construct from . Let matrix. We want to construct a matrix that has the same empirical covariance matrix and which is centered. Since has rank at most , we need to find a matrix such that
For , we define (multiplicative inflation factor) and as the matrix whose coefficients are as follows:
It can be easily checked that (i.e., is an orthogonal matrix) and , the first basis vector.
Let matrix whose first column is zero and whose other columns are those of , that is,
By construction, is a solution of equations (15) and (16).
For the study of covariance localization, it is mainly limited to the effect of each observation on the analyzed distance within a fixed horizontal distance and vertical distance . In addition, each model component within the background covariance matrix is correlated with each other. Therefore, in the context of the sequential assimilation scheme, the elements of the Kalman gain matrix are zero beyond these distances (cf. Houtekamer and Zhang [14]; equation (20)).
The observations in the grid point are applied to different weights based on their physical distance from the grid points. These weights are mainly imposed on the inverse of the Schur product of the observation covariance matrix and the correlation coefficient matrix to implement (cf. Hunt et al. [8]). The traditional localization function is an adjustable function with a distribution function similar to the probability density function of a normal distribution. In addition, efforts are being made to find the Gaussian function that can be fitted to the widely used GaspariCohn function with compact support (cf. Gaspari and Cohn [42]), where the observations in this localization region are applied different observation weights and the GaspariCohn function outside this localization region is zero. The step function is as follows:where denotes the adjustable cutting radius. For observation weighting, there are mainly weights inside the observation region and weights outside the observation region (usually set to zero). Therefore, equation (19) is also considered as domain localization (DL) (cf. Kirchgessner et al. [15]).
3. Twin Models and Experimental Settings
3.1. Experimental Settings for the Lorenz96 Model
For the simple 40dimensional Lorenz96 model (cf. Lorenz and Emanuel [43]), the model parameters are chosen as follows: to ensure sufficient integration frequency for each assimilated variable in the model, all 40 grid points are observed. We set the 0.05time step (i.e., corresponds to 6 h atmosphere valid time), and this 0.05time step is labeled as a DA step. To introduce more observation errors in a more challenging way, the observation error covariance is . Here, selecting the observation error covariance value of 3.05 is used for validating the assimilation performance of CL and CLMC, this is mainly because introducing larger observation error covariance values whether the CLMC method still achieves better DA performance. The Lorenz96 model state , is based on the following equation:
Here, , and the observation members are 20 (i.e., ). The input parameters (e.g., inflation factor, localization radius, and ensemble size) are also considered. The initial conditions are generated from the filter’s ensemble members. The model was first integrated forward over several years of realworld time without assimilating any variables. The initial forecast ensemble members are similar except for the imposition of additional Gaussian noise. The true values are produced randomly by small perturbations of the instability. In this case, the initial ensemble members move away from the model attractor. There is no clear relationship between the spread and the error. The spinup period of one and a half years is sufficient to eliminate any unfavorable effects (cf. Hoteit et al. [44]). Constant multiplicative inflation factor (cf. Anderson and Anderson [6]) is used to enlarge the ensemble spread and prevent filter divergence. The default strength forcing is 8. A set of data assimilation experiments for each data assimilation method is performed for 365 days, using the default settings and different assimilation parameters (inflation and localization). The manually tuned optimal parameters are shown in Table 1.
3.2. Experimental Settings for the SPEEDY Model
The moderately complex SPEEDY model (cf. Ritchie [45]; Miyoshi [46]) is similar to weather dynamics. Its assimilation variables contain temperature, dimensional wind component, meridional wind component, specific humidity, surface pressure, and precipitation (cf. Bourke [47]). It has eight vertical levels of resolution in the sigma coordinate system. The SPEEDY model creates an output file every 6 hours for weather prediction (cf. Greybush et al. [11]). A more detailed SPEEDY model is available in Molteni [48]. The initial ensemble of states is generated randomly by the model with a spinup period of 1 year (cf. Kondo and Miyoshi [26]). The natural runtime starts from a stationary standard atmosphere (i.e., no wind) at 0000 UTC 1 January. The total model component is . For all filter cases, the model state space exceeds the ensemble size .
For the SPEEDY model, identical observations are generated by adding Gaussiandistributed random numbers with the natural run every 6 h at radiosondelike locations for 8 levels (100 hPa, 200 hPa, 360 hPa, 520 hPa, 800 hPa, 850 hPa, 880 hPa, 925 hPa). The standard deviations of the observation errors for the zonal wind, temperature, specific humidity, and surface pressure are fixed at 1.0 ms^{−1}, 1.0 K, 1.0 gkg^{−1}, and 1.0 hPa.
3.3. Validation Statistics
If there are dimensional model components, the root mean square errors (RMSEs cf. equation (18) in Bai et al. [21]) and the ensemble spread (cf. equation (22) in Hoteit et al. [44]) are applied to evaluate the assimilation performance error factors, the equation is as follows:where the subscript denotes the index of the state components and the subscript refers to the index of the ensemble members; and denote the th analyzed value and the true value, respectively; denotes the ensemble variance of , the analysis quality of the assimilation algorithm can be measured more comprehensively by averaging the computations in space. Usually, better assimilation results correspond to smaller RMSEs.
3.4. Experimental Results for the Lorenz96 Model
This section analyzes the CL and CLMC algorithms assimilation results using the Lorenz96 model. Sensitivity experiments are conducted on the CL and CLMC algorithms by configuring different localization scales with the default ensemble size, the fixed observation members, the strength forcing, and the manually tuned optimal inflation value (Table 1). To improve the accuracy of the experiments, each experiment was repeated twice, applying their mean value as the evaluation criterion for each algorithm.
The power spectral density (PSD; cf. Sakov and Bertino [10]) evaluation criterium is used to test the performance of the CL and CLMC algorithms. With the different localization scales, the CL and CLMC algorithms of PSD change correspondingly as shown in Figure 2. The blue curve “Forecast” in the graph represents the Fourier spectrum of the ensemble singular value of the forecast values, and the green curve “CL” and the red curve “CLMC” represent the Fourier spectrum of the ensemble singular values of the analysis values. The PSD of the vertical coordinate indicates the PSD of the ensemble mean and the area under the panel represents the variance.
(a)
(b)
(c)
Figure 2 shows the influence of the CL and CLMC algorithms on the PSD with different localization radius. Here, the default ensemble size, observation fraction, observation error variance, strength forcing, and the optimization of the inflation factor are configured. The result shows that the areas under the curves gradually become small while the PSD value is less than the forecast value for the CL and CLMC algorithms as the spectral component increases. In addition, as the spectral component increases, the PSD values for the CL and CLMC algorithms exhibit different results. (1) The area under the curve for the CLMC algorithm is always smaller than the CL algorithm; however, as the spectral component increases, the PSD values for the CL and CLMC methods are far less than forecast values. (2) Based on the modified Cholesky decomposition leads to the magnitude of the CLMC algorithm between 0.1 and 0.2, in the CL scheme, the magnitudes are between 0.10 and 0.30 when the localization radius is 10 (Figure 2(a)). When the localization radius is increased to 15, the modified Cholesky decomposition leads to magnitudes in the PSD of the CLMC algorithm between 0.05 and 0.15 and in the CL algorithm between 0.13 and 0.25 (Figure 2(b)). It is worth noting that the PSD value of the CL and CLMC methods become small when the localization radius is 20. However, the PSD value of CLMC is still smaller than the CL. Overall, the CLMC algorithm exhibits better performance than the CL algorithm.
In the local analysis process, it is known from equation (1) that the update of Kalman gain plays an essential role in the analysis value, which affects the final analysis accuracy. Here, the default ensemble size, observation fraction, observation error variance, strength forcing, and the optimization of the inflation factor and localization radius are used (Table 1). Figure 3 shows the impact of CL and CLMC methods on Kalman gain for different localization radii. In this case, the Kalman gain is a 40 20 matrix. The horizontal axes in Figure 3 indicate the index numbers of the corresponding observations, and the vertical axes indicate the index numbers of the state variables. The Kalman gain values were obtained by the CL and CLMC methods, as well as without using the localization scheme. The figure is similar to literature Sakov and Bertino [10] in Figure 3. The results show that the Kalman gain without the localization scheme retains the spurious correlation from the longrange observations, whereas the CL and CLMC methods eliminate the spurious correlation of the longrange in the gain matrix. Furthermore, we observed that the Kalman gain matrix exhibits significant diagonality and symmetry due to the use of localization schemes (including CL and CLMC). However, the superior performance of the CLMC method may be more significant compared to the CL scheme (as seen from the colorbar scale). In terms of the gain obtained for each observation (Kii), increasing the localization radius is favorable for the CL and CLMC methods to obtain the small Kalman gain, whereas the CL, CLMC, and without using the localization scheme have larger Kalman gain values when the observation index approaches 20 (fourth column of Figure 3).
(a)
(b)
(c)
Time series of the RMSEs for zonal wind (ms^{−1}) at the fourth model level (∼510 hPa) with different ensemble sizes: (a) ensemble size = 75, (b) ensemble size = 100, (c) ensemble size = 125, and (d) ensemble size = 200. Here, 14600 assimilation steps correspond to 10 years from 0000 UTC 1 January 1982 to 0000 UTC 1 January 1992. The localization method is beneficial for the ensemble Kalman filter to eliminate spurious longrange correlations. Especially, the choice of localization parameters is unknown a priori, and most researchers obtain the optimal localization and inflation factor parameters sufficient to handle the corresponding assimilation problem with continuous iterative experiments. In this study, the inflation factor and localization are manually tuned with optimal parameters. Here, the default ensemble size, observation members, observation error variance, strength forcing, and the optimization of the inflation factor are used (Table 1). Figure 4 shows the time series of the RMSEs for the CL and CLMC algorithms with different localization radii. In the operational model, the assimilation cycle is 1460 steps. One assimilation cycle is approximately equivalent to 6 hours of “weather”. The 1460 assimilation cycles are one year (365 days). The results show that the CL and CLMC methods achieve better filter performance with increasing localization radius. However, the CL and CLMC methods perform large RMSEs since the instability of the system state at the initial moment. The RMSEs of the CL and CLMC methods present convergence results as the assimilation step increases. After a period of assimilation steps, the RMSEs do not reduce and achieve a convergent trend, this may be due to the optimization of the inflation factor and the default ensemble members of 25, since small inflation and a large localization radius are desired. In addition, it is worth noting that after about 50 days of assimilation steps, the CL exhibits inferior balance (Figure 4(c)), this may be due to the missing part of the analysis increment in updating the local state variables, resulting in poor assimilation results. However, the CLMC method shows better assimilation results and achieves wellbalanced during the whole assimilation process.
(a)
(b)
(c)
3.5. Experimental Results for the SPEEDY Model
In the abovementioned experiments, the effectiveness of CL and CLMC methods was checked by the toy Lorenz96 model with different assimilation parameters. To further investigate the application of CL and CLMC methods in largescale models, in this section, the performance of the CL and CLMC are further examined on the SPEEDY model. All experiments are initialized by the initial ensemble randomly chosen from the nature run at 0000 UTC 1 January (cf. Miyoshi [46] and Kang et al. [49]). Here, to avoid manual optimization of the inflation parameters, the adaptive covariance inflation method was applied (cf. Miyoshi [50]).
To investigate the assimilation results of the CL and CLMC methods in the largescale models, an experiment was conducted using different ensemble sizes, since this choice plays an essential part in the DA process. In practical applications, the feasible ensemble size is usually smaller than the degrees of freedom of the system (or dimension of the unstable subspace or error subspace). The other assimilation parameters, such as localization scale (containing horizontal and vertical scales), observation member, and the observation error variance (1.0 ms^{−1}) are also applied as the default values (Table 2), and an adaptive covariance inflation method is also employed to enlarge the ensemble spread and prevent filter divergence. Figure 5 shows the time series of RMSEs of the CLMC and CL methods are examined for the zonal wind at the fourth level mode (∼510 hPa). The results show that increasing the ensemble size improves the assimilation results for the CLMC and CL methods. When the ensemble size is 75 and 125, the RMSEs profiles of the CLMC and CL methods show a similar assimilation behavior (Figures 5(b) and 5(c)). However, as the assimilation step increases, in particular when the ensemble size is 100, 125, and 200, the RMSEs of CLMC and CL methods present peaks (at the same assimilation step) (Figures 5(b) to 5(d)). This may be the adaptive inflation for larger ensemble sizes, since small inflation and broad localization scales are what we expect as the ensemble size increases. The RMSEs of the CLMC method are much smaller than CL when the ensemble size is 200, this implies that a larger ensemble size is beneficial for the CLMC method (Figure 5(d)). Overall, the CLMC method outperforms CL, especially when the ensemble size is larger.
(a)
(b)
(c)
(d)
To better understand the mechanisms behind the CL and CLMC methods and to test their applicability to real atmospheric systems. The zonal wind of the spatial distribution of the timeaveraged analysis RMSEs for the CLMC and CL methods are examined from 0000UTC 1 March 1982 to 0000UTC 1 April 1982 with different ensemble sizes. The other assimilation parameters, localization scale, observation member, and the observation error variance (1.0 ms^{−1}) are used as their default values. Figure 6 displays the spatial distribution of the timeaveraged analysis RMSEs for the CLMC and CL methods from 0000UTC 1 March 1982 to 0000UTC 1 April 1982 with different ensemble sizes. The results show that the CL and CLMC methods exhibit inconsistent assimilation results when the ensemble sizes are 75 and 100. (1) The timeaveraged analysis RMSEs of the CL method are particularly pronounced, especially when the ensemble size is 100, in the longitude region W– W (observations are sparse), where the average analysis RMSEs of the zonal wind reaches about 0.152 ms^{−1}. The timeaveraged analysis RMSEs of the CLMC method are relatively small (about 0.116 ms^{−1}) (Figures 6(c) and 6(d)). (2) In the high latitudes, the timeaveraged analysis RMSEs reach 0.148 ms^{−1} for the CL method (Figure 6(c)) and about 0.135 ms^{−1} for the CLMC method (Figure 6(d)). In sparse observations, it is worth noting that EnKFbased CL methods are susceptible to large adaptive covariance inflation factors, with a fixed localization scale, smaller covariance inflation factors are what one would expect as the ensemble size increases. When the ensemble size is 125, the spatial distribution of the timeaveraged analysis RMSEs of the CL and CLMC methods reaches inconsistent results. (1) The CL scheme achieves larger RMSEs, mainly in the low latitudes ( W– E), where the averaged analysis RMSEs are approximately 0.120 ms^{−1} (Figure 6(e)). (2) In most of the regions, the CLMC method obtains the smaller analysis RMSEs (the maximum value reaches 0.112 ms^{−1}) (Figure 6(f)). This is probably due to the CLMC scheme retaining the main characteristics of background error covariance for larger localization radius. CL scheme neglectes some short to mediumrange correlations. When the ensemble size is 200, over the Arctic, especially near W– E where observations are relatively sparse, the timeaveraged analysis RMSEs for the CLMC method reach almost 0.04 ms^{−1} (Figures 6(h)), the CL scheme reaches about 0.10 ms^{−1} (Figure 6(g)), as a reference, observations are available only at the position shown at the intersection with an error standard deviation of 1.0 ms^{−1}. In general, these results demonstrate that the CL and CLMC techniques tested here resulted in a wellconstrained analysis, and the error is significantly reduced with increasing ensemble sizes, but when the ensemble size increases from 75 to 125, the error of the CL scheme does not present a significant reduction in the few areas where observations are sparsely distributed. This may be due to the broader localization scale and the larger inflation factor used for the CL scheme since the less inflation and the wider localization scale is what one would expect when the ensemble size increases (Figures 6(a), 6(c) and 6(e)). Therefore, the superior behavior of the CLMC method here may be more significant when considering practical application aspects.
Finally, the stability of the CL and CLMC methods for adaptive covariance inflation factor is explored by using timeaveraged analysis ensemble spread instead of timeaveraged analysis RMSEs (cf. Kondo and Miyoshi [26]). The results show that in regions with few observations, the effect of the CL and CLMC methods on the timeaveraged analysis ensemble spread is larger with increasing ensemble size. In contrast, in regions with a larger number of observations, the timeaveraged analysis ensemble spread values are surprisingly small (Figures 7(a) to 7(h)). Thus, the adaptive covariance inflation factors exhibit strong robustness for different ensemble sizes. Table 1 lists the global averages of the adaptive covariance inflation parameters for CL (left value in the last row) and CLMC (right bold value in the last row) at the fourth model level.
The previous experimental cases are based on analysis RMSEs to examine the effectiveness of CL and CLMC methods, where the CLMC method outperforms CL, and the CL and CLMC are susceptible to local instability since relatively large adaptive covariance inflation factors are used. To prevent such a situation, the shortterm forecast RMSEs of the CL and CLMC methods are examined with the default observations, observation error covariance, localization scale, and adaptive covariance inflation factors. Here, the ensemble size is 200 members and the analysis ensemble mean is used as the initial condition ensemble members. For all variables (including zonal wind, temperature, specific humidity, and surface pressure), the average 6day forecast RMSEs for 245 forecast cases from 0000UTC 1 March to 0000UTC 1 May are investigated (Figure 8). The results show that for all variables, except for the specific humidity, the profiles of the average 6day forecast RMSEs are never crossed. The average forecast RMSEs of the CLMC method is always smaller than CL for the entire 6day forecast from the initial time. By calculating the average 6day forecast RMSEs for 245 forecast cases, we find that the 1day forecast RMSEs of CL is approximately equal to the 5day forecast RMSEs of CLMC. This suggests that the CLMC method has potential advantages for relatively large ensemble sizes and for considering the correlation of shortterm forecast errors. However, during the experimental settings, the results may be overly optimistic since assumptions such as fully known observation error statistics and a simplified model with low resolution are used. In general, the CLMC clearly outperforms the CL method in terms of shortterm forecasting.
(a)
(b)
(c)
(d)
4. Conclusion
In ensemble DA, since using a large ensemble size leads to sampling error and imbalance in the DA system, this paper proposes a covariance localization method (CLMC) with Cholesky decomposition to eliminate sampling error and achieve the estimation of the inverse of the background covariance matrix. Several advantages of using such a new approach are listed below. (1) Typically, the ensemble covariance is a fullrank matrix, and most rankdeficient matrices depend on the Schurproduct. Since the matrix has large dimensionality, it is not favorable for matrix storage. The predefined sparse matrix structure is constructed as the inverse of the covariance coefficient by Cholesky decomposition. If two distant model components are uncorrelated, the corresponding entries in the inverse covariance matrix are zero, thus avoiding the use of excessive memory; (2) Since the only nonzero entries in the Cholesky factors correspond to model components close to each other, the use of sparse structure on the inverse background covariance matrix is a form of covariance localization. (3) To facilitate the storage of the background error covariance matrix, an adjustable singular value decomposition method is applied to the background error matrix.
To verify the effectiveness of the proposed method, a series of experiments are conducted in the simple Lorenz96 model and the SPEEDY model. The results of the Lorenz96 model show that the RMSEs obtained by CLMC are smaller than the CL method with sparse observation results . In addition, increasing the localization radius is beneficial for using more observation information and improving the assimilation result of the proposed algorithm. The results of the SPEEDY model show that increasing the ensemble number is favorable for the CL and CLMC methods to obtain better assimilation results. However, the differences in the spatial distribution of the analysis ensemble spread and the analysis RMSEs between the CL and CLMC methods are more obvious. This may be due to the fact that the use of adaptive covariance localization is susceptible to large inflation factors in the filtering process and reflects the changing characteristics of the background field of the chaotic model, which is not a good choice for the DA system. Thus, investigating adaptive covariance localization methods is a future topic. In addition, calculating the average 6day forecast RMSEs for 245 forecast cases, we find that the 1day forecast RMSEs of CL are approximately equal to the 5day forecast RMSEs of CLMC. This suggests that the CLMC method has potential advantages for relatively large ensemble sizes considering the correlation of shortterm forecast errors.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
Mingheng Chang performed conceptualization, methodology, software, validation, original draft preparation, and reviewing. Hongchao Zuo supervised conceptualization and methodology. Jikai Duan performed investigation and Editing.
Acknowledgments
This work was supported by the Second Tibetan Plateau Scientific Expedition and Research Program (STEP) under grant no. 2019QZKK0103, and the National Natural Science Foundation of China under grant no. 422750699. This work was also supported by Lanzhou University, Fundamental Research Funds for the Central Universities under grant number lzujbky2021sp61. Many thanks to Pavel Sakov for providing the publicly available EnKFMatlab software.