#### Abstract

This study considers a new hybrid three-dimensional variational (3D-Var) and ensemble Kalman filter (EnKF) data assimilation (DA) method in a non-perfect-model framework, named space-expanded ensemble localization Kalman filter (SELKF). In this method, the localization operation is directly applied to the ensemble anomalies with a Schur Product, rather than to the full error covariance of the state in the EnKF. Meanwhile, the correction space of analysis increment is expanded to a space with larger dimension, and the rank of the forecast error covariance is significantly increased. This scheme can reduce the spurious correlations in the covariance and approximate the full-rank background error covariance well. Furthermore, a deterministic scheme is used to generate the analysis anomalies. The results show that the SELKF outperforms the perturbed EnKF given a relatively small ensemble size, especially when the length scale is relatively long or the observation error covariance is relatively small.

#### 1. Introduction

Data assimilation (DA) combines the information from the short-term model forecast and the information from the observations to form an initial estimate of the state in a dynamic system. The three-dimensional variational (3D-Var) method [1, 2] is known as an effective scheme for DA system, and 4D-Var [3, 4] is an extension of 3D-Var with an extra time dimension. However, due to the use of a static background error covariance, the DA system with 3D-Var lacks flow-dependent error statistics, which will yield an inaccurate estimate of the state.

Another scheme, called Kalman filter (KF), as well as its variant extended Kalman filter (EKF) [5, 6], is then used, which can estimate both the state and its real-time uncertainty in a linear or weakly nonlinear dynamic system. The ensemble Kalman filter (EnKF) [7, 8] is a variant of the EKF, and it utilizes an ensemble-based covariance to approximate the forecast uncertainty. Hybrid schemes, regarded to absorb both the advantages of 3D-Var and EnKF, are attracting much attention these years, such as MLEF [9], En4DVar [10], incorporating ensemble covariance in GSI [11], PODEn4DVar [12], and IEnKF [13]. Localization is often used to reduce spurious correlations in ensemble-based covariance matrices [14–16]. However, research on directly localizing the ensemble anomalies and expanding the correction space of analysis increment is small, and the deterministic scheme of generating the analysis anomalies in a hybrid frame attracts little attention as well.

The aim of this paper is to present a new method, which can directly localize the ensemble anomalies, rather than the full forecast or the full analysis error covariance, and increase the rank of the ensemble-based covariance matrix. Therefore, a new hybrid 3D-Var and EnKF scheme, named space-expanded ensemble localization Kalman filter (SELKF), is proposed. In the new method, a localization operation is directly executed in the ensemble anomalies, rather than the full forecast error covariance or analysis error covariance. Meanwhile, the correction space of analysis increment is expanding from a space spanned by the ensemble perturbations to a new space with a larger dimension. Furthermore, a deterministic scheme is applied to form the analysis anomalies. We perform a number of experiments to compare the performance of the SELKF against the traditional (perturbed) EnKF, exploring its capabilities in different ensemble sizes, length scales of the correlation, inflations, observation time intervals, and observation error variances.

#### 2. The New Scheme: SELKF

We firstly define some notations used in this paper as follows: *I*: number of ensemble members; *N*: dimension of state vector; *M*: dimension of observation vector; *r*: rank of the truncated square root of correlation matrix **S**.

In the EnKF, the ensemble-based covariances are defined as where and denote the forecast error covariance and the analysis error covariance, respectively; and indicate the forecast ensemble anomalies and the analysis ensemble anomalies, where each column is the deviation of an ensemble member to the mean, respectively.

Unlike the perturbed EnKF, the ensemble transformation Kalman filter (ETKF) [14] used a deterministic scheme to generate the analysis anomalies, in which the transformation from forecast ensemble anomalies to analysis ensemble anomalies can be written as
where **V** and are the eigenvector matrix and the eigenvalue matrix of , respectively. ETKF is theoretically equivalent to a 3D-Var DA method in the linear framework, named MLEF [9], in which
where is a control variable with dimension , which is the size of ensemble, and is the observation operator. MLEF constructs analysis ensemble anomalies with the same form as (2).

The modification to the forecast state in the MLEF (or the ETKF) is restricted to the ensemble space. If the ensemble size is relatively small, one can only obtain a suboptimal analysis due to the restricted subspace. In addition, in the original ETKF, there is no localization, which is known as a powerful method to reduce sample errors by lessening spurious correlation in an error covariance [15].

In the following, a space-expanded ensemble localization Kalman filter (SELKF) is presented. In SELKF, the correction space is expanded from to *Ir*, where is a spectral truncated dimension of a correlation matrix **S**, and the localization operation is mixed into the ensemble anomalies directly, as was done by Buehner [17], rather than being applied into the full error covariance.

Introduce a new control variable and define
where indicates a diagonal matrix with the elements of in the diagonal, is the deviation of from the mean , and denotes the truncated square root of correlation matrix . is an matrix, and **w** is a vector with dimension *Ir*

where is a vector with dimension , corresponding to .

The following identity can be derived [17]:

If is linear or weakly nonlinear, we define where

Thus, we can obtain a new cost function and its gradient

The analysis solution can then be calculated

From (4), we can get a relationship between and as follows:
where is the analysis error covariance with respect to **w**, and it is equal to the inversion of the Hessian matrix of cost function (9).

We define where .

Thus, the analysis ensemble anomalies can be calculated in which

and is a vector with the corresponding elements of .

Finally, the analysis ensemble is deterministically generated as follows:

To reduce the computation cost, some transformations can be made as done in [9] where and are the eigenvector matrix and the eigenvalue matrix of , respectively.

A similar scheme [18–20] (hereafter, we call the scheme LW) was proposed to localize in the ensemble perturbation space. However, there are two major differences between the LW and the SELKF. (1) The dimension of the control variable in the SELKF is *Ir*, which is much smaller than *IN* in the LW, due to the use of a truncated spectral expansion in the SELKF. Therefore, the computational cost is significantly reduced. (2) The way to generate the analysis anomalies is quite different. In Lorenc 2003, no deterministic scheme is utilized to update the analysis ensemble, and in Wang et al. 2008a, 2008b, an ETKF scheme is directly used to form the analysis anomalies. However in this paper, the analysis anomalies are generated from the localized square root, , of the analysis error covariance matrix, as is shown in functions (14) and (15). This scheme keeps the property of localization and explicitly reduces the spurious correlations in the analysis error covariance matrix.

#### 3. Experiments and Results

The performance of the new method will be investigated in the Lorenz96 system, which is often used as an approximate to real atmospheric systems [12, 21]. It is a chaos system in which a small perturbation of the initial state can result in a much different forecast.

The system can be written as where , and . The initial state of this simulation is defined as

The model used in this experiment is not perfect, with an error covariance of , where **A** is a matrix with 1 at the diagonal and 0.5 at the sub- and super-diagonals. The observation operator is defined as , in which is a perturbed value with variance of . We assume that all grid points are observed every time steps. The observations are generated by adding a random to the truth at observation times. We assume that each variable is observed independently; that is, the observation error covariance is diagonal.

Here, we use a compactly supported weighting function as a Schur Product function to reduce the spurious correlation as follows:
where , denotes the length scale, and indicates the distance between two points (observation points and grid points) [22]. For the approximation of the correlation matrix **S** in the SELKF, the truncation strategy is simply set to choose the eigenvalues equaling to or bigger than 1 and the eigenvectors corresponding to those eigenvectors, while the full **S** is used in the EnKF. The inflation strategy for both methods is simply defined as
where .

The comparison of the new method with the EnKF will be made after a 2000-step spin-up of the model. The performance with different length scales of the correlation matrix and inflation factors will be investigated firstly. Every RMSE (the root mean square error) is an average of 50 runs over 20000 simulation time steps. We use in the first experiment.

Figure 1 shows the RMSE versus the length scale and the covariance inflation factor . It can be seen that when the ensemble size is large enough, for example, , the SELKF performs as well as the EnKF; however, it performs better than the EnKF as the number of ensemble members decreases, and the smaller the ensemble size, the larger the performance difference. It is interesting that the optimal performance of the EnKF is achieved at a relatively small inflation factor, especially with a relatively large ensemble size, for example, for , whereas the best results of SELKF are obtained at a much larger inflation factor, for example, for . However, when the length scale is increasing, a little larger inflation factor could be used to obtain a better performance for the EnKF, as is shown in Figure 1(b). It should be noted that with the decreasing of ensemble members, the optimal inflation factor of the EnKF increases to a much larger one, for example, 2 through 3 when .

(a) SELKF, |

(b) EnKF, |

(c) SELKF, |

(d) EnKF, |

(e) SELKF, |

(f) EnKF, |

It is also noted that when the length scale is relatively large, for example, through 10, the EnKF diverges severely at a relatively small ensemble size. In contrast, the SELKF can get a good performance with an appropriate inflation factor, even though the length scale is very large. Smaller length scales will reduce more spurious correlation in the covariance; however, it destroys physical balance at the same time [23]. It is therefore shown that the SELKF is more applicable than the EnKF especially when the ensemble size is limited and the length scale is relatively large.

Figures 1(a) and 1(b) demonstrate the comparable performance between the SELKF and the EnKF when the ensemble size is large enough. In the following, we will investigate the sensitivity of the two methods to varying observation time intervals and observation errors. Here, we choose the parameters that result in best performance for the two methods, respectively: for the SELKF and for the EnKF, and for both.

It is shown in Table 1 that the shorter the observation time interval, the better the performance for both methods, as expected. When the observation error variance is relatively large, the EnKF performs a litter better than the SELKF in a relatively small observation interval. If becomes larger, the SELKF can get a better performance than the EnKF even when is relatively large. However, when is small enough, the improvement of performance for the SELKF is significantly great, whereas the EnKF performs much worse than the SELKF, even worse than itself in the case of relatively large . The reason is partly because the EnKF takes use of the observation perturbations to generate analysis ensemble. Once the observation error variance is not large enough, the space spanned by the analysis ensemble cannot accommodate the truth. In contrast, the deterministic way of constructing the analysis anomalies in the SELKF can avoid that problem and benefit from the better accuracy of .

#### 4. Summary and Discussion

In order to reduce the spurious correlation in the ensemble forecast covariance and expand the limited correction space of the analysis increment in the EnKF, a new method called SELKF is presented.

This new method is compared with the traditional EnKF in the Lorenz96 system with model errors, and the results demonstrate that when the ensemble size is relatively large, the EnKF performs as well as the SELKF if the length scale is relatively small and the observation error covariance stays in a proper range. With decreasing ensemble size, the SELKF will outperform the EnKF, especially when the length scale is relatively large, where the EnKF diverges rapidly. This may be partly because the SELKF uses an expanded correction space, which can accommodate the truth of the analysis increment within the space spanned by the expanded ensemble anomalies, even given a relatively small ensemble size. It is noted that the SELKF performs much better than the EnKF when the observation error covariance is very small. This is partly due to the use of a deterministic scheme of forming the analysis anomalies. The SELKF uses exact factorization of the analysis error covariance and thus avoids the suboptimality resulting from the stochastic nature of the perturbed EnKF.

However, there are some shortcomings in the SELKF. If the length scale is very small, the truncation of the correlation matrix will yield a relatively large truncated rank , which results in a relatively expensive computation. This indicates that the SELKF would better be applied to systems with relatively long length scales. In addition, if the ensemble size is relatively large, the best performance of the SELKF is achieved at a much larger inflation factor than in the EnKF, the reason of which is still an open question.

#### Acknowledgments

This work was partly supported by the National Science Foundation of China (Grant no. 41105063). The authors are grateful to the anonymous reviewers for helpful comments.