Research Article  Open Access
Qin Xu, Li Wei, "Formulations for Estimating Spatial Variations of Analysis Error Variance to Improve Multiscale and Multistep Variational Data Assimilation", Advances in Meteorology, vol. 2018, Article ID 7931964, 17 pages, 2018. https://doi.org/10.1155/2018/7931964
Formulations for Estimating Spatial Variations of Analysis Error Variance to Improve Multiscale and Multistep Variational Data Assimilation
Abstract
When the coarseresolution observations used in the first step of multiscale and multistep variational data assimilation become increasingly nonuniform and/or sparse, the error variance of the firststep analysis tends to have increasingly large spatial variations. However, the analysis error variance computed from the previously developed spectral formulations is constant and thus limited to represent only the spatially averaged error variance. To overcome this limitation, analytic formulations are constructed to efficiently estimate the spatial variation of analysis error variance and associated spatial variation in analysis error covariance. First, a suite of formulations is constructed to efficiently estimate the error variance reduction produced by analyzing the coarseresolution observations in one and twodimensional spaces with increased complexity and generality (from uniformly distributed observations with periodic extension to nonuniformly distributed observations without periodic extension). Then, three different formulations are constructed for using the estimated analysis error variance to modify the analysis error covariance computed from the spectral formulations. The successively improved accuracies of these three formulations and their increasingly positive impacts on the twostep variational analysis (or multistep variational analysis in first two steps) are demonstrated by idealized experiments.
1. Introduction
Multiple Gaussians with different decorrelation length scales have been used at NCEP to model the background error covariance in variational data assimilation (Wu et al. [1], Purser et al. [2]), but mesoscale features are still poorly resolved in the analyzed incremental fields even in areas covered by remotely sensed highresolution observations, such as those from operational weather radars (Liu et al. [3]). This problem is common for the widely adopted singlestep approach in operational variational data assimilation, especially when patchy highresolution observations, such as those remotely sensed from radars and satellites, are assimilated together with coarseresolution observations into a highresolution model. To solve this problem, multiscale and multistep approaches were explored and proposed by several authors (Xie et al. [4], Gao et al. [5], Li et al. [6], and Xu et al. [7, 8]). For a twostep approach (or the first two steps of a multistep approach) in which broadly distributed coarseresolution observations are analyzed first and then locally distributed highresolution observations are analyzed in the second step, an important issue is how to objectively estimate or efficiently compute the analysis error covariance for the analyzed field that is obtained in the first step and used to update the background field in the second step. To address this issue, spectral formulations were derived by Xu et al. [8] for estimating the analysis error covariance. As shown in Xu et al. [8], the analysis error covariance can be computed very efficiently from the spectral formulations with very (or fairly) good approximations for uniformly (or nonuniformly) distributed coarseresolution observations and, by using the approximately computed analysis error covariance, the twostep analysis can outperform the singlestep analysis under the same computational constraint (that mimics the operational situation).
The analysis error covariance functions computed from the spectral formulations in Xu et al. [8] are spatially homogeneous, so their associated error variances are spatially constant. Although such a constant error variance can represent the spatial averaged value of the true analysis error variance, it cannot capture the spatial variation in the true analysis error variance. The true analysis error variance can have significant spatial variations, especially when the coarseresolution observations become increasingly nonuniform and/or sparse. In this case, the spatial variation of analysis error variance and associated spatial variation in analysis error covariance need to be estimated based on the spatial distribution of the coarseresolution observations in order to further improve the twostep analysis. This paper aims to explore and address this issue beyond the preliminary study reported in Xu and Wei [9]. In particular, as will be shown in this paper, analytic formulations for efficiently estimating the spatial variation of analysis error variance can be constructed by properly combining the error variance reduction produced by analyzing each and every coarseresolution observation as a single observation, and the estimated analysis error variance can be used to further estimate the related variation in analysis error covariance. The detailed formulations are presented for onedimensional cases first in the next section and then extended to twodimensional cases in Section 3. Idealized numerical experiments are performed for onedimensional cases in Section 4 and for twodimensional cases in Section 5 to show the effectiveness of these formulations for improving the twostep analysis. Conclusions follow in Section 6.
2. Analysis Error Variance Formulations for OneDimensional Cases
2.1. Error Variance Reduction Produced by a Single Observation
When observations are optimally analyzed in terms of the Bayesian estimation (see chapter 7 of Jazwinski [10]), the background state vector b is updated to the analysis state vector a with the following analysis increment: and the background error covariance matrix B is updated to the analysis error covariance matrix A according towhere R is the observation error covariance matrix, is the innovation vector (observation minus background in the observation space), is the observation vector, denotes the observation operator, and is the linearized . For a single observation, say, at in the onedimensional space of , the inverse matrix reduces to , so the th diagonal element of A in (1b) is simply given by where , (or ) is the background (or observation) error variance, is the background error correlation function, denotes the th point in the discretized analysis space , and is the number of grid points over the analysis domain. The length of the analysis domain is , where is the analysis grid spacing and is assumed to be much larger than the background error decorrelation length scale .
Note that is a continuous function of , so (2) can be written into also as a continuous function of , whereis the error variance reduction produced by analyzing a single observation at . The error variance reduction in (3) decreases rapidly as increases, and it becomes much smaller than it peak value of at as increases to . This implies that the error variance reduction produced by analyzing sparsely distributed coarseresolution observations can be estimated by properly combining the error variance reduction computed by (3) for each coarseresolution observation as a single observation. This idea is explored in the following three subsections for onedimensional cases with successively increased complexity and generality: from uniformly distributed coarseresolution observations with periodic extension to nonuniformly distributed coarseresolution observations without periodic extension.
2.2. Uniform CoarseResolution Observations with Periodic Extension
Consider that there are coarseresolution observations uniformly distributed in the above analysis domain of length with periodic extension, so their resolution is . In this case, the error variance reduction produced by each observation can be considered as an additional reduction to the reduction produced by its neighboring observations, and this additional reduction is always smaller than the reduction produced by the same observation but treated as a single observation. This implies that the error variance reduction produced by analyzing the coarseresolution observations, denoted by , is bounded above by ; that is, where denotes the summation over for the observations. The equality in (4) is for the limiting case of only. The inequality in (4) implies that the domainaveraged value of is larger than the true averaged reduction estimated by , where is the domainaveraged analysis error variance estimated by the spectral formulation for onedimensional cases in Section 2.2 of Xu et al. [8].
The domainaveraged value of can be computed by where denotes the integration over the analysis domain, denotes the summation over for the grid points, and is used in the last step. By extending with the analysis domain periodically, can be also estimated analytically as follows:where denotes the integration over the infinite space of , = = = is used in the second to last step, and is used with in the last step. For the doubleGaussian form of used in (5) of Xu et al. [8], we have . The analytically derived value in (5b) is very close to (slightly larger than) the numerically computed value from (5a). With the domainaveraged value of adjusted from to , can be estimated byThe analysis error variance, , is then estimated by
As shown by the example in Figure 1 (in which = 110.4 km and = 10 so km is close to km), the estimated in (7) has nearly the same spatial variation as the benchmark that is computed precisely from (1b), although the amplitude of spatial variation of , defined by , is slightly smaller than that of the true , defined by . As shown in Figure 2, the amplitude of spatial variation of benchmark decreases rapidly to virtually zero and then exactly zero (or increases monotonically toward its asymptotic upper limit of m^{2} s^{−2}) as decreases to 0.5 and then to (or increases toward ∞), and this decrease (or increase) of the amplitude of spatial variation of with is closely captured by the amplitude of spatial variation of the estimated as a function of .
Using the estimated in (7), the previously estimated analysis error covariance matrix, denoted by with its th element obtained from the spectral formulations, can be modified into , , or with its th element given byThe formulation in (8a) is conventional, as in of Purser et al. [2] or originally of Rutherford [11], in which the covariance is modified by applying separately to each entry (indexed by and ) of to retain the selfadjointness. The second equation in (8a) shows that the conventional approach can be viewed alternatively as plus a correction term, the last term in (8a). Ideally, the correction term should completely offset the deviation of from the true covariance, but the correction term in (8a) offsets only a part of the deviation.
For the case in Figure 1, the benchmark analysis error covariance matrix, denoted by , is computed precisely from (1b) and is plotted in Figure 3, while the deviations of , , , and from the benchmark are shown in Figures 4(a), 4(b), 4(c), and 4(d), respectively. As shown, the deviation becomes increasingly small when is modified successively to , , and . Note that the correction term in (8a) is modulated by . This modulation has a chessboard structure, while the desired modulation revealed by the tobecorrected deviation of in Figure 4(a) has a banded structure (along the direction of + = constant, perpendicular to the diagonal line). This explains why the correction term in (8a) offsets only a part of the deviation as revealed by the deviation of in Figure 4(b). On the other hand, the correction term in (8b) is modulated by . This modulation not only retains the selfadjointness but also has the desired banded structure, so the correction term in (8b) is an improvement over that in (8a), as shown by the deviation of in Figure 4(c) versus that of in Figure 4(b). However, as revealed by Figure 4(c), the deviation of still has two significant maxima (or minima) along each band on the two sides of the diagonal line of = , while the tobecorrected deviation of in Figure 4(a) has a single maximum (or minimum) along each band. This implies that the function form of is not sufficiently wide for the correction. As a further improvement, this function form is widened to for the correction term in (8c), so the deviation of in Figure 4(d) is further reduced from that of in Figure 4(c).
(a)
(b)
(c)
(d)
When an estimated is used to update the background error covariance in the second step for analyzing the highresolution observations in the nested domain, the accuracy of the secondstep analysis depends not only, to a certain extent, on the number of iterations performed by the minimization algorithm but also on the accuracy of the estimated over the nested domain plus its extended vicinities within the distance of outside the nested domain. Here, is the decorrelation length scale of defined by according to (4.3.10) of Daley [12], and (=4.45 km for the case in Figures 1 and 3) can be easily computed as a byproduct from the spectral formulation. Over this extended nested domain, the relative error (RE) of the estimated with respect to the benchmark can be measured by where denotes the unit matrix in the subspace associated with the grid points in the extended nested domain and thus (or ) is the submatrix of (or A) associated only with the grid points in the extended nested domain and denotes the Frobenius norm of () defined by the square root of the sum of the squared absolute values of the elements of the matrix in () according to (2.2–4) of Golub and Van Loan [13]. The REs of , , and can be measured by the same form of Frobenius norm ratio as that defined for in (9). The REs of , , , and are computed for the case in Figure 1 and listed in the first column of Table 1. As shown by the listed values, the RE becomes increasingly small when is modified successively to , , and , and this is consistent with and also quantifies the successively reduced deviation shown in Figures 4(a)–4(d).

2.3. Nonuniform CoarseResolution Observations with Periodic Extension
Consider that the coarseresolution observations are now nonuniformly distributed in the analysis domain of length with periodic extension, so their averaged resolution can be defined by . The spacing of a concerned coarseresolution observation, say the th observation, from its right (or left) adjacent observation can be denoted by (or ). Now we can consider the following two limiting cases.
First, we consider the case of with (or with ). In this case, the concerned th observation collapses onto the same point with its right (or left) adjacent observation, that is, the th [or th] observation. The two collapsed observations should be combined into one superobservation with a reduced error variance from to . The error variance reduction produced by this superobservation still can be estimated by (3) but withOn the other hand, without superObbing, the error variance reduction produced by the two collapsed observations will be overestimated by (3) with By comparing (10b) with (10a), it is easy to see that this overestimation can be corrected if the error variance is inflated from to for each of the two collapsed observations.
Then, we consider the case of and . In this case, the concerned th observation collapses with its two adjacent observations, that is, the th and th observations. The three collapsed observations should be combined into one superobservation with a reduced error variance from to . The error variance reduction produced by this superobservation still can be estimated by (3) but withOn the other hand, without superObbing, the error variance reduction produced by the three collapsed observations will be overestimated by (3) with By comparing (10d) with (10c), it is easy to see that this overestimation can be corrected if the error variance is inflated from to for each of the three collapsed observations.
Based on the above analyses, when the error variance reduction produced by the th observation is estimated by (3), the error variance should be adjusted for this observation unless . In particular, its error variance can be adjusted from to with given empirically by Note that for = 0, so the adjusted error variance is which recovers the result derived from (10c)(10d). Note also that for and (or and ), so the adjusted error variance is which recovers the result derived from (10a)(10b). Clearly, for = = , = 0, so is not adjusted which recovers the result for uniformly distributed coarseresolution observations.
The above results suggest that should be modified into This modification can improve the similarity of the spatial variation of to that of the true error variance reduction, denoted by , but the maximum (or minimum) of , denoted by (or ), is usually not very close to that of . The maximum (or minimum) of can be closely estimated by (or ), the maximum (or minimum) of computed by (6) for uniform coarseresolution observations but with decreased to (or increased to ), where (or ) is the minimum (or maximum) spacing between two adjacent observations among all nonuniformly distributed coarseresolution observations in the onedimension analysis domain. By adjusting to and to , the error variance reduction can be estimated by where .
The analysis error variance is then estimated by as in (7), except that is computed by (12a) instead of (6). As shown by the example in Figure 5, the estimated captures closely not only the maximum and minimum but also the spatial variation of the benchmark computed from (1b). Using this estimated , the previously estimated from the spectral formulation can be modified into , , or with its th element given by the same formulation as shown in (8a), (8b), or (8c). For the case in Figure 5, the benchmark is plotted in Figure 6, while the deviations of , , , and from the benchmark are shown in Figures 7(a), 7(b), 7(c), and 7(d), respectively. As shown, the deviation becomes increasingly small when the estimated analysis error covariance matrix is modified successively to , , and .
(a)
(b)
(c)
(d)
As explained in Section 2.2, the accuracy of the secondstep analysis depends on the accuracy of the estimated over the extended nested domain (i.e., the nested domain plus its extended vicinities within the distance of on each side outside the nested domain), while the latter can be measured by the smallness of the RE of the estimated A with respect to the benchmark , as defined for in (9). The REs of , , , and computed for the case in Figure 5 are listed in the first column of Table 2. As listed, the RE becomes increasingly small when is modified successively to , , and , which quantifies the successively reduced deviation shown in Figures 7(a)–7(d).

2.4. Nonuniform CoarseResolution Observations without Periodic Extension
Consider that the coarseresolution observations are still nonuniformly distributed in the onedimensional analysis domain of length but without periodic extension. In this case, their produced error variance reduction still can be estimated by (12a) except for the following three modifications.
(i) The maximum (or minimum) of , that is, (or ), should be found in the interior domain between the leftist and rightist observation points.
(ii) For the leftist (or rightist) observation that has only one adjacent observation to its right (or left) in the onedimensional analysis domain, its error variance is still adjusted from to but is calculated by setting [or () = 0] in (11a) for calculating in (11b).
(iii) Note from (12a) that and thus as moves outward far away from the leftist (or rightist) measurement point and thus far away from all the observations points. In this case, if (as for the case in this section), then estimated by in (12a) may become unrealistically negative as moves outward beyond the leftist (or rightist) measurement point, denoted by . To avoid this problem, (12a) is modified into where is a factor defined byIt is easy to see from (12b) that for and thus , as , so the estimated in (12b) can never become unrealistically negative.
The analysis error variance is estimated by as in (7), except that is computed by (12a) [or (12b)] for within (or beyond) the interior domain. As shown by the example in Figure 8, the estimated captures closely the spatial variation of the benchmark not only within but also beyond the interior domain. Using this estimated , can be modified into , , or with its th element given by the same formulation as shown in (8a), (8b), or (8c). For the case in Figure 8, the benchmark (not shown) has the same interior structure (for interior grid points and ) as that for the case with periodic extension in Figure 6, but significant differences are seen in the following two aspects around the four corners (similar to those seen from Figures 7(a) and 11(a) of Xu et al. [8]). (i) The element value becomes large toward the two corners along the diagonal line (which is consistent with the increased analysis error variance toward the two ends of the analysis domain as shown in Figure 8 in comparison with that in Figure 5). (ii) The element value becomes virtually zero toward the two offdiagonal corners (because there is no periodic extension). The deviations of , , , and from the benchmark are shown in Figures 9(a), 9(b), 9(c), and 9(d), respectively, for the case in Figure 8. As shown, the deviation becomes increasingly small when the estimated analysis error covariance matrix is modified successively to , , and . The REs of , , , and are listed in the first column of Table 3. As listed, the RE becomes increasingly small when is modified successively to , , and , which quantifies the successively reduced deviation shown in Figures 9(a)–9(d).

(a)
(b)
(c)
(d)
3. Analysis Error Variance Formulations for TwoDimensional Cases
3.1. Error Variance Reduction Produced by a Single Observation
For a single observation, say, at in the twodimensional space of , the inverse matrix in (1b) also reduces to , so the th diagonal element of is given by the same formulation as in (2) except that (or ) is replaced by (or ). Here, denotes the th point in the discretized analysis space with , (or ) is the number of analysis grid points along the (or ) direction in the twodimensional analysis domain. The length (or width) of the analysis domain is (or ) and is assumed to be much larger than the background error decorrelation length scale in x, where (or ) is the grid spacing in the (or ) direction and is assumed for simplicity.
Since is a continuous function of x, the aforementioned formulation for the th diagonal element of can be written into also as a continuous function of x, where is the error variance reduction produced by analyzing a single observation at . This reduction decreases rapidly and becomes much smaller than it peak value of at as increases to and beyond.
3.2. Uniform CoarseResolution Observations with Periodic Extension
Consider that there are coarseresolution observations uniformly distributed in the above analysis domain of length and width with periodic extension along and , so their resolution is /, where , (or ) denotes the number of observations uniformly distributed along the (or ) direction in the twodimensional analysis domain, and is assumed (so ). In this case, as explained for the onedimensional case in Section 2.2, the error variance reduction produced by each observation can be considered as an additional reduction to the reduction produced by its neighboring observations. This additional reduction is smaller than the reduction produced by a single observation, so the error variance reduction produced by analyzing the coarseresolution observations is bounded above by , which is similar to that for the onedimensional case in (4). For the same reason as explained for the onedimensional case in (4), this implies that the domainaveraged value of is larger than the true averaged reduction estimated by , where is the domainaveraged analysis error variance estimated by the spectral formulation for twodimensional cases in Section 2.3 of Xu et al. [8].
The domainaveraged value of can be computed by where denotes the integration over the twodimensional analysis domain, denotes the summation over for the grid points, and is used in the last step. By extending with the analysis domain periodically in both the and directions, can be estimated analytically as follows:where denotes the integration over the entire space of x, is used in the second to last step, and is used with in the last step. For the doubleGaussian form of used in Section 4 of Xu et al. [8], we have . The derived value in (15b) is very close to the numerically computed value from (15a).
With the domainaveraged value adjusted from to , can be estimated by the same formulation as in (6) except that is replaced by x. The analysis error variance is then estimated by As shown by the example in Figure 10, the estimated in (16) is very close to the benchmark computed precisely from (1b), and the deviation of from the benchmark is within (−0.21, 0.35) m^{2}s^{−2}. On the other hand, the constant analysis error variance ( = 6.7 m^{2}s^{−2}) estimated by the spectral formulation deviates from the benchmark widely from −1.91 to 2.22 m^{2}s^{−2}.
(a)
(b)
Using the estimated in (16), the previously estimated analysis error covariance matrix, denoted by with its th element obtained from the spectral formulation, can be modified into , , or with its th element given by the same formulation as in (8a), (8b), or (8c) except that (or ) is replaced by (or ). Again, as explained in Section 2.2 but for the twodimensional case here, the accuracy of the secondstep analysis depends on the accuracy of the estimated over the extended nested domain, that is, the nested domain plus its extended vicinities within the distance of outside the nested domain. Here, is the decorrelation length scale of defined by according to (4.3.12) of Daley [12], and (=4.52 km for the case in Figure 10) can be easily computed as a byproduct from the spectral formulation. Over this extended nested domain, the relative error (RE) of each estimated A with respect to the benchmark A computed precisely from (1b) can be defined in the same way as that for in (9), except that the extended nested domain is twodimensional here. The REs of , , , and computed for the case in Figure 10 are listed in the first column of Table 4. As listed, the RE becomes increasingly small when is modified successively to , , and .

3.3. Nonuniform CoarseResolution Observations with Periodic Extension
Consider that the coarseresolution observations are now nonuniformly distributed in the analysis domain of length and width with periodic extension, so their averaged resolution can be defined by . The spacing of a concerned coarseresolution observation, say the th observation, from its th adjacent observation (among the total 4 adjacent observations), can be denoted by . Now we can consider the limiting case of for (≤4) adjacent observations with for the remaining (≥0) adjacent observations. In this case, the concerned th observation collapses onto the same point with its adjacent observations. The collapsed observations should be combined into one superobservation with a reduced error variance from to . The error variance reduction produced by this superobservation still can be estimated by (14) but withOn the other hand, without superObbing, the error variance reduction produced by the collapsed observations will be overestimated by (14) with By comparing (17b) with (17a), it is easy to see that this overestimation can be corrected if the error variance is inflated from to for each of the collapsed observations.
Based on the above analyses, when the error variance reduction produced by the concerned th observation is estimated by (14), the error variance should be adjusted for this observation unless for , and 4. In particular, can be adjusted to with given empirically by where denotes the summation over for the four adjacent observations nearest to the concerned th observation. With given by (18a), the adjusted recovers not only the inflated observation error variance derived above for each limiting case [with for (≤4) and for the remaining (≥0) observations] but also the original observation error variance for uniformly distributed coarseresolution observations.
The above results suggest that should be modified into This modification can improve the similarity of the spatial variation of to that of the true error variance reduction, denoted by , but the maximum (or minimum) of , denoted by (or ), is usually not very close to that of . The maximum (or minimum) of can be estimated by (or ), the maximum (or minimum) of computed for uniformly distributed observations but with decreased to (or increased to ), where (or ) is the minimum (or maximum) spacing of adjacent observations among all nonuniformly distributed coarseresolution observations in the twodimension analysis domain. Specifically, (or ) is estimated by with and is estimated by with , where denotes the th observation point, denotes the observation point that is th nearest to , min_{m} (or max_{m}) denotes the minimum (or maximum) over index for all the coarseresolution observation points in the twodimension analysis domain, denotes the summation over from 1 to , and is the total number of adjacent observation points (nearest to ) used for estimating (with ) or (or ). By adjusting to and to , the error variance reduction can be estimated by where .
The analysis error variance is then estimated by as in (16), except that is computed by (19a). As shown by the example in Figure 11, the estimated (x) is fairly close to the benchmark , and the deviation of from the benchmark (x) is within (−2.40, 4.20) m^{2}s^{−2}. On the other hand, the constant analysis error variance ( m^{2}s^{−2}) estimated by the spectral formulation deviates from the benchmark widely from −9.98 to 3.83 m^{2}s^{−2}. Using this estimated , the previously estimated from the spectral formulation can be modified into , , or with its th element given by the same twodimensional version of (8a), (8b), or (8c) as explained in Section 3.2. The REs of , , , and computed for the case in Figure 11 are listed in the first column of Table 5. As listed, the RE becomes increasingly small when is modified successively to , , and .

(a)
(b)
3.4. Nonuniform CoarseResolution Observations without Periodic Extension
Consider that the coarseresolution observations are still nonuniformly distributed in the analysis domain of length and width but without periodic extension. In this case, their averaged resolution is still defined by . To estimate their produced error variance reduction, we need to modify the formulations constructed in the previous subsection with the following preparations. First, we need to identify four nearcorner observations among all the coarseresolution observations. Each nearcorner observation is defined as the one that nearest to one of the four corners of the analysis domain. Then, we need to identify (or ) nearboundary observations associated with each boundary (or boundary), where (or ) is estimated by the nearest integer to (or ). The total number of nearboundary observations is thus given by