Abstract

Assimilating observations to a land surface model can further improve soil moisture estimation accuracy. However, assimilation results largely rely on forecast error and generally cannot maintain a water budget balance. In this study, shallow soil moisture observations are assimilated into Common Land Model (CoLM) to estimate the soil moisture in different layers. A proposed forecast error inflation and water balance constraint are adopted in the Ensemble Transform Kalman Filter to reduce the analysis error and water budget residuals. The assimilation results indicate that the analysis error is reduced and the water imbalance is mitigated with this approach.

1. Introduction

As a key component of land underlying surface, soil plays an essential role in the exchange of energy, heat, water vapor and carbon dioxide between the land surface and atmosphere [1]. Soil moisture is an important land surface variable that can directly affect vegetation respiration, transpiration, and various chemical reactions. These actions will ultimately result in the redistribution of surface energy and water by changing the surface albedo, soil heat capacity, surface evaporation, and vegetation growth [2, 3].

Many studies have investigated soil moisture assimilation in different locations and on different time scales. For instance, a study assimilating the near-surface observations focused on the effect of ensemble size and observation depth [4]. The dual-based data assimilation system for assimilating satellite observations of soil moisture reproduces the temporal evolution of daily soil moisture, especially under freezing conditions [5]. The assimilation of satellite data to a land surface model can further improve soil moisture estimation accuracy [69], which can also interactively or simultaneously estimate model parameters and surface variables [10, 11].

Generally speaking, soil moisture estimation is mainly from land surface model simulation and observation. Land surface models (e.g., Common Land Model, CoLM) [1215], which use observations of surface meteorological conditions and precipitation, can characterize the exchange of energy and water within the surface layer. Soil moisture simulations from land surface models can provide temporally and spatially continuous series, but limited to the specification of the forcing variables, uncertain model parameters describing the exchange of energy and water with the atmosphere and models, and imperfect model physical processes [16].

The advantage of observation is that it can obtain the value of the observed variable on the represented time and space [17]. However, in situ soil moisture observations can only represent a very spatially limited range and cannot meet the needs of regional studies due to the heterogeneity of surface variables [18, 19]. Satellite remote sensing data are only roughly available at shallow layers and cannot provide complete soil moisture profiles [20, 21].

Based on observations and model simulations, land surface data assimilation can provide an analysis of soil moisture with less uncertainty and better coverage. The assimilation results are largely limited to meteorological forcing error, which is one of the error sources in model simulation. Therefore, forcing data with long term and high precision are very important for improving the simulation capability of land surface models [2224]. Many studies use reanalysis data directly or corrected reanalysis data with monthly observations [25]. However, observations from approximately 200 stations were only offered by China for international exchange to construct such datasets. Therefore, their quality over China mainland is questionable. Recently, a new China mainland forcing field (including near-surface air temperature, relative humidity, surface pressure, wind speed, precipitation, downward shortwave radiation, and longwave radiation) is constructed by fusing hourly observation, reanalysis, and remote sensing data [26], which is evaluated in this study.

On the other hand, the assimilation results crucially rely on the estimation accuracy of the forecast and observation error covariance matrices for any assimilation methods [27, 28]. The Ensemble Transform Kalman Filter (ETKF) is a commonly used assimilation method, that has been widely studied and applied in atmospheric science research [29, 30]. In ETKF, the forecast perturbations are transformed into analysis perturbations by multiplying a transformation matrix and the eigenvector decomposition of a matrix of the ensemble size is used to construct the transform matrix [31, 32]. Past research on ensemble based land data assimilation methods have found that the ensemble spread decreases during model integration because the ensemble forecasts use the same boundary and meteorological forcing. In ETKF, the forecast error is initially estimated as the perturbed forecast states minus their ensemble mean. The sampling errors in such estimations, resulting from the limited ensemble size as well as the poor initial perturbations and model error, can generally lead to an underestimate of the forecast error covariance matrix and can eventually result in filter divergence [33, 34]. Therefore, using inflation techniques to address the underestimation of forecast error becomes increasingly important [35].

Basically, there are two approaches for estimating the multiplicative inflation factors based on the innovation statistic: the moment estimation [36, 37] and the maximum likelihood estimation [3840] which is applied in this study. Liang et al. [38] compared the two methods using Lorenz-96 model as a test bed and found out that the maximum likelihood approach leads to a smaller analysis error than the moment estimation approach (in their Figure ). This is a reason for us to use maximum likelihood estimation in this study. Moreover, the −2 log-likelihood can be used as the cost function to estimate both inflation factor and threshold layer, while this cannot be done if the moment estimation is applied. The effect of forecast error inflation is investigated for reducing the analysis error.

In addition to reducing the analysis error in land surface data assimilation, obtaining a balanced or closed water budget is extremely important for analyzing the terrestrial hydrological cycle. The terrestrial water budget simulated by land surface models (LSMs) is a key process of the global hydrologic cycle and an important part of the physical consistency [41]. The physical consistency (i.e., closure of the water budgets) can be maintained by constructing water balance constraint in land surface models [42]. A balanced or closed water budget is important for estimating runoff, developing and validating hydrological models, and better understanding land-atmosphere water exchange processes [43]. However, state updating in land surface data assimilation generally degrades the water budget balance, because the analysis increment compensates for system biases or errors but does not conserve water. If the degree of water imbalance is excessive, it is reasonable to develop a data assimilation system that can remove or reduce the imbalance of water. Therefore, the water balance constraint technique is often adopted in the assimilation procedure to compensate for this defect. Pan and Wood [42] derived a two-stage constrained Kalman filter solution in which the first stage is a traditional Kalman filter and the second stage imposes a water balance constraint in an optimal manner. Yilmaz et al. [43, 44] introduced a weak constraint solution to the conventional data assimilation systems to reduce the water budget imbalance. In this study, the water balance constraint is adopted for the ETKF assimilation scheme with forecast error inflation. The effect of the water balance constraint is investigated for reducing the water budget residual.

As a case study, the ETKF assimilation scheme with a water balance constraint is applied to the Automatic Weather Station (AWS) located in the Daxing district of Beijing (39°37N, 116°25E). The new forcing data are interpolated to the in situ observation station and the corresponding accuracy is evaluated. Then, the interpolated forcing data are used to drive CoLM, and the shallow soil moisture observations are assimilated to estimate the soil moisture in different layers. The analysis errors and water budget residuals are compared among the different schemes. The assimilation results can provide some experiences for satellite data assimilation for future research.

The rest of the paper is organized as follows. The model and data are introduced in Section 2. The modified ETKF schemes with forecast error inflation and a water balance constraint are described in Section 3. The assimilation results of the observation station are presented in Section 4. Lastly, Section 5 provides discussions and the conclusions are in Section 6.

2. Model and Data

2.1. Model

The development of Common Land Model (CoLM) dates back to the mid-1990s, and it is based on land surface model (LSM) [45]. The biosphere-atmosphere transfer scheme (BATS) [46] and the Institute of Atmospheric Physics land surface model (IAP 94) [47] also account for different physical processes, such as glaciers, lakes, wetlands, and dynamic vegetation. It can be described as a community effort and that has been widely adopted by multiple climate models. Many researchers have studied land state simulation and assimilation based on this generally used model. For instance, assimilating the MODIS-based albedo and snow cover fraction can improve snow depth simulation [48], and the estimation of water and energy fluxes during the growing season is useful for data-limited heterogeneous farmland [49].

The model performance has been validated in very extensive field data, including sites adopted by the Project for the Intercomparison of Land-Surface Parameterization Schemes and others [13]. There are 10 unevenly spaced soil layers (see Table 1), one vegetation layer, and 5 snow layers (depending on snow depth) in CoLM. The model parameters, including global terrain, elevation, land use/vegetation, a land-water mask, and soil types are available from the United States Geological Survey (USGS) at 30-arc-second resolution. The CoLM is set as the forecast model to simulate soil moisture in this study.

2.2. Site Data

The Automatic Weather Station (AWS) studied in this paper is located in the Daxing district of Beijing (39°37N, 116°25E; see Figure 1) and the dominant vegetation type in the area is corn/wheat. The observation items conclude air temperature/humidity/pressure, wind speed/direction, precipitation, radiation, soil temperature/moisture, and so on, and the collection frequency is every 10 minutes from 2008 to 2010. The depths of the soil moisture sensor are 5, 10, 20, 40, 60, and 100 cm. The data that are obviously beyond the range of physical possibility are rejected, and the gaps are filled using interpolation [50]. The shallow soil moisture observations (at 5 cm) are assimilated to CoLM in this study and the other soil moisture observations are used for validation.

2.3. Forcing Data

The atmospheric forcing data used to run CoLM mainly included the near-surface air temperature, relative humidity, surface pressure, wind speed, precipitation, downward shortwave radiation, and longwave radiation. The accuracy of atmospheric forcing data is very important to the simulation of the land surface model [22, 23, 51]. In many studies, global atmospheric forcing datasets have been mainly derived from reanalysis datasets, such as the ERA-interim dataset [52], the Climate Forecast System Reanalysis (CFSR) dataset [25], and the Princeton forcing dataset [53]. However, only observations from approximately 200 stations in China have been used in the international exchanges to construct such datasets. Therefore, their quality over China mainland is questionable. To compensate for this defect, a method of applying monthly observations to correct the reanalysis data is widely used, which may lead to insufficient accuracy at hourly scale and low spatial resolution.

Recently, a new technique is applied to construct an atmospheric forcing dataset for China mainland with a resolution of 1 × 1 km [26]. Firstly, a partial thin-plate smoothing spline with orography and reanalysis data as explanatory variables to ground-based observations is used to estimate the trend surface. Secondly, a simple kriging procedure is used to correct the residual. In Section 4, this dataset is accessed through realistic meteorological observation data.

3. Methodology

3.1. Forecast and Observation System

Using notations similar to those of Ide et al. [54], a nonlinear discrete-time forecast system is written aswhere is the time step index; is the n-dimensional state vector ( in CoLM); the superscripts “” and “” specify forecast and analysis respectively; and is the nonlinear forecast operator (CoLM in this study). The major objective of this study is to assimilate the shallow soil moisture observations to improve the estimation of the variables related to the water budget; that is, the 22-dimensional vector in which and are the 10-dimensional vectors specifying the soil moisture and the soil ice, respectively, at the 10 vertical levels shown in Table 1, and the units are volume percentage %  ; the scalars CWC and SWE specify the canopy water content and the snow water equivalent, respectively.

The observation system can be written aswhere is the soil moisture observations that are only available at the depth of 5 cm at 0000 UTC every day; is a 22-dimensional vector that linearly interpolates the soil moisture at depths of 2.8 cm and 6.2 cm to the depth of 5 cm; and is the observation error which is assumed to be statistically independent from the forecast error, is time-uncorrelated, and has mean zero and covariance .

3.2. ETKF with Forecast Error Inflation and Water Balance Constraint

To improve the assimilation efficiency, the forecast error inflation technique is usually applied to the assimilation procedure. In addition to reducing analysis error, land data assimilation of soil moisture generally degrades the water balance as a result of state updates. Following Yilmaz et al. [43, 44], the water budget balance residual at time can be expressed aswhere is the 22-dimensional unit vector weighted by 1/3600 to unify the dimensions, specifies precipitation from atmospheric forcing data, and and specify the diagnostic states of evapotranspiration and runoff. The water budget balance residual represents the difference of the soil moistures at the current time step and previous time step, considering precipitation, evapotranspiration, and runoff. The detailed procedure of ETKF assimilation with the proposed forecast error inflation and water balance constraint scheme is described as follows:

3.2.1. Forecast Step

Calculate the th perturbed forecast state at time aswhere is the th perturbed forecast state at time and is the th perturbed analysis state at time ; is the CoLM forced by the th perturbed atmospheric forcing data; and is the total number of ensemble members (select as 100 in this study).

The mean forecast state of the variables related to the water budget is defined asThe forecast error matrix and the forecast error covariance matrix are initially estimated asrespectively.

Then the forecast error matrix is inflated to the following form:and the forecast error covariance matrix is adjusted aswhere is a diagonal matrix with the diagonal elements and is the inflation layer that is selected by (11). For the soil moistures in the layers shallower than , is a scalar that inflates the forecast errors; the soil moisture in the layers deeper than are not updated, and then the corresponding elements of (i.e., the related to in the layer from to 10) are set to zero; the forecast errors of the water budget variables other than soil moistures are not inflated, and then the corresponding elements of (i.e., the related to , , and ) are set to unit. To cover the observations at 5 cm depth, the threshold layer should be larger than 3.

Because the observation and forecast errors are assumed to be statistically independent of each other, time-uncorrelated, and with mean zero and covariance and , the innovation statistic is normally distributed with mean zero and covariance . Therefore, the inflation factor can be estimated by minimizing the following −2 log-likelihood of the innovation statistic [38, 39]; that is,The total objective function value during the entire assimilation period iswhich is treated as a function of the threshold layer . For a given , the inflation factors are estimated by minimizing (10) and the value of is also calculated through (11). Then the threshold layer is selected as the smallest number such that is the smallest value among (). The forecast error inflation is conducted using the selected and the corresponding inflation factors .

3.2.2. Analysis Step

Calculate the analysis state aswhere is the weight vector, which is estimated by minimizing the following objective function with a weak water balance constraint [43]where is a 22-dimensional vector similar to vector but where components corresponding to the soil moisture in the layers deeper than are modified to zero, which is different from Yilmaz et al. [43, 44]. One has thatis the error variance of , and is the th perturbed realizations of . For the linear observation operator, the estimated weight vector had the following analytic form:where is the posterior variance of ; that is,

Then, calculate a perturbed analysis state aswhere is the th column of the matrix [31]. The analysis states of the variables other than those related to the water budget are set as the corresponding forecast states. Lastly, set and return to the Forecast Step for the next iteration. The complete assimilation procedure is shown in Figure 2.

3.3. Validation Statistics

To compare the estimated 10 layers of soil moisture with the 6 levels of soil moisture observations, a root-mean-square-error statistic is used:where is the linear interpolation operator that interpolates the soil moisture analysis to all of the depths of the observations and lev specifies the level of observations.

To test the effect of the forecast error inflation, a Chi-square statistic [55] is used in this study. The statistic at time is defined asIf the forecast and observation error covariance matrix are correctly estimated, the innovation statistic is normally distributed with mean zero and covariance ; then will follow a Chi-square distribution with the number of observations as its degrees of freedom [56]. The value of in this study can be easily calculated and should be close to 1. Therefore the inflation effect can be effectively evaluated.

To illustrate the water budget balance at a location, the water budget balance residual is calculated as follows:If an assimilation scheme maintains a water budget balance, the water balance residual should be close to 0.

4. Experimentation

4.1. Evaluation of the Atmospheric Forcing Data

In the following experiments at the Daxing Automatic Weather Station, the atmospheric forcing data with high precision are obtained from the observations. To evaluate the accuracy of interpolated forcing data (described in Section 2.3), CoLM is driven using the interpolated data, the CFSR data, and the ERA-interim data as the meteorological forcing for one year (from June 1 2009 to May 31 2010). The simulated results are then interpolated to the 6 observation layers and the corresponding RMSE are plotted in Figure 3. Because the CFSR data are used to estimate the trend surface when generating the interpolated forcing data, the latter performs better than the former in almost all 6 layers. In the shallow layers (5 cm and 10 cm) and deep layers (60 cm and 100 cm), the RMSE of the interpolated data is smaller than the ERA-interim data but is converse in the middle layers (20 cm and 40 cm). This may be due to the soil moisture in the deep layers fluctuating slower than in the middle and shallow layers, and also the interpolation errors in the shallow layers being smaller than in the middle and deep layers. On the whole, the interpolated data perform better than the CFSR data and the ERA-interim data. Whether this is a general principal or is site dependent will be validated further using more observations in the near future.

Next, the interpolated forcing data are used to drive CoLM from June 1 2009 to August 31 2010 to simulate the soil moisture at the station. The results in the first year are treated as spin-up and the last three months are used for the assimilation experiments. The time-mean values of observed and interpolated forcing data, as well as the corresponding RMSEs of the seven variables, are listed in Table 2. The simulated soil moistures using interpolated forcing data are interpolated to the observation layer and are compared with the soil moisture observations. The time-mean RMSEs for the 6 observation layers are listed in Table 3. Generally speaking, the interpolated forcing data are comparable with the observed forcing data, especially for near-surface air temperature and relative humidity. The simulated soil moisture using interpolated forcing data is different from the observed values but can be improved by assimilating the observation to CoLM, which is introduced in the last section.

4.2. Assimilation Results

In this section, the interpolated data are used to drive CoLM and the soil moisture observation at the depth of 5 cm at 0000 UTC is assimilated every day from June 1 2010 to August 31 2010. The four variables strongly related to soil moisture (near-surface air temperature, precipitation, downward shortwave radiation, and longwave radiation) are perturbed with a 5% error scale to generate the ensemble forecast states. The initial values used during the assimilation period are the model forecasts after one year’s spin-up. The assimilated soil moisture observations are plotted in Figure 4. The observation error scale is set as 0.5%. The following four ETKF assimilation settings are examined.ETKF: traditional ETKF without forecast error inflation and water balance constraint.ETKF-Inf: ETKF with forecast error inflation but without water balance constraint.WCETKF: ETKF with water balance constraint but without forecast error inflation.WCETKF-Inf: ETKF with forecast error inflation and water balance constraint.

The selected inflation layer is 5 for WCETKF-Inf and 6 for ETKF-Inf using the criterion of (11) described in Section 3.2.1. The estimated 10-layer soil moistures are interpolated to the 6 observation layers and the corresponding RMSEs of these assimilation schemes as well as the corresponding mean values are listed in Table 4. For all of the four different assimilation schemes, the RMSEs are generally smaller than those of the simulations (the second row in Table 3), except in separate layers. The mean RMSE of ETKF-Inf (3.88) is smaller than that of ETKF (4.87), whereas the mean RMSE of WCETKF-Inf (4.39) is smaller than that of WCETKF (6.02), indicating that the forecast error inflation can improve the assimilation accuracy. However, the mean RMSE of WCETKF (6.02) is larger than that of ETKF (4.87), whereas the mean RMSE of WCETKF-Inf (4.39) is larger than that of ETKF-Inf (3.88), indicating that the water balance constraint may result in assimilation accuracy loss to some extent.

The ensemble members of the ETKF-Inf scheme are plotted in Figure 5, which indicates the uncertainty of our analysis state to some extent. It shows that the ensemble spreads in the very shallow and deep layers are relatively small whereas those in the middle layers are slightly large, which is consistent with the fluctuation of the used forcing data. This may be due to the assimilated observation being in the shallow layer (at the depth of 5 cm) and the soil moisture in the deep layers fluctuating slower than in the shallow layers. Therefore the corresponding uncertainty in the very shallow and deep layers seems to be small, whereas that in the middle layers may be larger.

4.3. Water Budget Residual

The water balance constraint is conducted in this study to reduce the water budget residual. For the two assimilation schemes with inflating forecast error on selected inflation layers, the water budget residual (see (19)) of ETKF-Inf scheme is 0.1580 mm, whereas it is only 0.0662 mm of WCETKF-Inf scheme, indicating that the water balance constraint can reduce the water budget residual.

5. Discussions

5.1. Inflation

It is widely recognized that the initial estimate of ensemble forecast errors should be inflated to improve the assimilated results. In this study, a method for estimating the multiplicative inflation factors by minimizing the −2 log-likelihood of the innovation statistic is investigated. The forecast errors of soil moisture in the layers shallower than are inflated, while the threshold layer is selected by the criterion described in Section 3.2.1. To examine the effect of the select criterion, the assimilation experiments are conducted with the inflation layer from 3 to 10. The mean RMSE and the value of the objective function as a function of the inflation layer are plotted in Figure 6. For the assimilation schemes with and without a water balance constraint, the selected inflation layers are both consistent with the smallest mean RMSE, indicating that the selected criteria described in Section 3.2.1 are effective.

Another statistic to evaluate the effect of the forecast error inflation is the Chi-square statistic (see (20)) described in Section 3.3. For WCETKF-Inf and ETKF-Inf assimilation schemes with selected inflation layers (5 and 6, resp.), as well as the two assimilation schemes without a forecast error inflation (ETKF and WCETKF), the corresponding time series of from June 1 2010 to August 31 2010 are plotted in Figure 7. It shows that the values are remarkably close to 1 if the forecast error is inflated. In contrast, if the initially estimated forecast error (i.e., ) is used in the assimilation procedure, the values of depart extremely from 1. This indicates that the forecast error inflation method used in this study is effective for correctly estimating the error statistics.

The −2 log-likelihood function of the innovation statistic seems like a good objective function to quantify the goodness-of-fit of the forecast error covariance matrix. In fact, suppose that the forecast and observation errors are statistically independent of each other, time-uncorrelated, and normally distributed with zero mean and covariance and ; then the innovation statistic will also follow a normal distribution with mean zero and covariance . Therefore the inflation factor estimation method based on the −2 log-likelihood function will incorporate the information of the whole distribution other than that of the moment. In most cases of this study, the −2 log-likelihood function corresponds to a smaller RMSE.

5.2. Water Balance Constraint

Assimilation of hydrological observations (e.g., soil moisture) can improve the estimates of hydrological variables but generally results in water balance degradation. To compensate for this defect, a water balance constraint in the assimilation procedure becomes increasingly important.

To evaluate the effect of the water balance constraint in this study, the water balance residual (see (19)) is calculated for the inflation layer from 3 to 10. The results of ETKF-Inf and WCETKF-Inf assimilation schemes are plotted in Figure 8. The figure clearly shows that, for each scheme, the water balance residual increases as the inflation layer increases. However, for a fixed inflation layer, the water balance residual of the WCETKF-Inf assimilation scheme is smaller than that of the ETKF-Inf assimilation scheme. This illustrates that the water balance constraint is effective for maintaining the water balance and the inflation layer selection is very important.

Another problem that requires attention is that, for a fixed inflation layer, the mean RMSE of the WCETKF-Inf assimilation is usually larger than that of the ETKF-Inf assimilation scheme. In land data assimilation systems, the state updates usually produce a water budget imbalance residual. The analysis error may increase if a water balance constraint is conducted, which may not be experiment dependent but may be the general case. In fact, the cost function of the assimilation scheme with a water balance constraint is the summation of three terms that represent the forecast errors, observation errors, and the water budget imbalance, respectively. The corresponding assimilation results can be treated as a trade-off in reducing the analysis error and water budget residual. Although it may lead to the analysis error increasing, the water balance constraint can make the land surface model more consistent in the physical structure. A better understanding of the water budget can also help to facilitate model parameterization development, better understand the hydrological processes, improve our knowledge of land-atmosphere water exchange and related physical mechanisms and, therefore, help to improve our ability to develop models.

6. Conclusion

In this study, interpolated forcing data is evaluated and the observations in the shallow layer are assimilated to CoLM to better estimate soil moisture. The forecast error is inflated to improve the analysis state accuracy and the water balance constraint is adopted to reduce the water budget residual in the assimilation procedure. The experimental results illustrate that the adaptive forecast error inflation can reduce the analysis error, whereas the proper inflation layer can be selected based on the −2 log-likelihood function of the innovation statistic. The water balance constraint can result in a substantially reduced water budget residual, at a low cost of assimilation accuracy loss. The assimilation scheme can potentially be applied to assimilate the remote sensing data and we plan to do this in the near future.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

Acknowledgments

The datasets are provided by the Haihe Experiments of Professor Liu SM (http://westdc.westgis.ac.cn/haihe). This work is supported by the National Basic Research Program of China (Grants nos. 2012CB956203 and 2015CB953703), the National Natural Science Foundation of China (Grant no. 41405098), and the R&D Special Fund for Nonprofit Industry (Meteorology, Grant no. GYHY201206008).