Advances in Meteorology

Volume 2018, Article ID 9310838, 7 pages

https://doi.org/10.1155/2018/9310838

## Research on Fusing Multisatellite Soil Moisture Data Based on Bayesian Model Averaging

^{1}School of Information Engineering, East China Jiaotong University, Nanchang 330013, China^{2}National Meteorological Information Center, Beijing 100081, China

Correspondence should be addressed to Shan Wang; moc.361@nahs_kcirtap

Received 12 February 2018; Revised 13 May 2018; Accepted 15 May 2018; Published 25 June 2018

Academic Editor: Jifu Yin

Copyright © 2018 Shan Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Soil moisture (SM) is an important physical quantity that can reflect the land surface condition. There are many ways to measure SM, satellite microwave remote sensing is now considered the primary method because it can provide real-time high-resolution data. However, SM data obtained by satellite remote sensing exhibit certain deviation compared with reference data obtained from ground stations. To improve the accuracy of SM forecasts, this study proposed the use of a Bayesian model averaging (BMA) method to integrate multisatellite SM data. First, China was divided into eight regions. Then, SM data observed by satellites (FY3B, SMOS, and WINDSAT) were fused using the BMA method and a traditional averaging method. Finally, SM data were predicted using data from ground observation stations as a reference standard. Following the fusion process, three parameters (standard deviation, correlation coefficient, and root mean square deviation) were used to evaluate the fusion results, which revealed the superiority of the BMA method over the traditional averaging method.

#### 1. Introduction

Soil moisture (SM) is an important parameter in surface climate and numerical weather forecasting, hydrological forecasting, agricultural drought monitoring, and the predictions of land surface models in the field of research. Therefore, obtaining high-quality SM data is important with regard to these various activities [1–3]. There are many ways to obtain SM data, such as ground-based observation and satellite microwave remote sensing [4]. Currently, because the distribution of ground observation stations is sparse, satellite microwave remote sensing has become the primary method to obtain real-time high-resolution SM data. Microwave sensors mounted on satellites can obtain large quantities of information on shallow SM. This information has been used in research regarding the validation of the inversion of satellite-derived SM products, as well as in applications of land surface models and numerical forecasting models [5, 6].

Since 1978, several satellite-borne active and passive microwave sensors have been launched successfully by China and other countries. For example, microwave sensors for detecting SM have been deployed on Aqua/AMSR-E, Coriolis/WINDSAT, MetOp-A/ASCAT, SMOS/MIRAS, and FY3B/MWRI satellites. However, because of the influence of land surface roughness, vegetation, and the inversion algorithm adopted, SM data derived from satellite remote sensing often show certain deviation [7]. Therefore, how to integrate multisource remote sensing satellite data to achieve better results is a topic worth pondering. The traditional method of fusing SM data takes the average values of the observational data of each satellite for the fusion results. In other words, each satellite model is assigned the same weighting. However, in practice, each satellite model behaves differently in different regions [8]. Taking China as an example, SM derived by FY3B shows two extreme states: moisture in the humid areas of northern and southern China and dryness in arid northwestern areas. SM derived by SMOS is mostly dry in most parts of the country, and differences in its spatial distribution are not obvious. The soil moisture in WINDSAT is relatively dry in the whole country, but it is relatively moist in the middle and lower reaches of the Yangtze River and in the Northeast [9]. In addition, the traditional fusion method does not incorporate observational data from ground-based stations.

The objective of this study was to improve the accuracy of SM forecasts using a Bayesian model averaging (BMA) method to obtain time-varying weights. The a posteriori probability weights assigned to each sensor model in a specific period reflect the inherent uncertainty of each sensor model. In the early stages of the development of the BMA method, there have been some related research applications [10, 11]. The first application of the BMA method was to calibrate forecast ensembles, where the sea level pressure and other weather variables obeyed Gaussian distributions [12]. In 2007, the BMA method was applied to the variable of precipitation, which did not obey a Gaussian distribution [13]. In 2013, the BMA method was applied to the prediction of daily mean temperature in the Huaihe River Basin, China. A BMA probabilistic forecasting model for each site in the basin was established dynamically using the regional forecasting techniques of the TIGGE multimodel super ensemble forecasting system [14].

The average daily soil volumetric water content of 376 sites of the automatic soil moisture observation station (ASM) in 2012 was used in this study as a reference standard. This paper mainly studies the surface soil moisture data (0–10 cm). Three passive microwave remote sensing satellites (FY3B, SMOS, and WINDSAT) are used for satellite data [15, 16]. SM data observed by satellites were fused by the BMA method. The remainder of this paper is organized as follows. Section 2 introduces both the theory of the BMA method and the algorithm adopted. In Section 3, the method for evaluating the BMA model is described. Section 4 presents the original data and the SM forecast values adopting each region as the research object. Finally, our conclusions are stated in Section 5.

#### 2. Principles of the BMA Method

##### 2.1. Bayesian Model Averaging Method

Bayesian theory provides a set of ideas based on probabilistic statistical methods applicable to the fusion of information from different sources. The Bayesian formula is defined as

The BMA method is a statistical postprocessing method based on the Bayesian theory that uses a combination of multiple statistical models to produce a prediction. If is the forecast variable (the SM data after fusion), is the prediction result of possible models (the prediction result of satellites), and is the training data (the prereal satellite remote sensing SM observational data), the BMA prediction model can be written as follows:where is the weight value of BMA, which represents the prediction value of the satellite model as the a posteriori probability of the optimal prediction result. Greater weights mean higher accuracy of the prediction result, and the sum of all model weights is 1. The term represents the probability density function (PDF) of the predicted variable for a given sample and model condition. For each possible model, it is necessary only to consider the proportion that it occupies throughout the prediction process.

The synthetic prediction of variable based on the BMA method is based on using the probability as the weight, and the PDF of all the models is weighted to realize the probability prediction of the variables. In the study of satellite remote sensing of SM data, can be regarded as a normal distribution function, and its prediction is expected to be a simple linear function of the single prediction result. The variance is , and where and can be calculated by the linear regression method. Therefore, we can obtain the expectation of the BMA forecast as

At this point, the predicted value is a definite value, which can be compared with the predicted value of the single model. We denote space and time by subscripts and , such that denotes the forecast in the ensemble for location and time [12, 17]:

The solution of parameter and in the upper model is the key. First, the maximum likelihood estimation method is used to determine the maximum value and then the expectation-maximization (EM) algorithm is used to solve the problem (Section 2.2 for details of the specific algorithms).

##### 2.2. Parameter Estimation

The EM algorithm is a method for obtaining maximum likelihood estimations of parameters [18, 19]. Raftery et al. [12] proposed a new method to solve the weights and the variance using the EM algorithm to solve the case where the forecast variable obeys a normal distribution. The EM algorithm is iterative, and it alternates between two steps: the E (or expectation) step and the M (or maximization) step [20]. The two steps of the EM algorithm are as follows:

First, the weights and variances are initialized:

Its logarithmic likelihood function is

*E step*: for each , replace it with and calculate

*M step*: its weight value is

We can then obtain the variance

The above steps are repeated to update the iteration, constantly optimizing the parameter value and checking for convergence. The iteration continues until convergence is achieved.

#### 3. Evaluation Criteria

The SM data obtained from the automatic ground observation stations were used as the reference standard, and the multisatellite SM data were recorded as , the mean value of which was . The calibrated SM data were recorded as , the mean value of which was . To evaluate the SM data, we introduce three parameters to measure the calibration results, which are the standard deviation (), correlation coefficient (), and root mean square deviation (), using the SM data from the ground observation stations as the reference standard. For ground-observed SM data and satellite-derived SM data, the SD has the following formula:

The value of between the corrected SM data and the reference SM data can be calculated as follows:and the can be calculated as

If either the or the is small or the is closer to 1, the better result we will get. A Taylor diagram can characterize these three parameter values and illustrate the results more clearly (Section 4).

#### 4. Regional SM Data Fusion Results

Unlike some studies, the SM data of each province were taken as the research object. However, it is more appropriate that the overall area was divided into several regions based on characteristics of drought and flooding. Zhu [21] used the rotary empirical orthogonal function to divide eastern and western regions of China into seven drought and flood areas. In this study, China was divided into eight regions, each of which was covered equally by three satellites. In the case of Northeast China, for example, we randomly selected 9/10 of all the data for training purposes. According to the Bayesian model, the EM algorithm was used to derive the weights of each satellite, and then the remaining 1/10 data were treated using the BMA method and the traditional averaging method to obtain the weights of the fusion and to draw the Taylor diagrams for comparison. Figures 1 and 2 depict the experimental results of the training for the Northeast region. Figure 1 describes the distribution of raw data in.