Satellite Soil Moisture and Its ApplicationsView this Special Issue
Research on Fusing Multisatellite Soil Moisture Data Based on Bayesian Model Averaging
Soil moisture (SM) is an important physical quantity that can reflect the land surface condition. There are many ways to measure SM, satellite microwave remote sensing is now considered the primary method because it can provide real-time high-resolution data. However, SM data obtained by satellite remote sensing exhibit certain deviation compared with reference data obtained from ground stations. To improve the accuracy of SM forecasts, this study proposed the use of a Bayesian model averaging (BMA) method to integrate multisatellite SM data. First, China was divided into eight regions. Then, SM data observed by satellites (FY3B, SMOS, and WINDSAT) were fused using the BMA method and a traditional averaging method. Finally, SM data were predicted using data from ground observation stations as a reference standard. Following the fusion process, three parameters (standard deviation, correlation coefficient, and root mean square deviation) were used to evaluate the fusion results, which revealed the superiority of the BMA method over the traditional averaging method.
Soil moisture (SM) is an important parameter in surface climate and numerical weather forecasting, hydrological forecasting, agricultural drought monitoring, and the predictions of land surface models in the field of research. Therefore, obtaining high-quality SM data is important with regard to these various activities [1–3]. There are many ways to obtain SM data, such as ground-based observation and satellite microwave remote sensing . Currently, because the distribution of ground observation stations is sparse, satellite microwave remote sensing has become the primary method to obtain real-time high-resolution SM data. Microwave sensors mounted on satellites can obtain large quantities of information on shallow SM. This information has been used in research regarding the validation of the inversion of satellite-derived SM products, as well as in applications of land surface models and numerical forecasting models [5, 6].
Since 1978, several satellite-borne active and passive microwave sensors have been launched successfully by China and other countries. For example, microwave sensors for detecting SM have been deployed on Aqua/AMSR-E, Coriolis/WINDSAT, MetOp-A/ASCAT, SMOS/MIRAS, and FY3B/MWRI satellites. However, because of the influence of land surface roughness, vegetation, and the inversion algorithm adopted, SM data derived from satellite remote sensing often show certain deviation . Therefore, how to integrate multisource remote sensing satellite data to achieve better results is a topic worth pondering. The traditional method of fusing SM data takes the average values of the observational data of each satellite for the fusion results. In other words, each satellite model is assigned the same weighting. However, in practice, each satellite model behaves differently in different regions . Taking China as an example, SM derived by FY3B shows two extreme states: moisture in the humid areas of northern and southern China and dryness in arid northwestern areas. SM derived by SMOS is mostly dry in most parts of the country, and differences in its spatial distribution are not obvious. The soil moisture in WINDSAT is relatively dry in the whole country, but it is relatively moist in the middle and lower reaches of the Yangtze River and in the Northeast . In addition, the traditional fusion method does not incorporate observational data from ground-based stations.
The objective of this study was to improve the accuracy of SM forecasts using a Bayesian model averaging (BMA) method to obtain time-varying weights. The a posteriori probability weights assigned to each sensor model in a specific period reflect the inherent uncertainty of each sensor model. In the early stages of the development of the BMA method, there have been some related research applications [10, 11]. The first application of the BMA method was to calibrate forecast ensembles, where the sea level pressure and other weather variables obeyed Gaussian distributions . In 2007, the BMA method was applied to the variable of precipitation, which did not obey a Gaussian distribution . In 2013, the BMA method was applied to the prediction of daily mean temperature in the Huaihe River Basin, China. A BMA probabilistic forecasting model for each site in the basin was established dynamically using the regional forecasting techniques of the TIGGE multimodel super ensemble forecasting system .
The average daily soil volumetric water content of 376 sites of the automatic soil moisture observation station (ASM) in 2012 was used in this study as a reference standard. This paper mainly studies the surface soil moisture data (0–10 cm). Three passive microwave remote sensing satellites (FY3B, SMOS, and WINDSAT) are used for satellite data [15, 16]. SM data observed by satellites were fused by the BMA method. The remainder of this paper is organized as follows. Section 2 introduces both the theory of the BMA method and the algorithm adopted. In Section 3, the method for evaluating the BMA model is described. Section 4 presents the original data and the SM forecast values adopting each region as the research object. Finally, our conclusions are stated in Section 5.
2. Principles of the BMA Method
2.1. Bayesian Model Averaging Method
Bayesian theory provides a set of ideas based on probabilistic statistical methods applicable to the fusion of information from different sources. The Bayesian formula is defined as
The BMA method is a statistical postprocessing method based on the Bayesian theory that uses a combination of multiple statistical models to produce a prediction. If is the forecast variable (the SM data after fusion), is the prediction result of possible models (the prediction result of satellites), and is the training data (the prereal satellite remote sensing SM observational data), the BMA prediction model can be written as follows:where is the weight value of BMA, which represents the prediction value of the satellite model as the a posteriori probability of the optimal prediction result. Greater weights mean higher accuracy of the prediction result, and the sum of all model weights is 1. The term represents the probability density function (PDF) of the predicted variable for a given sample and model condition. For each possible model, it is necessary only to consider the proportion that it occupies throughout the prediction process.
The synthetic prediction of variable based on the BMA method is based on using the probability as the weight, and the PDF of all the models is weighted to realize the probability prediction of the variables. In the study of satellite remote sensing of SM data, can be regarded as a normal distribution function, and its prediction is expected to be a simple linear function of the single prediction result. The variance is , and where and can be calculated by the linear regression method. Therefore, we can obtain the expectation of the BMA forecast as
At this point, the predicted value is a definite value, which can be compared with the predicted value of the single model. We denote space and time by subscripts and , such that denotes the forecast in the ensemble for location and time [12, 17]:
The solution of parameter and in the upper model is the key. First, the maximum likelihood estimation method is used to determine the maximum value and then the expectation-maximization (EM) algorithm is used to solve the problem (Section 2.2 for details of the specific algorithms).
2.2. Parameter Estimation
The EM algorithm is a method for obtaining maximum likelihood estimations of parameters [18, 19]. Raftery et al.  proposed a new method to solve the weights and the variance using the EM algorithm to solve the case where the forecast variable obeys a normal distribution. The EM algorithm is iterative, and it alternates between two steps: the E (or expectation) step and the M (or maximization) step . The two steps of the EM algorithm are as follows:
First, the weights and variances are initialized:
Its logarithmic likelihood function is
E step: for each , replace it with and calculate
M step: its weight value is
We can then obtain the variance
The above steps are repeated to update the iteration, constantly optimizing the parameter value and checking for convergence. The iteration continues until convergence is achieved.
3. Evaluation Criteria
The SM data obtained from the automatic ground observation stations were used as the reference standard, and the multisatellite SM data were recorded as , the mean value of which was . The calibrated SM data were recorded as , the mean value of which was . To evaluate the SM data, we introduce three parameters to measure the calibration results, which are the standard deviation (), correlation coefficient (), and root mean square deviation (), using the SM data from the ground observation stations as the reference standard. For ground-observed SM data and satellite-derived SM data, the SD has the following formula:
The value of between the corrected SM data and the reference SM data can be calculated as follows:and the can be calculated as
If either the or the is small or the is closer to 1, the better result we will get. A Taylor diagram can characterize these three parameter values and illustrate the results more clearly (Section 4).
4. Regional SM Data Fusion Results
Unlike some studies, the SM data of each province were taken as the research object. However, it is more appropriate that the overall area was divided into several regions based on characteristics of drought and flooding. Zhu  used the rotary empirical orthogonal function to divide eastern and western regions of China into seven drought and flood areas. In this study, China was divided into eight regions, each of which was covered equally by three satellites. In the case of Northeast China, for example, we randomly selected 9/10 of all the data for training purposes. According to the Bayesian model, the EM algorithm was used to derive the weights of each satellite, and then the remaining 1/10 data were treated using the BMA method and the traditional averaging method to obtain the weights of the fusion and to draw the Taylor diagrams for comparison. Figures 1 and 2 depict the experimental results of the training for the Northeast region. Figure 1 describes the distribution of raw data in.
Figure 1 describes the distribution of raw data in Northeast China, including ground observation data and three satellite observations. The horizontal axis represents the distribution of the site, and the vertical axis indicates the value of soil moisture data. Through the curve trend of each data, it can be seen that the soil moisture has a strong spatial variability. It can be seen from the map that the trend of the WINDSAT satellite is the closest to the ground observation data (ASM), and the FY3B and SMOS satellite curves are far from the ASM value. Figure 2 illustrates the satellite observation data and the product of the BMA method after integration of the PDF curve, where the abscissa represents the soil moisture and the ordinate represents the soil volume of water content of the data. From the PDF curve describing the cumulative distribution, it is evident that WINDSAT is the closest of the three satellites to the fusion results and WINDSAT has the highest weight. In addition, we obtained three satellites with weights of 0.268 (FY3B), 0.211 (SMOS), and 0.521 (WINDSAT), respectively, in Northeast region.
To evaluate the quality of the results obtained by the fusion method, we used the parameters of , , and to measure the system deviation and plotted them on a Taylor diagram for an intuitive representation. In the Taylor diagrams, the distance from the origin (the radius) represents the of the data. The arc represents the value of between the point and the reference data. Drawing an arc with the reference point as its center, each point located on the arc represents the of that point. In the process of drawing the Taylor diagrams, some difficulties were encountered; however, the results were finally optimized using quality control.
Figures 3–5 present Taylor diagrams for Northeast China, northern North China, and the region of the middle and lower reaches of the Yangtze River, respectively. In figures, the six points plotted in the Taylor diagrams are respectively the automatic ground-based observational data (ASM), observational data from the three satellites (FY3B, SMOS, and WINDSAT), fusion results obtained by means of the traditional averaging method (AVER), and BMA results. As can be seen from Figure 3, the SD of the WINDSAT satellite in Northeast China is about 0.07, is more than 0.6, and the RMSD is about 0.06. These three evaluation indexes are all superior to the SMOS satellite and the FY3B satellite, which corresponds to the information expressed in the previous results. Using the multisatellite data to fuse, the SD of the fusion results of the average method is about 0.075, reaches 0.6, and the RMSD is about 0.06. And the SD of the fusion result through BMA method is 0.06, reaches 0.7, and RMSD is 0.05. The parameters of the BMA method have been improved accordingly. The experimental results show that the BMA method can improve the accuracy of the prediction compared with the mean method and the BMA method has a better fusion effect.
In Figure 4, three parameters (SD, , and RMSD) of the WINDSAT satellite in northern North China are superior to that of the SMOS satellite and the FY3B satellite. Using the multisatellite data to fuse, the standard deviation of the fusion results of the average method is larger than the standard deviation of the BMA method. The parameter of is approximately equal to that of the BMA method, and RMSD is greater than the root mean square error of the BMA method. It is still the better fusion effect of the BMA method. From Figure 5, we can see that the three evaluation indexes of satellite data are similar to those of the above two regions, while the results of the traditional mean method and the BMA method are similar. But we can see that the BMA method is still better. However, environmental and other factors led to insufficient data for the Northwest and Southwest regions, which produced unsatisfactory results. These areas have high altitude and poor weather, and these regional meteorological stations are scarce. This will be investigated in future research.
In order to compare the overall performance of the BMA method and the traditional average method in different regions, we give the results of multiple regions in a Taylor diagram. Figure 6 shows the results of the Taylor diagram of five regions. The regions 1 to 5 are, respectively, Northeast China, northern North China, southern North China, the middle and lower reaches of the Yangtze river, and eastern Northwest China. In the figure, 1FY represents the result of FY3B in region 1; correspondingly, 1SM stands for the result of SMOS in region 1, and 1WI is the result of WINDSAT in region 1. From the figure, we can see that the points named BMA is located below the AVER points. It means that the SD and RMSD of the fusion results of the averaging method are greater than the BMA method. Furthermore, the of the fusion results of BMA method is closer to 1. We can know that the BMA method is better than the traditional averaging method in the five regions.
Table 1 shows a comparison of the fusion characteristics based on region division. The first three columns of the table are the previous divided area and the divided area of this paper, and the corresponding latitude and longitude of each area. The next two columns are quantity of data fusion, and BMA method fusion effect is better than the averaging method number of sites. The last column is ratio of better result in BMA method. It can be seen that the proportion of BMA method fusion in the Northeast, eastern Northwest, and southern Southwest regions is 78.96%, 57.33%, and 92.46%, respectively, whereas it is <50% in all other regions. By querying the satellite weight in those regions with values <50%, it was found that the weight value of the BMA method is about 1/3, similar to the weight value obtained by the averaging method. The small amounts of data obtained for these regions because of environmental and other factors account for the poor results. In spite of this, the overall result of the fusion of BMA methods is clearly superior to the traditional average method.
This research used the BMA method for the fusion of multisatellite microwave remote sensing SM data, in order to improve the prediction of SM data at unmeasured points. From the results of Taylor diagram, the fusion results obtained by using BMA method in different regions are better than the traditional average method. From the fusion results of different regions, it can be known that the proportion of the BMA method fusion in the Northeast region is better, accounting for 78.96%, the eastern Northwest region reaching 57.33%, and the southern Southwest region reaching 92.46%. So we can get the following conclusions. In the prediction of SM data, the use of the BMA method for the fusion of SM data not only solves the uncertainty of the model but it also improves the accuracy of the predicted value.
Although the evaluation parameters were improved, it has not been clarified how the forecast could be made more accurate. In future study, we will use additional ground-based SM observation stations and more sophisticated equipment to obtain further SM data to augment the original database.
The data used to support the findings of this study are included within the supplementary information (in txt format).
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
The authors not only wish to acknowledge the National Meteorological Center for providing the data, but also thank teacher Shi for her guidance and support. This work was supported in part by Major Program of National Natural Science Foundation of China (no. 91437220), Jiangxi Province Science Foundation for Youths (no. 20171ACB21038), and JiangXi Municipal Science and Technology Project.
The supplementary materials include the data observed from three satellites (FY3B, SMOS, and WindSat) and 376 automatic soil moisture stations (ASM, 0–10 cm) daily data in 2012, and all data are arranged according to the regions. (Supplementary Materials)
C. Haishan and Z. Jing, “Impact of interannual soil moisture anomaly on simulation of extreme climate events in China. Part II: sensitivity experiment analysis,” Chinese Journal of Atmospheric Sciences, vol. 37, no. 1, pp. 1–13, 2013, (in Chinese).View at: Google Scholar
M. Zhuguo, F. Congbin, X. Li et al., “Some problems in the study on the relationship between soil moisture and climatic change,” Advance in Earth Sciences, vol. 16, no. 4, pp. 563–568, 2001, (in Chinese).View at: Google Scholar
S. Shuanghe, J. Long, Z. Ying et al., “A practical simulation model for cropped soil water prediction,” Scientia Meteorelogica Sinica, vol. 16, no. 3, pp. 240–248, 1996, (in Chinese).View at: Google Scholar
Y. Zhuang, C. Shi, R. Shen et al., “Quality evaluation of multi-microwave remote sensing soil moisture products over China,” Journal of the Meteorological Sciences, vol. 35, no. 3, pp. 289–296, 2015.View at: Google Scholar
J. A. Hoeting, D. Madigan, A. E. Raftery et al., “Bayesian Model Averaging: a Tutorial,” Statistical Science, vol. 14, no. 4, pp. 382–401, 1999.View at: Google Scholar
J. M. Sloughter, A. E. Raftery, and T. Gneiting, “Probabilistic quantitative precipitation forecasting using Bayesian model averaging,” Department of Statistics, University of Washington, Seattle, WA, USA, 2006, Tech. Rep. 496.View at: Google Scholar
J. Liu, Z. Xie, and L. Zhao, “BMA probabilistic forecasting for the 24-h TIGGE multi-model ensemble forecasts of surface air temperature,” Chinese Journal of Atmospheric Sciences, vol. 37, no. 1, pp. 43–53, 2013.View at: Google Scholar
Z. F. Zhang, “Development of quality control program for soil moisture data of automatic station,” Journal of Arid Land Geography, vol. 36, no. 1, pp. 101–108, 2013.View at: Google Scholar
Z. Dongbin, “Automatic soil moisture observation data quality analysis and reference site selection,” Technical Report, National Meteorological Information Center, Beijing, China, 2013, (in Chinese).View at: Google Scholar
A. E. Raftery, “Bayesian model selection in structural equation models,” in Testing Structural Equation Models, K. A. Bollen and J. S. Long, Eds., SAGE Publications, Thousand Oaks, CA, USA, 1993.View at: Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society Series B, vol. 39, no. 1, pp. 1–38, 1977.View at: Google Scholar
G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, Wiley, Hoboken, NJ, USA, 1997.
Z. Yafen, “Zhu Yafen’s drought and flood division in eastern China and evolution of drought and flood in North China during 5.30,” Journal of Geography, vol. 58, no. S1, pp. 100–107, 2003.View at: Google Scholar