Abstract

During recent decades, various downscaling methods of satellite soil moisture (SM) products, which incorporate geophysical variables such as land surface temperature and vegetation, have been studied for improving their spatial resolution. Most of these studies have used least squares regression models built from those variables and have demonstrated partial improvement in the downscaled SM. This study introduces a new downscaling method based on support vector regression (SVR) that includes the geophysical variables with locational weighting. Regarding the in situ SM, the SVR downscaling method exhibited a smaller root mean square error, from 0.09 to 0.07 m3·m−3, and a larger average correlation coefficient increased, from 0.62 to 0.68, compared to the conventional method. In addition, the SM downscaled using the SVR method had a greater statistical resemblance to that of the original advanced scatterometer SM. A residual magnitude analysis for each model with two independent variables was performed, which indicated that only the residuals from the SVR model were not well correlated, suggesting a more effective performance than regression models with a significant contribution of independent variables to residual magnitude. The spatial variations of the downscaled SM products were affected by the seasonal patterns in temperature-vegetation relationships, and the SVR downscaling method showed more consistent performance in terms of seasonal effect. Based on these results, the suggested SVR downscaling method is an effective approach to improve the spatial resolution of satellite SM measurements.

1. Introduction

Remotely sensed soil moisture (SM) offers increased spatial coverage and improved temporal continuity and has thus resulted in substantial changes in our understanding of the global water cycle [1, 2]. Nevertheless, the relatively large spatial resolution of approximately 10 km for passive/active microwave satellite remote sensing datasets is the main reason they cannot be effectively applied to hydrological studies at a regional scale [3]. The issue of scale mismatch between remotely sensed and in situ SM has also been considered unavoidable and has been critically evaluated using coarse satellite measurements, particularly in areas with nonhomogeneous land cover [4]. Thus, downscaling techniques that focus on the spatial resolution of remotely sensed SM are important to match with an in situ dataset and enable practical applications.

To resolve this problem, synergistic approaches to disaggregate microwave remote sensing SM measurements using visible/infrared (VIS/IR) sensors with enhanced spatial resolution have been performed in previous studies [59]. This approach is based on the relationship of SM between the land surface temperature () and the normalized difference vegetation index (NDVI) that theoretically forms a triangular shape because of the evaporative cooling effect [10, 11]. However, the downscaling methods based on this relationship are considered semiempirical. Previous SM downscaling researches have consisted largely of variations in the regression formula based on these three related variables. Chauhan et al. [6] introduced surface albedo into this method to strengthen the relationship between SM and land parameters and applied it to 25 km SM data from a special sensor microwave imager and 1 km land parameters from the advanced very high resolution radiometer. A comparison of the 1 km SM and in situ SM revealed fairly similar trends, with a root mean square error (RMSE) that ranged from 0.005 to 0.037 m3·m−3. In addition, the introduction of surface albedo was later adopted in Yu et al. [7] and Choi and Hur [5]. Piles et al. [8] introduced brightness temperature (TB) instead of surface albedo to downscale the SM with other variables and the Ocean Salinity (SMOS) mission.

The polynomial regression formula applied in previous downscaling studies has been shown to have good performance. However, the method features innate errors resulting from the regression of a highly complex and nonlinear relationship of in nonhomogeneous vegetation conditions and SM into a polynomial model [12, 13]. Thus, there is a need to find and employ a different regression model to better capture the inherent complexity.

A support vector machine (SVM), an alternative method to downscale the SM, is a machine-learning algorithm that provides a nonlinear generalization solution to datasets through structural risk minimization and is based on the solid theoretical foundation of Vapnik–Chervonenkis theory [1417]. The initial applications of SVM have targeted optical characteristic recognition and object recognition tasks using support vector (SV) classifiers [1820]; its application in regression and time series prediction was subsequently adopted [18]. Support vector regression (SVR) in remote sensing research has often been applied to predict variables that appear as responses to other input variables [21, 22]. Kaheil et al. [23] suggested using downscaling algorithms for the Southern Great Plains 1997 (SGP 97) with SVM and assimilation with ground SM measurements. The SVM method was specifically used to tune the downscaled image based on the relationship between the original and approximated coarse scale image. Keramitsoglou et al. [24] applied an SVR to downscale the meteosat second generation using moderate resolution imaging spectroradiometer (MODIS) NDVI, emissivity, and other regression methods to find the preferred methodology. These studies using SVR have evaluated SM downscaling methods by comparing them within identically structured calculation methods in which only the input variables varied [5, 8]. However, the application of SVR in East Asia is insufficient using a remote sensing dataset. Thus, the comparison of downscaling in this area is necessary because the various methods for downscaling SM are inadequate.

In this study, a methodology to downscale active microwave SM based on and NDVI using SVR is suggested to build an optimized regression model that considers the spatial pattern of the original dataset to obtain finer, more accurate SM distribution relative to the conventional VIS/IR downscaling methods. This research is unique because it offers a cross comparison between the newly suggested SVR downscaling method and conventional methods. The downscaled SM was evaluated by taking in situ measurements from nine measurement sites within a 150 km × 125 km study area of the Korean Peninsula from March to November 2012. The polynomial regression downscaling method was also applied in the same study area for comparative evaluation.

2. Study Area and Dataset Descriptions

2.1. Study Area

The study area in southwestern South Korea encompasses the area from 35.0 to 36.3°N and 126.6 to 128.4°E for a total of 18,750 km2 (Figure 1). Cropland and mixed forest are the dominant land covers. The area was selected for its representative land cover characteristics and the availability of in situ measurements, the locations and characteristics of which are described in Table 1. The land cover types were considered because surface properties such as vegetation types, soil types, land uses, and topography could affect the SM retrieval algorithm that is based on microwave sensor observations [25, 26].

The annual precipitation at the measurement sites ranged from 1300 to 1800 mm, with the heaviest rainfall occurring during the summer, and the annual mean temperature ranges from 10.6 to 13.2°C [27]. The western part of the study area generally consists of plains that are used as cropland, while the eastern part is of higher altitude and mostly forested (Figure 1). The land cover is classified using the MODIS yearly land cover type data with the international Geosphere-Biosphere Programme (IGBP) global vegetation classification scheme [28].

2.2. In Situ SM Measurements

The nine in situ SM measurements stations that were used in this study were installed by the Rural Development Administration (RDA), Korea. The measurement sites were approximately distributed to cover the study area, and SM was measured within 0 to 10 cm depth with time-domain reflectometry (TDR) at hourly time-step from March to November 2012. TDR and frequency-domain reflectometry (FDR) sensors are the most commonly used techniques to measure soil water content [29]. The TDR measures the propagation time of an electromagnetic wave along the transmission line to determine the dielectric permittivity, while FDR measures the capacitance. Previous studies have demonstrated good agreement in SM measurements between the two approaches [3032]. Note that there is an unavoidable limitation in the difference in the measurement depth of microwave satellite SM data and in situ data [4]. However, because the geophysical variables adopted in this study represent the surface properties observed using an optical satellite sensor, the measuring depth difference was disregarded.

2.3. Advanced Scatterometer SM

The advanced scatterometer (ASCAT) is an active microwave sensor aboard the European Space Agency’s (ESA) meteorological operation (MetOp-A) satellite. It began operation in 2006 and measures the radar backscatter at C-band (5.255 GHz) with vertical transmit–vertical receive (VV) polarization. Since the measurement is performed using two satellite tracks, dual 550 km-wide swaths are produced, covering 82% of the Earth daily. While the SM retrieval from passive microwave observations with 12.5 km resolution is mainly based on the linkage between TB and geophysical variables, the retrieval of the SM from ASCAT uses a time series-based approach to scale the backscattering coefficient between the lowest and the highest values which are presented as the degree of saturation [33, 34]. The SM values are estimated as the relative variation between the wettest (100%) and driest (0%) values. The dataset used in this study was daily ASCAT-relative SM processed by the Integrated Climate Data Centre in Hamburg. Relative SM was converted to volumetric SM (m3·m−3) by applying the porosity of each soil texture to enable comparison with the in situ measurements (Table 2).

2.4. Moderate Resolution Imaging Spectroradiometer

The MODIS on board the Earth observation system (EOS) Terra (10:30/22:30) and Aqua (01:30/13:30) satellites uses 36 spectral bands to observe characteristics of the atmosphere, land, and ocean. The MODIS products used in this study were 1 km resolution daily daytime (MOD11A1) and 1 km resolution 16-day NDVI (MOD13A2) from the Terra satellite. The is retrieved from TB using the generalized split-window algorithm [35]. Cloudy pixels are excluded from the retrieval process since thermal infrared signals do not penetrate clouds and are thus confounded with cloud-top temperature. The NDVI can be calculated as the normalized ratio of the near IR and red bands, reflecting the chlorophyll and mesophyll in the vegetation canopy [36, 37]. The level 2 daily surface reflectance product, from which the 16-day period of the MOD13A2 NDVI product is generated, is the adjusted data for ozone absorption, molecular scattering, and aerosols [38]. To establish statistically significant regression models, only days with more than 90% cloud-free pixels were used.

3. Methods

3.1. Preprocessing of Remote Sensing Images

The and NDVI from MODIS which originally have 1 km spatial resolution were uniformly disaggregated to a spatial resolution of 500 m and were then aggregated to have a 12.5 km resolution by applying arithmetic means as follows:where is the 12.5 km averaged NDVI, is the 12.5 km averaged , and and are the number of 500 m pixels in rows and columns in 12.5 km ASCAT, respectively. For downscaling the 12.5 km ASCAT SM, the difference between the 500 m and 12.5 km spatial resolution of the LST and NDVI dataset was required; thus, the 500 m LST and NDVI products were upscaled to 12.5 km resolution for calculating the difference between the products that had different resolutions.

3.2. SM Downscaling Using Polynomial Regression

The performance of the suggested downscaling method using SVR was evaluated by calculating the downscaled SM from the conventional polynomial method using the same input variables. Carlson et al. [10] suggested a relationship among SM, NDVI, and , a polynomial regression formula, under the different climatic conditions and land cover types as follows:where is the number of a reasonable dataset and is the regression coefficient at a specific day and scene for analysis.

In this study, the equation is applied with and to yield second-order polynomial equation as follows:

3.3. SM Downscaling Using Support Vector Regression

The SM downscaling procedure using SVR consisted of two parts. The remote sensing images (ASCAT SM, MODIS , and NDVI) were preprocessed for application during the SVR process, and high-resolution SM data were produced using the training and prediction procedure in the SVR. The downscaling methodology suggested in this study combines the conventional VIS/IR synergistic downscaling method with the image approximation concept by introducing the locational information of latitude and longitude as an additional input variable. Figure 2 shows the entire procedure for the suggested downscaling.

The SVM is among the machine learning based on covariates’ nonlinear transformations developed by Vapnik in the early 1990s [39]. The SVM for regression was also updated by Vapnik [14]. This model included a training phase to train the associated input and target output dataset based on statistical learning theory [40].

Of the various versions of SVM tools, the LibSVM that was built by Chang and Lin [41] was used in this study. The radial basis function (RBF) was selected for the kernel as follow:where is the bandwidth that determines the under- or overfitting loss [42]. The consists of {, , , and }, where is the position of the pixel and is the position of a pixel. The selection of the RBF kernel was based on previous studies that showed its superiority over other kernel functions for both classification and regression tasks [4345]. Two RBF parameters—gamma and penalty—were optimized using a grid search algorithm and n-fold cross validation, both of which have been widely used in the literature [4648]. The original sample in the n-fold cross validation is randomly divided into subsamples of equal size. A single sample among the subsamples is maintained with validation data for assessment of the model, and the subsamples are used for training data. Then, the process of cross validation is repeated times, and the subsamples are used at once for validation. This approach has been widely used in SVM research [4951], and it is regarded as a basic application in the LibSVM tool as previously mentioned. A three-fold cross validation was then used, and the selected parameters are showed in Table 3. Since the selection of geophysical variables ( and NDVI) was theoretically conducted [3, 6], variable selection was omitted. All of the variables were scaled to [0, 1] to even out quantitative differences among them. To obtain valid regression models with a sufficient number of samples, only satellite images from cloud-free days were used; thus, a total of 55 days from March to November 2012 were available.

3.4. Statistical Analysis Methods

The four following indices were used for the statistical evaluation as follows:where is the satellite-observed or modeled SM and SM in situ is the ground-measured SM. In this study, the averaged R value instead of each R value from each site was employed in accordance with previous studies [5, 5254] because of the limitations of a lack of SM samples from both the ground and satellites. In case of in situ measurements, it is difficult to obtain simultaneous data with the over pass time of the satellite, and vice versa, as cloud cover causes an absence of the visible band-based MODIS land data (LST and NDVI) making it impossible to get a downscaled SM. The index of agreement (IOA) ranges from 0 to 1, with higher index values indicating a smaller mean square error and better agreement between the modeled values and observations [54].

4. Results and Discussion

The polynomial regression and SVR models were established to perform daily-scale evaluations of SM variability. The averaged linear correlation coefficient value between the original and downscaled products was 0.55 for both models. This model performed relatively well in disaggregating the coarse-scale original SM product; thus, both models were considered suitable to downscale the original SM dataset.

4.1. Evaluation of Downscaled SM Compared with In Situ SM

The original ASCAT SM and each downscaling algorithm were compared against nine in situ SM measurements in the study region. Figure 3 shows the temporal variation in the SM measurements, 12.5 km SM from ASCAT, and downscaled 1 km SM using SVR and polynomial regression and their response to daily rainfall events. Although the characteristics of the temporal patterns are site-specific, all three remotely sensed measurements approximately followed the patterns of the in situ SM. The 1 km SM downscaled data using polynomial regression also showed a similar temporal pattern to that of the 12.5 km and in situ SM measurements, but crucially underestimated some values on occasion. This was most visibly demonstrated in comparison to the downscaled SM from SVR. In Geumsan, Yeongdong, Wanju, and Jeonju, the underestimation of the polynomial downscaling results are more apparent than elsewhere. Overall, the 1 km SM downscaled using SVR had more realistic trend values than that using polynomial regression compared with the in situ SM, and its pattern was very similar to the original 12.5 km SM.

Since the results of the comparison of the downscaled SM methods appeared to have clear differences, particularly at some sites, the correlation between the two independent variables (NDVI and ) and the 12.5 km SM residual magnitudes of each regression model prediction were analyzed to evaluate the manner in which the variables affect the regression models using the p-test. Figure 4 shows the time series for R between the residual magnitude and the variables with the corresponding significance. The statistical results are summarized in Table 4. The value, a statistical significance, is the marginal significance level under the assumption that true for the null hypothesis stands for occurrence probability of the given event and the slope means whether there is linear relationship between the independent variable x and the dependent variable y. For the polynomial regression model, the residual magnitude showed a significant and strong correlation on average with both independent variables with average R values of 0.32 and 0.41. Comparatively, the SVR model showed a weaker and insignificant correlation with averaged R values of 0.10 and 0.17. The difference between the degrees of correlation of the two models used in this study was likely caused by the methodological difference of the SVR model that additionally considered locational weighting.

Meanwhile, the signs of the slope coefficients for each variable were found to be opposing for each regression model. For the polynomial regression model, NDVI showed a positive relationship and showed a negative relationship with the SM residual magnitude, while the signs were opposing for the SVR model. Considering that for most dates, the relationship between the two variables and the residual magnitudes of the SVR models was insignificant, with large values (average values of 0.41 and 0.24, resp.), and the signs were only meaningful for the polynomial regression models. The positive slope coefficients between the NDVI and the residual magnitude can be explained as a result of the increased uncertainty of the microwave SM retrieval for areas with denser vegetation [55]. The oppositely negative signs of the slope coefficients between and the residual magnitude were partially due to their relationship with vegetation. While the relationship between the two variables was assumed to be mutually independent in the regression models, similar water stress conditions produced two variables that were negatively correlated, partly due to evaporative cooling [12, 56]. In addition, the highest R values were found during the growing season, from mid-May to mid-September, demonstrating that seasonal patterns occurred in the residual of the polynomial regression models. This result was also partly explained by the crucial underestimating tendency of the polynomial regression model found at some sites. In the case of Jeonju, this pattern could be attributed to the highest annual mean air temperature based on the observed significant positive correlation between the regression model’s residual magnitude and (Table 1). In addition, in a pixel-by-pixel inspection for days with extremely underestimated performance at each site (not shown here), the underestimations were found to have occurred in pixels in which was substantially higher than the average for that particular day.

As shown in Tables 57, each of the remotely sensed SM measurements were quantitatively evaluated by comparing them with the in situ SM measurements. The average value of nine in situ SM measurement sites was 0.23 m3·m−3, and among the remotely sensed products, the 1 km SM downscaled using polynomial regression had the nearest value (0.24 m3·m−3) but with the highest RMSE. The average downscaled SM using SVR was 0.26 m3·m−3 with a standard deviation (SD) of 0.05 m3·m−3, similar to that of the 12.5 km ASCAT SM measurement (0.05 m3·m−3) (Table 7). The R values between in situ SM and 12.5 km ASCAT SM, downscaled SM using polynomial regression, and downscaled SM using SVR were 0.66, 0.62, and 0.68, respectively.

Figure 5 presents the overall error distribution for each remotely sensed SM measurement. Since a difference in SM indicates an error in the remotely sensed SM relative to the in situ SM measurement, an ideal histogram would have a steep and narrow form centered on zero, thus indicating a normal distribution with a zero mean [57]. While the original coarse scale SM had a positive bias on average with an RMSE of 0.072 m3·m−3, for the downscaled SM using polynomial regression, the RMSE was the same as that of the original ASCAT SM but with a higher SD (0.072 m3·m−3). In the case of the SVR, it also had a positive bias with a decreased RMSE (0.065 m3·m−3) and SD (0.056 m3·m−3) (Figure 5). Thus, these results indicate that SVR offers better performance in reducing the error of the downscaled satellite SM. The R values between the satellite SM and the corresponding in situ measurements showed better results for the SVR downscaling method, with an increase from 0.62 to 0.68 as previously mentioned (Tables 6 and 7). The IOA values, which are more sensitive to extreme values in estimating the model agreement, showed differences with the R value at some sites (Jeongeup and Wanju). However, on average, they also indicated the SVR results to be a better estimation for in situ SM.

Figure 6 shows the two-dimensional Taylor diagram [58] summarizing the statistics for the three ASCAT SM products compared with the in situ SM measurements from nine sites. This diagram shows the statistical values between the original SM and downscaled SM using SVR and the polynomial method and in situ data. While the ranges of the R values for the three SM products were similar, from approximately 0.4 to 0.9, there were apparent differences in the distributions of the ratio of the SDs and RMSE. Although statistical resemblances were found between the results of the 12.5 km ASCAT SM (diamond) and 1 km SVR SM (circle), the diagram indicates a clearly higher SD for the 1 km poly SM (triangle). In particular, the SVR SM points were found to be most closely around the ideal arc drawn with a dashed line. The results of the polynomial downscaling were more sparsely distributed on the diagram with an isolated point representing the result at the Hapcheon site, and the larger RMSE was probably a result of the geophysical characteristics at Hapcheon site since the corresponding ASCAT pixel contained a mixed land cover of forest and cropland. Generally, they showed some weak agreement between SM retrievals with in situ measurements such as at the Hapcheon site; however, the R values of the downscaled SM were largely improved even if the range of that improvement was small.

4.2. Spatial Distribution of Downscaled SM

The 12.5 km and 1 km ASCAT SM measurements obtained using two different downscaling methods were spatially compared with daily mappings of each type of data on dry and wet days (Figure 7). The overall spatial variations of the 1 km ASCAT SM measurement were approximately similar to those of the 12.5 km data, but with more finely distributed characteristics. While the eastern part of the study area with forested land cover had a higher average SM of approximately 0.5-0.6 m3·m−3, the western part with primarily cropland land cover had more temporal variation according to meteorological events. A comparison of the spatial distributions in the 1 km SM mapping using polynomial regression revealed a clear similarity between the mappings of the 12.5 km SM and 1 km SM for SVR caused by the downscaling algorithm that uses each pixel’s position as a predictive variable. Under wet conditions (Figures 7(c) and 7(d)), the spatial patterns of the downscaled SM from the polynomial regression are evenly distributed and rely on the distribution of compared with that under dry conditions (Figures 7(a) and 7(b)). Under wet conditions, the original ASCAT SM shows relatively dry patterns in the western part of the study area, while the TS and downscaled SM using polynomial regression show no higher temperature or drier patterns in the same region, respectively. Piles et al. [8] also reported a more consistent and similar spatial variability of the downscaled SM product relative to the original SMOS SM under dry soil conditions. Thus, a consideration of positional weighting would allow substantial performance improvement of the SM downscaling based on Ts-NDVI.

Figure 8 shows the distribution of the seasonal mean differences between the 1 km ASCAT, uniformly disaggregated, and downscaled SM measurements generated using each methodology. On average, the difference between the original data and the downscaled SM using the SVR was clearly less than that using the polynomial downscaling. Although there were clear per-pixel discrepancies in the polynomial-downscaled SM, the differences in the SVR-downscaled SM were more evenly distributed, regardless of location. This characteristic was a result of the methodological difference between those that the SVR downscaling considered as the positional weight while the polynomial downscaling did not. The difference in the polynomial-downscaled SM was negatively biased and was found to be larger in the southeastern region of the study area where the elevation was higher with a land cover dominated by mixed forests. It is probable that the higher uncertainty in the NDVI for densely vegetated areas erroneously affected the regression model. The largest differences in SM for both products were found during the summer (June to August) when the vegetation growth reached its peak, and this might have affected the relationship between the SM and . Similar seasonal differences in the error pattern for the downscaled SM were also found in Merlin et al. [3], and that study adopted a separate downscaling algorithm to reduce the seasonal discrepancy in downscaling performance by considering the controlling variable of SM for each pixel. In addition, in a future study, the depth discrepancy between satellite- and ground-based SM measurements should be corrected when comparing downscaled SM product with in situ data by estimating the profile satellite SM values [59, 60].

5. Conclusions

The downscaling methods for remotely sensed SM dataset are among the most important topics in related research fields since they provide a solution to low spatial resolution. This study proposed and evaluated a new downscaling method using SVR by comparing with in situ SM measurements and results of a conventional downscaling method. The RMSE decreased after downscaling using SVR from 0.08 to 0.07 m3·m−3, and the R increased from 0.66 to 0.68; the bias remained the same at 0.03 m3·m−3. Considering that the improvements and deterioration of the downscaled SM evened out on average, valid improvements in accuracy should be noticed at the nine sites selected for validation. The statistics were better than those of the polynomial downscaling method, which had an RMSE of 0.09 m3·m−3, an R of 0.62, and a bias of −0.02 m3·m−3. In the correlation analysis between the independent variables (NDVI and ) and the residual magnitude between the 12.5 km ASCAT SM and predicted SM from each regression method, only the polynomial regression residual magnitudes showed significant results that were positively correlated with NDVI and negatively correlated with . In a spatial comparison among the SM mappings at two scales, the 1 km SM using SVR better followed the spatial distribution of the original scale (12.5 km) than the 1 km SM using a polynomial regression. In the spatial distribution of the seasonally averaged differences between the original and the downscaled SM contents, the SVR downscaling method showed a more consistent performance, given the seasonal effect. Based on these results, the suggested SVR downscaling method can be used to improve the spatial resolution of satellite SM while offering better performance than the conventional downscaling method. However, this study did have several limitations; first, the remote sensing data were difficult to obtain due to missing products; second, it took considerable time to preprocess the dataset and execute the model to obtain downscaled SM; and lastly, the algorithm’s complexity needed considerable memory requirements for a wide range of tasks [61]. In a future study, the limitations of this study will be improved by applying various remote sensing and assimilation datasets. This method can be extended to apply to various fields that require fine-resolution SM datasets such as large-scale water-related natural disasters. This is because antecedent SM information can be effectively used to predict landslides, droughts, dust outbreaks, and agricultural water deficiencies [62, 63].

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-2016R1A2B4008312); Space Core Technology Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2014M1A3A3A02034789); and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2017R1A6A3A11034250; NRF-2017R1D1A1B03028129).