Estimating Salt Concentrations Based on Optimized Spectral Indices in Soils with Regional Heterogeneity
Soil salinity is one of the most damaging environmental problems worldwide, especially in arid and semiarid regions. The objectives of this study were to improve the inversion accuracy of soil salt content (SSC) in soils with spectral heterogeneity by using optimized spectral indices. Soil samples at a 0–20 cm depth were taken from Keriya Oasis (98 soil samples), Ugan-Kuqa Oasis (49 soil samples), and Ebinur Lake Basin (57 soil samples). SSC and spectral reflectance (SR) of all the 204 soil samples were determined. To comprehensively analyze the field-collected hyperspectral data, various band combinations were used to calculate a normalized difference spectral index (NDSI) and ratio spectral index (RSI). Then, the relationships between the indices and SSC were examined, and the most robust relationships were demonstrated. The partial least squares regression (PLSR) method was utilized to develop a predictive model of SSC, and the variable importance in the projection (VIP) method was used during modeling. The results revealed that (i) the salinized soils in different regions had apparent differences in both reflectance and spectral curve morphology, but the optimized spectral indices method effectively overcame the regional heterogeneity of salinized soil hyperspectral characteristics, and the correlation with SSC was always kind, with correlation coefficients up to 0.748 at 0.001 level of significance; (ii) the VIP filtering method effectively selected the optimal independent model, and the modeling accuracy was better than the single optimization index (R2Pre = 0.83 and RMSEPre = 2.31 g·kg−1) by using the combination of two optimal indices; (iii) although the global modeling accuracy was significantly lower than the local modeling accuracy due to the inconsistent salt sensitivity bands of salinized soils in different regions, combined with cross-validation analysis, the global model had the ability to predict soil salinization accurately (R2Pre = 0.69 and RMSEPre = 8.45 g·kg−1). The methods developed in this study can be applied in other arid and semiarid areas. Besides, the study also provides examples for aerospace hyperspectral remote sensing of cross-regional soil salinization.
Soil salinization can lead to decline in soil fertility, decrease in crop productivity, and deterioration of the ecological environment, which are the main factors restricting the sustainable development of agricultural production and the ecological environment [1–3]. The rational development, improvement, and treatment of salinized soil require rapid, accurate, and dynamic information acquisition about salinized soil. Traditional soil salinization monitoring adopts the field fixed-point survey method, which is time-consuming, laborious, and destructive, with few measuring points and poor representativeness [4, 5]. It is difficult to obtain salt information across a large salinization area quickly and dynamically. Hyperspectral remote sensing has overcome the shortcomings of traditional artificial ground monitoring methods and has become an advanced method in the field of soil salinization monitoring due to its substantial advantages of multiple bands and continuous, abundant information and high quantitative inversion accuracy.
Construction of a soil salinity hyperspectral quantitative inversion model is one of the essential aspects of soil salinization hyperspectral remote sensing monitoring. A large number of studies have carried out the salt hyperspectral quantitative inversion of salinized soils in different areas. Weng et al. [6, 7] used the Yellow River Basin in Shandong Province of China as the research area. The hyperspectral quantitative inversion model for soil salinity based on a laboratory spectrum and Hyperion image was established using the univariate regression method, and the determination coefficients of the model were 0.873 and 0.627, respectively. Zhang et al.  also studied salinized soil in the Yellow River Basin by constructing a partial least squares model based on hyperspectral vegetation indices, but the coefficient of determination for the model was only 0.58. Kumar et al.  used the Indo-Gangetic plains of India as the study area and constructed a linear regression model with Hyperion data and salinity parameters of electrical conductivity (EC), sodium adsorption ratio (SAR), and exchangeable sodium percentage (ESP), with determination coefficients of 0.777, 0.801, and 0.804, respectively. Zeng et al.  established partial least square regression (PLS) and orthogonal projection to latent structure (OPLS) models for soil salinity from field-measured hyperspectral data collected in the Hetao Irrigation District of Inner Mongolia. The soil salt estimation model had a coefficient of determination up to 0.755. An et al.  studied a hyperspectral field estimation stepwise multiple linear regression model for salt content in coastal saline soils of the Yellow River delta with a coefficient of determination of 0.867. Rocha Neto et al.  used a variety of regression methods (extreme learning machine (ELM), ordinary least square regression (OLS), multilayer perceptron (MLP), and partial least squares regression (PLSR)), based on laboratory spectroscopy, to construct a hyperspectral prediction model for conductivity of saline soil in the Morada Nova Irrigation District. The highest accuracy of the model reached an adjusted R2 = 0.953. Wang et al.  established a bootstrap-BP neural network SSC prediction model using field-measured hyperspectral data over the Ebinur Lake Wetland National Nature Reserve, and the prediction accuracy reached R2 = 0.95 based on the optimal spectral indices. These studies can be used as references for follow-up research that seeks to use hyperspectral remote sensing to monitor the status of soil salinization.
Soil salinization has spatiotemporal variability, and the main factors affecting spatial and temporal heterogeneity are as follows: (i) The effect of soil formation on soil water and salt: the soil parent material desalinates the soluble salts during the erosion process under climatic conditions, and the salt gradually accumulates to form salinized land. The human cultivation and maturation process, the irrigation siltation process, and the salinization process are the main soil-forming processes. (ii) Local small terrain plays a crucial role in the accumulation of soil salinity. Generally, in low-lying areas, the soil particles are finely distributed, the water permeability is weak, and the groundwater is shallow. Under hot climatic conditions, the water volatilizes, leaving more salt and salt accumulates in this process, eventually resulting in a slice of salinized land. (iii) The influence of groundwater depth and water quality on soil water and salt. The main reason for the change in the balance of soil salinity is the change in water resources and expenditure. In the arid area, the evaporation is much larger than the precipitation, the surface runoff is basically absent, the groundwater becomes the primary source of water, and the groundwater quality and depth have spatial and temporal changes. The change in space is characterized by deeper groundwater depth in the upstream of the alluvial fan and lower salinity, while the downstream groundwater has a shallower depth and a higher degree of mineralization. The water level in the groundwater depth gradually increases, and the salinity concentration of the groundwater gradually increases, accelerating the accumulation of soil salinity in the upper layer of the soil. The change in time is due to the intercontinental variation of groundwater depth. Generally, it is affected by agricultural irrigation from April to October, and the water level is high. From March to February, due to winter soil freezing and phreatic evaporation, groundwater depth increased and the water level showed a minimum. (iv) The impact of land use on soil water and salt: the land use pattern will directly affect the nature of the soil and has a significant impact on the soil water cycle and the change of salt content.
Hyperspectral prediction models established for soil salinity, at present, are based on data from a specific area. However, the sensitive bands of hyperspectral remote sensing inversion of soil salinization in different regions are affected by regional heterogeneity. Soil salinization results using hyperspectral remote sensing for soil salinization monitoring can vary based on differences in soil organic matter, soil salinity, salt ion composition, and other factors in a particular region. Therefore, it is challenging to apply salt hyperspectral prediction models to other regions. Reduction in soil spectral heterogeneity and improvement of prediction accuracy and universality of the model have always been a hot issue for remote sensing researchers. Mashimbye et al. , in terms of the hyperspectral inversion of the conductivity of different types of salinized soils in South Africa, showed that first-order differential treatment could effectively eliminate the influence of soil types and other factors, thus highlighting the correlation between spectral reflectance and electrical conductivity. Shi et al.  conducted first-order differential processing and spectral classification modeling on 1581 soil samples from 16 soil types collected from Tibet, Xinjiang, Heilongjiang, and Hainan, China, and found the prediction accuracy of the first-order differential-spectral classification-local modeling is significantly higher than that of the first-order differential-global modeling. The authors also indicated that the first-order differential treatment is better than the original data. Consequently, to improve the universality and accuracy of the model, the first-order differential spectroscopy preprocessing method may be an effective method to improve further the global model presented in this paper.
Remote sensing monitoring of soil salinization can use the macromonitoring advantage of remote sensing technology to establish a unified quantitative inversion model for soil salinization at the national or even global scale to meet the needs of large-scale monitoring. Nevertheless, at the large-scale range, there are apparent differences in the hyperspectral characteristics of soils with different parent materials and soil-forming processes, which may be critical factors that reduce the accuracy of large-scale inversion models relative to regional inversion models. To identify and solve this problem, it is necessary to successfully establish a unified quantitative model for soil salinization at the national and global scales. Therefore, based on the differences in hyperspectral characteristics of soil salinization in three typical salinization areas in Xinjiang, China, this study proposed constructing the soil salinization hyperspectral quantitative inversion model in Xinjiang to provide a reference for future monitoring and application of soil salinization on a large scale by space hyperspectral remote sensing.
2. Materials and Methods
2.1. Study Area
Xinjiang, China (Figure 1), is located in central Eurasia and the northwestern border of China, with a total area of 1,664,900 km2, accounting for one-sixth of China’s total land area . The topography is characterized by “three mountains and two basins.” Xinjiang is inland and far from the sea, with a bright temperate continental climate, with Tianshan as the boundary. The temperature in southern Xinjiang is higher than in northern Xinjiang, and the precipitation in northern Xinjiang is higher than in southern Xinjiang. The oases are distributed in the basin margins and river basins. The total oasis area accounts for 5% of the total area with typical oasis ecological characteristics. According to the data from China’s Second Soil Census, the soil types distributed in Xinjiang plain oases are mainly anthrosols, gleysol, fluvisol, phaeozems, and solonchaks.
The area of saline soil in Xinjiang accounts for 22% of the total area of saline soil in China, and it is also the largest saline soil distribution area in the country. Xinjiang is known as the salinized soil museum in arid areas with a wide range of distribution and variety. As the most significant agricultural land reserve area in northwestern China, the total area of salinized land in the Xinjiang irrigation area accounts for 32.07% of the total area. Soil salinization has become a significant obstacle to the efficient utilization, sustainable development, and construction of modern agriculture in Xinjiang. The formation of salinized soil in Xinjiang has gone through a long process. Southern Xinjiang began at least from the Cretaceous. The subsequent Quaternary river-lake sediments, mainly the modern sediments, played a critical role in the redistribution of soil water and salt. The deep saline water in soil participated in the regional water-salt cycle in the form of fissure water and confined water, which has a history of millions of years. In arid desert conditions, due to the meager precipitation and extreme evaporation, the salt in the crust continues to accumulate on the surface to form a typical saline soil with high salt content and thick salt layer. The unique geological and geomorphological conditions of Xinjiang have formed a broad closed-flow area, which makes the vast basins and low plains of Xinjiang an enrichment area for salt migration. Xinjiang’s natural geographical environment, coupled with unsustainable utilization of water and land resources, will inevitably cause more land to face soil salinization problems. Accurate and efficient determinations of local soil salinity will be beneficial to local water resources management, agricultural land-use planning, and prevention of soil salinization expansion.
Ebinur Lake Basin is the lowest depression and water-salt pool in the southwestern margin of Junggar Basin; the range was 44°19′∼45°11′N and 82°02′∼83°42′E. The climate type is a temperate semiarid continental monsoon climate. Frequent dust and salt dust storms on the particular terrain are remarkable characteristics of this region. In recent years, Ebinur Lake has shrunk dramatically in volume, owing to climate change and human influence, leaving a saline playa greater than 500 km2 [17, 18]. The delta oasis of the Ugan and Kuqa rivers is located to the south of Tianshan Mountains and the north of the Tarim Basin; it is a typical and complete piedmont alluvial-pluvial fan plain. The actual study area is bounded by 82°05′E∼83°42′E longitude and 40°55′N∼41°50′N latitude. The climate exhibits extreme aridity. The type of soil in the study area is mostly light-soil land and sandy loam soil, and the primary component of the salinized soil is chloride. Approximately 26% of the nonagricultural land is heavily salinized . The Keriya Oasis (36°43′∼37°13′N, 81°08′∼81°47′E) is located on the southern edge of the Taklamakan Desert, at the northern foot of the Kunlun Mountains. This region has a temperate continental arid climate characterized by hyperaridity. The oasis is situated on a fluvial plain with relatively flat terrain, loose soil, high salt concentrations, and low soil fertility .
2.2. Field Sampling and Laboratory Analysis
The study of Xinjiang is focused mainly on the oases that address the survival needs for human beings and to establish a hyperspectral model of soil salinity retrieval in the oases. The soil sample collection area covers three typical and large-scale samples.
Soil samples from Ugan-Kuqa Oasis were obtained during the 11 days from July 7, 2016, to July 17, 2016, with 49 samples collected. Then, we took the collected soil samples back to the laboratory for drying and did not process the soil samples before going to the Keriya Oasis. The reason why we choose to collect soil samples in the field in July is that the sunshine in July is abundant and intense, and the evaporation is high, so the phenomenon of soil salinization is distinct, and because of the low precipitation, the field soil sampling can be guaranteed to proceed normally. Because the groundwater table is very high in September and October, which is the month of serious soil salinization in a year, it is more effective to carry out field sampling at this time. Keriya Oasis soil samples were obtained from September 19, 2016, to September 28, 2016, with 98 samples collected during ten days. After the sampling work, we took the soil samples back to the laboratory for drying (together with the soil samples from the Ugan-Kuqa Oasis), and then we went to Ebinur Lake Basin in October. The soil sampling time in the Ebinur Lake Basin was from October 16, 2016, to October 25, 2016, during which 57 samples were collected over ten days. Similarly, the soil samples were taken back to the laboratory and placed together with the samples from the other two places for drying. In January 2017, after all the soil samples were dried, we concentrated on the indoor laboratory work.
A total of 204 sites were chosen for soil sampling based on the visual determination of land cover conditions and salinity. Quadrats were set (30 m × 30 m) using GPS record position for the use of the 5-point hybrid method for sample collection. Soil samples were collected from the soil surfaces (0–20 cm), bagged, and brought to the laboratory. The collected soil samples were air-dried to eliminate the influence of water content; sieved using a 1 mm sieve to remove large debris, stones, and vegetation; and then split into two groups (one for chemical analysis and the other for spectral measurement). After grinding and filtering all 204 soil samples, the salt content and indoor hyperspectral data of the soil samples were measured. The salt content data of each point are in one-to-one correspondence with the hyperspectral data. Soil salt content was determined by preparing soil solutions with a soil-water ratio of 1 : 5, and the salt content and electrical conductivity (EC 1 : 5) of the solution were measured using Orion 115A+ [21, 22].
2.3. Soil Sample Spectrometry
The hyperspectral data of all sampling points were measured centrally on January 15, 2017. Spectral reflectance data for soil samples were obtained using an Analytical Spectral Devices FieldSpec-3 portable spectroradiometer ASD (Analytical Spectral Devices, Inc., USA) with a wavelength range of 350–2500 nm. Spectral measurements were carried out in a dark room with spectral resolutions of 1.4 nm (350–1000 nm) and 2 nm (1000–2500 nm) for different bands, with a sampling interval of 1 nm. The light source was a 50 W halogen lamp, 50 cm from the surface of the soil sample and 15 degrees at the zenith angle. The ground sample was placed in a black container with a diameter of 20 cm and a depth of 2 cm. After filling, the surface was flattened. The sensor probe used a 25° field angle probe, which was 15 cm above the soil sample. The area where the probe receives the soil spectrum was much smaller than the area of the black container so that the reflection spectrum of the soil sample was guaranteed. Before each sampling, a dark current correction and whiteboard calibration were performed. Ten spectral curves were collected for each sample, and the arithmetic average was calculated by ViewSpec-Pro (V6.0.11) software that was used as the reflectance of the soil sample. The ten spectra for each soil sample were averaged to minimize instrument noise. A Savitzky–Golay filter was applied to reduce the noise effect and eliminate the low SNR edge bands (350∼399 nm and 2401∼2500 nm), smoothing the denoising data of the 400∼2400 nm for 204 soil samples. These hyperspectral data were later used to calculate the optimized spectral indices, and then the correlation between the measured salt content data and the optimized spectral indices was obtained.
2.4. Selection of Spectral Soil Indices
One of the most effective approaches to explore the significant relationships between the soil physiological index and hyperspectral data is conducting a comparative analysis of ratio spectral indices (RSIs) and simple normalized difference spectral indices (NDSIs), which are calculated from narrowband reflectance spectra. We identified that the wavelengths or spectral indices (equations (1) and (2)) could estimate soil salt content. NDSI and RSI are applied to identify optimal spectral indices. Spectral indices are defined aswhere R is the spectral reflectance value and the subscripts (inm and jnm) are wavelengths in nanometers (nm).
Spectral indices (RSIs and NDSIs) were calculated for the smoothed soil hyperspectral reflectance using all possible combinations of available bands (inm and jnm) in the 400–2400 nm spectral region. The two-band combination of optimized indices software V1.0 (No: 2018R11S177501, developed based on the JAVA) was used during the calculation of the indices . Here, we examined the correlation between in situ soil salt content and spectral indices (NDSIs, RSIs) and 2-dimensional maps of correlation coefficients (r). The r maps, made by Matlab (Version 2012), allow the evaluation of the different band combinations and selection of sensitive spectral indices (RSIs and NDSIs) to study soil salt content. The most effective indices (NDSIs and RSIs) were determined using the significance level of 0.001 and the highest soil salt concentration correlation coefficient (r).
2.5. Variable Importance in the Projection Method
The VIP selection method (equation (3)) was first published by Wold and coauthors. VIP scores summarize the influence of individual variables on the PLS model. VIP scores give a measure useful to select the x variables which contribute the most to the y variance explanation. For a given model and data set, there will always be only one VIP scores vector summarizing all components and y variables . The VIP score for the j variable is given aswhere Wjf is the weight value for j variable and f component and SSYf is the sum of squares of the explained variance for the f component and J number of variables. SSYtotal is the total sum of squares explained by the dependent variable, and F is the total number of components. The Wjf2 gives the importance of the j variable in each f component, and VIPj is a measure of the global contribution of j variable in the complete PLS model. In the case of one-dimensional space y, holds SSYf = bf2tf′tf and SSYtotal = b2T′T, where T is the variable scores matrix and b is the PLSR inner relation vector of coefficients.
2.6. Partial Least-Squares Regression Method
Partial least-squares regression (PLSR) is a multivariate regression method that specifies a linear relationship between a set of dependent response variables, Y, and a set of predictor variables, X . It is a popular modeling technique applied in chemometrics and commonly used for quantitative spectral analysis. To select the optimum number of factors and avoid overfitting, we calibrated the model by an iterative leave-one-out cross-validation (LOOCV) criterion called the minimum predicted residual sum of squares (PRESS) RMSE. RMSE is minimized by iteratively leaving one sample out of the calibration dataset and calibrating the model from the remaining dataset . Wold et al.  described the PLSR method in detail. The PLSR process was performed using DPS® (Version 16.05). The predictive ability of the best selected PLSR model was assessed using the R2 and RMSE for the independent validation dataset.
2.7. Model Validation
Validation of the models is a crucial step to ensure model quality . Once all the developed models were tested, models with (a) a high R2, indicating a robust linear relationship, and (b) low root mean square errors for the model variables. The two quantitative criteria between measured and predicted values were calculated by the equation listed in Table 1. As a result, the best-implemented PLSR model that met all the model selection and validation criteria was selected and used to predict the SSC.
3.1. Statistical Analysis of SSC
For modeling, the samples were sorted from low to high according to the SSC, and based on the equidistance method, the calibration set and validation set were chosen. The characteristics of the descriptive statistics (Table 2) show that the calibration and validation sets of Ebinur Lake Basin corresponded to the following: the maximum values of SSC were 56.4 g·kg−1 and 27.7 g·kg−1, respectively; the minimum values were 0 g·kg−1 and 1.1 g·kg−1, respectively; and the mean values were 8.33 g·kg−1 and 8.21 g·kg−1, respectively. The coefficients of variation for the calibration and validation sets were 157% and 110%, respectively. The calibration and validation sets of the Ugan-Kuqa Oasis corresponded to the following: the maximum values of SSC were 69.8 g·kg−1 and 65.1 g·kg−1, respectively; the minimum values were 0 g·kg−1 and 3.9 g·kg−1, respectively; and the mean values were 16.78 g·kg−1 and 24.06 g·kg−1, respectively. The coefficients of variation for the calibration and validation sets were 111% and 92%, respectively. The calibration and validation sets of Keriya Oasis corresponded to the following: the maximum values of SSC were 17.8 g·kg−1 and 12.9 g·kg−1, respectively; the minimum values were 0 g·kg−1 and 0.5 g·kg−1, respectively; and the mean values were 5.36 g·kg−1 and 4.225 g·kg−1, respectively. The coefficients of variation for the calibration and validation sets were 81% and 104%. The variation coefficient (CV) reflects the distinctiveness of soil salt content, suggesting the degree of human activity effects on soil salt content. A greater coefficient value indicates that soil is more disturbed by human activities . Variation coefficient ranges are the following: CV < 15% is mild variation; 15% < C.V < 36% is medium variation; and C.V > 36% is highly variable. In this study, SSC had variation coefficients greater than 36% in the three regions. The results showed that SSC had the greatest dispersion in the study area, which was largely impacted by human activities.
A statistical table of measured values for other soil properties is also now incorporated by reference. The unit of soil electrical conductivity (EC 1 : 5) in Table 3 is mS·cm−1, the unit of soil organic matter (SOM) is g·kg−1, and the unit of soil compaction (SC) is Kpa.
Table 4 is a statistical analysis table of the eight major ions in the soil. The measurement unit of the eight major ions in the soil is g·kg−1.
3.2. Hyperspectral Analysis of Salinized Soil in Different Areas
Figure 2 summarizes the reflectance spectra curves of soil in the different study areas (Ebinur Lake Basin, Ugan-Kuqa Oasis, and Keriya Oasis). Figure 2(a) shows the three spectral curves have approximately different characteristics. In the visible near-infrared ranges (400–1300 nm), the soil reflectance in the Ugan-Kuqa Oasis was relatively high and the soil reflectance in the Ebinur Lake Basin and Keriya Oasis was relatively low. In the near-infrared part (1300–2400 nm), Ugan-Kuqa Oasis soil reflectance was highest, followed by the Keriya Oasis soil, with the lowest soil reflectance in the Ebinur Lake Basin. This result may be due to differences in soil texture in different areas.
The spectral curve of soil salinity (SSC: 0 g·mg−1) was selected for analysis to eliminate the influence of salt on the differences between soil spectral curve characteristics in the different regions. Figure 2(b) shows the slope of the soil reflectivity curve of Keriya Oasis is the maximum in the band of 400–600 nm and approximately the same in the Ugan-Kuqa Oasis and the Ebinur Lake Basin. In the water absorption zone, the absorption depth and absorption area near the soil of 1400 and 1900 nm in Ugan-Kuqa Oasis were significantly higher than those in the Ebinur Lake Basin and Keriya Oasis. The reflectance of the soil in Ugan-Kuqa Oasis was smooth after a falling trend at 2100–2300 nm, and the Keriya Oasis and Ebinur Lake Basin showed a downward trend in volatility. Besides, the soil in the Ebinur Lake Basin and the Keriya Oasis had a broad weak absorption peak in the 800–1300 nm, but this was not obvious in the Ugan-Kuqa Oasis. Throughout the entire spectrum, the spectral curves of soils in the two areas (Ebinur Lake Basin and Keriya Oasis) were close to each other, whereas there was a large difference between these and the spectral curve of the soil in Ugan-Kuqa Oasis.
3.3. Correlation between Soil Salt Content and Reflectivity Data in Different Regions
In this paper, the spectral indices (NDSIs and RSIs) were optimized by hyperspectral data from different study areas, and the two-dimensional maps of correlation (r) between optimized spectral indices and SSC are shown in Figure 3. From the analysis, the NDSIs and RSIs with the highest correlation were found in the visible and near-infrared (VNIR) spectral range in the Ebinur Lake Basin. Accordingly, the NDSI (R2086, R1880) and RSI (R2013, R1892) within this region had a maximum correlation of , respectively. Furthermore, a spectral region (2000–2200 nm) with the combination of green and red over the NIR region showed a relatively good correlation with SSC . In Ugan-Kuqa Oasis, the spectral ranges (800–900 nm and 1000–1400 nm) correlated relatively well with SSC when only combined with the NIR region . The NDSI (R1246, R1237) and RSI (R1246, R1237) within this region had a maximum correlation of , respectively. Additionally, the spectral region formed with 1500–1800 nm and 1000–1700 nm regions showed good correlation with SSC . In Keriya Oasis, the strength of the relationship between optimized spectral indices (NDSIs and RSIs) were moderate for SSC (Figures 3(e) and 3(f)). SSC was best correlated with NDSI ((R2006, R1896), (R2007, R1896), and (R2008, R1896)) and RSI ((R2001, R1896), (R2003, R1896), (R2005, R1896), (R2006, R1896), (R2007, R1896), (R2008, R1896), (R2001, R1897), (R2003, R1897), and (R2001, R1898)) , and the most significant region was broad between 2000 and 2200 nm.
According to the soil spectral data for the three research areas, the spectral indices (NDSIs and RSIs) were optimized. As seen, the spectral region formed with 800–1000 nm and 1300–1400 nm, the 1500–1700 nm, 1800–1900 nm, and 2000–2100 nm regions showed good correlation with SSC . NDSI (R1332, R1287) and RSI (R1329, R1287) within this region had a maximum correlation . The optimized spectral indices revealed a good correlation with soil salt content in the study areas and illustrated the potential predictive ability for estimating the SSC.
3.4. Inversion Accuracy Analysis of Local Modeling and Global Modeling
3.4.1. Establishment of Local and Global Models
In this paper, the PLSR models with predictive variables were selected based on the rule (VIP) that the addition of another variable should increase the accuracy of the models. The influence of each spectral index in the PLSR model is illustrated in Figure 4 with corresponding VIP values. The VIP method revealed the importance of optimized indices for SSC. In Ebinur Lake Basin, the local VIP values (VIP) were found with NDSI ((R2013, R1892), (R2074, R1880), (R2073, R1880), (R2075, R1880), (R2086, R1880), (R2082, R1880), (R2083, R1880), (R2087, R1880), (R2089, R1880), (R2088, R1880)) and RSI ((R2013, R1892), (R2074, R1880), (R2073, R1880), (R2075, R1880), and (R2086, R1880)) in optimized spectral indices. In Ugan-Kuqa Oasis, the optimized spectral indices NDSI ((R1246, R1237), (R2227, R2219), and (R819, R811)) and RSI ((R1246, R1237), (R2227, R2219), (R836, R825), (R1247, R1237), (R1176, R1169), (R1245, R1237), and (R836, R826)) revealed the maximum VIP values (VIP). In the Keriya Oasis, the optimized spectral indices NDSI ((R2006, R1896), (R2008, R1896), (R2007, R1896), and (R2005, R1897)) and RSI ((R2003, R1896), (R2005, R1896), (R2006, R1896), (R2001, R1896), (R2001, R1897), and (R2001, R1898)) revealed the maximum VIP values (VIP).
The best predictive variables for the PLSR model were selected, and ten models were developed for estimating SSC in the research areas; the selected variables and details of PLSR model accuracy are described in Figure 4 and Table 3. The scatter plot (Figure 5) of predicted and measured SCC for the partial least squares regression (PLSR) predictive models was established and illustrated the predictive ability of the PLSR models in SSC estimation using the R2Pre and RMSEPre for the independent validation dataset.
3.4.2. Analysis of Predictive Model Accuracy
Through the application of the VIP method in the previous step, we selected the optimal band combination from the two indices. Next, we used the selected band combination as the model’s independent variable and the SSC as the dependent variable to establish a model that reflects the relationship between the two using PLSR. Here, we have a total of ten models, nine partial models, and one global model. Three models were set up for each study area, which are based on the combination of NDSI optimal bands, the combination of RSI optimal bands, and the combination of the optimal band of NDSI and RSI. As seen from Figure 5, the accuracy of the models established by the NDSI in Ebinur Lake Basin was R2Pre = 0.77 and RMSEPre = 3.81 g·kg−1. The accuracy of the model based on the RSI was R2Pre = 0.76 and RMSEPre = 3.84 g·kg−1. The total model accuracy of the two exponential combinations is R2Pre = 0.83 and RMSEPre = 2.31 g·kg−1. The accuracy of the models established by the NDSI in Ugan-Kuqa Oasis is R2Pre = 0.68 and RMSEPre = 7.56 g·kg−1. The accuracy of the model based on the RSI is R2Pre = 0.62 and RMSEPre = 9.76 g·kg−1. The total model accuracy of the two exponential combinations is R2Pre = 0.71 and RMSEPre = 5.46 g·kg−1. The accuracy of the models established by the NDSI in Keriya Oasis is R2Pre = 0.64 and RMSEPre = 4.86 g·kg−1. The accuracy of the model based on the RSI is R2Pre = 0.63 and RMSEPre = 4.31 g·kg−1. The total model accuracy of the two exponential combinations is R2Pre = 0.68 and RMSEPre = 3.31 g·kg−1. Through the evaluation of the model, it was found that the accuracy of the model combined with two kinds of optimized spectral indices in each sample area was better than that of the single index. Therefore, when establishing the global model, we only used a combination of two models to optimize the spectral index, and the accuracy of the global model was R2Pre = 0.63 and RMSEPre = 8.36 g·kg−1. In summary, the prediction accuracy of the combined model in Ebinur Lake Basin was the highest, with R2Pre reaching 0.83, which was the best for all the models. The combined band for this model was NDSI ((R2013, R1892), (R2074, R1880), (R2073, R1880), (R2075, R1880), (R2086, R1880), (R2082, R1880), (R2083, R1880), (R2087, R1880), (R2089, R1880), and (R2088, R1880)) and RSI ((R2013, R1892), (R2074, R1880), (R2073, R1880), (R2075, R1880), and (R2086, R1880)). Thus, we can think of whether the local model can be applied to the global model and compare it with the prediction of the global model.
In order to further verify the accuracy of the prediction model, a comparative interpolation map of the measured and predicted values of soil salinity in each region was prepared. It can be seen from Figure 6 that the prediction effects of the three regions are sound, and the predicted interpolation map is consistent with the measured map.
3.4.3. Analysis of Cross-Validation Model Accuracy
According to the accuracy analysis and verification of the above model, it can be seen that the local model established by combining two optimized spectral indices has better predictive ability. To select the best stability model, the verification set of 15 samples was used uniformly, and the robustness of the model was evaluated by cross validation of three study areas. As a result, a total of 18 verification schemes were obtained. The results are shown in Table 5. The three models derived from the Ebinur Lake Basin were applied to the Ugan-Kuqa Oasis and the Keriya Oasis. The combined model shows good predictability in Ugan-Kuqa Oasis and Keriya Oasis, and the model accuracies were R2Pre = 0.43 and RMSEPre = 9.68 g·kg−1 and R2Pre = 0.41 and RMSEPre = 10.89 g·kg−1, respectively. The three models derived from the Ugan-Kuqa Oasis were applied to the Ebinur Lake Basin and the Keriya Oasis. The combined model shows good predictability in Keriya Oasis and Ebinur Lake Basin, and the model accuracies were R2Pre = 0.42 and RMSEPre = 11.14 g·kg−1 and R2Pre = 0.38 and RMSEPre = 11.97 g·kg−1. The three models of the Keriya Oasis were validated in the Ebinur Lake Basin and Ugan-Kuqa Oasis. The combined model shows good predictability in the Ebinur Lake Basin and Ugan-Kuqa Oasis, and the model accuracies were R2Pre = 0.31 and RMSEPre = 11.02 g·kg−1 and R2Pre = 0.29 and RMSEPre = 12.31 g·kg−1, respectively. It can be seen that each local model was not very useful in the prediction of other sample plots, and the combined model always yielded better prediction accuracy than the single model. The best model was the combination model of Ebinur Lake Basin in Ugan-Kuqa Oasis. However, the model prediction accuracy was only R2Pre = 0.43 and RMSEPre = 9.68 g·kg−1, far less than that of the global model (R2Pre = 0.69, and RMSEPre = 8.45 g·kg−1), indicating that the local model does not have excellent stability and universality and is only applicable to the respective study area. The results of Table 5 show the existence of regional heterogeneity. The local model established in one region cannot be applied to other regions, while the global model can obtain better R2. This shows that the optimization spectral indices can effectively reduce the heterogeneity to some extent.
This study showed that the spectral characteristics of salinized soils in different regions were significantly different. The reasons for the spectral heterogeneity of salinized soils in different regions may lie in the following aspects: (i) First, due to regional differences, the soil formation conditions, soil formation processes, and soil parent materials are different, resulting in substantial differences in soil minerals. Minerals, as the soil components with the highest quality, have the most significant impact on soil spectral characteristics. (ii) Second, the soil salinity composition was different in different regions. The salt in the soil is a mixture of many single salts. Different salts have their unique spectral characteristics. Therefore, the soil salinity formed by mixing different proportions of salts also has different spectral characteristics. (iii) Third, the different parts of the salt and soil particles combined in different ways, and the soil salinity content in Ugan-Kuqa Oasis was higher than that in Ebinur Lake Basin and Keriya Oasis. However, the salt content in the soil needs to reach a particular content to exist in crystalline form, which shows as visible salt particles. However, soil with lower content is in the form of salt molecules, so the spectral characteristics of soils in Ugan-Kuqa Oasis are different from those in Ebinur Lake Basin and Keriya Oasis. (iv) Fourth, the composition of organic matter in soils in different areas is different; that is, the content ratio of humic acid, humin, and fulvic acid is different. Climate is one of the essential factors affecting the composition of soil organic matter. The organic matter of different composition states also has its spectral characteristics and will be reflected in the fact that Ebinur Lake Basin, Ugan-Kuqa Oasis, and Keriya Oasis are in different climatic zones, and there will be some differences in the organic matter composition of the soil.
The methods developed in this paper have the following advantages: (i) The continuous narrow-band spectrum contains more salt information, and the two-band optimization algorithm of massive spectral data can fully extract the relevant bands with the most considerable correlation with soil salinity on the two-dimensional level and achieve rapid optimization in complex hyperspectral parameters . The effect of deep mining hyperspectral data further improves the accuracy of hyperspectral estimation of soil salinity, reducing the impact of environmental factors on modeling, and has better sensitivity than traditional single band. (ii) VIP is a useful method for selecting essential bands, and the VIP method is used to further filter the independent variables in this paper, which is beneficial to improve the practicability of the model. (iii) Compared with the independent model, the combined model can achieve the best prediction effect. This aspect may be because the combination of multiple spectral indices can make full use of the characteristics of each spectral index to achieve complementary advantages. On the contrary, the modeling algorithm used in the study may be related because it is more efficient to use partial least squares regression when the variables are highly linearly correlated internally. However, whether further optimization of the spectral index method can be successfully applied to the research of hyperspectral remote sensing monitoring of soil salinization at a larger scale and across a full area must be further studied and verified.
By analyzing the differences in the hyperspectral characteristics of salinized soils in three regions (Ebinur Lake Basin, Ugan-Kuqa Oasis, and Keriya Oasis) and using the optimized spectral indices approach to improve the modeling accuracy, the hyperspectral quantitative inversion of salinity in salinized soil with spectral heterogeneity was studied. The major conclusions include the following: (i) The salinized soils in different regions had apparent differences in both reflectance and spectral curve morphology, but the optimized spectral index method effectively overcame the regional heterogeneity of salinized soil hyperspectral characteristics, and the correlation with SSC was always right (correlation coefficient was up to 0.748). (ii) The regional heterogeneity of salinized soil hyperspectral characteristics caused the sensitivity of salinized soils in different regions to be inconsistent. At the same time, the global modeling accuracy was significantly lower than the local modeling accuracy. (iii) The VIP filtering method can effectively select the optimal independent model, and the modeling effect of combined models was always optimal. (iv) The cross-validation results showed that the local model had definite regional limitations, and the global model had the best prediction ability, with R2Pre = 0.69 and RMSEPre = 8.45 g·kg−1. This research idea provides a valid reference for hyperspectral remote sensing monitoring for large-scale soil salinization across regions.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was jointly supported by multiple grants from the National Science Foundation of China (nos. 41761077 and U1603241), College of Resources and Environment Science, and the Key Lab of Oasis Ecology at Xinjiang University. The authors thank research partners who assisted long and strenuous hours in collecting field data.
X. Wang, F. Zhang, J. Ding, H.-t. Kung, A. Latif, and V. C. Johnson, “Estimation of soil salt content (SSC) in the Ebinur Lake Wetland National Nature Reserve (ELWNNR), Northwest China, based on a Bootstrap-BP neural network model and optimal spectral indices,” Science of the Total Environment, vol. 615, no. 12, pp. 918–930, 2018.View at: Publisher Site | Google Scholar
Z. E. Mashimbye, M. A. Cho, J. P. Nell, W. P. De Clercq, A. Van Niekerk, and D. P. Turner, “Model-based integrated methods for quantitative estimation of soil salinity from hyperspectral remote sensing data: a case study of selected South African soils,” Pedosphere, vol. 22, no. 5, pp. 640–649, 2012.View at: Publisher Site | Google Scholar
W. Dong and Z. H. Liu, “Comprehensive evaluation of water resources carrying capacity in Ebinur Lake Basin,” Arid Land Geography, vol. 33, no. 2, pp. 217–223, 2010.View at: Google Scholar
J. Ding, W. Chen, and Y. Chen, “Soil salinization disaster warning in arid zones: a case study in the ugan-kuqa oasis,” Journal of Desert Research, vol. 36, no. 4, pp. 1079–1086, 2016.View at: Google Scholar
R. Li, F. Zhang, Y. Gao et al., “Surface hydrochemistry characteristics and controlling factors in the Ebinur Lake region during dry and wet seasons,” Journal of Glaciology and Geocryology, vol. 38, no. 5, pp. 1394–1403, 2016.View at: Google Scholar
H. Zhang, F. Zhang, and Z. Li, “Differentiation of soil water and salt characteristics under background of different environmental variation in Ebinur Lake region,” Arid Land Geography, vol. 40, no. 3, pp. 606–613, 2017.View at: Google Scholar
S. Wold, “PLS for multivariate linear modeling,” Chemometric Methods in Molecular Design, vol. 2, no. 1, pp. 195–218, 1995.View at: Google Scholar
F. Li, B. Mistele, Y. Hu, X. Chen, and U. Schmidhalter, “Optimising three-band spectral indices to assess aerial N concentration, N uptake and aboveground biomass of winter wheat remotely in China and Germany,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 92, no. 2, pp. 112–123, 2014.View at: Publisher Site | Google Scholar