Abstract

Spatial interpolation of meteorological parameters, closely related to the earth surface, plays important roles in climatological study. However, most of traditional spatial interpolation methods ignore the geographic semantics of interpolation sample points in practical application. This paper attempts to propose an improved inverse-distance weighting interpolation algorithm considering geographic semantics (S-IDW), which adds geographic semantic similarity to the traditional IDW formula and adjusts weight coefficient. In the interpolation process, the geographic semantic differences between sample points and estimation points are considered comprehensively. In this study, 3 groups of land surface temperature data from 2 different areas were selected for experiments, and several commonly used spatial interpolation methods were compared. Experimental results indicated that S-IDW outperformed IDW and several existing spatial interpolation methods, but there were also some abnormal value and interpolation outliers. This method provides a new insight toward the estimation accuracy, data missing, and error correction of spatial attributes related to meteorological parameters.

1. Introduction

Spatial interpolation of meteorological parameters is to obtain relatively accurate descriptions of spatial attributes related to climatological dynamics and weather patterns by using some reasonably located samples [1]. Traditionally, sampling observation is the best way to obtain the regional mean conditions in order to ensure equal sampling opportunities for each location in the region. However, the observation sampling points are sparse and of random distribution in practical application [1]. For example, the location of the sample points is systematic and changes smoothly. Furthermore, most meteorological models are obtained by sampling from observation stations at present. Spatial interpolation method is widely used to transform discrete observation data into continuous surface so as to better measure the spatial distribution pattern of data elements [2]. Currently, familiar spatial interpolation methods, such as IDW, Kriging, Spline, and trend surface method, have been widely used in different fields. Most of them have some limitations in application, such as distance weighting method with some problems, which affects calculation results due to distance, being not suitable for a large range [3]. Kriging method can adopt different variogram forms and parameters for different sampling data points, with certain flexibility. However, it loses the high efficiency of the original inverse-distance weighting method by first determining the variogram form and fitting the parameters of variogram. Kriging variograms require artificial selection, and there exists the problem that computation increases sharply when there are too many combinations of variograms [4]. Spline method is not suitable for sparse and finite sampling points and is often used for high-density sample point interpolation [5]. The trend surface method relies more on the existing spatial distribution trend of interpolation elements [6]. Consequently, many authors have carried on the continuous exploration and improvement to the spatial interpolation method [7]. For instance, the complexity of terrain and elevation factor were introduced by some researchers into the inverse-distance weighting [8, 9], and Li et al. brought the harmonic weighting coefficient of azimuth into the distance weighting interpolation [2]. The natural neighborhood relationship was led into the distance weighting interpolation [10], and some authors introduced the fuzzy trigonometric function into the distance weighting interpolation [3]. Others took into account the spatiotemporal variation characteristics of geographical factors and introduced time-series data to remove some numerical fluctuations in time, such as spatiotemporal weighting Kriging and spatiotemporal inverse-distance weighting interpolation [9]. The succession of methods proposed by the above-mentioned authors had achieved remarkable academic impact and showed high spatial autocorrelation, but most of them were based on numerical interpolation methods, without considering geographic semantics.

Inspired by the gradient theory in the field of image processing, the gradient is the first-order differential of gray value, reflecting the change rate between adjacent pixels in the direction of X and Y [11]. Where the gradient change rate of image is larger in the region, the types of land cover tend to change, such as the boundary between land and water in the image. Existing research based on remote sensing image inversion, such as land surface temperature (LST), vegetation index, and moisture index, is to some extent a model to describe the relationship between remote sensing signals or remote sensing data and surface applications [12]. For example, the temperature nearby residential buildings is quite different from forest land or water body. The air temperature of some exposed land surfaces, like build roofs and pavement, is hotter than that of the shades of forests. Therefore, geographic semantics are indispensable to exploring geospatial description of surface remote sensing pixel information. Currently, some authors have put forward semantic Kriging method, which has achieved excellent research results, but there are still some problems such as the complexity of calculating the variogram imported by the semantic similarity [1315]. In addition, the prediction of multivariable meteorological factors by embedding geographic semantics into Bayesian networks weakens the influence of parameter uncertainty but lacks the knowledge of meteorological modeling [16]. Although the aforementioned spatial interpolation methods show performance in different applications, there still exists scope of improvements by introducing geographic semantics into spatial interpolation process. Furthermore, information semantics are growing in the field of spatial statistics and environmental modeling [17, 18].

This paper attempts to introduce the geographic semantics into inverse weighting spatial interpolation by embedding hierarchical geographic semantics into spatial interpolation model and using semantic similarity to measure factor weight. The following analyses were carried out in this study: (1) the S-IDW methods used in this study are explained in the next section; (2) the findings are discussed in the Experimental Results and Comparison section; (3) finally, our conclusions and subsequent research are drawn in the Conclusions section.

2. Methodology

The S-IDW integrates the geographic semantic knowledge into the inverse-distance weighting interpolation method. Considering the effect of distance on interpolation results, the influence of land-use type on land surface temperature interpolation is added. The S-IDW reconsiders the interpolation weight, increases the weight of the same land-use type, and reduces the weight of different land-use types on the basis of distance, constructing the S-IDW method [19].

In the S-IDW, the first step is to calculate the semantic similarity of geographic entities. The formula is as follows:

In equations (1)–(4), is the estimated value of the th point to be interpolated; is the measured value of the th discrete point; is the distance between the th discrete point and the th point to be interpolated; is the latitude of the point to be interpolated; is latitude of discrete points; is the longitude of the point to be interpolated; is the longitude of the discrete point; is the number of measured sample points participating in the interpolation; is the power exponent, which controls the degree to which the weight coefficient decreases with the increase of the distance between the point to be interpolated and the sample point. When is larger, the closer sample point is endowed with higher weight; when is smaller, the weight is more evenly distributed to all sample points. When , it is called inverse-distance weighting method, which is a common and simple spatial interpolation method. When , it is called inverse-distance squared method, which is often used in practical application. In this study, is taken.

is the semantic similarity between the th point to be interpolated and the th discrete point, and the value range is . Semantic similarity refers to the degree to which two concepts can replace each other in the same context without changing the semantic structure of the text [19]. The larger the change of semantic structure, the smaller the similarity; the smaller the change of semantic structure, the greater the similarity. In this study, a comprehensive semantic similarity algorithm for geographic ontology is adopted. On the basis of analyzing the influencing factors of semantic distance similarity, the weighted sum method is used to calculate semantic distance similarity, concept attribute similarity, and information similarity. The calculation formula is as follows [20]:

The semantic similarity is calculated by referring to the hierarchical structure of geographical entities in Table 1 and Figure 1. In formulas (5)–(8), is the semantic distance, which refers to the shortest path between any two concept nodes and b in the ontology hierarchy, and is the regulating factor. In this paper,  = 8. represents the similarity of concept attributes between concept nodes and b. The function is the set of entity attributes, and is the number of attributes. In addition, , is a real number, and its value is controlled at [0, 2max (IC (a, b))], where . The information quantity is defined as the function of the occurrence probability of concept a. In equation (5), when the land-use types of the th point to be interpolated and the th discrete point are equivalent,; when the land-use types of the th point to be interpolated and the th discrete point are not equivalent, .

In equation (5), , , and ; , and are the adjustment coefficients of semantic distance similarity, concept attribute similarity, and information similarity, respectively; . and are geographical entities. For the convenience of calculation, GB/T21010-2017 classification of land use in China and its meaning are used to extract geographical entities. The semantic attributes of geographical entities are shown in Table 1, and the ontological hierarchical network structure of land-use status classification is shown in Figure 1. Based on equation (5), the semantic similarity for some geographical entities of land-use type ontology is calculated as shown in Table 2.

3. Experimental Results and Comparison

3.1. Experimental Design and Error Metric

The experimental study has been carried out using land surface temperature (LST) data from Landsat 8 OLI-TIRS satellite. Due to the complex and changeable surface environment, LST shows different characteristics in different surface environments. In order to explore the spatial interpolation accuracy in different areas and different land surface temperatures, LST at diverse time intervals were selected in the 2 study areas, and 3 distinct LST conditions of high temperature, low temperature, and normal temperature were used to carry out experiments. The interpolation accuracy of traditional numerical interpolation methods is often closely related to the density and sparsity of the discrete points. In this paper, the discrete points and the points to be valued are selected randomly and distributed evenly. 15 points to be valued and 60 discrete points were randomly selected in the experiment. Assuming that the LST values of the 15 points are missing or abnormal, we use 60 discrete points of known LST values to interpolate the 15 points in order to compensate for and correct the missing or abnormal values. The popular approaches for spatial interpolation include Kriging, IDW, Natural, Spline, and S-IDW. On this basis, we compared and analyzed the results of 5 interpolation methods with the original LST values of 15 points to be valued. As shown in Figure 2, the experimental flow chart of semantic inverse-distance weighting interpolation is shown.

In the experiment, the accuracy of the estimated value is evaluated by means of root mean squared of errors (RMSE) [21], mean absolute error (MAE), mean absolute percentage error (MAPE) [16], and the ratio of variance of the estimated values to variance of the observed values (RVAR) [22, 23]. The formal definition of each indicator is given as follows:

In formulas (10)–(13), is the total number of measured values; is the measured value of the th discrete point; is the estimated value of the th point to be interpolated; is the average value of measured value at discrete point; is the average value of the estimated value of the point to be evaluated; is the variance of the estimated value of the point to be interpolated; is the variance of measured value at discrete point. The best fitting between measured value and estimated value under ideal conditions can be obtained as follows: RMSE≈0, MAE≈0, MAPE≈0, and RVAR≈1.

3.2. Analysis and Discussion of the Interpolation Results in Study Area-1

In order to verify the interpolation effect of S-IDW under three temperature environments, the imaging dates of the remote sensing images in study area-1 are January 11, 2018, April 17, 2018, and August 9, 2013. The corresponding image clouds are 0.54%, 0.05%, and 4.76%, respectively. Under these conditions, LST inversion is carried out and the inversion data are extracted and processed. The distribution of 60 discrete points and 15 points to be valued in study area-1 is shown in Figure 3.

Interpolation data results and interpolation accuracy of five methods under low-temperature conditions in study area-1 can be seen in Tables 35. As shown in Table 3, the interpolation results of S-IDW for 8 of the 15 points to be valued are closer to the land surface temperature than those of the other 4 interpolation methods. Generally, through the mathematical statistics analysis and Pearson correlation analysis of the five interpolation methods, it is indicated that the MAE, MAPE, and RMSE of S-IDW are closer to the best fitting values between measured and estimated values under ideal conditions than the other 4 interpolation methods. As far as RVAR is concerned, Natural and Spline are better than S-IDW, but S-IDW is better than Kriging and IDW. In terms of Pearson correlation, the results of S-IDW, Kriging, IDW, and Natural interpolation are significantly correlated with LST at 0.01 level (two-tailed), of which the correlation coefficient r between S-IDW interpolation results and LST is 0.959, with the strongest correlation, and the significant correlation coefficient r between Spline interpolation results and LST was 0.616 at 0.05 level (two-tailed), with the weakest correlation.

Interpolation data results and accuracy of five methods under normal temperature conditions in study area-1 can be seen in Tables 68. As shown in Table 6, the interpolation results of S-IDW for 8 of the 15 points to be estimated are closer to the land surface temperature than those of the other 4 interpolation methods. Generally, through the mathematical statistics analysis and Pearson correlation analysis of the 5 interpolation methods, it is found that the MAE, MAPE, and RMSE of S-IDW are closer to the best fitting values between measured and estimated values under ideal conditions than the other 4 interpolation methods. In terms of MAPE, Natural is better than S-IDW, but S-IDW is better than Kriging, IDW, and Spline. In terms of RVAR, Spline is better than S-IDW, but S-IDW is better than Kriging. In terms of Pearson correlation, the results of S-IDW, Kriging, IDW, Natural, and Spline are significantly correlated with LST at 0.01 level (two-tailed), of which the correlation coefficient r between S-IDW interpolation results and LST is 0.930, with the strongest correlation.

The interpolation data results and accuracy of five methods under high-temperature conditions in study area-1 can be seen in Tables 911. As shown in Table 9, the interpolation results of the S-IDW for 4 of the 15 points to be valued are closer to the land surface temperature than those of the other 4 interpolation methods. Generally, through the mathematical statistics analysis and Person correlation analysis of the 5 interpolation methods, it is indicated that the MAE and RMSE of S-IDW are closer to the best fitting values between measured and estimated values under ideal conditions than the other 4 interpolation methods. As far as MAPE is concerned, 1.252% of S-IDW is higher than 0.781% of IDW and 0.057% of Spline but lower than 1.818% of Kriging and 1.253% of Natural. In terms of RVAR, 0.983 of Natural is better than 0.827 of S-IDW, but S-IDW is better than Kriging and Natural. In terms of Person correlation, the interpolation results of S-IDW, Kriging, IDW, Natural, and Spline are significantly correlated with LST at 0.01 level (two-tailed), of which the correlation coefficient r between S-IDW and LST was 0.914, stronger than 0.843 of IDW, 0.794 of Kriging, 0.791 of Natural, and 0.669 of Spline.

3.3. Analysis and Discussion of Interpolation Results in Study Area-2

In order to verify the interpolation effect of S-IDW under three temperature environments, the imaging dates of the remote sensing images in study area-1 are February 11, 2017, April 19, 2018, and July 10, 2013. The corresponding image clouds are 0.54%, 0.05%, and 4.76%, respectively. Under these conditions, LST inversion is carried out and the inversion data are extracted and processed. The distribution of 60 discrete points and 15 points to be valued in study area-2 is shown in Figure 4.

Interpolation data results and accuracy of 5 methods under low temperature conditions in study area-2 can be seen in Tables 1214. As shown in Table 12, the interpolation results of S-IDW for 9 of the 15 points to be valued are closer to the land surface temperature than those of the other 4 interpolation methods. The interpolation result of S-IDW at point 11 to be valued is 9.052°C, which deviates from the LST value more than the interpolation results of other 4 interpolation methods. Generally, through the mathematical statistics analysis and Pearson correlation analysis of the 5 interpolation methods, it is known that the MAE, MAPE, RVAR, and RMSE of S-IDW are closer to the best fitting values between measured and estimated values under ideal conditions than those of the other 4 interpolation methods. In terms of Pearson correlation, the results of S-IDW, Kriging, IDW, Natural, and Spline are significantly correlated with LST at 0.01 level (two-tailed), of which the correlation coefficient r between S-IDW interpolation results and LST is 0.890.

The interpolation data results and accuracy of 5 methods under normal temperature conditions in study area-2 can be seen in Tables 1517. As shown in Table 15, the interpolation results of S-IDW for 6 of the 15 points to be valued are closer to the land surface temperature than those of the other 4 interpolation methods. The LST value of point 11 to be valued is 25.245°C, and the S-IDW interpolation result of this point to be valued is 30.266°C, which deviates from the LST value more than the interpolation results of the other 4 interpolation methods. Generally, through the mathematical statistics analysis and Pearson correlation analysis of the 5 interpolation methods, it is known that the MAE and RMSE of S-IDW are closer to the best fitting values between measured and estimated values under ideal conditions than those of the other 4 interpolation methods. In terms of MAPE, 1.856% of S-IDW is higher than 1.723% of Natural but lower than Kriging, IDW, and Spline. As far as RVAR is concerned, 1.051 of Natural and 0.844 of IDW are closer to the best fitting values between measured values and estimated values under ideal conditions than 0.806 of S-IDW. In terms of Pearson correlation, the results of S-IDW and Natural interpolation are significantly correlated with LST at 0.05 level (two-tailed), of which the correlation coefficient r between S-IDW interpolation results and LST is 0.620, with the strongest correlation.

The interpolation data results and accuracy of the 5 methods under high-temperature conditions in study area-2 can be seen in Tables 1820. As shown in Table 18, the interpolation results of S-IDW for 5 of the 15 points to be valued are closer to the land surface temperature than those of the other 4 interpolation methods. The LST value of point 2 to be valued is 37.981°C and the interpolation result of S-IDW at point 2 is 35.398°C, which deviates from the LST value more than Spline interpolation but is better than Kriging, IDW, and Natural interpolation, similar to that of point 10. Generally, through the mathematical statistics analysis and Pearson correlation analysis of the 5 interpolation methods, it is found that the MAE, MAPE, and RMSE of S-IDW are closer to the best fitting values between measured and estimated values under ideal conditions than those of the other 4 interpolation methods. As far as RVAR is concerned, 0.870 of Kriging and 0.813 of Natural are closer to the best fitting values between measured values and estimated values under ideal conditions than 0.695 of S-IDW. In terms of Pearson correlation, the results of S-IDW, Kriging, IDW, Natural, and Spline interpolation are significantly correlated with LST at 0.01 level (two-tailed), of which the correlation coefficient r between S-IDW interpolation results and LST is 0.906, with the strongest correlation.

4. Conclusions

In this paper, S-IDW considering geographic semantics is proposed, which is a novel spatial interpolation algorithm of meteorological parameters. The geographical semantic similarity and weight between known observation points and estimated points are considered comprehensively, which makes the interpolation result of IDW more reasonable. We selected 2 research areas with abundant land-use types to analyze the interpolation under different temperature conditions and used 4 different statistical methods to evaluate the interpolation accuracy. At the same time, the interpolation results of 5 interpolation methods were analyzed and compared by Pearson correlation analysis. The experimental results show that the accuracy of S-IDW is generally higher than the inverse-distance weighting method, Kriging, natural neighbor interpolation, and spline function interpolation, but there are also some abnormal value and interpolation outliers. Comparing the interpolation results of five methods, it is found that the interpolation results of S-IDW are closer to the measured value of LST than those of four other interpolation methods. The MAE, MAPE, RVAR, and RMSE of S-IDW are closer to the best fitting value between the measured and estimated values under ideal conditions than those of the other 4 interpolation methods, and the correlation between the interpolation results of S-IDW and LST is also the strongest. Under the above experimental conditions, the interpolation results of S-IDW are more accurate and stable.

Note that we check the sample points involved in the calculation and find that the semantic interpolation is a little less effective than the traditional numerical interpolation when there are many surface types of the same kind. When there are more homogeneous interpolation points, there is similarity to numerical interpolation. Other interpolation methods have obvious advantages in numerical interpolation. For example, Kriging interpolation method has a wide range of applicability, which can better reflect a variety of terrain changes. Spline interpolation method is suitable for gradually changing surfaces, such as temperature, elevation, groundwater level height, or pollution level. IDW interpolation is suitable for the data with large density and uniform distribution. In our experiment, when the type of interpolation point is single, the advantage of semantic interpolation is not obvious, even less than numerical interpolation. Meanwhile, when there are more types, the semantic interpolation method is obviously better.

However, there are still defects in our study, which need to be improved in further researches. First, for the future development framework of semantic interpolation, we hope to consider the continuity of time to make up for some missing data and combine the time factor [9, 16] with semantic interpolation method to study spatiotemporal semantic interpolation. In addition, we also attempt to integrate the density, direction, elevation, and other influencing factors [610] of interpolation points into semantic interpolation and develop multifactor semantic interpolation methods. Moreover, in order to handle the complexity and uncertainty of predicting spatial attributes in most real-world problems, deep learning and artificial intelligence technology [16, 24] including logical and statistical learning algorithms can be considered as future extension of the work in the age of Big Data.

Data Availability

The S-IDW data used to support the findings of this study are available from the corresponding author upon request via email.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Junli Li and Ruijie Gan conceived and designed the new analytical approach. Wenjun Wu, Ruijie Gan, Junli Li, and Xiu Cao wrote the paper. Xinxin Ye, Jie Zhang, and Hongjiao Qu advised on the methods applied in the study. Ruijie Gan performed the experimental analyses. All authors read and approved the final manuscript.

Acknowledgments

This research was financially supported by the National Natural Science Foundation of China (Grant no. 41571400) and supported in part by the Open Research Fund Program of Anhui Province Key Lab of Farmland Ecological Conservation and Pollution Prevention.