Stochastic Systems: Modeling, Analysis, Synthesis, Control, and their Applications to EngineeringView this Special Issue
The Use of Geographically Weighted Regression for the Relationship among Extreme Climate Indices in China
The changing frequency of extreme climate events generally has profound impacts on our living environment and decision-makers. Based on the daily temperature and precipitation data collected from 753 stations in China during 1961–2005, the geographically weighted regression (GWR) model is used to investigate the relationship between the index of frequency of extreme precipitation (FEP) and other climate extreme indices including frequency of warm days (FWD), frequency of warm nights (FWN), frequency of cold days (FCD), and frequency of cold nights (FCN). Assisted by some statistical tests, it is found that the regression relationship has significant spatial nonstationarity and the influence of each explanatory variable (namely, FWD, FWN, FCD, and FCN) on FEP also exhibits significant spatial inconsistency. Furthermore, some meaningful regional characteristics for the relationship between the studied extreme climate indices are obtained.
There is a general agreement that changes in the frequency or intensity of extreme climate events are likely to exert a much greater impact on nature and humanity than shifts in the mean value . Starting from IPCC (1996) , many scientists have stressed the importance to study extreme climate events [3–5]. In the research field of extreme temperature and precipitation events, indices that are based on either fixed thresholds  or relative thresholds  are commonly used. To the best of our knowledge, most of the previous studies of climate extremes mainly focus on some individual extreme climate index; however, the investigation of the relationship between them is relatively rare.
As for the relationship between some extreme climate indices, researchers generally assume that it is stationary over space and use an ordinary linear regression (OLR) model to analyze it. Nevertheless, it is known that an OLR model can only represent global relationship and it hardly takes into consideration the variations in relationships over space, in other words, the explicit incorporation of space or location has not been that commonly considered. In this context, there has been recently a surge focusing on the inclusion of spatial effects in climate models. A geographically weighted regression (GWR) model, which extends the traditional regression framework by allowing regression coefficients to vary with individual locations (spatial nonstationarity), is an effective method of utilizing spatial information to improve this issue [8–13]. Hence, GWR produces locally linear regression estimates for every point in space. For this purpose, weighted least squares methodology is used, with weights based on the distance between observations and all the others in the sample. GWR allows the exploration of variation of the parameters as well as the testing of the significance of this variation. It is of great appeal to apply GWR technique to analyze spatial data in a number of areas such as geography econometrics, epidemiology, and environmental science [14–16].
China is strongly influenced by the East Asian monsoon . During the winter half year, the climate is mostly cold and dry. Cold days and strong winds accompanied by dust storms are the major climate features particularly observed in northern China . During the summer period, the rain belt moves gradually from south to north with the hot and humid climate in eastern China . The regional characteristics of extreme climate are particularly prominent in China. The purpose of this paper is to analyze the spatially varying impacts of some temperature extreme indices on one precipitation extreme index in China. In this paper, relative thresholds based on the 1961–1990 base period were firstly used to build some extreme indices, namely, FEP (frequency of extreme precipitation), FWD (frequency of warm days), FWN (frequency of warm nights), FCD (frequency of cold days), and FCN (frequency of cold nights). The spatial distributions of these indices were then analyzed. In order to investigate the relationship among these indices, a GWR model was utilized to study how FEP was affected by the other indices. Moreover, two statistical tests were carried out to confirm some of our guesswork and some promising results were obtained.
The rest of the paper is organized as follows. Section 2 presents the data source, gives the definitions of extreme climate indices used in this paper, and briefly outlines the method of GWR. Results for annual mean extreme climate indices over China are displayed in Section 3. Section 4 provides a conclusion.
2. Data and Method
2.1. Experimental Data
The experimental data sets used in this paper consist of daily maximum and minimum temperatures and daily precipitation observed at 753 meteorological stations in China from January 1, 1961 to December 31, 2005, which were offered by National Meteorological Information Center in China Meteorological Administration. Because the study must rely on reliable data, the missing data in each month should be no more than three days. Therefore, the data collected from the 504 stations (Figure 1) which comply with this requirement were utilized in this work. With respect to the missing values in these 504 stations, a linear interpolation method was adopted to impute them.
2.2. Extreme Climate Indices
Numerous temperature indices have been used in previous studies of climate events. Some indices involved arbitrary thresholds, such as the number of hot days exceeding 35°C and summer days exceeding 25°C. As indicated by Manton et al. , these are suitable for regions with little spatial variability in climate, but arbitrary thresholds are inappropriate for regions spanning a broad range of climates. In China, climates vary widely from monsoon region in the eastern part to the westerly region in the northwestern part of the country, so there is no single temperature threshold that would be considered an event in all regions. For this reason, some studies have used weather and climate indices based on statistical quantities such as the 10th (5th) or 90th (95th) percentile [20, 21]; detailed information can be found from the European Climate Assessment & Dataset (ECA&D) Indices List (http://www.knmi.nl/). Upper and lower percentiles of temperature indices are used in all regions, but vary in absolute magnitude from site to site. A regional climate study in the Caribbean region using the same indices can also be found in .
As this study covers a broad region in China, climate indices chosen are based on the 10th and 90th percentiles. The extreme climate indices studied in this paper include FEP, FWD, FWN, FCD, and FCN whose definitions are described in detail in Table 1. As for the experimental data of these extreme indices based on the 1961–1990 base period, the relative values of them were calculated. For each station, the values for FEP, FWD, FWN, FCD, and FCN are their respective values averaged over the period 1961–2005, which are still denoted as FEP, FWD, FWN, FCD, and FCN in order to facilitate the following discussions.
2.3. Geographically Weighted Regression (GWR)
The technique of linear regression estimates a parameter that links the explanatory variables to the response variable. However, when this technique is applied to spatial data, some issues concerning the stationarity of these parameters over the space come out. In “normal” regression, it is generally assumed that the modeling relationship holds everywhere in the study area—that is, the regression parameters are “whole-map” statistics. In many situations this is not the case, however, as mapping the residuals (the difference between the observed and predicted data) may reveal. The realization in the statistical and geographical sciences that a relationship between an explanatory variable and a response variable in a linear regression model is not always constant across a study area has led to the development of regression models allowing for spatially varying coefficients. Many different solutions have been proposed for dealing with spatial variation in the relationship. One of them, developed by Brunsdon et al. , has been labelled geographically weighted regression (GWR), which provides an elegant and easily grasped means of modeling such relationships by subtly incorporating the spatial characteristics of data via allowing regression coefficients to depend on some covariates such as longitude and latitude of the meteorological stations. Specifically, it is a nonparametric model of spatial drift that relies on a sequence of locally linear regressions to produce estimates for every point in space by using a subsample of data information from nearby observations. That is to say, this technique allows the modeling of relationships that vary over space by introducing distance-based weights to provide estimates for each variable and each geographical location . Thus the spatial variation of regression relationship can be effectively analyzed and the inherent disciplines of spatial data by the estimated coefficients over different locations can be better understood.
An ordinary linear regression (OLR) model can be expressed by where , , are the observation of the response variable , represents the regression coefficients, is the th value of the explanatory variable , and are normally distributed error terms with zero mean and constant variance.
In GWR model, the global regression coefficients are replaced by local parameters where denotes the longitude and latitude coordinates of the th meteorological station, represent the observed value of the response and explanatory variables at , is the intercept, and are unknown coefficient functions of spatial locations, which represent the strength and type of relationship that the th explanatory variable has to the response variable . Additionally, are error terms which are generally assumed to be independent and identically distributed variables with mean 0 and common variance . It is worth noticing that the OLR model is actually a special case of the GWR model where are constant for all .
The coefficient function vector for the th observation in GWR can be estimated via the locally weighted least square procedure  as where is a diagonal weight matrix, ensuring that observations near to the location have greater influence than those far away. Here, denotes the distance between two observed locations and , which can be calculated as where is the earth radius, namely, 6371 kilometers. In (2.5), with being Gaussian kernel function and being the bandwidth which can be estimated by some data-driven procedures such as the cross-validation (CV) method , the generalized cross-validation (GCV) procedure , or the corrected Akaike information criterion () . In this paper, the CV method utilized by  was employed to select the optimal which was chosen to minimize where is the fitted value of under bandwidth with the observation at location omitted from the fitting process.
Although GWR is very appealing in analyzing spatial nonstationarity, from the statistical viewpoint, two critical questions still remain. One is the goodness-of-fit test, that is, a OLR model is compared to a GWR model to see which one provides the best fit. Usually, a GWR model can fit a given data set better than an OLR model. However, the simpler a model, the easier it can be applied and interpreted in practice. If a GWR model does not perform significantly better than an OLR model, it means that there is no significant drift in any of the model parameters. Thus, we will prefer an OLR model in practice. On the other hand, if a GWR model significantly outperforms an OLR model, we will be concerned with the second question, that is, whether each coefficient function estimate exhibits significant spatial variation over the studied area [11, 25]. If the answer to this question is positive, the characteristics of the data will be investigated in more details.
To compare the goodness-of-fit of a GWR model and an OLR model, a simplified procedure is summarized as follows. (1)Formulate the hypothesis (2)Construct the test statistic Here, , is an identity matrix of order , and is an matrix. If is true, the test statistic is to be (3)Test the hypothesis. The value should be calculated as where is the observed value of in (2.12). Since it is difficult to derive the null distribution of theoretically, the three-moment approximation procedure [26, 27] devoted to approximate the distribution of normal variable quadratic form such as was used to compute the value defined in (2.13). Given a significance level , if , the null hypothesis should be rejected. Otherwise, we may conclude that the GWR model cannot improve the fitness significantly in comparison with the OLR model.
In order to test whether each coefficient function estimate exhibits significant variation over the studied area, we employed the method developed by  to achieve the goal. The main steps of it are summarized as follows. (a)For a given , formulate the hypothesis (b)Construct the test statistic Here, 1 is an column vector with unity for each element, and is an column vector which takes value 1 for the th element and zero for the other elements. Under the null hypothesis , the test statistic is simplified as (c)Test the hypothesis. The value is where is the observed value of in (2.17). Similar to the goodness-of-fit test, the three-moment approximation procedure was used to derive the value defined in (2.18). Given a significance level , if , reject ; accept otherwise.
3. Analysis of Results
In this part, we will carry out numerical experiments for the OLR model and GWR model. All programs are written in Matlab.
3.1. Spatial Distributions of Extreme Climate Indices
Based on the values of FWD, FWN, FCD, FCN, and FEP, Figure 2 presents the spatial distributions for each of them over the 504 stations in China.
As shown in Figure 2, FWD, FWN, FCD, FCN, and FEP exhibit some regional features. Generally, there are 16 to 29 times per year for FWD and the larger values for FWD are mainly located in the north as well as the east of China. There are 18–35 times per year for FWN. If using the Yangtze River as the boundary, FWN values in the north are generally larger than those in the south. As for FCD, there are 14 to 26 times per year. Specially, FCD has small values about 14–18 times per year in most parts of northwest China. With regard to FCN, it is about 13–28 times per year and it has small values in southern China. Furthermore, FEP values are between 9 and 33 times per year. In most of the country, its value varies from 25 to 33 times per year, and only in some stations in southern Xinjiang and Tibet, its values lie between 9 and 17 times per year.
3.2. The Fitted Geographically Weighted Regression Model
In order to make clear the relationship among these extreme climate indices in 504 stations in China so that some useful information can be provided to decision-makers to help them to deduce the disaster caused by extreme weather, a GWR model was fitted by considering FEP as the response variable and FWD, FWN, FCD, and FCN as the explanatory variables , respectively. Letting be equal to 504 and equal to 4 and letting be the observations of the variables at the location , the model (2.2) can be expressed as based on the data collected from the 504 stations.
When we apply a fixed Gaussian function, the minimum score of (2.8) is obtained when the bandwidth equals approximately 240 km. Thus, the weighting matrix is estimated, where . Based on (2.3), are calculated by the locally weighted least square approach. Hence, the strength and type of relationship that FWD (FWN, FCD, FCN) has with FEP over 504 stations in China can be studied.
Because Wheeler [28–30] raised the multicollinearity issues, correlation coefficients of the independent variables as well as that of the GWR coefficient estimates were presented in Tables 2 and 3, respectively.
As shown in Tables 2 and 3, correlation coefficients of the independent variables as well as that of the GWR coefficient estimates are all not large, except for that between and , as well as and , whose absolute values are more than 0.5. It indicates that has a positive correlation with , while it has a negative correlation with . We ignore the correlation between the independent variables in this paper.
After conducting the goodness-of-fit test, the computed value is smaller than the significance level 0.05. Thus, the GWR model can describe the regression relationship significantly better than the OLR model and it indicates that the relationship between FEP and FWD, FWN, FCD, and FCN has spatial nonstationarity. Define to measure the goodness of fit of the regression relationship on the given data set. The values for the OLR and GWR model are 0.3953 and 0.7750, respectively, which indicates that the GWR model can capture a larger amount (77.50%) of variance of FEP based on the climate indices FWD, FWN, FCD, and FCN, than the OLR model. The prediction errors (i.e., residual errors) for the OLR and GWR model are presented in Figure 3, which shows the prediction error of the GWR model and its standard error are both lower than that of the OLR model.
Furthermore, the statistical significance tests for the variations of the coefficient functions are carried out. The obtained results show that all the regression coefficient estimates vary significantly with the locations, that is, the influence of each explanatory variable (viz., FWD, FWN, FCD, and FCN) on the response variable FEP has spatial inconsistency. All values of relevant tests for the GWR model (3.1) are presented in Table 4.
In order to visualize these spatial inconsistencies, Figure 4 shows geographic distributions of the estimated GWR coefficient functions in China. As there is not much meaning of , the plot of it is omitted here.
Figure 4(a) shows that the values of are between −3.5 and 2.6. Negative values of can be observed in most of mainland China, and the most largest absolute values are located in the northern and western parts of the Xinjiang region. Few stations with positive values of are concentrated in the southern part of Tibet, Gansu, Chongqing, and the eastern part of north China and east China.
As Figure 4(b) manifests, the values of are between −1.3 and 0.28, and some stations with positive values of are concentrated in Jilin, northern inner Mongolia, eastern coast and Hainan. However, for China as a whole, it is obvious that many areas show negative values, especially in the Xinjiang, Tibet region as well as the middle Yellow River valley and the southern part of Northeast China.
From Figure 4(c), it can be seen that the values of are between −1.3 and 4.6. Its value is positive in most parts of the country, and it is larger in western China than in eastern China. Scattered stations with negative values can be found in the northern part of inner Mongolia and south China, especially concentrated in Yunnan and Guangdong Province.
As for , it can be found in Figure 4(d) that its values are between −2.3 and 0.57. Negative values occur in the western China and center China, while in the north of the northeast China, north of north China and south China, positive values can be found.
On the basis of the above analysis, some regional characteristics for the relationship between the studied extreme climate indices can be observed. In western China, FEP increases with the increase of FCD, while it decreases with the increase of FWD, FWN, and FCN. In southern China, FEP increases with the increase of FCN, while it decreases with the increase of FWD, FWN, and FCD. In the northern part of northeast China, FEP increases with the increase of FCD and FCN, while it decreases with the increase of FWD and FWN. The impacts of FCN and FCD on the FEP are roughly the opposite over almost all China.
Based on the Chinese daily temperature and precipitation data collected at 753 meteorological stations from 1961 to 2005, the relationship among the numbers of days that experience extreme temperature or precipitation events (i.e., FEP, FWD, FWN, FCD, and FCN) is investigated by a GWR model and their spatial distributions in China. The main conclusions can be summarized as follows. (1)FWD, FWN, FCD, FCN and FEP exhibit different spatial variations. There are larger values about 24–29 times per year for FWD mainly in northeast China. In the north of the Yangtze River, FWN has larger values of 24–35 times per year. FCD has larger values about 18–26 times per year in most part of China but northwest China. As for FCN, most of China has larger values about 18–28 times except for the south. Except in some stations in southern Xinjiang and Tibet, FEP has larger values of 17–33 times per year. (2)With respect to how FWD, FWN, FCD, and FCN affect FEP, the GWR model is significantly superior to the OLR model at the significance level 0.05. Furthermore, the statistical tests indicate that the influence of each explanatory variable (viz., FWD, FWN, FCD, and FCN) on FEP has spatial inconsistency. (3)Some regional features are detected for the relationship between the studied extreme climate indices. In western China, FCD has a positive effect on FEP, which is contrary to that of FWD, FWN, and FCN. However, it is just the opposite in southern China. The effects of FCD as well as FCN on FEP are positive in the northern part of Northeast China, while those of FWD and FWN are negative. Meanwhile, FCN and FCD have the opposite influence on FEP over most of China.
This work is supported by the National Natural Science Foundation of China (Grant nos. 60675013, 10531030) and the National Basic Research Program of China (973 Program) (Grant no. 2007CB311002).
J. T. Houghton, L. G. M. Filho, B. A. Callander, N. Harris, A. Kattenberg, and K. Maskell, Eds., Climate Change 1995: The Science of Climate Change. Contribution of Working Group I to the Second Assessment Report of the Intergovernmental Panel on Climate Change, Cambridge University Press, Cambridge, UK, 1996.
C. Brunsdon, A. S. Fotheringham, and M. E. Charlton, “Geographically weighted regression: a method for exploring spatial nonstationarity,” Geographical Analysis, vol. 28, no. 4, pp. 281–298, 1996.View at: Google Scholar
A. S. Fotheringham, M. Charlton, and C. Brunsdon, “The geography of parameter space: an investigation of spatial non-stationarity,” International Journal of Geographical Information Systems, vol. 10, no. 5, pp. 605–627, 1996.View at: Google Scholar
A. Fotheringham, C. Brunsdon, and M. Charlton, Geographically Weighted Regression: The Analysis of Spatially Varying Relationships, Wiley, Chichester, UK, 2002.
C. Brunsdon, M. Aitkin, S. Fotheringham, and M. Charlton, “A comparison of random coefficient modelling and geographically weighted regression for spatially non-stationary regression problems,” Geographical and Environmental Modelling, vol. 3, no. 1, pp. 47–62, 1999.View at: Google Scholar