Abstract

This paper presents space-time kriging within a multi-Gaussian framework for time-series mapping of particulate matter less than 10 μm in aerodynamic diameter (PM10) concentration. To account for the spatiotemporal autocorrelation structures of monitoring data and to model the uncertainties attached to the prediction, conventional multi-Gaussian kriging is extended to the space-time domain. Multi-Gaussian space-time kriging presented in this paper is based on decomposition of the PM10 concentrations into deterministic trend and stochastic residual components. The deterministic trend component is modelled and regionalized using the temporal elementary functions. For the residual component which is the main target for space-time kriging, spatiotemporal autocorrelation information is modeled and used for space-time mapping of the residual. The conditional cumulative distribution functions (ccdfs) are constructed by using the trend and residual components and space-time kriging variance. Then, the PM10 concentration estimate and conditional variance are empirically obtained from the ccdfs at all locations in the study area. A case study using the monthly PM10 concentrations from 2007 to 2011 in the Seoul metropolitan area, Korea, illustrates the applicability of the presented method. The presented method generated time-series PM10 concentration mapping results as well as supporting information for interpretations, and led to better prediction performance, compared to conventional spatial kriging.

1. Introduction

Outdoor air pollution has been known as one of the risk factors that affect human health directly and/or indirectly [14]. In Korea, it is reported that long-term exposure to ambient air pollution has a reasonable association with tuberculosis, cardiovascular diseases, and preterm delivery [57]. Thus, periodic monitoring and management of air pollution are required for exposure assessment for effective health management.

In Korea, several air pollutants including particulate matter less than 10 μm in aerodynamic diameter (PM10), sulfur dioxide (SO2), carbon monoxide (CO), nitrogen dioxide (NO2), ozone (O3), and particulate matter less than 2.5 μm in aerodynamic diameter (PM2.5) have been periodically collected at several monitoring stations. Based on this real-time monitoring of air pollution, air quality levels are provided to the public domain [8]. Due to the few stations, however, it is very difficult to analyze the spatial characteristics and spatiotemporal dynamics of air pollutants over a wide study area during the predefined time interval [9]. To overcome these difficulties, spatial interpolation or prediction is routinely applied to the sparse air pollutants observations to obtain exhaustive concentration values over the study area.

Among various spatial interpolation methods, geostatistical kriging has been widely applied to spatial interpolation tasks, due to its ability to account for spatial autocorrelation structures inherent to sample data and to integrate auxiliary data [10, 11]. When kriging is applied for spatial interpolation, spatial autocorrelation structures are quantified by variogram which denotes the spatial variability between samples as a function of distance [10]. If only sparsely sampled data are available, distinct spatial autocorrelation structures may not be captured from the sample data. As a result, spatial interpolation results would not show better prediction performance, compared to other deterministic interpolation methods such as inverse distance weighting. If data are collected at a limited number of locations but continuously in a time domain such as air pollutants, temperature, and precipitation, temporal autocorrelation information may complement the lack of spatial autocorrelation information and improve the prediction performance for spatial interpolation tasks. Regarding the processing of this kind of space-poor but time-rich data, conventional geostatistical kriging, which was developed for considering spatial autocorrelation information only, can be extended to space-time kriging [12]. Space-time kriging or simulation has been applied to time-series mapping of various environmental variables such as air pollutants, temperature, and precipitation [1316]. Despite its great potential for time-series mapping, however, uncertainties attached to the interpolation have not been fully accounted for. Most approaches have focused on the generation and interpretation of spatiotemporal mapping results. To the author’s knowledge, very few studies have been conducted using stochastic simulation [14] and local uncertainty assessment based on space-time kriging that does not require heavy computational cost is not fully considered. Recently, Park [17] presented a multi-Gaussian framework for time-series mapping of environmental variables. As the case study in [17] was carried out in the very small area, however, its applicability should be thoroughly investigated.

The main objective of this paper is to present space-time kriging capable of providing uncertainty assessment information and time-series mapping of PM10 concentrations. Within a spatial time-series framework [14, 17], conventional spatial multi-Gaussian kriging is extended to a space-time domain and its potential is illustrated via a case study of monthly PM10 concentration mapping in the Seoul metropolitan area, Korea.

2. Study Area and Data

A case study was conducted in the Seoul metropolitan area of Korea which includes 66 provincial districts in Seoul city, Incheon city, and Gyeonggi province (Figure 1). The metropolitan area covers approximately 11.78% of the entire land area of Korea and accounts for 49.07% of the entire population of Korea, as of 2014 and 2010, respectively [7, 18]. The study area comprised various types of land-covers, including the large urban areas of Seoul city and Incheon city located in the central and western parts of the study area, and the forests and agricultural lands (78.46% of the whole study area) located in the northern and eastern parts of the study area.

A monthly PM10 concentration dataset collected at 94 monitoring stations in the study area from January 2007 to December 2011 (60 months) was downloaded from the AirKorea website [8] and used for the case study. As shown in Figure 1, each district in the Seoul metropolitan city includes one station, but there are very few monitoring stations in other districts in Gyeonggi province, which comprised nearly half (47.74%) of the study area. This location information on the monitoring stations implies that relatively large uncertainties may be attached to the sparsely sampled locations. Within the administrative boundaries, 500 m interval grid points were generated and PM10 concentrations were mapped at these points. It should be noted that the main purpose of this case study is to exemplify the analytical procedures and potential of the geostatistical approach presented in this paper, not to reveal detailed local characteristics of PM10 concentrations in the study area.

3. Method

Figure 2 illustrates the entire procedure for the multi-Gaussian spatial time-series approach presented in this paper.

3.1. Multi-Gaussian Spatial Time-Series Approach

The time-series PM10 concentration at each observation station was regarded as spatial time-series data and these spatial time-series datasets were modelled as spatially correlated time-series random function models [14, 17]. Suppose that denotes the spatially indexed time-series PM10 concentrations within the discrete time domain and the continuous space domain . In geostatistics, the uncertainty at any unmonitored location is usually modeled through the conditional cumulative distribution function (ccdf) which states the probability that the unknown attribute value does not exceed a certain threshold value [10]. If this ccdf is extended to a space-time domain, the uncertainty at a certain location and time is modeled via the following ccdf of :where “” denotes conditioning to the local neighboring data both spatially and temporally.

In this study, multi-Gaussian space-time kriging was adopted for ccdf modeling as a parametric approach. Under the assumption that the ccdf at any location follows a Gaussian or normal distribution, the multi-Gaussian approach aims to predict the mean and variance, which are the two parameters that define the Gaussian ccdf [10]. To satisfy the assumption of a Gaussian ccdf, all monitoring data are transformed into a Gaussian space by normal score transform [11]. Then, the transformed spatial time-series dataset () will follow a standard Gaussian distribution with a mean of 0 and a standard deviation of 1, and its ccdf is expressed aswhere is a standard Gaussian cumulative distribution function. and are mean and variance values obtained by using the given neighboring data, respectively. These two values, which are required to define the Gaussian distribution, correspond to the simple kriging estimate and simple kriging variance, respectively.

To fully characterize the ccdf in (2), PM10 concentrations were modelled by decomposing the data into a deterministic trend component and a stochastic residual component [14, 17, 19]. The trend component is related to overall PM10 concentration patterns such as seasonal or regional variations. Meanwhile, the residual component, which is regarded as a second-order stationary random variable, includes local variations of PM10 concentration at a certain time and location and is the main target of geostatistical analysis. Since the multi-Gaussian approach was adopted in this study, this decomposition was applied to the normal score transformed dataset like where and denote the trend and residual components in a Gaussian space, respectively.

3.2. Trend Component Modeling

The trend component in (3) was modelled by weighted linear combination of elementary temporal profile functions presented in [14]. Suppose that is a normal score transformed PM10 concentration at a certain monitoring station during the time period of 60 months (, ). The trend component at the th monitoring station () is expressed as a weighted sum of elementary temporal profile functions. Many elementary temporal profile functions can be applied for trend modeling. For example, periodicity and linear trend can be accounted for by combining linear and trigonometric functions. In this study, a spatially averaged time-series set computed from 94 monitoring stations [14] was used as the elementary temporal profile function for its simplicity:where and correspond to the intercept and slope, respectively, in linear regression and are related to the similarity between a normal score transformed time-series at a certain monitoring station and the spatially averaged time-series set.

The above two regression coefficients are only available at monitoring stations after linear regression. Thus, they should be interpolated at all grid points in the study area for all time intervals in order to obtain the trend component distributions. If reasonable correlations are observed between two coefficients, simple cokriging, which can account for both the autocorrelation structures of the two coefficients and the cross-correlation structure between them, can be applied for spatial interpolation. Otherwise, univariate kriging or another deterministic interpolation method is independently applied to each coefficient. After regionalization or interpolation of the two coefficients, a trend component over the study area at each month was obtained by combining the interpolated coefficients with the spatially averaged time-series set.

3.3. Residual Component Modeling

The residual components, which are regarded as the second-order stationary random variable and subject to the main geostatistical analysis, were modelled via space-time kriging.

Spatiotemporal correlation structures required for the application of space-time kriging were first quantified via variogram modeling. The experimental spatiotemporal variogram is defined aswhere and denote spatial lag distance and temporal interval, respectively, and is the number of data pairs within the class of spatiotemporal lags.

Among various spatiotemporal variogram models, a product-sum variogram model in [20] was applied to model the experimental spatiotemporal variogram. The product-sum variogram model is expressed as the combination of marginal spatial and temporal variograms (i.e., purely spatial variogram and purely temporal variogram) as [20]where is the marginal spatial variogram and the marginal temporal variogram. and are the sill values of and , respectively.

Some parameters in (6) are also defined as [21]where is a sill value of the spatiotemporal variogram model.

After completion of the spatiotemporal variogram modeling, the residual components at all grid points in the study area for all time intervals were obtained by a linear combination of neighboring sample residual values within the predefined spatiotemporal search window via simple space-time kriging. The simple space-time kriging estimate and variance were computed aswhere is a simple kriging weighting value assigned to the neighboring sample residuals and is a spatiotemporal covariance value between the estimation grid and neighboring sample locations. Since the residuals have a zero mean value, a constant mean value required for simple kriging is set to 0 and does not appear in (8).

3.4. ccdf Modeling

The space-time kriging estimate and variance for the residuals were used for fully characterizing the ccdf in a Gaussian space in (2). More specifically, the residual estimate at any grid point was added to the trend component at the corresponding grid point and then used as a mean value of the ccdf. Since the trend component was assumed to be deterministic, the space-time kriging variance was directly used as the variance value of the ccdf.

The ccdfs at all grid points in the study area were already fully known after applying the normal score back-transform. Then, certain statistics could be used as PM10 concentration estimates and uncertainty measures. The PM10 concentration value was empirically estimated from the expected value of the corresponding normal score back-transformed quantiles in the original space after discretizing the ccdf with many quantiles in the Gaussian space, as presented in [22, 23]where denotes the normal score back-transformed values of the quantiles in the Gaussian space.

Like the computation of the expected value of the ccdf, conditional variance was also computed using (10) [23] and used as a measure of uncertainty:

Unlike kriging variance that provides only the proximity from the sample data, conditional variance can provide information on the spread of the conditional probability distribution function or the steepness of the ccdf and thus can be used as a quantitative measure of the uncertainty. The larger the conditional variance, the greater the uncertainty attached to the prediction.

In addition to the computation of PM10 concentration estimates and uncertainty measures from the ccdf, a probability of exceeding a certain critical concentration level can be easily computed. Based on this probability and the PM10 concentration estimates, misclassification risks, which are associated with the classification of the study areas into hazardous and safe classes, can be computed and then used for decision supporting information.

Two misclassification risks and were considered in this study. Risk , which is the probability of wrongly classifying any location as hazardous (i.e., false positive), is defined as [10]where is a critical PM10 concentration level.

Risk , which is a probability of wrongly classifying any location as safe (i.e., false negative), is given as [10]

3.5. Validation

The prediction performance of multi-Gaussian space-time kriging was quantitatively evaluated by leave-one-out cross validation since kriging is an exact interpolator. After one monitoring station was temporarily eliminated, kriging using the remaining stations was conducted to predict the PM10 concentration at the eliminated monitoring station. This procedure was repeated for all monitoring stations. Then the prediction performance was quantified using the linear correlation coefficient between the true PM10 concentration and the mean absolute error (MAE).

4. Results and Discussion

4.1. Trend Component Modeling Result

After preparing time-series PM10 concentration datasets, normal score transform was first applied using GSLIB [11]. Figure 3 shows a spatially averaged time-series that was computed from normal score transformed PM10 concentrations and used as the elementary temporal profile function. During the 5-year period from 2007 to 2011, a decreasing pattern was observed from April to August, but the increase in PM10 concentration commenced in fall and continued to winter. However, the winter PM10 concentration exhibited a different pattern each year. This overall pattern may be related to yellow dust in spring and meteorological factors such as wind, relative humidity, and precipitation. In winter and spring, the relatively stable atmospheric condition with high relative humidity and yellow dust contributes to the increase in PM10 concentration, respectively; meanwhile, the low PM10 concentration in summer is due to the washout effect by precipitation, as reported in previous studies [24, 25].

Regression between the spatially averaged time-series set and the time-series set at each monitoring station was conducted and two regression coefficients are presented in Figure 4. If the intercept and slope values approach zero and one, respectively, the time-series at the monitoring station is very similar to the spatially averaged time-series set. The different similarities at the monitoring stations led to differences of trend components, and hence the residual components, which are the main targets of space-time kriging, also varied across the study area.

The next step for the regionalization or estimation of trend components at unmonitored locations was to interpolate the intercept and slope values in Figure 4. The linear correlation coefficient between the two coefficients at 94 monitoring stations was very low (−0.08), so an independent univariate ordinary kriging was applied to the two coefficients. By combining the interpolated regression coefficients with the spatially averaged time-series set in Figure 3, the trend components during the considered time period were retrieved and used for ccdf modeling.

4.2. Residual Component Modeling Result

After computing trend components at each monitoring station, the residual components that could not be explained by the trend components were computed at each monitoring station. The modified Fortran routines of De Cesare et al. [21] were used to compute the experimental spatiotemporal variogram. The marginal spatial and temporal experimental variograms of the residuals with the fitted models are given in Figures 5(a) and 5(b), respectively. The marginal spatial variogram (Figure 5(a)) showed large relative nugget effects, but a reasonable temporal autocorrelation structure with an effective range of about 7 months was observed in the marginal temporal variogram (Figure 5(b)). This result implies that to account for temporal autocorrelation information during the interpolation could improve prediction performance, compared to the interpolation case with only spatial autocorrelation information. Figure 5(c) presents the experimental spatiotemporal variogram surface of the residuals. From this figure, the spatiotemporal variogram model, which satisfies the constraints in (8), was finally estimated and then used as an input variogram model for space-time kriging. Space-time kriging was applied to obtain the residuals at all grid points in the study area by using the spatiotemporal variogram model of the residuals. The Edinburgh space-time geostatistics Fortran program [26] was used to implement space-time kriging of the residuals.

4.3. PM10 Concentration Mapping and Uncertainty Analysis Results

The simple space-time kriging estimate of the residuals was added to the interpolated trend components and then used as a mean value for the Gaussian ccdf at all locations. The simple space-time kriging variance was also used as the variance of the Gaussian ccdf, as in (2). After constructing ccdfs at all locations, the PM10 concentration estimate and conditional variance were computed using (9) and (10), respectively. All postprocessing was implemented by Fortran programming and ArcGIS was used for visualization.

Only the PM10 concentration mapping results for two months in 2011 are given in Figure 6, due to space limitation. The PM10 concentration in April was much higher than that in August due to less precipitation and yellow dust transported to Korea by prevailing westerly winds in April. In April, relatively high PM10 concentrations were observed in northern Incheon, Dongducheon, Pyeongtaek, and Gwangju due to the large concentrations either at the monitoring stations in those cities or at the nearby monitoring stations. The PM10 concentration in August was relatively high in the northern Incheon, Gimpo, Dongducheon, Hwaseong, Seongnam, Gwangju, and northern Icheon. In both months, the northern Incheon, Dongducheon, and Gwangju showed relatively high concentrations, but low concentrations were observed in Seoul city.

The spatial distribution of conditional variance that measures the uncertainty for prediction is given in Figure 7. A large conditional variance was observed in some concentration areas (e.g., northern Incheon and Pyeongtaek in April and Gimpo in August, resp.) where the PM10 concentration values at monitoring stations fluctuated greatly both temporally and spatially. Some areas with very few or even no monitoring showed relatively large conditional variance which is similar to conventional kriging variance. This uncertainty statistic revealed that the conditional variance, which provides information on both the sample variations and the sample configuration, can be used as supporting information to interpret the PM10 concentration mapping result.

To generate misclassification risk maps, the probability of exceeding a certain threshold value was first mapped. The atmospheric environmental standard in Korea is defined only for an annual average (25 μm/m3) or a 24-hour average (100 μm/m3) [8]. Thus, it is not feasible to directly use the atmospheric environmental stand value as the threshold, since the monthly PM10 concentration was considered in this study. Since the ccdfs were established at all locations in the study area, a variety of probability maps could easily be generated by applying different threshold values. For an illustration purpose, the PM10 concentration of 80 μm/m3 was used as the threshold. By combining the classification result with the exceeding probability using a PM10 concentration of 80 μm/m3 as the critical threshold, the risk and risk maps were generated, as shown in Figure 8. By definition, risk is only mapped where the PM10 concentration exceeds the predefined threshold. On the contrary, risk is defined where risk is not mapped. In the risk map in Figure 8(a), the false positive probability is relatively low (i.e., less than 0.3), but not negligible. The risk map in Figure 8(b) shows very large variations of the false negative probability, which is greater than 0.7 in the northern part of the study area including Pocheon and Yeoncheon. A large misclassification risk was also found around the areas that are classified as hazardous (i.e., exceeding the PM10 concentration of 80 μm/m3). Although choosing proper probability thresholds is difficult or subjective, these misclassification risk maps, which cannot be provided by deterministic interpolation methods or kriging algorithms without ccdf modeling, can be useful information for further decision-making or interpretations. For example, the areas showing high misclassification risk values can be considered as candidates for further monitoring or in-depth investigations.

4.4. Validation Results

To quantitatively evaluate the prediction performance of space-time kriging, leave-one-out cross validation was carried out and error statistics such as the linear correlation coefficient with the true values and MAE were computed. Spatial ordinary kriging, which considers only spatial autocorrelation information, was also applied for comparison purpose.

Figure 9 presents the scatter-plots with error statistics computed from leave-one-out cross validation. Although the underestimation of high values and overestimation of low values were observed in both results, this mismatch arising from the smoothing effects of kriging was relatively weakened in the validation result of space-time kriging. The linear correlation coefficients for space-time kriging and spatial ordinary kriging were 0.92 and 0.87, respectively. Space-time kriging also showed an improvement of 13.23% in MAE, compared to that of spatial ordinary kriging. Similar to the previous case study result in Park [17], these quantitative evaluation results confirmed that the incorporation of temporal autocorrelation information via space-time kriging improved the prediction performance and generated reliable mapping results for space-poor and time-rich data such as PM10 concentrations.

5. Conclusions

A geostatistical approach based on spatiotemporal multi-Gaussian kriging was presented for time-series mapping of PM10 concentrations. Unlike conventional space-time kriging and spatial kriging, which provide the estimate and kriging variance only, the presented approach generated rich interpretable by-products as well as the PM10 estimates. From a case study in the Seoul metropolitan area of Korea, multi-Gaussian space-time kriging accounted for temporal autocorrelation information as well as spatial autocorrelation information and generated reliable mapping results that outperformed those of conventional spatial kriging. In addition, the presented approach produced uncertainty measures and misclassification risks from the ccdf modeling that are useful for interpretation or decision-making.

To strengthen the major findings of this study, several outstanding issues should be addressed in future work. First, several auxiliary variables such as the proximity to major roads and weather data will be integrated within the framework of the present study in order to generate much more reliable PM10 concentration mapping results. In relation to uncertainty modeling, the multi-Gaussian approach adopted herein may not be appropriate for datasets with a strong positively skewed distribution which may be often observed in air pollutant concentrations. Thus, the extension of the conventional spatial indicator approach [10, 11] to the space-time domain and the comparison with the multi-Gaussian approach presented herein will also be included in future work.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2015R1A1A1A05000966). This work was also supported by Inha University Research Grant. The author thanks Dr. L. Spadavecchia for providing the Edinburgh space-time geostatistics program.