Abstract

Over the past few years, hierarchical Bayesian models have been extensively used for modeling the joint spatial and temporal dependence of big spatio-temporal data which commonly involves a large number of missing observations. This article represented, assessed, and compared some recently proposed Bayesian and non-Bayesian models for predicting the daily average particulate matter with a diameter of less than 10 (PM10) measured in Qatar during the years 2016–2019. The disaggregating technique with a Markov chain Monte Carlo method with Gibbs sampler are used to handle the missing data. Based on the obtained results, we conclude that the Gaussian predictive processes with autoregressive terms of the latent underlying space-time process model is the best, compared with the Bayesian Gaussian processes and non-Bayesian generalized additive models.

1. Introduction

Many of the environmental data contain different scales of variability over space and time. For example, scientists from environmental and public health sciences are typically interested in modeling the evolving process of the air pollution during the time over specified locations. Such a stochastic process is often high-dimensional, large, and complicated with nonstationary structures, so the traditional statistical methods are hampered by the need of advanced statistical techniques to specify the spatio-temporal dependency. This can be quite practical with modern computers with high-level computational programming. The spatio-temporal modeling of and (particulate matter with diameters of less than 10 and 2.5 micrometers, respectively) is rapidly becoming an important component of most air quality studies [13]. Particulate matter (also called particle pollution) is a mixture of solid particles and liquid droplets found in the air as a result of dust, soot, dirt, smoke caused by road transportation, and complex chemical reactions in the atmosphere such as sulfur dioxide and nitrogen oxides. Exposure to particle pollution is a public health hazard and can cause acute and chronic heart and lung diseases [4]. The larger the values of particulate matters (PM) are, the more harm on short and long terms of public health is. World Health Organization’s (WHO) air quality guidelines recommend that the annual and 24-hour mean concentrations should not, respectively, exceed 20 and 50 microgram per cubic metre (μg/m3) for PM10, and 10 and 25 μg/m3 for PM2.5. Countries with fast-developed infrastructure such as Qatar usually suffer from relatively high levels of PM air pollutant. In this regard, the WHO classified the air quality in Qatar as poor and unsafe. In this paper, we focus our attention on modeling the PM10 in Qatar because the PM2.5 data is inaccessible. The most recent data indicates that country’s annual average 24 hr PM10 concentration levels were ranged from 126.69 μg/m3 to 184.55 μg/m3, which exceeds the recommended maximum of 50 μg/m3 [46].

Several authors have developed spatio-temporal models for analyzing the ambient of air pollution. Research in this field is back in dates to Cressie [7] and Goodall and Mardia [8]. Cressie et al. [9] compared the performance of Markov-random field with the familiar geestatistical approach in prediction the PM10 concentrations around the city of Pittsburgh, United States of America. The authors did not model the joint temporal and spatial structure of observations. They first modeled the purely spatial structure of observations for one particular day, then they incorporated the temporal component in their final statistical modeling. Sun et al. [10] developed a spatial forecasting distribution for unmeasured daily log PM10 average concentration given from ten locations in Vancouver, Canada. At each monitoring site, Sun et al. [10] showed that the autoregressive model of order one (AR(1)) described quite well the daily log PM10 average values. The authors did not consider the Bayesian models in their approach. Golam Kibria et al. [11] proposed a multivariate spatial prediction methodology in a Bayesian approach that is suited for spatio-temporal data observed at a small set of ambient monitoring stations at successive time points. They demonstrated the usefulness of their approach by mapping PM2.5 at monitoring sites with different start-up times in the city of Philadelphia, USA. Golam Kibria et al. [11] did not compare the performance of their model with other non-Bayesian models that are commonly used in literature. Shaddick and Wakefield [12] and Sahu and Mardia [13] used short-term spatio-temporal predictive analysis for modeling the PM2.5 and PM10 concentration levels. Zidek et al. [3] presented predictive distributions on nonmonitored PM10 values in Vancouver, Canada. Smith et al. [14] proposed a spatio-temporal model to predict the weekly averages of particulate matter concentrations within three southeastern states in the USA. Sahu et al. [15] modeled the PM2.5 by combining the rural background and the urban areas into one process. Cocchi et al. [1] developed a hierarchical Bayesian model for the daily average PM10 values. Pollice and Lasinio [2] developed a Bayesian-based kriging method for estimating the daily average PM10 concentration levels. Wikle et al. [16] provided an excellent review of classical and Bayesian approaches for analyzing spatio-temporal data. Although Taylor et al. [6], Ahmadi et al. [5], and others have studied the relationship between some environmental conditions and particulate matter levels assessing the air quality based on particulate matter levels in different locations in Qatar, they did not study the spatio-temporal variability of PM10. The main objective of this research is to develop space-time models for daily PM10 air pollution levels in Qatar for the four years, 2016-2019 comparing the hierarchical Bayesian approach with other spatio-temporal recent methods. We develop spatial interpolation and forecasting model using iterative Markov chain Monte Carlo (MCMC) computation setup which is an effective method for modeling a data with large number of missing values [17]. To the best of our knowledge, this will be the first study in Qatar, and we hope that this research will be helpful to protect the environment and public health in Qatar.

The rest of this article is organized as follows: Section 2 provides a brief review of two-stage hierarchical Bayesian models that have been used for modeling spatio-temporal data. A numerical example is given in Section 3 to demonstrate that the Bayesian approach accurately predicts the daily average PM10 values with a large number of missing values comparing with non-Bayesian models. Finally, a conclusion is given in Section 4.

2. Hierarchical Spatio-Temporal Model

When data is collected at different points in space and time, we should use a model that can, simultaneously, describe the dependency structure coming from the three sources of variations: time variation, space variation, and joint variability between time and space. Such a model is called a space-time model (or spatio-temporal model, where spatio refers to space and temporal refers to time).

In this article, we develop hierarchical models to predict the daily PM10 concentration levels which vary over time and locations. The PM10 concentration levels do not often follow the normal distribution. Thus, we usually model these values on the square-root scale or we use the log-transformation to stabilize the variance and enforce normality and to stabilize the variance [18]. We consider the square-root scale to alleviate the departure from normality in our research data.

Let and denote the two units of time where represents the longer unit (e.g., year), and represents the shorter unit (e.g., day). Let denote the observed value of the concentration, after any necessary transformation, at a given location and over a given discrete time . We assume that the spatial location is a two-dimensional vector describing the latitude-longitude (or equivalently northing and easting coordinates), and the time unit is typically hour, day, month, or year. We also assume that the is observed at monitoring sites denoted by and at time points denoted by two indices and so that the total number of observations is denoted by . In this article, we denote all the missing data by , whereas all the observed data will be denoted by .

The first stage of the hierarchy assumes that the observed values , where , can be decomposed into a true (latent) spatio-temporal process with an error term . More specifically, the data (or measurement error) model in the first stage of the hierarchy is

The error term is assumed to be a Gaussian white noise process with mean zero and constant variance , which is often called the nugget effect absorbing microscale variability.

The second stage assumes that the true process has a systematic mean and a spatio-temporal error term. The mean can be modeled based on the past values of the unobserved variable or/and based on some relevant covariates. Typically, can be specified in the following formula: where is an spatio-temporal residual random intercept assumed to follow , where , is the site invariant spatial variance, and is the spatial correlation matrix. In this article, we consider four modelings for . The first one assumes that the random effects, , for all locations and times . This implies that the model in (1)–(2) will be the simple regression model. The other models for are assumed to be Gaussian process, independent over the time, which is specified in Sections 2.1, 2.2, 2.3, and 2.4.

2.1. Matérn Spatio-Temporal Covariance Function

For spatio-temporal modeling, we usually assume that the random effects process is a weakly stationary Gaussian with a zero mean and a valid isotropic covariance function. A valid covariance function implied that the covariance matrix is positive definite, and isotropic means that the separation vector between the two locations only depends on the distance and not on the direction. The class of the spatio-temporal covariance functions can be separable, product-sum, metric, and sum-metric [19]. In this paper, we use the separable covariance model which is simply the product of the pure spatial covariance function, , by the pure temporal one, , given by where is the separating spatial distance, and is the temporal distance for any pair of points in the spatial and temporal study domain.

The Matérn family provides a general choice of covariance functions. For each time , the Matérn covariance is given by: where is the modified Bessel function of second kind of order which is a parameter controlling the smoothness of the realized random field [20], is the standard gamma function, and is a parameter which controls the decay rate in the correlation as the distance increases. Popular special cases of Matérn model are (i) leads to exponential covariance function and (ii) Gaussian model, when .

2.2. Gaussian Process Model Specification

Consider in (2), where , is the design matrix of spatially and temporally varying -covariates and is the -dimensional vector of regression coefficients. Thus, the Gaussian process (GP) two-stage model can be written as

We assume that the random effect process is independent from the white noise process . Note that some of the covariates may vary spatially and not temporally or vice versa.

One advantage of using Bayesian model is that we can use it to handle any missing data. This can be done by using the Markov chain Monte Carlo, MCMC, computation where the missing data is simulated from the distribution defined by (5) at each MCMC iteration using Gibbs sampling. The Gibbs sampler requires that the full conditional distributions of the parameters are given in a closed form.

The logarithm of the joint posterior distribution of the missing data and the parameters in this case is given by: where is the prior distribution [21] and we refer the readers to Bakar and Sahu ([21], Appendix A) to obtain the Gibbs sampling using the full conditional distributions of .

2.3. Autoregressive Model Specification

The autoregressive process (AR) model in Equations (1) and (2) can be written in the hierarchical form (e.g., [18] model) as follows: where is parameter of the first-order autoregressive model and . Note that if , we get the GP model given by (5). Also, note that the autoregressive model requires to specify an independent spatial model with initial values of for each , with mean and covariance obtained from the Matérn covariance function in (4) with the same set of parameters. In this case, logarithm of the joint posterior distribution of the missing data and the parameters in this case is given by: where is the prior distribution for the parameter ([21], see Appendix B).

2.4. Gaussian Predictive Processes with Autoregressive Model Specification

We now introduce the case that the process in (2) has a nonstationary covariance structure. We follow Sahu and Mukhopadhyay [22] by choosing the integer out of the sample prediction performance (crossvalidation) to get a smaller number of locations (denoted by knot), . Given , we assume that is a Gaussian process with zero mean vector and covariance function given in (4). Thus, the nonstationary model is defined after replacing in (2) by hence, is a vector of -independent realization at each time point and with the same Gaussian process in (4). Define , we have where is the crosscorrelation matrix between and having elements and , and is the correlation matrix of so that , for . Clearly, the process shows nonstationary structure with variance function that is given by

The advantage of using the nonstationary model in (10) is the flexibility in the surface which is based on linear functions of . When is very small compared with , this will lead to reduce the computational burden, especially for big data which is usually the case of the spatio-temporal data. Moreover, using nonstationary models usually provides more accurate results in prediction the nonstationary process. We specify the hierarchical Gaussian predictive processes (GPP) as follows: where is given in (10). The process , at the knots, can be modeled according to the autoregressive model where for and . We assume that the initial conditions has normal distribution with mean zero and covariance matrix , and both can be obtained from the Matérn covariance function defined in (4), where is an matrix much lower dimensional than of dimension .

In this case, logarithm of the joint posterior distribution of the missing data and the parameters in this case is given by: where is the prior distribution for the parameter and the Gibbs sampling procedure of these parameters can be obtained from the full conditional distributions required provided in the Appendix.

3. Numerical Example

3.1. Data Description

The data used in this article is obtained from three different sources. The first one was collected by the air pointer and the meteorological station and is managed by the Environmental Science Center at Qatar University over the years 2016-2019. It has hourly pollutant including PM10 measured in μg/m3, temperature, Temp, measured in degree Celsius, and relative humidity, RH, with several missing values.

First, we sort the data into an ascending order by date, and then, we impute the missing values of the PM10 by averaging the two nonmissing values before and after this missing value. When two or more successive missing values exist, we impute them by the corresponding monthly average. We do the same for the hourly missing values of temperature and relative humidity. After that, we aggregate the hourly data into daily data for which we fit our models. The amended data from this resource is irregular time series which started on the fourth of January 2016 and ended on the twenty-eighth of August 2019. Table 1 provides a summary statistics of this data. Results suggest that the PM10 levels increased during the time, and the majority of missing observations (with more than 52%) have been occurred during the year 2018.

The second source of the data was the regular monthly observations obtained from nine meteorological stations over the years 2016-2019. The total number of observations is where the stations are located in the following sites: Qatar University, Abu Samra, Al Khor, Al Ruwais, Al Wakrah, Doha Airport, Dukhan, Mukenis-Al Karanaah, and Umm Said. Figure 1 shows the map locations of 9 meteorological sites. Although this data provides the monthly average values of temperature in degree Celsius and relative humidity (RH), it does not include measures for any type of pollutant. Table 2 summarizes the descriptive statistics of this data. We use this data to interpolate the daily temperature and relative humidity by disaggregating technique at each location. We also use our own portable devices to collect daily PM10 values, as a third source of data, over the period of time from November 5th to December 31th of 2020 close to these sites. We use these devices to collect and simulate more daily PM10 concentrations over the aforementioned locations. In particular, we recorded the average of the PM10 taken randomly in the morning and evening of each day over this period. Then, we use this collected data to simulate a new data in each location. Finally, we merge the simulated data with the previous one and obtained the data that we use in this article.

The final data has 1037 observations (see Figure 2) which has three variables (PM10, temperature, and relative humidity) measured over irregular time over the years 2016-2020 in nine spatial locations.

The top panel in Figure 3 shows the distribution of the daily average PM10 concentrations over the years 2016-2020, whereas the bottom one in the same figure shows these averages by the nine sites. Clearly that the distribution of the PM10 concentrations, in all locations, are right skewed with very high extreme values. Thus, to stabilize the variance and reduce the departure from normality, we transform the original scale to square root scale.

3.2. Parameter Estimates, Model Validations, Prediction, and Comparison Results

The main objective of this research is to develop a hierarchical Bayesian model that can be used to select and validate the best model which fits the daily average of the PM10 air pollutant levels over different locations in Qatar. We consider the two covariates, temperature and relative humidity, for spatial interpolation of the PM10 concentration at a new location and any time.

First, we use the MCMC algorithm with Gibbs sampler, based on 5,000 iterations discarding the first 1,000 values, to impute the missing data as explained in Appendix, and then, we utilize this data to predict the values of at new location . The posterior predictive distribution is where are the model parameters and (see [17]). We fit the Bayesian GP, AR, and GPP models described in Section 2 using the spTimer package [21, 23] and compare these models with the non-Bayesian generalized additive model (GAM) using the R package mgcv [2426]. We use the crossvalidation method to evaluate the predictive performance of these models. Here, data at locations where are used to fit the model while data at other sites are used to assess the model using the mean squared error (MSE), mean absolute error (MAE), mean absolute prediction error (MAPE), relative bias (rBIAS), and relative mean separation (rMSEP) defined, respectively, as follows: where is the total number of nonmissing observations, is the posterior predictive value of , and are the arithmetic means of for .

Specifically, we use the data from Abu-Samra and Doha airport stations with observations for validation purposes. The data from the remaining six stations with total number of observations are used for model fitting (see Figure 1). The parameter estimates (posterior) for the Bayesian spatio-temporal models are given in Table 3. The 95% credible intervals for the parameters suggest that most of regression parameters for the AR and GP models are statistically significant, whereas the GPP model suggest that the variables are significant. In all models, the nugget effect is small. On the other hand, the spatial decay parameter for GP model and for GPP model suggesting that the effective ranges will approximately be 909 and 3000 kilometers, respectively. These two values are unusual associated with very large spatial variance values () suggesting that the PM10 concentrations in Qatar are not significantly different over the space.

After fitting the three models, we perform the crossvalidation and select the best one. Table 4 summarized the validation statistics for the GAM, GP, and AR models. Clearly, the MSE, MAE, MAPE, rBIAS, and rMSEP are smaller for the Bayesian spatio-temporal methods providing better predictive performance compared to the non-Bayesian additive model. For example, the Bayesian AP, GP, and GPP models reduced the MSE by about 63.7%, 66.4%, and 79.6%, respectively, compared with the non-Bayesian GAM model. We conclude from this table that the Bayesian spatio-temporal GPP model is the best model that can be used to predict the PM10 concentrations.

Figure 4 shows the time series plot of the true and fitted values for Qatar University location using the GPP model. We clearly see that the fitted values are very close to the true values. To further demonstrate the usefulness of the spatio-temporal GPP model for predicting the linear trend surfaces of the PM10 values with their standard values, we illustrate a prediction map for the 29th of March 2018 and 15 May of 2019 over a one-kilometer square grid (see Figure 5). Obviously, the graphs show that the GPP model correctly represented the PM10 concentration level over the space and time.

4. Conclusion

The potential of applying nonstationary hierarchical Bayesian spatio-temporal models in PM10 prediction with a large number of missing values is presented in this paper. The predicting model is developed by comparing the Gaussian predictive processes (GPP) with Gaussian processes (GP), autoregressive (AR) Bayesian models, and non-Bayesian generalized additive model (GAM) models using the datasets from the state of Qatar. The numerical results show that the GPP model outperforms other alternative models providing forecasting with good accuracy and interpretability. We applied the disaggregating technique and simulated a daily spatio-temporal PM10 data using the available and collected data. Then, we used the Markov chain Monte Carlo with Gibbs sampler to impute the missing data in real collected data. We believe that our statistical data analysis approach will give similar results for future available real data. In many applications, the support vector machine (SVM) algorithm has shown a superior forecasting performance compared with several evolutionary algorithms. This could be an interesting possible extension to this article where further research can be done by comparing the performance of the SVM algorithms with Bayesian models in predicting the daily average PM10 concentration levels.

Appendix

Full Conditional Distributions for Gibbs Sampling

In order to fit the model, we use Gibbs sampling [27] by repeatedly simulating the missing observations and parameter models from the full conditional distribution: (1)Sampling of missing values from its conditional distribution , at each iteration(2)Sampling and from the conditional Inverse Gamma distribution so that (3)Sampling of from the full conditional distribution where where the sample values is restricted in the interval .(4)Sampling of from the full conditional distribution where (5)Sampling of from the full conditional distribution , where

For ,

For , where . (6)Sampling of from the full conditional distribution , where where is a unit vector of dimension .(7)Sampling of. The full conditional distribution of is calculated based on Metropolis-Hastings algorithm from the prior and likelihood terms involving which is given by where and denotes all the data and parameters except for . We adopt the Metropolis-Hastings algorithm to obtain the sample from (A.7) by considering the proposal distribution as the normal distribution on the log-scale for with the mean at the current value and the variance with an acceptance rate between 15% and 40% as suggested by Gelman et al. [28]. The sample value is accepted with probability

Data Availability

The data used in the article is obtained by the Environmental Science Center at Qatar University and can be made available upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are grateful to the academic editor, Hong Wei-Chiang, and three anonymous peer reviewers for their comments and suggestions. This research was supported by the Qatar National Research Fund with a grant number UREP25-010-1-003.