#### Abstract

This paper develops and empirically compares two Bayesian and empirical Bayes space-time approaches for forecasting next-day hourly ground-level ozone concentrations. The comparison involves the Chicago area in the summer of 2000 and measurements from fourteen monitors as reported in the EPA's AQS database. One of these approaches adapts a multivariate method originally designed for spatial prediction. The second is based on a state-space modeling approach originally developed and used in a case study involving one week in Mexico City with ten monitoring sites. The first method proves superior to the second in the Chicago Case Study, judged by several criteria, notably root mean square predictive accuracy, computing times, and calibration of 95% predictive intervals.

#### 1. Introduction

This paper compares two methods for temporally forecasting next-day hourly ground-level ozone concentrations over spatial regions. Software for implementing both methods along with demo files can be downloaded from http://enviro.stat.ubc.ca/. The paper focuses on a case study involving Chicago during the summer of 2000. The methods can be used to forecast the maximum eight-hour average ozone concentration, which is reported for many urban areas. For example, on June 27, 2009 the AIRNow website forecasts a maximum for Chicago of between 0 and 50 ppb, rating that as “Good.” In contrast, for that day in one part of Los Angeles, the rating was “Unhealthy for sensitive groups,” meaning a forecast maximum of between 101 and 150 ppb.

These forecasts are needed to forewarn susceptible groups of high ozone concentrations that are associated with acute health effects. Such effects are well documented in the air quality criterion document (Ozone [1]. See http://oaspub.epa.gov/eims/eimsapi.dispdetail?deid=149923), the basis of the recommendations made in 2007 to the US Environment Protection Agency by its Clean Air Scientific Advisory Committee for Ozone, in which the third author served. In fact, the accumulated body of evidence was so strong that the committee recommended strengthening the air quality standards for this criterion pollutant to meet the requirements of the US Clean Air Act. In particular, the evidence pointed to a strong association between morbidity as well as reduced lung function and high levels of ground-level ozone concentrations. This points to a need for enhanced near-term forecasting methods.

One general method for making such forecasts relies on the fusion of measured hourly ozone concentration values and simulated values obtained from chemical transport models (CTMs) such as CMAQ. Two papers [2, 3] develop methods for doing this, albeit by different approaches unrelated to those in this paper. In future work, all these methods should be compared in domains where CTM data are available. Even without that hypothetical comparison, those in this paper have the advantage of being available in domains where they are not. These Bayes-empirical Bayes methods offer the flexibility needed to characterize environmental space-time processes, while fully representing the various kinds of uncertainty involved in their construction. Both have been developed for and successfully used to model hourly ozone air pollution concentrations in other contexts.

The first method in this paper denoted by M1 adapts a multivariate method developed for modeling space-time fields [4–7]. A univariate version of that method for hourly ozone concentrations is the subject of a companionable paper [8], which compares it with a state space model but for spatial prediction, not temporal forecasting, and it does so in a different geographical region. The goal there is mapping the ozone field for another requirement of the US Clean Air Act of 1970, namely, the protection of human welfare including such things as crop yields. M1 needs some new theory, which is presented in the sequel along with a demonstration on how it may be applied.

The second method denoted by M2 uses a method originally developed for modeling hourly ozone concentrations in Mexico City [9]. That method and the models on which it is based along with the computational algorithms used to implement it seem to have been quite successful in that application. Moreover, even though it was developed for use in Mexico City for one specific week, a strong prima facie case can be made for its applicability in other weeks and jurisdictions and that is why we assess the performance of that method here. Much recent work has been done in modeling random space-time pollution fields [10–12]. As one of the photo-oxidants, ozone is produced in the same way in all temporal and spatial domains by a complex interaction of oxides of nitrogen () with volatile organic compounds (VOCs) in the presence of heat and sunlight. In most modern urban environments such as Chicago and Mexico City, vehicle emissions are a prime source of the and VOCs [1]. Furthermore the prima facie case is supported by an exploratory data analysis that shows very similar daily cycles in our domain as those observed in the Mexico City application. This led us to match, to the greatest possible extent [8], the method used there in our adaptation of it, and no originality is claimed for it.

The main finding in this paper is that in the case study M1 outperforms M2 in a number of ways. First is its computational efficiency. To run the M2 approach, it often took about a week or so to get the results, while M1 only took about ten to twelve hours at the same Linux server. Thus, M2 would not be suitable for making 24 ahead forecasts, while M1 running on a faster processor could be used for that purpose. We also found that M1 produced more accurate forecasts than M2, as measured by their root-mean-squared-prediction errors. Moreover, M1’s predictive error bands proved to be better calibrated. In other space-time domains, a similar assessment would have to be made to select a forecasting procedure, and M2 may be superior in some. Overall, we believe that the value of this paper lies in the guidance on how that assessment could be made and the source of software that can be used for it.

The layout of this paper will now be described. Section 2 presents both approaches to forecasting hourly ozone concentrations. Section 2.1 introduces M1. Its forecasting (posterior) distribution is developed, and the corresponding pointwise predictive intervals at each gauged site constructed. Section 2.1 also extends the results to forecasting -step-ahead responses for any . Section 2.2 reviews M2. Section 3 implements these two methods in a case study involving Chicago and data from the US EPA’s AQS air quality database. Section 3.1 presents and compares the results for one-day-ahead predictions. Finally, Section 4 summarizes these results and gives our conclusions.

#### 2. Methodology

This section presents our two temporal forecasting approaches. Although both are general and can be used in other contexts, we develop them as methods for forecasting an hourly response tomorrow given data up to today. The measured value of the response is available and serves as a “test value” at each of monitoring sites for a comparative assessment of the two forecasted methods.

##### 2.1. Method M1

###### 2.1.1. Basic Theory

The general approach [7], on which M1 is based, assumes a multivariate space-time field of -dimensional random response vectors indexed by their locations, a finite number of sites in a specified geographical domain. These sites need not lie on a lattice.

The general theory involves a geographical region that includes monitored (i.e., gauged) sites and unmonitored (i.e., ungauged) sites. Although this paper requires only the part of that theory for those that are monitored, we state it in generalable to link this paper to its companion publication [8]. The dimensional response vectors at these sites for times are combined in the response matrix . This matrix can be partitioned as , where contains the response vectors at the ungauged sites and , those at the gauged sites. The theory posits ([7], p. 145-146) that where denotes the non-site-specific covariates (their site-specific counterparts could be included in the response vectors); denotes the matrix of site-specific random covariate coefficients; denotes the covariance matrix among the responses at any given time point. The hyperparameters for this hierarchical Bayes model are , and , with representing the variance component of between its rows. Here, denotes the Generalized Inverted Wishart distribution, a conjugate prior for normal matrix distributions. Separability of the hypercovariant matrices for both the response and random coefficient matrices is assumed for computational simplicity. Invoking Box's celebrated dictum that all models are wrong, we defend this assumption by the good performance of the resulting method as seen in the empirical assessment provided in the sequel.

Validating the modeling assumptions above will usually require a transformation of the random responses, a square root transformation in this paper's case study. Then systematic components such as the temporal trend over the whole regions will need to be removed. These can be accurately inferred from the typically large dataset formed by aggregating the data over all sites and times. Finally, something needs to be done to eliminate autocorrelation in the temporal sequence of responses. For example, the temporal series can often be filtered using a regional time series model without site-specific parameters. However, our relative abundance of data leads us here to a different approach described in detail in the next subsection, that splits the transformed, detrended residuals into separate, disjoint subsequences of responses, which are separated widely enough in time as to be uncorrelated and hence independent under our Gaussian sampling model. In our experience, the residuals obtained after these steps have been taken usually satisfy the model assumptions above, and these comprise the response vectors in M1. So the model above can then be applied to each subsequence and type II maximum likelihood estimators found for the hyperparameters. These can subsequently be averaged across the subsubsequences to get an overall estimate. While this approach would be less efficient than a full-data approach under a correctly specified model, it avoids the risk of model misspecification in complex situations like that of the case study. The forecasting model developed below can then be applied, and the preliminary steps above reversed, to get the forecasts back on the scale of the raw data.

To elaborate on our distributional assumptions, the GIW prior for can be defined, through the Bartlett decomposition, as follows: where , and . Note that has an inverted Wishart distribution with hyperparameters ; the matrix is the hypermean of , and the matrix gives the covariance between the rows of . Denote the set of hyperparameters as .

Given the observations at the gauged sites (i.e., ), the predictive distribution of is completely determined, it is the distribution required in our companionable paper on spatial prediction [8]. Furthermore given partially observed responses at gauged sites, the predictive distribution of the missing responses at gauged sites given these observations can be derived after the hyperparameters have been estimated by an empirical Bayes approach ([7], p. 300–303), and that is how the theory is used in this paper.

Deriving that forecasting model requires a general result that concerns a sequence of response vectors of which are observed, and lies in the future. Then with the superscripts “*m*” and “*o*,” respectively, standing for “missing” and “observed,” we may further partition the random response matrix already partitioned above, as with and . Forecasting requires the predictive posterior distribution of given in the following result ([7], p. 160-161).

Theorem 1. *Let
**Conditional on the hyperparameters , the marginal posterior distribution is a matrix distribution:
**
where
**
where , and is assumed to be , denoting spatial and between hour correlations, respectively.*

*Remark 2. * This theorem gives the joint predictive distribution for future response vectors given observed responses. As coordinates of those future vector are observed, this distribution yields in turn the conditional predictive distribution for the unobserved coordinate responses, That is how the theorem is applied in the next section.

###### 2.1.2. One-Day-Ahead Forecasts

For expository simplicity, we describe the general method M1 in terms of the goal of forecasting ozone concentrations at a specific hour on Day 121 and each of monitoring sites based on data collected at those sites during the preceding days, that being the objective in the case study described in the next section. However, we emphasize that M1 is generally applicable to other days and geographical domains with appropriate modifications. In fact, the approach can be adapted for use with other environmental processes. With that caveat, we now describe M1 in this subsection. Note that in our description, hour refers to the period between midnight and 1 AM and so on.

To begin, we follow the standard practice of transforming the hourly data by taking their square roots to achieve a more nearly Gaussian data distribution [13]. Hereafter these transformed values will be our “responses.” M1 then partitions the sequence of responses into blocs of hours each. For (hereafter referred to as Case 2), the last bloc spans the 24 hour period from hour *k* + 1 on day 120 to hour on Day 121, and the first from hour *k* + 1 of Day 1 to hour of day 2. However (Case 1) is different, and there the blocs correspond to the days from 1 to 121. In either case, each bloc yields a 24-dimensional multivariate response vector with an unspecified covariance structure. Reformulating our model in this way gives us the advantage of avoiding the challenging task of specifying the complex short-term autocovariance structure, which varies over the day. But it does require the multivariate theory described in the previous subsection.

The next step in developing the forecast model would generally require the removal of any systematic, regional components in the series. In particular, it is necessary to learn which covariates/predictors to include in the design matrix so that in application of the method, site-specific coefficients can be fitted allowing deviations from the regional baselines established for them at the preliminary stage. Note that the EnviRo.stat software (see [14]) referred to in Section 1 automatically estimates the those baseline coefficients as prior hypermeans, using a maximum likelihood-based approach, the key preliminary step here is actually the identification of . Note also that in our application the covariates need to be adapted in form to conform to the temporal span of the response vectors (the so-called “blocs” introduced below), once hour has been specified.

Finally, we need to address the autocovariance structure. While the responses are primarily an AR(2) time series after removing their diurnal pattern [13], as we do in the case study, we need to allow for a lag 1 autocorrelation in series of the response vectors to capture small but potentially significant longer-term dependence [15]. To eliminate it, we use the approach described in the previous subsection and split the observed response vectors into two groups: those with odd numbers and those with even numbers. For each of the resulting subseries, the model assumptions will hold approximately, and the hyperparameters can be estimated as described in the previous subsection. More detail is given below where we turn to a more precise description of M1.

In Case 2 where , the 24-hour bloc containing hour includes measured responses for the hours on Day 120. Thus we may first apply Theorem 1 with designating the last bloc in our construction above, to get a joint predictive distribution for its associated “future” response vector. Then we compute the marginal conditional predictive distribution for hour implied by that joint distribution, given these data from Day 120 to finish the construction.

However, Case 1 is more difficult and the one we treat first. Among options considered by the authors for this case, was changing the bloc length to say so that the last bloc could reach back to hour on Day 120, which contains a measured response. That approach was discarded since the sequence of 7 bloc sequences would not synchronize with weeks. The latter are important structural features of the process that reflect the changing daily traffic patterns and give us the single covariate we have in the case study. Instead we accomplish the same thing, another way, which does preserve the week, namely, we apply Theorem 1 with to get the initial joint predictive distribution. The required marginal conditional distribution is then obtained by conditioning on all the data from Day 120.

Note that in both cases, the future response or responses in Theorem 1 depend on the bloc immediately preceding them. To avoid the need to model in that dependence, we simply eliminate that data vector. We did explore the result of keeping it in and ignoring that dependence and found virtually no difference in the Case Study. We now turn to a more precise description of M1.

For that we need some notation. Let denote the coordinate of the unobserved response vector for blocs and gauged site , while denotes the observed response for bloc , , , . For bloc , Sites , and hours between and inclusive, that is, , let denote the random response. Throughout .

The two cases referred to above are as follows.

*Case 1 (predict the response for the last hour day 121). * Hyperparameter estimates for its predictive distribution conditional on observed data are found first for the odd blocs , as described in the previous subsection, using . Repeat this procedure for the even blocs, letting , . Finally average the resulting pairs of hyperparameter estimates to get overall estimates.

Construction of the predictive distribution of the response at hour , begins with the corresponding matrix-variate observed responses, made by extracting the corresponding responses from Day 1 to Day 118 according to the above constructed or . Now suppose hypothetically that Day 120’s responses have not been observed and suppose the corresponding matrix-variate missing responses, are constructed as follows:
where is the unobserved future response vectors for days 121 and , the observed response vector of Day 120. Hence we have and in Theorem 1. Thus the predictive posterior distribution of can be obtained by applying (6). To obtain that of given , since in reality the latter are observed, one can decompose and as follows:
where and for . Hence, the predictive posterior distribution of is given by

To get the joint predictive distribution of responses at these gauged sites for Day 121's last hour, let be such that and for . Let . The joint predictive distribution of the unobserved response , that is, , is also a multivariate *t*-distribution:
where , and , that completes the description of M1 in this case.

*Case 2 (predict the response for hour on day 121). * For bloc , Sites , and hours between and inclusive, that is, , let , denote the random response. Now let consist of unobserved responses and observed ones at each of the gauged sites. What we need to do now at each site is to use the hours leading up to the first hour without data on Day 121, that is, hour for which the forecast is needed. To do this, we can use any data from hours 1 : on Day 121 that may be available, supplemented by the data from the preceding hours on Day 120, . Thus we create a dimensional response vector, . The same routine as in Case 1 using odd and even blocs, is then used with the remaining data to obtain parameter estimates, albeit with these shifted 24 dimensional hourly response vectors.

For hour , holds the observed responses from Day 1 to the one ending on day 119. To predict the responses one-day-ahead at gauged sites in this field, we have and in Theorem 1 with rearranged so that all missing responses are at the beginning of the response vector. Specifically, let , be 1 at the element and 0 otherwise, for and ; , be 1 at the element and 0 otherwise, for and . Let . Applying Theorem 1 yields

Notice that is . To obtain the predictive distribution for the unobserved responses, we first need to decompose , and as follows: , where , and
Applying standard theory, for the multivariate distribution yields the predictive distribution of the unobserved response given data as a *t*-distribution given by
This completes the description of M1 in this case.

###### 2.1.3. Multi-Day Ahead Forecasts

This subsection generalizes M1 to get a method that provides an -step-ahead forecast, (). Let be the total number of days of observed responses. As before, we consider the multivariate setting, being the total number of response coordinates and , the total number of gauged sites. As well, we generalize the forecasting problem in two ways, namely, to forecasting the response on the last hour of the day; hour on the day.

*Case 1 (predict the final response for the last hour on day ). *Here the odd bloc responses are . Note that if or for some . The even bloc responses are . Note that if and if for some .

*Remark 3. * Notice that the total number of observations in data submatrices can be different for each , being an odd or even number. Denote by , the total number of observed responses. Then for the odd-day-responses and , for the even-day-responses.

As in Section 2.1, we obtain the estimates of hyperparameters by averaging those two sets of estimates given odd or even bloc responses, respectively. Given these final estimates, we now obtain the predictive posterior distributions given all hourly observations up to Day in Theorem A.1. They are also multivariate *t*-distributions. The proof and details can be seen in Appendix A.1.

*Case 2 (predict the response for hour on day ). * Here the odds block responses are . Note that if and if for some . The even block responses are . Note that if or for some . We do the same thing here to obtain the “final” estimates for hyperparameters. Thus we are able to get the predictive posterior distribution given all observations up to bloc and estimates for hyperparameters.

*Remark 4. * We also let be the total number of observed response variables. So is for the odds-day-response blocs and , for the even-day-response blocs.

Given the final estimates in Section 2.1, the predictive posterior distributions given all hourly observations up to days are also multivariate -distribution. The proof and details can be seen in Appendix A.2.

*Remark 5. * From Theorem A.2, the predictive distribution for the unobserved response variables from day to is the product of a sequence of matrix- and distributions. This implies no analytic form can be found for the response variable at the (for ) hour of the day at gauged site ().

##### 2.2. Method M2

An alternative approach to M1, through dynamic linear modeling, can also be used for forecasting and would seem an obvious choice, being an amalgamation of state-space time series models. Let be the response vector across all sites at hour . As in the previous subsections, responses are square root transformed hourly ozone concentrations. Exploratory analysis of these transformed data, found 24- and 12-hour diurnal cycles, pointing to the approach in Huerta et al. [9] for ozone that is based on the same patterns.

Let be the common temporal trend coefficient across the spatial sites at hour . Furthermore, and denotes the coefficients of the periodic components with respect to 24- and 12-hour diurnal cycles, respectively. These two components are for . Let be the state vector at hour and . Furthermore, denotes the Euclidean intersite distance matrix for all site pairs. In this model, is the range parameter, , the variance parameter, and , the phase parameters. Assume a constant covariance matrix for the state parameter vector and denote it by . That covariance, which accounts for the spatial correlation between sites, is given by . The vector of hyperparameters, , is identical to that in Huerta et al. [9] based on assessments made for Dou et al. [8].

Thus, the measurement and state equations of the DLM are given by with initial information: . The hyperparameters , , and are also identical to those in Huerta et al. [9]. One can obtain the posterior distribution of the state parameters at the last known time point, , that is, , using the Kalman filter, a smoothing method and the Metropolis-within-Gibbs sampling algorithm [8, 9, 16–18]. We omit details on updating and forecasting the state parameters given the model parameters and observations up to current time point.

Given the distribution of the state parameters at the last time point, , the observed responses until time , and the model parameters, , the -step-ahead prediction is given by for . Note that , and can be obtained by application of a standard method [8, 9, 17]. Here and for the one-day-ahead prediction in the Case Study. For any fixed , the predictive response, , can also be obtained by the MCMC (Markov Chain Monte Carlo) method. More specifically, at iteration , suppose we have updated the vector of model parameters: using the FFBS (forward-filtering-backward-sampling) algorithm [8, 9, 16, 18]. That is, one has Then, the predictive response at iteration , can be drawn from (15), that is, Consequently, the predictive responses are obtained by the sample means of (, where denotes the total number of iteration after burn-in period; ). The empirical predictive intervals at the 95% nominal level can be obtained as the corresponding sample quantiles.

#### 3. Case Study

This section implements the forecasting methods in the last section for one Chicago summer (from May 1 to August 31) using data for that urban area taken from the EPA's AQS database (2000). These extracted data come from fourteen irregularly distributed monitoring stations measuring hourly ozone concentrations in parts per billion (ppb), which, to assure the validity of our Gaussian model assumptions, are square-root-transformed as noted in the last section. Each has few missing values under the EPA 1997 Standard in 1997 (i.e., 80 parts per billion for the eight-hour ground level ozone concentrations) during the overall time span across all available sites in this region.

To assess the model's performance for temporal forecasting, 14 sites are selected as “gauged” sites (i.e., ) and their observed responses on Day 121 are set aside as test values. Figure 1 shows the geographical locations of these fourteen gauged sites.

To explore these data further, weekday and hourly effects were computed for each site by averaging the transformed hour values over each of the seven weekdays over the whole summer. We found these effects to be very similar from one gauged site to the next. Thus, since “bloc” is the unit of time , our approach puts the appropriate zero-one elements into the to mark off the progression of blocs as the progressed. Baseline hourly effects are represented in the hypermean function and are automatically fitted by software, EnviroStat.1.0.1. This then represents the overall diurnal pattern, while allowing site-specific deviations within the model.

##### 3.1. Results and Comparisons

The two methods considered in this paper were applied at all fourteen gauged sites (GSs for short) to predict the twenty-four left out (test) and square-root-transformed hourly observations on Day 121. For all sites, plots showing the observations during the six days leading up to test Day 121, as well as the twenty-four forecasts by both methods for that day, may be seen in Dou et al. [19]. These plots also include the 95% pointwise predictive intervals for that day for M1 and M2. For brevity, we include only figures that present some noteworthy features of these predictors.

To begin, Figure 2 for GS 1 shows a flat ozone field at this location on Day 119 and strongly varying one on Day 120 with two peaks. M1 does better than M2 in following not only the overall trend but tracking the turns quite well. All twenty-four of the test values lie within the 95% predictive credibility band although that for hour 12 overlaps the upper boundary. M2 follows ozone's peaks fairly well, but it forecasts valleys that do not turn up—the strong structure provided by the sines and cosines reduces this model's flexibility and capacity to track the series in this case. Those harmonics point to two twelve-hour cycles despite the general lack of same in the observed series during the days before.

In contrast to the case of spatial prediction [8] where generally four peaks are seen in the sinusoidal curves bounding the DLM's 95% predictive credibility bands, here there are just two. In fact, four peaks would be expected since the mean model's components of variation has sines and cosines that are squared when they enter into the posterior variance. Thus their valleys turn into peaks, one every six hours since both twelve- and twenty-four hour cycles are present in the mean. So why only two?

The answer seems to lie in the fact that the random coefficients for the twelve hour components of variance in the forecast at any monitoring site, is not as uncertain in forecasting at that site than they are in spatial prediction at other sites, which may be a substantial distance from the monitoring sites. Consequently these components, although small, would have large posterior variances in the predictor than these components of the forecaster at one of the monitoring sites. In other words, much more information is available in the data leading up to the last day for the forecasts for the test values at that site than is available at a remote and unmonitored sites.

Incidentally, the lower bounds for M2 forecasts can go below zero so in practice would need to be truncated. Moreover, few of the test values lie within the 95% credibility band.

Huerta et al. [9] give similar plots for three stations in their case study for Mexico City using the method that led to our method M2. These plots differ from ours in that their plotted ordinates represent ozone while ours represent square-root-transformed ozone. We elected to keep the latter since that is the scale on which our analysis was done and hence the one that provides maximal diagnostic benefit. Furthermore our square root scale does not risk exaggerating the observed differences between both methods being considered. At the same time, we recognize the practical importance of publishing forecasts on the ozone rather than square-root ozone scale, and hence Table 1 presents comparisons on the former scale.

Huerta et al. [9] focus on just one week, so their forecasts are for their last Day 7, based on the six preceding days (whose observed hourly concentrations are plotted). In contrast, since our forecasts are based on the entire summer, our last day is Day 121 and we plot the observed values for this day (as well as the preceding seven). Although the daily amplitudes of the sinusoids in Huerta et al. [9] vary from day to day, the periodicity is very consistent over those days, with two fairly distinct peaks each day. In contrast, the ozone patterns over the seven days preceding our forecast Day 121, differ markedly from one day to another. Thus while the data series for Day 120 shows two very distinct peaks, that for Day 116 is nearly flat, and that for Day 118 (essentially) shows only one peak, after a monotone increasing trend rising to the end of that day. Finally Day 7 in their plot shows good agreement at all three of their stations, between their “predictive median” and the test data values. In contrast, our averages of 500 MCMC generated predicted responses for M2 disagree markedly with many of the hourly test values on Day 121. We would conjecture that these discrepancies occur because the data series on Day 121 tends to be much flatter than on all the preceding days except Day 116. (It may also derive from the additional (temperature) data Huerta et al. [9] had to enhance their forecasts that we did not have).

We do not see in any of the plots in Huerta et al. [9], the four peaks seen in some of our 95% credibility bands, suggesting that the uncertainty in the coefficients of the twelve-hour cyclical components is quite well resolved by the six days of data preceding their test day. The credibility bands for their version of M2 contain all the test values for all three of their sites, as do M2’s bands in Figure 2. (However, that is not the case for all fourteen of our stations as noted below. Moreover, their retrospective 95% credibility bands for all three sites are too narrow and fail to contain a large fraction of the observed values on a number of their days, e.g., on Sep 10 and Sep 12 at the Xalostoc site).

A summary very similar to that above for GS 1 also applies to the omitted figures for GS's 2–4, 7–13, although for GS 11 M1 forecasts diverge from the test values over the final four hours, and for GS 13 M1 underestimates those test values in the middle of the day.

Figure 3 shows that GS 5 is different. Although M2’s forecast series has two peaks of moderate height, the bounds for the credibility bands have four of them consistent with two twelve-hour cycles. None of the forecast series tracks the series of test values well.

Summaries similar to that for GS 5 applies to GS 6 with the exception that M1 forecasts track the test value series quite well unlike M2 and to GS 14, where both methods underestimate the test values, M1 being closer overall than M2.

Figure 4 plots the width of the 95% pointwise predictive credibility bands generated by the BSP at each of the twenty-four hours of Day 121. Starting from around 9 AM, these bands tend to increase and continue to do so until the last hour at 11 PM, reflecting the increasing uncertainty about the forecasts since increasingly fewer responses are observed as time increasing.

Figure 5 is a similar plot to that above, but this one for M2 instead of M1. These lengths are close to each other for various sites, exhibiting a wiggly periodic behavior across all gauges sites, a characteristic previously observed in Dou et al. [8, 18]. Although these lengths are very close to each other, M2 actually underestimates the predictive variances at gauged sites as seen in Figure 6, which shows the coverage probabilities of M1 and M2, and also shows a slightly overestimated predictive variance for M1, at the 95% nominal level.

Table 1 presents the root-mean-square-predictive error (RMSPE) of the predictive responses on the 121st day at each one of fourteen gauged sites using these two approaches. At GS , the RMSPE of the prediction at hour can be computed by where is the predictive response at hour of Day 121 and , the corresponding observed response at the same hour, same day, and same site. M2 has a larger RMSPE over all the gauged sites compared with M1. M1 has the smallest RMSPE across most gauged sites.

#### 4. Discussion and Conclusions

For forecasting ground-level hourly ozone concentrations in a Chicago summer, M1 seems better than M2. It seems more accurate, and its 95% predictive credibility interval is better calibrated. However, in any practical application M1 and M2 would need to be assessed in the same manner as in this paper before making a final selection. It should be noted that a new model also based on the dynamic linear approach has been proposed by Sahu et al. [11] for ozone modeling. It would be interesting to compare this new approach with the methods in this paper.

Those methods M1 and M2 are two quite different approaches to modeling space-time process and comparing and contrasting them at a more fundamental level seems worthwhile. To begin, both are quasi-Bayesian models in that they rely on some preliminary data analyses. Thus the diurnal cycles are identified for the M2 mean function, while regional non-site-specific weekday effects are found for M1. Both methods can then incorporate predictors or covariates in their parametric mean functions with random coefficients as well as reflect diurnal patterns of variation. M1 proceeds with this in two steps. First, regional time-dependent covariates or predictors are identified for the construction of the design matrix , in his case day-of-the-week bloc effects. Secondly, it estimates hypermeans for this predictor's coefficients as well as for the multivariate bloc vector's responses, in this case the hourly effects. At the same time, it allows site-specific deviations from these baseline estimates through the random mean coefficients. The case study suggests that these random coefficients capture site-specific hourly effects quite well. In contrast, M2 builds regional features and daily variation into its mean response function through the incorporation of mean trends and periodic components before implementation. Thus, its prescribed mean is fairly structured with Fourier components to describe daily 12- and 24-hour cycles. In contrast, M1 incorporates all general trends and diurnal patterns in the hypermean for its random coefficients and then allows site-specific deviations from this hypermean at all sites. The former is more flexible than the latter, in allowing the coefficients to change over time, but the second is more flexible than the first in allowing an arbitrary shape for the daily pattern of variation and allowing site-specific trends.

Both approaches put spatial covariance structures on their mean models as well as on the residuals. In contrast to M2, M1 does not require a nonstationary spatial covariance structure, and the form of the spatial covariance matrix is completely unspecified at level one of the Bayesian hierarchy. This is not important for the Chicago analysis where the spatial ozone field is quite flat, but we believe it would be an important difference between the models in say Los Angeles or Seattle, where M1 would be favored. M2 prescribes its temporal correlation structure through the structure of its mean function, notably a random walk model for its model coefficient vector. In contrast, M1's 24-hour bloc covariance matrix is unspecified at level one of the hierarchical model, leaving the data a big role in determining its form. However, this feature comes at the price of an assumption that the 24 autocovariance matrix is separable from the spatial covariance. Moreover, the covariance is constant over time. Both of these assumptions are limitations of M1.

Both M1 and M2 rely on both autocorrelation as well as temporal correlation for forecasting next day ozone levels. We believe responses will be somewhat autocorrelated from day to day and that feature can be exploited to enhance the forecasting performance. As formulated, M2 does borrow that additional strength, where M1 loses in the way we have implemented is parent, the BSP, by dropping a day to avoid having to formulate a multivariate time series model for the vectors of daily bloc responses. However, this is not strictly necessary. The more general version of the BSP approach does allow for that correlation, and in principle we would have estimated the hyperparameters that approach, suffered the consequences of possible misspecification and increased the computational burden of implementing M1. Thus M1 was formulated under the assumption of uncorrelated responses between days, unlike M2 which makes no such assumption, with the goal of ensuring timely 24 ahead ozone forecasts.

M2 has a much more general parent, in the dynamic linear model (DLM) and undoubtedly other implementations of the DLM could be made that retained its positive features while overcoming some of the limitations of M2 noted above. For example, a nonstationary spatial covariance could undoubtedly be used. As well the random walk model which has serious limitations could be replaced by say a more reasonable model like an AR(1), albeit with an added parameter burden. That would in turn further restrict the number of monitoring sites it could realistically handle in an urban area. As it stands, M1 computational efficiency enables it to handle a much larger number of sites than M2 in an urban area such as the greater Los Angeles area, which has 30 sites well beyond the reach of M2.

Although any ozone forecast for hourly concentrations 24 hours in advance cannot be much better than the baseline estimate, we have included Case 1 for completeness. Its Equation (9) is the basis of the forecast for that case. That equation actually gives a predictive distribution for all the hourly concentrations in Day 121 and could be used to forecast them all. However, the forecasts would not be very good compared to those given by the method in Case 2. The reason is that the latter exploits the strong AR(2) structure in any consecutive sequence of 24 hourly responses, unlike the former which assumes the daily vectors of responses are conditionally independent as an approximation made mainly for computational expediency. Thus for all other hours the forecaster in Case 2 should be used when the data in Day 120 are available. Note that within the Bayesian framework, the unconditional distribution of 24 dimensional response vectors are not independent, a feature that Case 1 exploits.

We have not considered the realistic case where only a limited number of hours of Day 120 data are available. That is because this case would be just a formalistic extension of methods M1 and M2.

Finally, we would emphasize that the results in Section 2.1 have been generalized in that section, another way in which M1 and M2 go beyond the limited application in the Case Study. Moreover, our approach for turning BSP into the temporal forecasting tool in M1 could well be used for any univariate time series. The approach would avoid the need to capture autocorrelation at fine temporal scales, something that can be difficult to do as in the ozone case, where the AR structure varies over the day.

Overall, we have found that for forecasting Chicago's next day ozone concentration levels, M1 would be more practical and more accurate than M2. With its well-calibrated forecast intervals, it seems a promising methodology for practical application.

#### Appendix

#### A. Supplementary Results

##### A.1. Theorem A.1 and Its Proof

Theorem A.1. *Let =, and . Then, one has the following predictive distributions: *(i)*, where
*(ii)* The predictive distribution of , the unobserved response on the day at gauged site , is t-distributed:
where
*

*Proof . *The result is straightforward by Theorem 1, where and ; decompose and as follows

Hence, we have
where , and are given in Theorem A.1.

We have , that is, the unobserved response of the last hour of the day at Gauged Site (). Hence, we have
that is, .

##### A.2. Theorem A.2 and Its Proof

Theorem A.2. *Let , : , where , and , for and . We also let , where . One then has the following predictive distributions: *(i) *where
*(ii)*where , , and are given in (A.16) and , , and , in (A.19). *

*Proof. *(i) The result is straightforward by Theorem 1 where and ;

(ii) denote for . And let . We will have the following results (details can be referred to in [19]):
for . From (i) in Theorem A.2, we have
where

We first decompose , and as follows:

Consequently, we have (a)(b)where
We then decompose , , and as follows:
Hence the predictive distribution of , that is, is given by
where

Therefore, we have
The predictive distribution for has no analytic form.

#### Acknowledgments

This work was partially supported by funding from the Pacific Institute of the Mathematical Sciences as well as the Natural Science and Engineering Research Council of Canada.