We present a detailed evaluation of the seasonal performance of the Community Multiscale Air Quality (CMAQ) modelling system and the PSU/NCAR meteorological model coupled to a new Numerical Emission Model for Air Quality (MNEQA). The combined system simulates air quality at a fine resolution (3 km as horizontal resolution and 1 h as temporal resolution) in north-eastern Spain, where problems of ozone pollution are frequent. An extensive database compiled over two periods, from May to September 2009 and 2010, is used to evaluate meteorological simulations and chemical outputs. Our results indicate that the model accurately reproduces hourly and 1-h and 8-h maximum ozone surface concentrations measured at the air quality stations, as statistical values fall within the EPA and EU recommendations. However, to further improve forecast accuracy, three simple bias-adjustment techniques—mean subtraction (MS), ratio adjustment (RA), and hybrid forecast (HF)—based on 10 days of available comparisons are applied. The results show that the MS technique performed better than RA or HF, although all the bias-adjustment techniques significantly reduce the systematic errors in ozone forecasts.

1. Introduction

As a result of combined emissions of nitrogen oxides and organic compounds, large amounts of ozone are found in the planetary boundary layer. Tropospheric ozone is considered one of the worst pollutants in the lower troposphere. At high concentrations, ozone is toxic to plants and reduces crop yield [13]. Some authors as [4] suggest that the effects on plants of indirect radiative forcing by ozone could contribute more to global warming than direct radiative forcing due to tropospheric ozone does. Ozone is a respiratory irritant to humans, and it damages both natural and man-made materials such as stone, brickwork, and rubber [5]. All these harmful effects are significant in Southern Europe, especially in the western and eastern Mediterranean area [69] as in summer solar radiation that exacerbates the effects of ozone. This is the case in areas of northern Spain located near urban and industrial areas, and especially those lying downwind of such areas where local ozone precursors are absent [10, 11]. Consequently, the environmental benefits of monitoring, quantifying, modeling, and forecasting the dose and exposure of the human population, vegetation, and materials to ozone are an essential precondition for assessing the scale of ozone impact and devising control strategies [12].

In the last three decades, significant progress has been made in air-quality modelling systems; Eulerian grid models [13, 14] represent the most sophisticated class of atmospheric models and they are most often used for problems that are too complex to solve using simple models. With continuing advances, Eulerian-grid modelling is increasingly used in research to assess the air and health impacts of future emission scenarios [15]. Air-quality Eulerian models have become a useful tool for managing and assessing photochemical pollution and they represent a complement that could reduce the often costly activity of air-quality monitoring.

Modelling, however, suffers from a number of limitations as its capacity to simulate atmospheric processes will always be imperfect since model assumptions, data limitations, and an incomplete understanding of physical/chemical processes introduce errors into the simulation results. In addition, since numerical simulations are based on the solution of differential equations (i.e., changes of quantities over time and space), even a perfect numerical model cannot be expected to reproduce observations because the initial state of the atmosphere is imperfectly known and the short-term temporal variability in the driving meteorology and emissions are not accurately quantified [16]. In order to improve air-quality model predictions it is crucial to identify the sources of error; emission data is always an important issue in this task as one of the main uses of air-quality models is to influence action to be taken (via emission abatement measures) to prevent unhealthy levels of air pollution. All these factors demonstrate the necessity to evaluate model results through an extensive validation before they can be used and relied upon [17].

In an attempt to meet such requirements, various studies have been performed in several areas [1820]. Some authors [21, 22] studied photo-oxidant dynamics in the north-western part of the Mediterranean area. In addition, for north-eastern Spain, several studies have evaluated the performance of the model MM5-EMICAT2000-CMAQ using a range of horizontal resolutions, comparing different photochemical mechanisms, and testing the capacity of the model to predict high ozone concentrations during typical summer episodes [2325].

We now report the validation of a new mesoscale air-quality modelling system evaluated in north-eastern Spain, called AQM.cat, although the methodology and the modelling system can be applied to any area. It is important to notice that, although the validation is applied over the same area and using the same meteorological and photochemical models, MM5 and CMAQ, as in previous studies, the validation covers a longer period (5 months in year 2009 and 12 months from May 2010 to the end of April 2011) and the system uses a new emission model, the Numerical Emission Model for Air Quality, MNEQA [26]. This model consists of a highly disaggregated emission inventory of gaseous pollutants and particulate matter. A preliminary version of this air-quality modelling system was evaluated [27], and results showed the capacity of the system to forecast ozone concentrations with sufficient accuracy. Since this validation, the modelling system has been continuously improved: we use higher horizontal resolution than in previous versions (from 9 km to 3 km), in 2010 we changed the value of the vertical coefficient of turbulent diffusivity, , in the CMAQ model [19], and we also updated the emission model. Thus, this study used the most up-to-date version of the model, MNEQAv4.0. In addition, testing that the air-quality model is capable of simulating changes from a given initial state reasonably well, or that the model can be used for regulatory purposes in accordance with [28, 29], we now try to improve forecast accuracy through postprocessing of model results using appropriate bias-adjustment techniques. Utilization of a postprocessing bias-adjustment method incorporates recent model forecasts combined with observations, to adjust current model forecasts. Previous biases in the forecast values are used to estimate the systematic errors in the forecast. Conceptually, once the future bias has been estimated, it can be removed from the forecast; such an adjusted forecast should be statistically more accurate than the forecast based on the raw model output.

It is difficult to justify the use of bias-adjustment techniques to analyse model deficiencies or performance, but it is useful if a simple bias correction can lead to significantly improved forecasts from an operational forecast viewpoint [29]. For several decades, postprocessing of model predictions has been used to improve forecasts of weather-related surface variables such as temperature, dewpoint, or precipitation. One of the principal reasons for this is that despite years of refinement and improvement, meteorological models still contain significant errors [30, 31]. Air-quality modelling systems have even greater errors because they include the meteorological model errors and also the photochemical and emission model errors. Numerous sensitivity studies have indicated the important role that emissions play in air-quality model forecasts [3234], and that highly uncertain emissions are the principal propagator of uncertainty in air-quality modelling systems. In addition to these modelling errors, there are two important issues that are influencing the atmosphere photochemistry, particularly in the Mediterranean areas. The first one is the role of urban and suburban aerosols in ozone formation, as aerosols modify UV solar radiation reaching at the surface and therefore the ozone production. Previous works relative to this topic, as [35], conclude that the reason that the solar UV at the suburban areas is higher than that measured in urban areas could be attributed to the presence of aerosols layer in the planetary boundary layer (PBL). The second issue to consider is the influence of stratospheric ozone on tropospheric ozone; some authors as [36] shows that changes in photodissociation rates induced by changes in ozone column densities may significantly affect lower tropospheric ozone.

In recent years, several studies have been published that focus on bias correction in operational air-quality forecasts [2937]. A second reason for the use of bias-adjustment techniques in air-quality forecasting is that the measurement sites used in the evaluation may not be representative of the forecast concentrations averaged over an area equal to that of a model grid cell [30].

The objectives of this study are to evaluate the MM5-MNEQA-CMAQ air-quality modelling system, certify the efficacy of postprocessing techniques to improve operational ozone forecasts, and analyse and compare several bias-adjustment techniques.

2. Experimental Set-Up and Models

In the following sections we comment on the characteristics of the area studied and the data used for the evaluation of meteorological and photochemical simulations and give a description of the modelling system.

2.1. Area Characteristics and Data Used

The area of study was Catalonia in north-eastern Spain. The population of Catalonia recently reached seven million, most of whom live in and around the city of Barcelona. Catalonia is a Mediterranean area with complex topography, bounded by the Pyrenees to the north and by the Mediterranean Sea to the south and east. From a geographic point of view, the territory can be divided into three different areas. One area runs more or less parallel to the coastline and includes a coastal plain, coastal mountain range, and precoastal depression. The second area is a central depression, and the third area includes the Pyrenean foothills and the Pyrenees Mountains proper (Figure 1). This complex topography induces an extremely complicated flow structure because of the overlap of local mesoscale circulations with the synoptic flow. Modelling results indicate that during summertime, sea-breeze flow and channelling effects due to terrain features strongly influence the dynamics of the circulatory patterns. This therefore affects the location of the pollutant plumes [38], as the main emission sources in the area of study are located on the coast, especially in the Barcelona conurbation and the Tarragona industrial zone. Nitrogen monoxide and dioxide () from the metropolitan area of Barcelona represent 23% of total emissions in the area of study, and 26% of total traffic emissions. Nonmethane volatile organic compounds (NMVOCs) from the industrial area of Tarragona represent 64% of total industrial emissions in the area of study. Biogenic sources are also of great importance near the Mediterranean coast; they represent approximately 30% of the total annual NMVOC emissions in the MNEQA model. These emissions are particularly important during summertime because of the higher temperatures and intense solar radiation, which promote the photochemical build-up of ozone and other pollutants.

Meteorological modelling results were evaluated using a set of 35 surface meteorological stations distributed throughout Catalonia that belong to the Catalan Meteorological Service. The evaluation included wind speed and wind direction measured at 10 m above ground level (a.g.l.), and air temperature and specific humidity measured at 1.5 m a.g.l. The air-quality evaluation, focussed on ozone concentrations, was performed using hourly automatic measurements of ozone (O3) concentration reported by 40 air-quality surface stations (Figure 1) named the XVPCA (Xarxa de Vigilància i Previsió de la Contaminació Atmosfèrica) that belong to the Environmental Department of the regional Catalan authorities. The measurement stations, 40 out of a total of 51, were selected for comparison with the modelling results since they are considered representative of the scale covered by the model and they also cover the whole of the territory considered with an accurate territorial distribution. The XVPCA network includes rural, urban, and suburban stations that provide air-quality data from intensive traffic, and industrial and background areas.

The evaluation covers two main periods, from May to September in both 2009 and 2010, hereafter referred to as summer 2009 and summer 2010. This is because during summertime, maximum ozone concentrations are reached in Southern Europe. However as the model was also run for several months from May 2010 to May 2011, a winter 2011 evaluation is included in order to improve our knowledge of model performance and to understand the reasons behind the biases and, therefore, potential ways of improving the model in the future. Both summer periods are characterized by typical summertime situations with high-pressure conditions favouring the development of thermal circulations, such as mountain winds and sea-breeze circulations (especially in the coastal area) that advect pollutants from the highly urbanized and industrialized coastal zones to the inland regions. The strength of the on-shore flows and the complex topography of the northwest Mediterranean coast produce several pollutant injections due to topographic forcing. As the sea-breeze front advances inland and reaches the mountain ranges located nearly parallel to the coast, topographic injections occur at different altitudes [38, 39]. A number of studies have shown that during summertime, layering and accumulation of pollutants such as ozone and aerosols take place along eastern Spain [40]. These mechanisms, together with the development of the thermal internal boundary layers in the coastal areas [36], are mainly responsible for the high levels of air pollutants observed, resulting in adverse air quality and an increased potential for health problems.

In winter, the meteorological situation is dominated by high-pressure conditions that favour the development of local circulations, especially mountain and valley winds, and by regional wind systems such as the Cierzo (north-western) and Tramontana (northern) winds, that are associated with the blocking effect of the Alps and Pyrenees during Alpine cyclogenesis events [41]. The former situation is associated with stagnant air masses with high levels of air pollutants, especially particulate matter, while the latter situation is associated with low levels of air pollutants.

2.2. Modelling Approach

Meteorological numerical simulations were performed using the PSU/NCAR mesoscale model, MM5 [42], version 3.7. The MM5 model was configured with three nested domains that have grids of 27, 9, and 3 km (Figure 1), with a two-way interface with the smallest grid. The innermost domain, D1, covers grid cells; D2, cells; D3; the inner domain corresponding to Catalonia covers grid cells. The reason for using this relatively small, innermost domain, D1, is that this domain is used by the Catalan Meteorological Service (for his own purposes and for studying air quality). For this domain, four-dimensional data assimilation (FDDA) based on Newtonian relaxation or “nudging” is used, because of its capacity to assimilate a wide variety of observations, especially those that are not direct model state variables (e.g., satellite data, radar, etc.). The output of this domain is integrated into the air-quality system in order to increase the accuracy of the predictions. Initial and boundary conditions for domain D1 are updated every six hours with data from the European Centre for Medium-range Weather Forecast global model (ECMWF) with a resolution. The boundary layer processes are calculated using the MRF scheme based on [43]; the Grell scheme [44] is used for cumulus parameterization, while the microphysics is parameterized using the Schultz scheme [45]. For the land surface scheme, a five-layer soil model is applied in which the temperature is predicted using the vertical diffusion equation for layers that are 0.01, 0.02, 0.04, 0.08, and 0.16 m below the surface layer, with the assumption of a fixed substrate [46]. Solar radiation is parameterized using the cloud-radiation scheme [47]. The vertical resolution includes 32 levels, 20 below (approximately) 1500 m, with the first level at approximately 15 m and the domain top at about 100 hPa. The distribution of the vertical layers, with higher resolution in the lower levels, is a common practice [1848]. The temporal length of meteorological simulations was 48 h and they were performed for each day of each period. A one-hour time-step resolution is used in all domains and models. MM5 hourly output files are processed using the Meteorology-Chemistry Interface Processor (MCIP) version 3.2 for the CMAQ model.

The photochemical model used in this study to simulate pollutant dispersion is the US Environmental Protection Agency (EPA) model-3/CMAQ model [49]. This model, supported by the US EPA, undergoes continuous development and consists of a suite of programs for conducting air-quality model simulations. The CMAQ version 4.6 simulations use the CB-05 chemical mechanism and associated EBI solver [50] including the gas-phase reactions involving nitrogen pentoxide (N2O5) and water (H2O), and it removes obsolete mechanism combinations in the three phases: gas, aerosol (solid or liquid), and aqueous. In addition to these changes, we use version 4.6 which includes modifications in the aerosol module, AERO4 [51, 52] with a preliminary treatment of sea salt emissions and chemistry. For treating clouds in the model, we use the asymmetric convective module, ACM [53]. Additional details regarding the latest release of CMAQ can be found on the Community Modelling and Analysis System (CMAS) [54]. The CMAQ model uses the same configuration as the MM5 simulation. Boundary conditions and initial values for domain D1 come from a vertical profile supplied by CMAQ itself, while boundary and initial conditions for domains D2 and D3 are supplied by domain D1 and D2, respectively. The model is executed for 48 h, taking the first 24 h as spin-up time to minimise the effects of initial conditions.

MNEQA is an emissions model devised by the Mesoscale and Microscale Atmospheric Modelling and Research Group (MAiR) of the Department of Astronomy and Meteorology at the University of Barcelona [25]. In this paper we use the new version 4.0 of the model which is applied to the area of Catalonia, but few modifications should be required to use the model in other regions, with adequate use of local data. MNEQA takes into account different emission sources and distinguishes between surface and elevated sources. Related to the MNEQA vertical structure, we consider only 9 levels, because all emissions are released into the atmosphere at a height below approximately 310 m, including those coming from the major point source stacks. For MM5 and CMAQ, we consider 32 levels, 20 below (approximately) 1500 m, with the first level at approximately 15 m and the top of the domain at about 100 hPa. MNEQA includes emissions from natural sources, such as dust particles from erosion and hydrocarbons emitted by vegetation, as well as several anthropogenic sources, such as traffic, industry and residential consumption. Chemical speciation is computed in the model using [55]. This procedure will generate some errors, but for some mobile sources the accurate information is not available in Europe; for the emission factor we have used the European guidelines, which should be adequate for gas and aerosol mechanisms in CMAQ. Monthly and weekly profiles from [56] are applied to determine the value of an emission for each month and day of the year.

Nested domains are commonly applied to air-quality modelling systems because the constituent meteorological, emissions, and photochemistry models must deal with grid variability and various domain ranges. As a result of the variability in spatial resolution, the MNEQA method differs from one domain to another. For smaller domains such as D3, MNEQA uses a bottom-up method to calculate pollutant emissions. This involves working on each type of source in a particular way using local information. MNEQA incorporates an industrial emissions inventory from the Catalan Environmental Department. Traffic and residential consumption emissions are calculated by emission factors using different traffic and residential parameters (length of roads, average speeds on roads, and vehicle type distribution and population) [25]. Natural emissions are computed in MNEQA using different parameterizations, and several MM5 model outputs. To incorporate particles from dust erosion, MNEQA uses the windblown dust model from [57, 58], whilst biogenic emissions are incorporated via the method described by [59]. For larger domains (D1 and D2), MNEQA uses a top-down method, which incorporates into the model pollutant emissions from [60]. Europe and a small section of North Africa are covered by the EMEP domain, with a 50 × 50 km2 grid resolution. Emissions are computed from national data referring to 11 different sectors (combustion plants, production processes, solvents, waste treatment, agriculture, etc.) and seven pollutants (CO, NH3, NMVOC, , PM2.5, and PMcoarse). The top-down method consists of a disaggregation model based on soil uses of Corine Land Class 2000, with 250 m resolution, coupled with different statistical functions, including socioeconomic variables [61]. We associate EMEP sectors with CLC2000 soil uses to allocate geographical distributions to emissions with horizontal resolutions of 9 km (D2) and 27 km (D1). Statistical functions are used as weighting functions to accurately distribute an emission value over the different grid cells. Finally, the emissions are merged for every grid cell using a geographical information system (GIS), because the photochemical model does not distinguish between the various types of sources; all that the CMAQ model requires is one emission value for each grid cell, each time step, and each chemical species. MNEQAv4.0 is executed daily to calculate hourly emissions for 24- and 48-hour simulations.

3. Statistical Air-Quality Modelling SystemEvaluation

As an air-quality modelling system is a conjunction of three models: meteorological, photochemical, and emission, and since the last of these has already been compared with other emission models [25], in this section the results of the MM5 meteorological model and CMAQ photochemical model will be evaluated using the classic approach; we compare discrete values corresponding to surface measurements and model results. The statistics selected to drive the evaluation are those recommended by [1662]. In order to establish a criterion for evaluating the performance of the meteorological model, but not to establish a rigid criterion for model acceptance or rejection (i.e., not a pass/fail test), we have adopted the benchmarks suggested by [63, 64] and recommended by [16], in the knowledge that these benchmarks were originally proposed for 24 h averages over a whole year. The mathematical definition and the associated benchmarks are summarized in Table 1.

3.1. Evaluation of the Meteorological Model Performance

As mentioned before, the model results were evaluated by comparison with data from a set of surface meteorological stations distributed over Catalonia. The root mean square error (RMSE), the mean absolute gross error (MAGE), the mean bias (MB), and the index of agreement for the meteorological parameters were calculated for hourly data provided by both the model and meteorological station observations (see Table 1 for definition) resulting in a daily statistical value (Figure 2), a statistical value for the whole period (Table 2), and a hourly statistical value (Figure 3). Wind statistics and wind direction are calculated for wind speeds higher than 0.5 m s−1, as wind direction is not reliable for lower speeds. The computation of statistical parameters is straightforward for wind speed, specific humidity, and temperature, but the circular nature of wind direction makes it difficult to obtain the corresponding statistics. To avoid this problem we used a modified wind direction, wherein 360° was either added to or subtracted from the predicted value to minimise the absolute difference between the observed and predicted wind directions [3865]. For example, if the prediction is 10° and the corresponding observation is 340°, then a predicted value of 370° is used.

To evaluate the accuracy of the meteorological predictions of the air-quality modelling system, in Table 2 global statistical values for the whole period are compared with the benchmarks in Table 1 recommended by [1662] Denby. The results show that statistical values corresponding to specific humidity and wind speed fall within recommendations or are very close to them. Wind speed is slightly overestimated by the model, while wind direction statistics show a slightly greater dispersion, with 69.35° (2009) and 70.90° (2010) as MAGE values. This characteristic is associated with the complex topography and it is reproduced by several authors working in this area with different meteorological models [2266]. For temperature, the underestimation (−0.88 K and −0.56 K) is higher than that suggested (−0.50 K), but MAGE and IOA values fall within the recommendations. There are some reasons for this slight underestimation; we may require a more accurate representation of surface properties, mainly soil moisture which is too high during summertime in the simulated area (personal communication) and therefore leads to an underestimation of surface heat fluxes.

Figures 2 and 3 show the daily and hourly evolution, respectively, of RMSE and MB values corresponding to wind speed, wind direction, temperature, and specific humidity for the period studied. Wind speed (Figure 2(a)) shows an RMSE of between 1 and 3 m s−1 and an MB of between 0 and 2 m s−1 for the whole period. As we commented in Section 2.1, typical summer weather conditions are anticyclonic with a slight pressure gradient favouring the development of mesoscale circulations such as a sea-breeze regime on the coast and mountain winds inland. The wind speed associated with these circulation patterns is reproduced quite well by the model, although it tends to overestimate wind speed, only slightly during the day but more at night, with values of MB between 0.5 and 1.0 m s−1 (Figure 3(a)). Figure 2(b) shows the daily evolution of RMSE and MB values corresponding to wind direction, with values ranging between 60° and 90°, and −10° and 10°, respectively, whilst Figure 3(b) shows the hourly evolution of RMSE and MB for the same variable. During the diurnal hours, RMSE values decrease strongly indicating that the model performs well; however, at night, the model does not reproduce very weak winds accurately [48], with the RMSE values increasing to 80–90 degrees. There is a large difference between MB and MAGE values for wind direction. The reason is because the mean error takes into account the sign of the forecasting error for long periods because positive and negative deviations compensate each other, however, the mean absolute error solves the aggregation problem of the mean error. Positive and negative deviations do not compensate each other. Therefore the mean absolute error can be used to evaluate the sum of single deviations. The poor performance and uncertainties in wind direction could be attributed to the complex topography of the area studied, to random turbulent processes that cannot be simulated by the models, or to subgrid variations in terrain and land use. It is therefore unlikely that these errors can be reduced much further. For air temperature, daily evolutions of RMSE and MB are presented in Figure 2(c). For most of the period studied, the RMSE is between 2 and 3 degrees, while the MB is between 0 and −1.5 degrees. These results highlight the tendency of the MM5 model to underestimate the air temperature at a height of 1.5 m during diurnal hours (Figure 3(c)). At night, the model slightly overestimates the temperature. This results in a negative MB value for the whole period. Finally, Figures 2(d) and 3(d) show daily and hourly evolutions of RMSE and MB for specific humidity. Daily RMSE is between 1 and 2.5 g kg−1 while MB values range from −1 to 1 g kg−1. Similarly to the case of wind direction and temperature evaluation, the model reproduces specific humidity better during daytime than during nighttime.

The performance of the meteorological model agrees with several previous studies of meteorological applications for air-quality modelling [18, 19], especially those based on the area of study [22], where classical statistics for surface fields have been reported. In comparison with previous results of this air-quality modelling [26], the use of higher horizontal resolution yields better results, especially for temperature and wind direction. Meteorological predictions over complex terrain require fine horizontal and vertical resolution for resolving complex mesoscale circulation patterns, especially during nighttime. It is known that during the day, under weak synoptic-scale conditions, the lower part of the atmosphere (the atmospheric boundary layer) is mainly governed by thermal effects; buoyancy is the dominant mechanism driving turbulence, which is assumed to be correctly described by existing similarity theories in the model. In contrast, during the night, turbulent mixing is greatly reduced or even completely suppressed, or it becomes intermittent. Various observations and studies [6769] have revealed a wide variety of nocturnal boundary layer situations, with sporadic and intermittent turbulence. In addition, this stable stratification in a nonuniform terrain induces local circulations, such as drainage flows [9], and leads to several phenomena such as gravity waves, density currents [70], intrusions, and meandering, with the frequent presence of low-level jets [71]. The misrepresentation of these effects can lead to incorrect estimates of vertical turbulent transport in the nocturnal boundary layer, resulting in an erroneous amount of exchange of heat and momentum over wide areas of the planet [48]. If we focus on the mesoscale, namely, the basin scale, as in this study, an incorrect treatment of the stable boundary layer can produce errors of several degrees in the temperature at 1.5 m and moisture levels that are too high or too low and can lead to 100% errors in the wind speed or direction if the local physiographical features are not well represented in the model [48].

3.2. Evaluation of the Photochemical Model

Statistical metrics for photochemical model performance assessment are calculated for surface ozone concentrations measured at 40 measurement sites in the 3 × 3 km2 modelling domain. Measurement stations are listed in Table 3 which shows their emplacement as well as the average observed ozone concentration values versus the simulated values for the summers of 2009 and 2010. The results indicate that the model has a tendency to underestimate the ground-level ozone concentration at different stations. In addition, if we plot the time evolution of average hourly ozone concentrations provided by the air-quality model at the 40 air-quality stations for the summer 2009 and 2010 periods (Figure 4), we can see that the model overestimates the ozone concentration during nighttime and underestimates it during the day, which is more significant in summer 2009 than in 2010. Observed and modelled values show quite significant correlation, with coefficient correlations of 0.733 and 0.637 for 2009 and 2010, respectively.

The diurnal underestimation could be caused by the following: sea-breeze and land-breeze circulations that are difficult to reproduce, associated with the important role of circulation patterns in photochemical simulations [72]; a tendency to underpredict ozone precursors (nitrogen oxides, carbon monoxide, and volatile organic compounds) in air-quality modelling systems [73]; the uncertainty in the emissions coming from the MNEQA model and in its temporal distribution; underestimation of temperature in the MM5 model [74], which is more important in summer heat episodes. Three main sources of nocturnal overestimation could be as follows: the model does not represent nocturnal physicochemical processes accurately enough [23]; the MNEQA emission model may not calculate nighttime emissions properly; meteorological parameters, such as wind speed, wind direction, and vertical mixing, are not well reproduced by the model when the synoptic forcing is weak and the ambient winds are light and variable [4875]. This nocturnal overestimation is reduced in summer 2010, probably due to several different reasons, such as the following: changes in the minimum value of the vertical coefficient of turbulent diffusivity, , from a constant value of 1 m2 s−1 (summer 2009) to a variable value depending on the urban fraction, from 0.5 m2 s−1 for rural areas to 2 m2 s−1 for urban areas; emissions coming from the MNEQA model are updated. This improvement consists of using a top-down methodology, as commented on in Section 2.2, in order to include emissions from different sectors such as extraction and distribution of fossil fuels and geothermal energy, solvent and other product use, road transport, waste treatment and disposal, agriculture, and shipping emissions. These sectors were not previously considered in the inner domain, as we did not have this information which was not required for the use of a bottom-up methodology.

The US Environmental Protection Agency [28] developed guidelines indicating that it is inappropriate to establish a rigid criterion for model acceptance or rejection (i.e., no pass/fail test). However, building on past ozone modelling applications [27] a common range of values for bias, error, and accuracy has been established. The three multisite metrics used are the unpaired peak prediction peak accuracy (UPA), the mean normalised bias error (MNBE), and the mean normalised gross error (MNGE). These statistics and the accepted EPA criteria [27] are showed in Table 1. The statistical parameters are applied to hourly and to both 1-hour (1 h) and 8-hour (8 h) maximum ozone surface concentrations. Observation-prediction pairs were often excluded from the analysis if the observed concentration was below a certain cut-off; the cut-off levels varied from study to study but often a level of 60 ppb was used [73], thus removing the influence of low concentrations such as nighttime values. To further analyse the importance of the cut-off values, in Table 4 we analyse the degree to which the model lies within the recommended statistics using different cut-off values (60, 50, 40, 30, and 20 μg m−3).

It can be seen that the CMAQ model meets the US EPA-recommended criteria (5–15% for the MNBE and 30–35% for the MNGE) for acceptable model performance when the statistics are computed for cut-off levels from 20 to 60 μg m−3 for hourly and 1 h and 8 h maximum concentrations during both summer 2009 and 2010 periods. There is an exception for MNBE and for MNGE hourly values which are slightly out of range from 60 and 20 μg m−3 cut-off, respectively. MNBE and MNGE values are similar for the different cut-off values with a slight tendency for MNGE values to decrease as the cut-off value increases. For MNBE, the tendency is the opposite; it increases as the cut-off value increases. We also see that for 20 to 60 μg m−3 cut-off values, the MB is negative but increases its absolute values as the cut-off value increases, indicating that the model increases its underprediction; the number of points (stations) where the model underestimates ozone concentration increases.

In order to adopt a cut-off suitable for air-quality evaluation and taking into account that the main objective of this study is to analyse the performance of the model at forecasting 1 h and 8 h maximum ozone concentrations, we adopted 60 μg m−3 as the cut-off value. This value is applied to all the data pairs used in the statistical analysis, with the following exceptions: (i) data pairs corresponding to average summer observed and simulated ozone concentrations for the stations presented in Table 3 and those corresponding to Figures 4 and 5 showing the time evolution of average hourly ozone concentrations provided by the air-quality model and bias-corrected forecast at the 40 air-quality stations for the summer 2009 and 2010 periods; in both cases no cut-off value has been applied; (ii) data pairs corresponding to the evaluation of winter 2011, in which case we adopt 35 μg m−3 as a cut-off value.

Thus, adopting the cut-off value of 60 μg m−3, in the following paragraph we will extend the discussion of the last column of Table 4 as it corresponds to this cut-off. Statistical parameters indicate that the model shows a tendency to underestimate ground-level ozone concentration, as MB and MNBE values are negative for hourly and 1 h and 8 h maximum surface ozone concentrations. During summer 2009, and for hourly values, MNBE falls just outside the recommended criterion; it is −15.22%, whilst the other statistics fall within the recommended values. For 1 h and 8 h maximum surface ozone concentrations, all the statistics are within the recommended values. During summer 2010, the model behaviour is similar, with an MNBE hourly value of −19.27%, just outside the recommended criterion, with MNGE, UPA, and IOA values within the EPA recommendations. The performance of the air-quality modelling system agrees with some previous results on air-quality modelling in the area studied [2224], although those studies used different meteorological and emission models and different horizontal and vertical resolution, and the air-quality model was evaluated over a different period, the year 2004.

Model evaluation for winter 2010 (see Table 5) shows a quite different behaviour. Statistical parameters indicate that the model shows a tendency to overestimate ground-level ozone concentrations, as MB and MNBE values are positive for hourly and 1 h and 8 h maximum surface ozone concentrations. In this case, all the values are within the recommendations except the MNBE for 1 h maximum value of surface ozone concentration, which is 17.85%. The comparison between the results shown in Tables 4 and 5 indicates that the model underestimates ozone concentrations during summertime and overestimates them, although to a lesser extent, during winter. The reason for this behaviour can be attributed to different causes, some of which are strongly controlled by meteorological factors such as solar radiation and temperature, and recirculations at the subgrid scale which are not well simulated by the meteorological model, and some others related to a more pronounced seasonal variation of emissions which is not well reproduced by the emission model. However, at present, we do not have sufficient information on how to account for this summer-winter variation and further work will be necessary.

To explore forecast performance as a function of the measurement emplacement, we classified the 40 monitoring stations as rural, urban, or suburban. A monitor location is considered representative of an urban measurement when, within an area of 1 km2, at least 90% of the terrain is urbanized; it is considered rural when, within an area of 100 km2, at most 10% of the terrain is urbanized; it is considered suburban when neither the urban nor the rural definition applies [76].

Table 6 shows the results of the statistical parameters MNBE, MNGE, and IOA as a function of measurement emplacement. We chose MNBE and MNGE values to estimate the bias and gross error of the air-quality forecast because we can compare these values with EPA recommendations. We complete the table with IOA because this parameter provides a measure of the agreement and accuracy of the forecast.

Our results show that for MNBE values, the best result is found for 1 h maximum ozone concentrations during summer 2009 in urban areas, and the worst for hourly values during summer 2010 in urban areas. For MNGE values, the best result is found for 8 h maximum ozone concentration during summer 2010 in rural areas, and the worst for hourly values during summer 2010 in urban areas. Finally, for IOA values the best results, similarly, are found for 8 h maximum ozone concentration in all areas during both summers and the worst for hourly values at urban sites during summer 2010. We can conclude that the worst results are found for hourly values in urban areas. This characteristic is mainly due to the important variability of traffic emissions, which is difficult to reflect in the emission model, and the poor representations of the nocturnal physicochemical processes in the photochemical model [24].

3.2.1. Random and Systematic Error

As well as the statistical validation, the random and systematic root mean square errors, and , defined in expressions (1) and (2) were computed in order to evaluate the intrinsic error in the model and the random error [77]. One has and values are modelled and observed concentrations, respectively; and are the least squares regression coefficients derived from the linear regression between and is the total number of model/observation pairs. These new measurements help to identify the sources or types of error, which can be of considerable help in refining a model. The represents the portion of the error that is attributable to systematic model errors; the represents random errors in the model or model inputs that are less easily addressed. For a good model, the random portion of the RMSE is much larger than the systematic portion, whereas a high systematic value indicates a poor model.

Results are given in Table 7 for each summer, 2009 and 2010. For the case studied, results for 1 h and 8 h peak surface ozone concentrations show that systematic error values are higher than random ones, which implies that the air-quality system still has to be improved and refined. To analyse these results more thoroughly, we plan to carry out and expand detailed analysis to identify the key factors that influence these prediction biases, such as sensitivity to synoptic conditions, difference between meteorological models, emissions model adjustments, and the influence of boundary conditions on CMAQ simulations over Catalonia.

3.2.2. Modelling Quality Objectives for Ozone “Uncertainty” Defined by Directive EC/2008/50

In 2008, a new European air-quality directive was ratified by the European Parliament [78]. This directive replaced earlier directives with the intention of simplifying and streamlining reporting. Whilst previous directives largely based on assessments and reporting measurement data, this new directive places more emphasis on the use of models to assess air-quality within zones and agglomerations. The increased focus on modelling allows the Member States more flexibility in reporting assessment and the potential to reduce the cost of air quality monitoring. However, modelling, like monitoring, requires expert implementation and interpretation. Models must also be verified and validated before they can be confidently used for air-quality assessment or management [16].

The quality objectives for a model are given as a percentage uncertainty. Uncertainty is then further defined in the directive as follows: “The uncertainty for modelling is defined as the maximum deviation of the measured and calculated concentration levels for 90% of individual monitoring points, over the period considered, by the limit value (or target value in the case of ozone), without taking into account the timing of the events. The uncertainty for modelling shall be interpreted as being applicable in the region of the appropriate limit value (or target value in the case of ozone). The fixed measurements that have to be selected for comparison with modelling results shall be representative of the scale covered by the model.”

Since values of the uncertainty are to be calculated, a mathematical formula would have made the meaning much clearer. As such, the term “model uncertainty” remains open to interpretation. Despite this, [16] suggests that it should be called the relative directive error (RDE) and defines it mathematically at a single station as follows: where is the closest observed concentration to the limit value (LV) concentration, or the target value for ozone, and is the correspondingly ranked modelled concentration. The maximum of this value found at 90% of the available stations is then the maximum relative directive error (MRDE), whose value recommended in the European Directive EC/2008/50 is 50%. MRDE values calculated with the AQM.cat model are 42% in 2009 and 43% in 2010, within the regulatory framework recommended in the European Directive EC/2008/50.

3.2.3. Bias-Adjustment Techniques

As we comment in the Introduction, in recent years several studies have been published that focus on the application of bias-correction methods to operational air-quality forecasting. A range from highly sophisticated mathematical algorithms to simple algorithms has been studied and evaluated. Among them, a sophisticated technique that has been used in recent years is the Kalman Filter (KF) predictor forecast method [79], described in detail in [80]. This technique has been shown to improve the accuracy of forecasts and represents the current state of the art in bias-adjustment techniques. Even so, techniques based on very simple mathematical algorithms have been incorporated into air-quality modelling systems, showing similar performance to that achieved with the KF technique in improving the accuracy of forecast models by reducing forecast errors [2931]. For this reason, in order to improve the AQM.cat ozone forecasts and to evaluate the performance of bias-adjustment techniques for future operational ozone forecasts, three simple bias-correction approaches are applied and analysed. However, we do not rule out the possibility of incorporating the KF technique into the air-quality modelling system in future work.

The first approach, hereafter referred to as mean subtraction (MS) [29], is the standard and most-used bias correction in meteorological analysis. The second approach, hereafter referred to as multiplicative ratio adjustment (RA) [29], is an alternative correction that may be more suitable for ozone predictions since corrected mixing ratios will always be a nonnegative technique, and the third approach applied is the hybrid forecast, hereafter referred to as (HF) [31].

The bias-correction algorithms are applied to hourly and 1 h maximum surface ozone concentrations. The mathematical formulation of each technique is presented and commented below.

The MS method is based on an additive correction to correct the model forecast, , given in where corresponds to each ozone monitoring site and to the hourly value. The correction given by expression (7) is calculated as an evaluation of the systematic error within the model, , and the observations, , over a number of days (). The parameter is taken as 10 days in this study. The selection of a 10-day period for the calculation of the running mean bias correction is based on the results of [30]. One has The second method considered is the multiplicative RA bias correction. The mathematical formulation of this method is given by expression (8). It is based on the application of a correction factor over the forecast model. This coefficient is calculated as the quotient between the additions of observed and modelled values over the last 10 days. One has The HF approach is based on the simple assumption that the model is capable of predicting the change in the pollutant mixing ratio from one day to the next, due to changes in the synoptic or large-scale forcing [31]. In this way, the forecast value can be improved by combining the observed value at the previous time with the forecast model change from the previous to current time. The bias-adjusted hybrid forecast for a future time can be represented as where are observations at time , and and are modelled ozone forecast values at time and , respectively. Here is 24 h.

Corrected mixing ratios, using bias-adjustment techniques, should be nonnegative, but this is not always guaranteed with the mean subtraction or the hybrid forecast. To solve this problem and avoid negative ozone concentrations using MS or HF methods we applied a filter based in excluding corrections higher than a prescribed value which is a 50% of the value forecasted by the model. This filter is applied in a very few occasions, as usually corrections are a small fraction of the forecasted values.

The impact of the application of the bias-correction techniques to surface ozone concentrations forecast by the model is to significantly improve the ozone concentration results (see Figure 5).

Concerning the statistical parameter values, MNBE, MNGE, and UPA are within the EPA recommendations (see Tables 8 and 9). In this way, hourly surface ozone concentration underestimation is reduced by between 9% and 15%, while 1 h and 8 h maximum underestimation is reduced by between 12% and 25%. In addition, the performance changes to a slight overestimation. However, in some cases the application of the bias-correction techniques makes the forecast worse or the errors are very similar. As an example, we observe that IOA is reduced by 14% or 8% when RA is applied to 1 h and 8 h maximum ozone surface concentrations in summer 2010. The reason for this behaviour is that the RA mathematical algorithm increases the width of the concentration distribution around the mean value and therefore increases the standard deviation as can be seen in Table 10. The RA method reproduces very high levels of ozone concentration, but this forecast does not always provide realistic values. In the operational daily forecasts we observed situations where the quotient between observations and model forecasts is greater than 1.25, which indicates that in these situations the bias-corrected forecast does not provide good results and the correction is excluded from the analysis.

Histograms of model forecast errors (bias error) of the daily 1 h and 8 h maximum surface ozone concentrations are shown in Figure 6, with and without applying bias-correction techniques. The negative bias of the air-quality modelling system predicting ozone concentrations is observed as a shift of the peak in the distribution toward negative values, while the distributions shift the peak close to zero when the model forecast is corrected. In some cases the bias-correction techniques enlarge the height of the distribution peak, therefore increasing the number of forecasts with bias close to zero and reducing the width of the distribution with the consequent decrease of the standard deviation, . To evaluate the shift in the peak and the width of the distribution when we apply bias-correction techniques to air-quality modelling system outputs, we adjust the distribution to a Gaussian function. The results of this adjustment are presented in Table 10.

The quantitative results shown in Tables 8 and 9 and the histograms in Figure 6 demonstrate that MS is the technique that leads to the greatest adjustment. In addition, the operational daily forecast contributes to some conclusions regarding the accuracy of each technique in different situations. MS and RA provided better results than HF when the area studied was dominated by stagnant meteorological situations, prevailing mesoscale circulations such as a sea breeze, which are mainly associated with high ozone episodes. However, the mathematical formulation of MS produces distributions with lower standard deviations if we compare them with those corresponding to RA. Otherwise, HF provides the best results when it is applied to meteorological conditions characterized by changes in the synoptic or large-scale forcing, assuming that the model is capable of predicting the change in the pollutant mixing ratio from one day to the next.

In relation to the MRDE, the application of bias-correction techniques also improves air-quality forecasts, reducing the MRDE value significantly (Table 11); the HF technique minimises this value. For this reason, the air-quality modelling system presented in this paper can be used for the aims the directive considers.

4. Conclusions

This paper describes the evaluation of a coupled regional air-quality modelling system used to simulate ozone over the north-western Mediterranean area (Catalonia) during two periods from May to September in both 2009 and 2010. The air-quality modelling system consists of the MM5 mesoscale model, the MNEQA emission model, and the CMAQ photochemical model. Although the same meteorological and photochemical models have been applied in Catalonia in the past, they have been evaluated either over shorter periods or using a coarser horizontal resolution. The meteorological and photochemical forecasts are compared with observations from 35 surface meteorological stations belonging to the Catalan Meteorological Service and 40 air-quality surface stations of the Environmental Department of the Catalan regional authorities (Spain).

This study demonstrates the capacity of the air-quality modelling system MM5-MNEQA-CMAQ to forecast ozone concentrations with sufficient accuracy, as the statistics fall within the recommended EPA and European performance goals. Daytime forecasts for hourly and 1 h and 8 h maximum surface ozone concentrations indicate satisfactory behaviour of the model for different cut-off values, from 20 to 60 μg m−3; however, during summer periods, modelled ozone concentrations are underestimated during the day while during the night they are overestimated. This nocturnal overestimation is partially reduced in summer 2010, probably due to several improvements, such as the following: changes in the minimum value of the vertical coefficient of turbulent diffusivity, , from a constant value of 1 m2 s−1 to a variable value depending on the urban fraction; emissions coming from the MNEQA model are improved, updating the values of those coming from biogenic, traffic, and industrial sources. However, despite of these improvements, the air quality model still presents some differences between measurements. The reasons for all these shortfalls could be associated with the uncertainty of the emissions model, mainly during nighttime and in urban areas; the poor representations of the nocturnal physicochemical processes in the photochemical model; the failure of the meteorological model to reproduce certain parameters, such as wind direction and air temperature, which is underestimated, mainly during the daytime; finally, some factors of uncertainty as the modification of UV solar radiation reaching at the surface due to the presence of urban and suburban aerosols layers in the ABL [35], and changes in stratospheric ozone which could influence tropospheric ozone concentrations [36]. For the winter period analysed, statistical parameters indicate that the model shows a tendency to overestimate hourly and 1 h and 8 h maximum surface ozone concentrations, which could be attributed mostly to meteorological and emission factors. Surface ozone concentration forecasts are more accurate in urban and rural areas for 1 h and 8 h maximum ozone concentrations, whilst the worst forecasts are of hourly values during summertime in urban areas. Results from systematic and random errors show that, although the air-quality model forecasts meet the EPA guidelines and the European directive, systematic error values are higher than random ones, which implies that the air-quality system still has to be improved and refined.

Assuming, as tested, that the air-quality model is capable of simulating the changes from a given initial state reasonably well, one method for reducing model errors (including the uncertainty of the meteorological, emission and photochemical models, and the fact that measurement sites used in the evaluation may not be representative of the forecast concentrations averaged over an area equal to that of a model grid cell) is to apply postprocessing bias-adjustment techniques to the modelling system outputs. In this study, three methods, MS, RA, and HF, are coupled and applied to the air-quality modelling system. Results reveal a significant improvement in the statistical parameters considered in the operational ozone forecast evaluation. The analysis of each bias-correction technique applied to the air-quality modelling system outputs concludes that the MS method provides the best forecasts and minimises the width of the concentration distribution. MS and RA provide better results in stagnant meteorological situations, while HF better reproduces situations when it is applied to meteorological conditions characterized by changes in the synoptic or large-scale forcing, assuming that the model is capable of predicting the change in the pollutant mixing ratio from one day to the next. Related to this technique corrected forecasts fail when the quotient between observations and model forecasts exceed a determined value.

The final performance of bias-adjusted forecast techniques depends on the performance of the model to which the bias-adjusted technique is applied. Since bias-adjusted techniques can only reduce systematic errors inherent in the model, additional improvements in the model physics and emissions inventory and chemistry are needed to reduce both systematic and random model errors to further improve forecast performance.


This research was supported by the Spanish Government through the project CICYT CGL 2009-12797-C03-02. The authors gratefully acknowledge the assistance of technicians at the Catalan Environmental Agency for providing information regarding the emissions inventory and air-quality measurements. Thanks are extended to the Catalan Meteorological Service for providing the initial and boundary meteorological fields for executing MM5.