Abstract

As a result of considerable changes in rural areas in Northern Thailand, the frequency and intensity of haze outbreaks from particulate pollution, particularly fine particulate matter (PM2.5), has increased in this region. To supplement ground-based monitoring where PM2.5 observation is limited, this study applied a multivariate linear regression model to predict PM2.5 concentrations in 2020 using aerosol optical depth (AOD); meteorological parameters of wind velocity, temperature, and relative humidity; and gaseous pollutants such as SO2, NO2, CO, and O3 from ground-based measurements at three locations: Chiang Mai, Lampang, and Nan provinces in Northern Thailand. Two multivariate linear regression models were conducted in this study. The first model (model 1) is a generic model with meteorological parameters of aerosol optical depth (AOD), temperature, relative humidity, and wind speed. The second model (model 2) includes meteorological parameters and several gaseous pollutants, such as SO2, NO2, CO, and O3. In general, the regression model, which used hourly data from 2020 of the three provinces, adequately characterized the PM2.5 concentrations. The performance of model 2 was good for the prediction of PM2.5 concentrations at Chiang Mai (R2 = 0.52) and Lampang (R2 = 0.60). Model 2 improved the prediction of PM2.5 concentration compared to model 1 for both wet and dry seasons. However, model uncertainties were also present, which lays a foundation for further study.

1. Introduction

Northern Thailand is a region with frequent air pollution problems from biomass burning, particularly at the beginning of the year, from January to April [1, 2]. Moreover, severe haze events in this region caused by particulate pollution have become more intense and frequent in recent years. Widespread biomass-burning emissions combined with human activities that emit particulate pollutants are important contributors to air quality problems in this region [35]. In addition, certain meteorological and topographic effects also produce favorable conditions that contribute to the air pollution problem in Northern Thailand.

As a result of the changing settlement in rural areas under urban pressure in Northern Thailand, the frequency and intensity of haze outbreaks from particulate pollution in this region have increased, particularly fine particulate matter (PM2.5) [6, 7]. Pollution and smoke are exacerbated by weather conditions such as in the Chiang Mai basin that favor particulate matter accumulation, burning of stubble in preparation for upcoming rain and planting of rice, and narrow valleys creating appropriate basins for air pollution [8]. On the other hand, air pollution in Northern Thailand originates not only from domestic sources but also from long-distance pollutants brought by meteorological factors such as wind, temperature, and humidity [2]. According to AirVisual data, at the beginning of 2020, the measured PM2.5 in Chiang Mai, one of the largest cities in Northern Thailand, reached a staggering level of 330 over one weekend, while the airborne pollution levels across the region varied between 100 and 390.

Collecting PM2.5 data is the first stage of air quality monitoring, by using a professional sampler with a filter membrane to measure precise PM2.5 concentrations in space. Many PM2.5 sampling and monitoring facilities, however, only examine critical sample data points, which cannot accurately represent data throughout the full research area. The lack of ground-monitoring stations and sampling locations for spatial interpolation may cause mistakes in the monitoring data [9]. To fill this information gap, the method of estimating particulate matter at a specific scale at locations with fewer monitors is an important approach that is now being widely investigated. Approaches to estimating PM2.5 are divided into two types: those that rely on ground-level monitor-based estimation and those that rely on satellite-based estimation [10, 11]. The general methods that incorporate ground-level monitor-based approximation include (a) land use regression (LUR) model, (b) generalized additive mixed model, (c) hierarchical model, and (d) geostatistical interpolation. The methods that rely on satellite-based (monitor-free) estimation explicitly refer to remote sensing techniques [11]. Satellite-based remote sensing data on aerosol properties, such as moderate-resolution imaging spectroradiometer (MODIS) and aerosol optical depth (AOD), are gradually being applied to air quality monitoring [12]. AOD is related to radiation wavelength, aerosol size, vertical profile, and particle size distribution. In the visible and near-infrared bands, particle sizes corresponding to AOD inversion in the visible and near-infrared bands vary from 0.1 to 2 µm, which is very close to the particle size range of PM2.5.

Meanwhile, many methods, including simple linear regression [13] multivariable statistical methods [14], mixed-effects [15], and chemical transport models, have been developed to estimate surface PM2.5 from AOD values based on satellite-measurement data. The goal is to establish a measurable relationship between satellite-measured AOD and PM2.5 concentrations that enables continuous and real-time monitoring of air quality in specific locations [16]. Previously, the relationship between PM2.5 and AOD was shown using a simple linear regression model if the two values were stable over time. Van Donkelaar et al. [17] proposed using the PM2.5/AOD ratio to predict PM2.5, but that was fraught with uncertainty due to a lack of PM2.5 data from ground monitoring. According to research by Wang et al. [18] and Rogula-Kozlowska et al. [19], some pollutants, including organic carbides, ammonium nitrate, nitrogen dioxide, carbon monoxide, and sulfate, are important components contributing to PM2.5, so the effects of AOD, meteorological conditions, and gaseous pollutants have been widely studied in relation to variations in PM2.5 concentrations. For example, Zhao et al. [20] proposed a multivariate linear regression model with the main parameters of aerosol optical depth obtained through remote sensing, meteorological factors from ground monitoring (wind velocity, temperature, and relative humidity), and gaseous pollutants (SO2, NO2, CO, and O3) to predict PM2.5 concentrations in Beijing. Their results indicated that the regression model based on annual data had good performance in estimating PM2.5 concentration. The relationship of particulate matter (PM10 and PM2.5) with gaseous pollutants and several meteorological parameters was investigated by Liu et al. [21], who revealed that pollution showed high levels of PM2.5 and PM10 concentrations during the 9-year observation period (2004–2012) in Beijing. They found significant positive correlations between the particulates and CO and NOx, which suggests that both traffic-related emissions and combustion sources were major contributors to the particulates. In addition, the wind profile appeared to be a key factor in the variation of particles.

To supplement ground-based monitoring, this study aimed to establish a quantitative model for estimating the continuous monitoring of PM2.5 in Northern Thailand. This research utilized multivariate linear regression, using PM2.5 concentration as the dependent variable and AOD data; meteorological characteristics such as wind speed, temperature, and relative humidity; and gaseous pollutants such as SO2, NO2, CO, and O3 as independent variables. To begin, three provinces in upper Northern Thailand were chosen as typical case studies based on the locations of Thai Pollution Control Department monitoring stations in Chiang Mai, Lampang, and Nan, and all necessary data for the model from 2020 were obtained.

2. Methodology

In this study, two linear multivariate linear regression models were created: a generic model (model 1) that includes meteorological parameters such as aerosol optical depth (AOD), temperature, relative humidity, and wind speed and a generic model (model 2) that includes chemical species such as nitrogen dioxide (NO2), sulfur dioxide (SO2), ozone (O3), and carbon monoxide (CO). To demonstrate whether the multivariate linear regression model improved the prediction of PM2.5 concentration, the model performance was compared with that of a generic model [22] as follows:where PM2.5 is given as mass concentration at the ground level (μg/m3), AOD is derived from MODIS (dimensionless), Temp is temperature (°C), RH is relative humidity (%), and WSPD is wind velocity (m/s).

A generic model including gaseous pollutants, based on Zhao et al. [20], applied a multivariate linear regression model to estimate PM2.5 in Beijing city. That study used PM2.5 data from stations in Beijing from 2015 and AOD product data from the Aqua-MODIS 550 nm Collection 6. The equation is as follows:where SO2, NO2, CO, and O3 are the concentrations of the four pollutants at the ground level; β1, β2, …, β8 are the slopes corresponding to the respective variables; and (α + ε1) is the intercept.

This study used measurement data from the Pollution Control Department in Northern Thailand. Figure 1 shows the three ground-monitoring stations in urban areas selected as case studies: the Municipal Office in Nan, the Provincial Government Center in Chiang Mai, and the Meteorological Station in Lampang. This area of upper Northern Thailand is influenced by southwest and northeast monsoons (https://www.tmd.go.th/en/archive/thailand_climate.pdf). This was demonstrated by separating the PM2.5 monitoring data into 2 seasons (wet season: May to October; dry season: November, December, January to April). As demonstrated in Table 1, the mean PM2.5 concentration in the dry season exceeded that of the wet season. Throughout 2020, data from these stations covering hourly mass concentrations of PM2.5 and the four principal air pollutants (NO2, CO, SO2, and O3) as well as meteorological data (temperature, wind speed, and relative humidity) were gathered. The quality assurance/quality control (QA/QC) procedures were based on protocols of the US Environmental Protection Agency (EPA) [23]. The performance criterion of sampling was to ensure quantified data for all PM2.5 exposures and microenvironmental concentrations. QA was based on the following principles: (1) all procedures should be carefully planned, tested, and performed according to standard operation procedures (SOPs) approved by the study director; (2) every piece of data must be fully traceable; and (3) any deviations or irregularities must be recorded. Missing data in PCD measurements occurred on a regular basis but only 15% of the time [3].

For this study, AOD product data from the Aqua-MODIS 550 nm Collection 6 were also retrieved. MODIS is a medium-resolution image spectrometer carried by the US Earth Observing System’s Terra and Aqua satellites that provides daily aerosol data to scientists worldwide [24]. The standard MODIS level-2 (L2) product has a spatial resolution of 10 km, whereas the recently released MODIS Collection 6 (C6) product (MYD04 3K) has a resolution of 3 km. Furthermore, MODIS C6 has been improved in a number of areas, including sensor calibration, cloud detection, lookup table structure, radiation transmission computation, and gas absorption corrections [25]. However, Munchak et al. [25] also suggested low performance of data production over urban surfaces. It is definitely a limitation in terms of air quality applications. Despite this limitation, the product provides additional capabilities for researching aerosol on a local scale, such as resolving small-scale AOD gradients and point sources, retrieving aerosols in patchy cloud fields, and retrieving data closer to coastlines. In addition, the performance of the AOD data was verified with 10 stations located in various regions: Beijing, XiangHe, SACOL, and Taihu, China; Taiwan; Moscow, Russia; Minsk, Belarus; Moldova; Kyiv, Ukraine; and Belsk, Poland. The summary of AOD MODIS Collection 6 was indicated by an R value in the range of 0.769–0.948, RMSE in the range of 0.148–0.324, and MAE in the range of 0.102–0.3 [26].

The coefficient of determination (R2), root mean square error (RMSE), and standard error (SE) were used to assess the accuracy and precision of the two models:where N = number of samples.

3. Results and Discussion

Figures 24 show the scatter distributions for the fitting of models 1 and 2 to the annual, wet season, and dry season PM2.5 data from Chiang Mai, Lampang, and Nan for 2020. The fitted line is based on the least-squares method to estimate the linear trend with the best fit among the scattered points. Chiang Mai, Lampang, and Nan are in upper Northern Thailand and are influenced by the monsoon climate, which is dominated by southwest and northeast monsoons (https://www.tmd.go.th/en/archive/thailand_climate.pdf). The seasonality of the data became clear after the PM2.5 monitoring data were further classified into two seasons (wet season: May to October; dry season: November, December, January to April). As shown in Table 1, the mean PM2.5 concentrations during the dry season substantially exceeded those during the wet season. In general, the dry season (November to April) is characterized by heavy pollution and the wet season (May to October) by light pollution in Northern Thailand, as indicated in Table 1.

Table 2 shows the R2 and error measure for models 1 and 2 fitting results in Chiang Mai, Lampang, and Nan. The R2 value of annual data from model 1 was in the range of 0.02–0.36, RMSE was 25.22–35.11 µg/m3, and SE was 21.72–30.27 µg/m3. Model 2 outperformed model 1, with R2 values of 0.18–0.26 and 0.13–0.42 for the wet and dry seasons, respectively; for model 1, the values were 0.09–0.22 and 0.03–0.17, respectively. RMSE and SE of model 2 were 31.47–42.10 and 6.56–7.08 µg/m3, respectively, during the wet season and 55.06–90.44 and 26.96–34.07 µg/m3 during the dry season. In general, the dry season is characterized by heavy pollution and the wet season by light pollution in Northern Thailand, as indicated in Table 1. Model 2 was better able to capture the trend of PM2.5 in the dry season than the wet season, as demonstrated by higher R2 values, particularly for Chiang Mai and Lampang, which were 0.40 and 0.32, respectively. However, for model 2 in the dry season, RMSE and SE were greater than model 1 by 76.91 and 34.07 for Chiang Mai and 90.44 and 25.81 for Lampang, respectively.

A higher R2 result for model 2 compared to model 1 is mainly due to the addition of gaseous air pollutants, which is inherent in the respective datasets for the two seasons utilized for model building. Furthermore, overall PM2.5 changes across seasons were so substantial that the random selection of test data could not fully guarantee that the data distribution would be proportional to the circumstances of the two seasons. The uncertainty of land surface would affect the accuracy of the MODIS product, which may be a limitation in terms of air quality applications [25]. Moreover, the overestimation of modeled PM2.5 is similar to the results of Zhao et al. [20], who applied a multivariate linear regression model to achieve short-term prediction of PM2.5 in Beijing. PM2.5 ground-monitoring stations were predominantly located in cities and metropolitan areas, with only a few in suburbs. Systematic variations intrinsic to the tapered element oscillating microbalance (TEOM) PM2.5 measuring technology also highlighted uncertainties in the PM2.5 data sources [27]. As a result, the estimation accuracy was reduced [25]. Of concern, the proposed model overlooks the complicated mechanism of PM2.5 formation, which would affect its performance.

Sulfate () and nitrate () were reported to be the two most sensitive parameters contributing to PM2.5 in the atmosphere [28]. can be generated by oxidizing SO2 with OH in the gas phase or with dissolved H2O2 or O2 in clouds [29]. During the day, NO2 with hydroxyl radical (OH), followed by condensation, dominates nitrate production, while at night, heterogeneous reactions of nitrate radical (NO3) or dinitrogen pentoxide (N2O5) predominate [30]. A comparison can be made with other methodologies, for example, 3-dimensional (3D) chemical model simulation. In a simulation using the community multiscale air quality (CMAQ) modeling system conducted by Pimonsree et al. [31] in Northern Thailand, the average estimated PM2.5 concentration from model 2 was 121.58 ug/m3, while a 3D chemical model simulation with Fire Inventory from NCAR (FINN) and modified FINN with fire radiative power (FRP) estimated 238 and 92 µg/m3, respectively, in the dry season. Model 2 underestimated by 50% and overestimated by 31% in the FINN (IOA = 0.25) and modified FINN with FRP (IOA  = 0.55) simulations of the high pollution episode (March 2012).

PM2.5 prediction is a challenging issue influenced by several variables, including weather conditions and environmental seasonality. Various factors influence the regression findings used to predict PM2.5. Aside from weather, there may be some correlation between PM2.5, SO2, NO2, CO, and O3. Table 3 shows that PM2.5 correlates with other measurements and factors. During the study period, mean PM2.5 concentrations correlated strongly with CO, NO2, and SO2 concentrations and AOD. The PM2.5 and AOD relationship is seen to be spatial and temporal [32]. AOD has been widely used to retrieve ground PM2.5 concentrations because of its wide coverage and continuous spatial distribution. While NO2 and SO2 directly affect PM2.5, CO has an indirect effect. The hydroxyl radical combines with CO, and the hydrogen atom reacts with O2 to form HO2. When NO is present, the most important atmospheric reaction that HO2 undergoes is with NO. The results reflect that the variables employed in the regression model are reasonable. The model’s predictive validity improved considerably after integrating the chemical species of gases to capture the impact of gaseous contaminants on PM2.5 production, especially during the dry season when biomass burning is a significant source of CO and NO in this region. However, this study estimated PM2.5 for only three locations and disregarded regional variations in the atmosphere. To improve prediction on a large scale, the spatial heterogeneity of PM2.5 data should be obtained, as suggested by Ma et al. [33, 34]. In addition, the latest version of AOD product data from the Aqua-MODIS 550 nm Collection 6.1, which improves the retrieval algorithms to decrease uncertainty [26], should be obtained for improved prediction.

Even though this model surpassed earlier models, there are still some uncertainties. The sources of PM2.5 measurements are suspect for many reasons. However, systematic errors in the TEOM approach for assessing PM2.5 concentrations were also revealed. Considering the high-quality AOD products at 3 km resolution, there were uncertainties in the AOD data that reduced the prediction accuracy. Third, the model as a whole has certain unsolved questions. The mechanism by which PM2.5 is formed was not taken into account in this study, which would have reduced the model accuracy. Spatial and temporal heterogeneity can cause uncertainty. Localization and time of day can affect PM2.5 concentrations. The model described here provides accurate predictions for a small area. If the model can be used to predict PM2.5 concentrations across a broader geographical and temporal range, it merits further investigation. This could be a useful technique for supplementing ground-based monitoring; however, additional development, including the addition of spatial heterogeneity, should be conducted in the future.

4. Conclusion

This study applied a multivariate linear regression model to predict PM2.5 concentrations using data from remote sensing of aerosol optical depth (AOD), meteorological elements from ground monitoring (wind velocity, temperature, and relative humidity), and gaseous contaminants (SO2, NO2, CO, and O3) at three locations in 2020: Chiang Mai, Lampang, and Nan Provinces in Northern Thailand. In this investigation, two multivariate linear regression models were used. Model 1, a general model, contains meteorological factors such as AOD, temperature, relative humidity, and wind speed. Model 2 incorporates meteorological characteristics as well as gaseous contaminants such as SO2, NO2, CO, and O3. The performance of the proposed model was examined by several statistical indicators: coefficient of determination (R2), root mean square error (RMSE), and standard error (SE). The findings show that the model performed well at capturing annual PM2.5 concentrations in Northern Thailand with R2 values in the range of 0.39–0.60. By including gaseous pollutants in the multivariate linear regression model, model 2 showed improved characterization of PM2.5 concentrations compared to model 1. However, the model found a large discrepancy, and to improve its efficiency, the nonlinear processes of PM2.5 formation and the spatial heterogeneity of PM2.5 data should be considered for prediction on a large scale to supplement ground-based measurements. Additionally, for improved prediction, AOD product data from the Aqua-MODIS 550 nm Collection 6.1, which enhances retrieval techniques to reduce uncertainty, should be acquired.

Data Availability

The data used in this study were provided with permission by Thailand’s Pollution Control Department (PCD) and hence cannot be made freely available. Access to these data can be obtained by contacting the PCD at http://www.pcd.go.th/.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Teerachai Amnuaylojaroen conceived and designed the experiments; performed the experiments; analysed and interpreted the data; contributed the reagents, materials, analysis tools, and data; and wrote the manuscript.

Acknowledgments

This research was funded by the National Research Council of Thailand and the University of Phayao, Thailand. The author thank the Thai Pollution Control Department for supporting ground-based measurement data and the National Research Council of Thailand for financial support.