Abstract
The lockdown and the strict regulation measures implemented by Chinese government due to the outbreak of the COVID-19 pandemic not only decelerated the spread of the virus but also brought a positive effect on the nationwide atmospheric quality. In this study, we extended our previous research on remotely sensed estimation of PM2.5 concentrations in Yangtze River Delta region (i.e., YRD) of China from 2019 to the strict regulation period of 2020 (i.e., 24 Jan, 2020-31 Aug, 2020). Unlike the method using aerosol optical depth (AOD) developed in previous studies, we validated the possibility of moderate resolution imaging spectroradiometer (MODIS) top-of-atmosphere (TOA) reflectance (i.e., MODIS TOA) at 21 bands in estimating the PM2.5 concentrations in YRD region. Two random forests (i.e., TOA-sig RF and TOA-all RF) incorporated with different MODIS TOA datasets were developed, and the results showed that the TOA-sig RF model performed better with of 0.81 ( μg/m3) than TOA-all RF model with of 0.79 ( μg/m3). The monthly averaged PM2.5 exhibited the highest value of 50.81 μg/m3 in YRD region in January 2020 and sharply decreased from February to August 2020. The annual mean PM2.5 concentrations derived by TOA-sig RF model were 47.74, 32.14, and 21.04 μg/m3 in winter, spring, and summer in YRD during the strict regulation period of 2020, respectively, showing much lower values than those in 2019. Our research demonstrated that the PM2.5 concentrations could be effectively estimated by using MODIS TOA reflectance at 21 bands and the random forest.
1. Introduction
Air pollution, especially the fine particulate matter (i.e., PM2.5) pollution, has aroused strong attention of the public in Asian countries since the last two decades [1, 2]. According to the World Health Organization (WHO), the two biggest developing countries, i.e., India and China, have witnessed their air quality worsening and have now ranked the most polluted countries in the world [3–5]. To better control the air pollution, Chinese government has established a large network with over 1500 monitoring stations distributed in urban and suburban areas to observe the dynamic change of the PM2.5 concentrations [6, 7]. Although these monitoring stations provide accurate and continuous daily data for the public, it is still difficult to obtain the wide range of PM2.5 concentrations because of the sparse distribution characteristics of the monitoring network [8, 9]. The advantaged remote sensing technology has now become the most widely used method to achieve the worldwide ground-level PM2.5 concentrations [10–13].
Most of the previous studies retrieved the aerosol optical depth (AOD) from satellite remote sensing imageries (e.g., moderate resolution imaging spectroradiometer (MODIS) and ozone monitoring instrument (OMI)) to estimate the PM2.5 concentrations [4, 9, 14–18]. These AOD products with resolution of 1-17.6 km could reveal the overall distribution of PM2.5 concentrations within an area. However, since the AOD reflects the integral of the extinction coefficient of the total atmospheric column, the AOD-PM2.5 relationship is thus highly affected by the meteorological parameters, such as the planetary boundary layer height (PBLH) and relative humidity (RH). Previous studies have proved that the vertical-and-humidity corrected AOD illustrated much higher correlation with ground-level PM2.5 concentrations [13, 19, 20]. Second, most of the current used AOD products derived by using dark target algorithm would cause the missingness of AOD values in the areas with high reflectance (e.g., the desert and the building areas) [21]. Therefore, filling the AOD gaps is a necessity before data modeling. Several methods including merging different types of AOD products and machine learning algorithms have been employed to obtain the full coverage of AOD. For example, He and Huang [22] combined two different types of MODIS AOD products (i.e., 3 km dark target AOD and 10 km deep blue AOD) to improve the daily coverage of AOD in China. Zhao et al. [23] developed a random forest using 15 meteorological parameters to fill the gaps of AOD missingness in Beijing-Tianjin-Hebei, and their result showed that the random forest could be effectively used for obtaining the 100% coverage of AOD.
Considering the above-mentioned disadvantages of AOD products, some efforts have been made to estimate the PM2.5 by using the top-of-atmosphere (TOA) reflectance directly from the remote sensing imageries [14, 17] For example, Shen et al. [6] developed a convolutional neural network (CNN) using satellite TOA reflectance at blue, red, and NIR band from MODIS product (hereinafter referred as MODIS TOA) and meteorological fields to predict PM2.5 in Wuhan region of China. The derived results indicated that the CNN incorporated with TOA reflectance could explain 87% of the PM2.5 variability. Yang et al. [7] further examined the possibility of MODIS TOA reflectance at the same three bands in PM2.5 prediction in Yangtze River Delta (i.e., YRD) region of China, and they obtained a site-based cross-validation of 0.87. These works demonstrated that the PM2.5 concentrations could also be derived by using satellite TOA reflectance. However, the correlation of MODIS TOA reflectance at other bands with PM2.5 concentrations has not ever been validated, and whether these datasets could be used for PM2.5 prediction is still unknown.
The outbreak of the COVID-19 pandemic started at the end of 2019 has triggered an unprecedented slowdown in global economic growth [24–26]. To better control the spread of COVID virus, the lockdown with strict regulation measures was implemented by Chinese government, which not only decelerated the spread of the virus but also brought a positive effect on the nationwide atmospheric quality [27–29]. Therefore, in this paper, we explored the correlation of MODIS TOA reflectance at 22 bands with PM2.5 concentrations in YRD region from 2019 to the strict regulation period of 2020 (i.e., 24 Jan, 2020-31 Aug, 2020). Since the machine learning algorithms (e.g., convolutional neural network and random forest) have been demonstrated as useful methods with high predictability, we developed two random forests incorporating with MODIS TOA reflectance at different bands to derive the PM2.5 in YRD region in this study. As far as we are concerned, this is the first study that developed random forests using MODIS TOA reflectance at 22 bands to obtain the PM2.5 concentrations.
2. Materials and Methods
2.1. Study Area
The eastern coastal region with an area of 219000 square meters, i.e., the Yangtze River Delta region (i.e., YRD) of China, was defined as our study case (Figure 1). The study area covers 25 cities and is characterized by subtropical monsoon climate with wet and high temperature in summer and relatively dry and cold in winter. As one of the largest urban agglomerations in China, the YRD region has witnessed the economy accelerating rapidly during the last 20 years and has now become the most developed region in China. Unfortunately, the air quality of YRD region has been gradually deteriorating since a large amount of the exhausted gas from vehicles and industrial factories has been emitted into the air, and the annual concentrations of the primary air pollutant, i.e., PM2.5, reached 65 μg/m3 for the last five years.

2.2. Data
2.2.1. MODIS TOA Reflectance
The MODIS Level-1B product with resolution of 1 km from 2019 to the strict regulation period of 2020 (i.e., 24 Jan, 2020-31 Aug, 2020) was downloaded in this study (https://ladsweb.modaps.eosdis.nasa.gov/). All of the MODIS imageries were reprojected to the World Geodetic System 1984 (WGS84) in IDL. As there are only 22 bands (i.e., b1-b22) for MODIS Level-1B product with 1 km resolution, we finally extracted the MODIS TOA reflectance at 22 bands for data modeling.
2.2.2. Ground-Level PM2.5 Observations
The hourly PM2.5 observations from 2019 to the strict regulation period of 2020 were collected from the web site of China Environmental Monitoring Center. There are 158 monitoring stations in the entire study area (Figure 1). The PM2.5 values recorded as NaN (i.e., not a number) were discarded before data integration.
2.2.3. Auxiliary Parameters
The meteorological data was obtained from Goddard Earth Observing System Data Assimilation System GEOS-5 Forward Processing (GEOS-5 FP) in this study. GEOS-5 FP has finer native horizontal resolution () and temporal resolution (hourly data and 3 hourly data) than older GEOS-5.2.0 version. Ten parameters from different fields were included in this study (Table S1), and all of the meteorological parameters were interpolated into km grid cell by utilizing bilinear interpolation algorithm for data modeling in IDL.
The MODIS monthly vegetation index (MVI) product (spatial resolution: 1 km) was used in this study (https://ladsweb.modaps.eosdis.nasa.gov/). We also downloaded the major roadways (shape file) from Baidu StreetMap and then calculated the road density ( km) in ArcGIS for data modeling. Finally, the MODIS TOA reflectance at 22 bands and other auxiliary parameters assigned for each PM2.5 monitoring site were extracted for model development.
2.3. Model Development and Validation
Previous studies have demonstrated that the MODIS TOA reflectance at three bands (i.e., blue, red, and NIR band) could be employed for PM2.5 estimation in different regions [6, 7]. Despite of these three bands, our research found that the MODIS TOA reflectance at other bands also exhibited significant correlation with PM2.5 concentrations. However, there is no such a study that has clarified whether MODIS TOA reflectance at other bands could be used as the proxy for PM2.5 estimation. Therefore, the purpose of this paper was to examine the possibility of MODIS TOA reflectance at 22 bands in deriving PM2.5 concentrations in YRD region. Here, we developed two ensembled random forests that integrated with MODIS TOA reflectance at different bands to estimate the PM2.5 concentrations in YRD region. Three steps were included in the process of model development in this study, and Figure 2 gives the flow chart of the model development. (1)First, the significance of MODIS TOA reflectance at each band with daily PM2.5 concentrations was tested, and the nonsignificantly correlated bands were removed before modeling. On this basis, the selected bands along with other parameters were employed as the model inputs in the random forest (hereinafter referred as TOA-sig RF model)(2)Second, we trained the random forest by using MODIS TOA reflectance at 22 bands without significance test (hereinafter referred as TOA-all RF model), since the random forest is applicable even with highly correlated variables [30]. The importance of each variable could be achieved via the importance index (i.e., the increase of mean square errors, IncMSE) (hereinafter referred as TOA-all RF model)(3)Finally, to assess the model accuracy, the tenfold cross-validation approach (10-fold CV) was carried out in this study. We first randomly split the dataset into 10 parts, where 9 parts were used for model training, and the remaining was used for prediction. We repeated this process for 10 times, and the averaged result was used as the final accuracy. The coefficient of determination () and the root mean square error (RMSE) were also used to evaluate the correlation of the estimated and observed PM2.5 concentrations

3. Results
3.1. Descriptive Statistics
There are a total of 34 independent variables for model construction, and the autocorrelation test between two of them was first carried out. The results showed that the MODIS TOA reflectance presented stronger correlation between visible bands than near infrared and middle infrared bands. It should be noted that the MODIS TOA reflectance at band 22 (i.e., b22) exhibited much less available values than other bands; the b22 data was thus discarded to improve the stability of the random forest model in this study.
The daily PM2.5 ranged from 1.00 μg/m3 in summer to 407 μg/m3 in winter for the entire study period. Specifically, the seasonal mean PM2.5 concentration of 2019 was 63.11 μg/m3 in winter, while the value sharply decreased to 35.08 μg/m3 in 2020. The same trend was also found in spring from 2019 (~50.10 μg/m3) to the strict regulation period of 2020 (~32.48 μg/m3).
3.2. Modeling and Validation
In this paper, we extended our previous research for PM2.5 estimation in YRD region from 2019 to the strict regulation period of 2020. We first examined the correlation of MODIS TOA reflectance at 21 bands and PM2.5 concentrations, and the results showed that the MODIS TOA reflectance at 14 bands (i.e., b1~b4, b6~b7, b9, b11~b12, b16~b17, and b19~b21) was significantly correlated with PM2.5 concentrations, as their values were all less than 0.01 (Table 1).
The MODIS TOA reflectance at the bands which showed significant correlation with PM2.5 concentrations was used for training the random forest (i.e., TOA-sig RF). The TOA-all RF model was also trained by using MODIS TOA reflectance at all bands. Figure 3 gives the performance of the TOA-sig RF model and TOA-all RF model, and we achieved an overall of 0.81 and 0.79 for the TOA-sig RF and TOA-all RF model, respectively. The RMSE values were 8.07 μg/m3 and 9.13 μg/m3 for the TOA-sig RF and TOA-all RF model, respectively, indicating that the TOA-sig RF model performed better than TOA-all RF model. The cross-validation results showed that the TOA-sig RF model achieved slightly higher CV (~0.80) than TOA-all RF model (~0.78). The CV RMSE of the TOA-sig RF model was 8.24 μg/m3, i.e., much lower than that of TOA-all RF model (~9.20 μg/m3).

(a) Model fitting of TOA-sig RF

(b) Model fitting of TOA-all RF

(c) Model validation of TOA-sig RF

(d) Model validation of TOA-all RF
We also validated the predictive power of the TOA-sig RF and TOA-all RF model in four seasons of 2019 and three seasons during the strict regulation period of 2020 (Table 2). The model fitting and validation results all indicated that the TOA-sig RF model showed higher predictability of PM2.5 concentrations than TOA-all RF model. Specifically, the PM2.5 concentrations derived by TOA-all RF model exhibited the fitted- of 0.83, 0.77, and 0.75 for winter, spring, and summer, respectively, in 2020. The RMSE values were 10.96 and 9.80 μg/m3 in winter and spring, respectively, which presented much higher values than summer (~8.23 μg/m3). The TOA-sig RF model achieved better performance than TOA-all RF model, as the derived (RMSE) values were 0.84 (~10.10 μg/m3), 0.78 (~9.54 μg/m3), and 0.77 (~8.00 μg/m3) for winter, spring, and summer, respectively, in 2020.
3.3. PM2.5 Prediction in YRD Region
Given that the TOA-sig RF model presented better performance than TOA-all RF model, the annual/seasonal-mean PM2.5 with full coverage was derived by using TOA-sig RF model. Figure 4 gives the spatial distribution of the average PM2.5 from 2019 to the strict regulation period of 2020. The overall PM2.5 concentrations of these two years exhibited the same trend, showing higher values in north YRD and lower values in south YRD. We also found that the average PM2.5 concentrations in the strict regulation period of 2020 was much lower than that in 2019. The seasonal mean PM2.5 derived from TOA-sig RF model exhibited higher values in each season of 2019 than those of 2020 (Figure 5). Specifically, the average PM2.5 concentrations ranged from 21 to 86 μg/m3 in the winter of 2020, which however decreased sharply to 17-46 μg/m3 in the spring of 2020. The seasonal mean PM2.5 concentrations were only 14-26 μg/m3 in the summer of 2020. This is because a large amount of the industrial and vehicle waste gas has been directly emitted into the air until the outbreak of COVID-19, which would increase the concentrations of the air pollutants. The lockdown with strict regulation measures implemented by Chinese government in 2020 not only decelerated the spread of the virus but also brought a positive effect on the nationwide atmospheric quality, as the PM2.5 concentrations decreased sharply from February to August.

(a) The spatial distribution of average PM2.5 concentrations

(b) The average PM2.5 concentrations of monitoring stations

4. Discussion
In this study, we examined the random forest in deriving PM2.5 in YRD region of China using MODIS TOA reflectance at 22 bands. Some benefits of our method are clarified as follows. First, the MODIS TOA reflectance instead of MODIS AOD was used as the input variable of the random forest. The MODIS AOD products retrieved by using dark target or deep blue algorithm yield AOD missingness in different types of the land (e.g., forests and buildings) [23, 31, 32]. Although various methods have been employed to fill the gaps of AOD values, the intermediate process of using AOD always increases the uncertainty of the model. Additionally, even the full-covered AOD products filled by various methods showed unstable correlation with PM2.5 concentrations. Therefore, some efforts have been made for PM2.5 prediction by skipping the AOD retrieval process [6, 7]. These researches held the truth that the PM2.5 concentrations could be successfully predicted by using satellite TOA reflectance at blue, red, and NIR band. However, whether the MODIS TOA reflectance at other bands could be used for estimating PM2.5 concentrations is still unknown. Thus, in this paper, we validated the possibility of MODIS TOA reflectance at other bands in obtaining PM2.5 concentrations, and the results showed that other bands (e.g., band12 and band9) in addition to the bands used for AOD retrieval also exhibited significant correlation with PM2.5 concentrations.
Second, the MODIS TOA reflectance shared the same spatial resolution (~1 km) with the multiangle implementation of atmospheric correction (MAIAC) AOD, but the retrieval process of MAIAC AOD is much more complex compared with the TOA reflectance directly from the satellite remote sensing imageries. We demonstrated the stability of remote sensing and machine learning method in PM2.5 estimation in different periods in this study. Two random forests using different datasets were developed, and the results showed that the overall determined coefficients () of the entire study period and the separated years for these two models were all nearly 0.8. The high predictability of the random forest was also validated in other study areas [30, 33]. The result also showed that the lockdown with strict regulation measures implemented by Chinese government has brought a positive effect on the nationwide atmospheric quality, as the average PM2.5 concentrations during the strict regulation period illustrated much lower values than that of 2019.
In this paper, we validated the importance of PBLH in explaining the PM2.5 variability in YRD region, which was consistent with the findings of our previous research [7]. We validated a total of 30 meteorological parameters in explaining the PM2.5 variability, and the results showed that only 10 parameters exhibited a relatively high correlation with PM2.5 concentrations in YRD region. Except for the PBLH, the RH and the wind speed also ranked the top five important parameters, indicating that the PM2.5 concentrations were more affected by meteorological fields than other land use variables. Among the MODIS TOA reflectance at 21 bands, b9 (438~448 nm) and b12 (545~556 nm) exhibited much higher correlation with PM2.5 concentrations than other bands in YRD region. It is reasonable that the MODIS TOA reflectance at b1, b3, and b7 exhibited lower correlation with PM2.5 concentrations than b9 and b12, as the AOD products retrieved by using these three bands (i.e., b1, b3, and b7) were also not significantly correlated with PM2.5 concentrations in most study areas [11, 34]. The derived PM2.5 concentrations indicated that our model using MODIS TOA reflectance at 21 bands could be applied for PM2.5 estimation in YRD region and other regions with similar climatic and topographic condition. However, some limitations should also be noted. First, there are 36 bands in MODIS level-1B product, and we only examined the relationship of MODIS TOA reflectance at 21 bands and PM2.5 concentrations. Therefore, whether the MODIS TOA reflectance at other 14 bands could be used for PM2.5 estimation still needs to be further validated. Second, there are still some other parameters that affect PM2.5 concentrations, such as the distribution of industrial pollution resources; therefore, future research will focus on how these parameters influence PM2.5 concentrations over the entire study area [9].
5. Conclusion
Our research is unique that we validated the possibility of MODIS TOA reflectance at 21 bands, rather than AOD products and the TOA reflectance at three bands which were used for AOD retrieval, in PM2.5 prediction in YRD region from 2019 to the strict regulation period of 2020. The results of the two random forests using different datasets showed that the TOA-sig RF model exhibited higher predictability of PM2.5 concentrations than TOA-all RF model. The annual mean PM2.5 concentrations derived by TOA-sig RF model presented a reasonable spatial distribution in YRD region, showing higher values in north YRD and relatively low values in south YRD. Our results demonstrated that the MODIS TOA reflectance at 21 bands could also be successfully used for estimating PM2.5 concentrations with relatively high accuracy.
Data Availability
The data that supports the findings of this study are available from the first author (email: [email protected]) at Minjiang University.
Additional Points
Highlights. (i) The possibility of using satellite TOA reflectance at 22 bands for PM2.5 estimation in YRD was validated. (ii) The TOA-sig RF model explained 81% of the variability in PM2.5 concentrations. (iii) The comparative spatial distribution of PM2.5 concentrations pre- and post-COVID outbreak in YRD was illustrated.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
Lijuan Yang was responsible for the methodology, software, writing-original draft, and project administration. Tingting Shi carried out data processing and visualization and model validation. Junjie Wu performed data downloading and processing. Musheng Lin was responsible for data visualization, investigation, and validation. Shuai Wang wrote, reviewed, and edited the manuscript.
Acknowledgments
The authors would like to thank NASA for the use of MODIS and GEOS-FP data. This work was supported by the Initial Scientific Research Fund of Talents in Minjiang University (No. MJY20001) and the Science and Technology Department of Fujian Province (Nos. 2015H0029 and 2021R0123) and the program of Fujian Educational Bureau (No. 2021JAT200435).
Supplementary Materials
Table S1: the meteorological parameters of the GEOS-FP. (Supplementary Materials)