Abstract

Urbanization, industrialization, and regional economic integration have developed rapidly in China in recent years. Air pollution has attracted more and more attention. However, PM2.5 is the main particulate matter in air pollution. Therefore, how to predict PM2.5 accurately and effectively has become a concern of experts and scholars. For the problem, atmosphere PM2.5 concentration prediction algorithm is proposed based on time series and interactive multiple model in this paper. PM2.5 concentration is collected by using the monitor at different air quality levels. The time series models are established by historical PM2.5 concentration data, which were given by the autoregressive model (AR). In the paper, three PM2.5 time series models are established for three different air quality levels. Then, the three models are converted to state equation, respectively, by autoregressive integrated with Kalman filter (AR-Kalman) approaches. Besides, the proposed interactive multiple model (IMM) algorithm is, respectively, compared with autoregressive (AR) model algorithm and AR-Kalman prediction algorithm. It is turned out the proposed IMM algorithm is more accurate than the other two approaches for PM2.5 prediction, and it is effective.

1. Introduction

Air pollution is composed of harmful gases and particulate matter. PM2.5 is one kind of the particulate matter, and it is one of the main indicators affecting air quality. The air pollution statuses are not only the key issues in the scientific research but also the hot social issues of the public’s life. Therefore, many experts and scholars at home and abroad have done a lot of research on PM2.5 concentration prediction. A number of prediction methods [1] have been widely developed, such as principal component analysis [2], regression analysis [3, 4], neural network [58], genetic algorithm, time series [9, 10], and Kalman filter [11]. Azid et al. used principal component analysis (PCA) [12] to analyse the major components affecting air quality and to predict the air pollutant concentration by the predictive ability of neural network [13]. However, the disadvantage of this method is easily trapped in the local minimum, so the accuracy of air quality prediction will be affected. Zhou et al. propose a hybrid model method of combining empirical model decomposition with neural network for predicting PM2.5 concentration [14]. But the EEMD method [15] needs a large amount of calculation and time and can retain residual noise in decomposition process. Ping et al. found a hybrid strategy used to predict the PM2.5 concentration in Beijing, Tianjin, and Hebei [16]. However, the selection of parameters and the training time may be inaccurate. Wang et al. used the autoregressive integrated moving average model to establish the time series of the PM2.5 concentration, and the noise deviation is trained by SVM [17]. But this algorithm is simple and robust and it has some disadvantages such as needing a lot of computing time and equipment requirements. Voukantsis and Dagoumas used the principal component analysis and neural network [8] for comparing the air quality in Thessaloniki and Helsinki [18]. Michano Michanowicz et al. used the AERM-OD (AMS/EPA regulatory model) model to predict the PM2.5 concentration [19]. Many experts and scholars have proposed the Kalman filtering algorithm [20, 21] to predict PM2.5 concentration. In the abovementioned algorithms, there are still great limitations in accuracy and effectiveness because these algorithms predict PM2.5 concentration mostly for a certain period of time or a single model. In different weather conditions, atmospheric PM2.5 concentration levels are different. The single model is only suitable for PM2.5 concentration prediction of a certain weather quality. Once the weather changes, the PM2.5 concentration level is also changed. The single model will produce errors in PM2.5 prediction and cannot be relied on for accurate prediction. And the multiple model algorithm is widely used in the field of control [22, 23] and other fields [2426], but not in the field of weather prediction. Thus, the multiple model algorithm under different weather conditions is selected to reduce the errors and make the prediction more accurate.

Different weather conditions are considered to predict PM2.5 for the above method. Therefore, based on multiple model theory [2729] and interactive multiple model algorithm, in this paper, the main objective of this study is to accurately predict PM2.5 concentration according to different air quality levels. A feasible and effective method is provided for air pollution prediction.

2. Materials and Methods

2.1. Data Collection

In order to prove the prediction accuracy of the proposed method, the PM2.5 concentration data were provided by the monitoring station in Beijing. The monitoring station is shown in Figure 1. The data were collected at different air quality levels from Sept. 29th to Oct. 1st, Mar. 21st to 23th, and Oct. 14th to 16th in 2018. The information sequence, which includes PM2.5, NO2, SO2, CO, PM10, and O3 concentrations (μg/m3), temperature, and humidity, is used to predict the performance of the proposed methods. Then, the time series model is established and combined with the proposed method to predict the PM2.5 concentration.

Figure 2 shows the monitoring area. The red circle represents a monitoring point in the figure.

Figure 3 shows the air quality in three months, and different colors represent different air quality levels. The green color represents air quality level is excellent, yellow represents air quality level is good, and orange represents air quality level is slightly polluted. Specific air quality assessment standard is shown in Table 1.

2.2. Time Series Model

Prediction methods of PM2.5 concentration mainly include numerical model method and statistical method. Time series analysis method belongs to the statistical category, and it can be used to extract the relevant information. Thus, the structure of time series and pattern can be used to deduce the future change trend of the system. The time series model includes autoregressive (AR) model, moving average (MA) model, and autoregressive and moving average (ARMA) model. Since the 20th century, various parameterization methods such as linear regression, smoothing method, and autoregressive process have been widely used in weather forecasting and have achieved good results. In this paper, AR(p) model is given as follows:where are the autoregressive coefficient and is the independent sequence of identically distributed random variable, and it is white noise. is the p-order autoregressive model AR(p).

2.3. Autoregressive-Kalman Filter Method

For complex atmospheric dynamic systems, traditional methods are difficult to measure every predicted variable. But the Kalman filter method can directly use the limited and indirect measurement information to estimate the missing information and predict the future change trend of the atmospheric dynamic system. Because the collected PM2.5 concentration data are time series, PM2.5 concentration time series model and Kalman filtering approach are combined in the paper. The AR model was firstly established, and then, the AR model is transformed into state equation for Kalman filtering. The state space model of PM2.5 concentration at different air quality levels is written as follows:where is the -dimensional state vector, is the -dimensional observation vector, and , are, respectively, -dimensional state transition matrix and observation matrix, and and represent process and measurement noises, respectively. The state equation in the AR-Kalman method [26] is given as follows:

Meanwhile, the measurement equation is expressed as

According to the Technical Regulation on Ambient Air Quality Index, the air pollution index is divided into different levels (excellent, good, mild pollution, and so on). In this paper, the three kinds of models at different air quality levels are only used for research and simulation.

2.4. Interactive Multiple Model Approach

In different weather conditions, atmospheric PM2.5 concentration levels are different. The interactive multiple model (IMM), which uses two or more models to describe the possible states in the process of work, is a soft switching algorithm and estimates the state of the system through effective weighted fusion. When PM2.5 concentration changes, the different models can switch to the corresponding PM2.5 concentration model by the interactive multiple model method in this paper. Therefore, the IMM model has strong robustness. IMM is used to achieve the initial condition of a specific model matching filter by mixing all the state estimations generated from the filter at the current moment under the assumption condition. Each model is performed by a standard Kalman Filter [30]; then, all filters generate updated state estimates weighted combination. At last, the results of state and covariance estimates are finally obtained. At time k, when the target state estimation is calculated by the interactive multiple model method [3133], all filters generate updated state estimates weighted combination. Thus, the time series models of PM2.5 concentration are established at different air quality levels in this paper. The model at different air quality levels can be represented as follows:

Each model has a prior probability. It is given as follows:

The transformation probability from model to model is denoted as follows:

Next, we need to calculate . In the random process,we set . E is a set of three PM2.5 concentration levels. The levels 1, 2, and 3 represent excellent, good, and slight pollution. is a Markova chain of discrete time parameters of state space transition probability, and it is given as follows:

From (9), the air quality level can be approximately considered as Markov chain. In this paper, the Markov state transition probability matrix can be obtained as follows:

The flow chart of IMM is shown in Figure 4:

In Figure 4, represents the initial value of the state of each model before input interaction. represents the initial covariance of each model after input interaction. represents the initial value of the state of each model. represents the initial covariance of each model. and are the updated state vector and covariance matrix of each model. is the prediction value. is estimation covariance matrix; .

In order to apply the IMM algorithm, for different air quality levels, we establish different PM2.5 time series models. When air quality grades are excellent, good, and slightly polluted, respectively, the corresponding PM2.5 concentration models are model 1, model 2, and model 3, respectively. When the time series model is combined with Kalman filter, according to the three different air quality levels that are mentioned above, model 4, model 5, and model 6 are established, respectively. The state space of the models is written as follows:where is the state vector. The target state equation is represented by model , . The measurement equation is given as follows:where is the state transition matrix of model , is the white noise, , is the observed noise sequence with zero mean and covariance matrix is , and . is the observation matrix of model . is the noise correlation matrix of model . There are four steps in the IMM algorithm, which are reinitialization, model filtering, probability updation, and prediction. In this paper, we use homogeneous Markov chain to achieve the transformation between different models and calculate the transition probability matrix using submodel predictions. The specific steps of the IMM algorithm are as follows:

Step 1 (reinitialization). The prediction probability of the model from time to time is performed as follows:where is the selected model at time and is the measured value at time . Based on the model probability, the estimated results of each filter are weighted and merged. The probability of model is expressed as follows:Each Kalman filter is initialized as follows:where is the reinitialized state vector at time , is the normalized constant for model , is the corresponding covariance state estimation of the filter, , and is the initial weight of each filter.

Step 2 (model filtering). where is the reinitialization state at time . The covariance matrix is as follows:The Kalman gain is as follows:Covariance matrix can be obtained as follows:State estimation updates and are given asThe residual is as follows:

Step 3. (probability updating). Likelihood function is designed as follows:Initialization of the weight of each Kalman filter in IMM algorithm is denoted as follows:where denotes the transition probability from model to model . is model probability vector.

Step 4 (prediction). Prediction results are obtained by the update probability of each model and hybrid prediction model can be computed as follows:Estimation covariance matrix is given as follows:

3. Simulation Analysis and Prediction

To verify the feasibility and effectiveness of the proposed IMM algorithm, some simulations are made to predict PM2.5 concentration. The PM2.5 concentration data from Sept. 29th 00 : 00 to Oct. 1st 00 : 00, Mar. 21st 00 : 00 to 23th 00 : 00, and Oct. 14th 00 : 00 to 16th 00 : 00 in 2018 (sampled every ten minutes) are selected as the experimental objects.

PM2.5 concentration in the atmosphere always changes with the change of environment. For this problem, different models are transferred by the Markov chain. Thus, in the experimental study, the transfer is determined by the Markov probability transfer matrix. The Markov transition probability is given as follows:

Initial state probability is . Initial value . Initial covariance .

The state space equation of model 1 is given as follows:

The state space equation of model 2 is given as follows:

The state space equation of model 3 is given as follows:

The model 1 can predict PM2.5 concentration at 0∼50; the predictive effect is shown in Figure 5. Since model 1 is established based on data with PM2.5 concentrations between 0 and 50, when the PM2.5 concentration is more than 50, the predictive effect of the model 1 cannot truly reflect prediction result at other air quality levels.

In Figure 6, model 2 is used to predict PM2.5 concentration. It shows that the model has a small error when the PM2.5 concentration is between 50 and 100, and it cannot predict PM2.5 concentration at other air quality levels. It is similar to model 1: since the model 2 is established based on data with PM2.5 concentrations between 50 and 100, the predictive effect of the model 2 cannot truly reflect prediction result at other air quality levels.

In Figure 7, model 3 is used to predict PM2.5 concentrations. From Figure 7, we can see that when PM2.5 concentrations are between 100 and 150, the model can well predict trend of PM2.5 concentration. It is similar to model 1 and model 2: model 3 can only be used to predict PM2.5 concentrations from 100 to 150.

In Figure 8, PM2.5 concentration is predicted by model 4 (AR-Kalman). When PM2.5 concentration is between 0 and 50, prediction error is obviously small. Compared with the predictive effect of model 1, the predictive effect of model 4 is better. It is similar to model 1: model 4 can only be used to predict PM2.5 concentrations from 0 to 50.

In Figure 9, model 5 is used to predict PM2.5 concentrations. From Figure 9, we can see that the prediction error of PM2.5 concentration is smaller than that of model 2 when the PM2.5 concentration is between 50 and 100.

In Figure 10, the PM2.5 concentration is predicted by model 6. It can be clearly seen that compared with model 3, model 6 has a higher prediction accuracy than model 3.

In Figure 11, we adopt the multiple model method to predict PM2.5 concentration. From Figure 11, we can see that the IMM method can be used to predict PM2.5 concentration values at all air quality levels effectively. Besides, compared with the single model, the IMM method has a better predictive effect. In order to more intuitively compare the prediction effects among various models, the performance indicators of each prediction model are given in Table 2. Table 2 shows that the prediction error of AR model is significantly larger than that of the IMM method. In addition, Figures 57 also indicate that the predictive result of the single model is effective. The prediction performance analysis of AR-Kalman model and AR model is given in Table 2. From Table 2, we can see that the AR-Kalman model is better than the AR method. However, from Figures 8 to 10, the AR-Kalman model is only applicable to predict PM2.5 concentration at the corresponding air quality level and it cannot be used to predict accurately PM2.5 concentration at different air quality levels. In Table 2, when the air quality level is excellent, the prediction error of IMM is more accurate than the single model (AR-Kalman). The single model can only obtain higher prediction accuracy under the corresponding air quality conditions. However, IMM can be used to accurately predict PM2.5 concentration at different air quality levels.

The performance indicators in the above table can be given by the following metrics. The mean absolute error (MAE) is defined as follows:where is the observation, is the predictive value, is the mean value of observed PM2.5 concentration, is the number of points to sample data, and the second indicator is MAPE, which is used to calculate mean absolute percentage error:

The root mean square error (RMSE) is defined as follows:

However, the mean square error (MSE) is defined as follows:

The validity and rationality of the proposed method can be proved by analysing and comparing various indicators.

4. Conclusions

In this paper, the time series model (AR) of PM2.5 concentration is established at different air quality levels and these models are used to predict PM2.5 concentration. The Kalman filter method is introduced by transforming the AR model into the form of equation of state, and the AR-Kalman hybrid prediction method was obtained. The method was used to predict PM2.5 concentration at different air quality levels. Then, the method of AR model and AR-Kalman is compared for PM2.5 concentration prediction. It is indicated that the AR-Kalman model could predict more accurately than the AR model. Finally, the proposed interactive multiple model method is applied and analysed to predict PM2.5 concentration by historical data at different air quality levels. The mean average prediction error is proposed as the evaluation index for the prediction models. The comparison indicates that the interactive multiple model (IMM) could predict more accurately than the single AR model and single AR-Kalman model. For PM2.5 concentration, this method has lower prediction error at different air quality levels.

In future work, temperature, pressure, humidity, and other factors in the weather environment are to be considered. Thus, nonlinear the PM2.5 model will be built and the dynamic update transition matrix may be considered to predict PM2.5 concentration.

Data Availability

The data used to support the findings of this study have not been made available because data sharing is not applicable to this article as no new data were created or analysed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (grant nos. 61873006, 61473034, and 61673053), National Key Research and Development Project (grant nos. 2018YFC1602704 and 2018YFB1702704), and Beijing Major Science and Technology Special Projects (grant no. Z181100003118012).