Abstract

With the rapid development of urbanization, environmental pollution has drawn worldwide attention. Accurate air quality prediction is very significant for alleviating severe pollution conditions and human healthy life. A novel hybrid model that combines a spatiotemporal correlation analysis method and an effectively simple network is proposed to forecast the concentrations of air pollutants. Firstly, a new calculation method considering distance and concentrations among stations is proposed to select stations strongly correlated with the target station on the basis of grey relation analysis. Secondly, a fully convolutional network is proposed to extract spatiotemporal features efficiently based on a temporal convolutional network. In addition, meteorological factors and other pollutants are selected as auxiliary factors to further improve the prediction accuracy. To verify the validity of the proposed model, the air quality and meteorological data collected from 40 monitoring stations in Fushun, which is an important industrial city in China, are applied for air quality prediction. The performance of models is evaluated by a series of metrics. The RMSE, MAE, and values of the proposed model are 12.505, 8.214, and 0.884 for forecasting the next hour concentration. Compared with other conventional models and hybrid models, the proposed model verified that it outperforms the other models with air quality prediction.

1. Introduction

With the rapid development of industrial technology, air pollution is becoming more and more serious, which has aroused worldwide concern [1]. Many research studies have shown that air pollution will cause respiratory diseases, heart diseases, and cancers in humans [24]. Consequently, accurate air quality prediction is of great significance for alleviating air pollution and people’s healthy life. Air pollutants include six kinds (, , , , , and ), among which is an important indicator to measure the pollution status [5]. Therefore, is selected as the prediction object of this study. The methods of prediction can be divided into three categories: statistical methods, machine learning, and deep learning.

Statistical methods that mainly include autoregressive moving average (ARMA) model [6, 7], autoregressive integrated moving average (ARIMA) model [8], and multiple linear regression (MLR) model [9] construct appropriate mathematical models to fit historical time-series curves and then forecast based on the established models [10]. In recent years, machine learning has been developed rapidly and many scholars begin to apply machine learning methods, which mainly include support vector regression (SVR), random forest (RF), and artificial neural networks (ANNs) to solve time-series prediction problems [1113]. Among these machine learning methods, ANNs such as back propagation (BP) and radial basis function (RBF) have been widely applied to the prediction of air pollutants because they can fit the nonlinear mechanism of atmospheric phenomenon well [14, 15]. Although ANNs can achieve nonlinear fitting, they are difficult to satisfy the demand of high prediction accuracy.

Deep learning, developed from ANNs, which can build multiple hidden layers between the input layer and the output layer to extract the nonlinear relationship between input and output, has been successfully applied to the wide-ranging fields such as computer vision [16], image recognition [17], time-series prediction [18], and natural language processing [19]. Long short-term memory (LSTM) was used to predict concentrations, which proved that it was more effective than conventional BP network [20]. A deep learning model based on 1D ConvNets and bidirectional gate recurrent unit (GRU) was applied to predict the concentrations of air pollutants in Beijing, which showed that GRU was superior to traditional neural networks such as support vector machine (SVM) [21]. A temporal convolutional network (TCN) was introduced to deal with time-series problems, such as machine translation, speech synthesis, and natural language processing in 2018 [22]. TCN does not contain complex recurrent structures, such as complex GRU structure or LSTM structure with gating mechanism. Compared with recurrent neural networks (RNNs), TCN is simpler and clearer, which can efficiently extract features of long data. Multi-channel TCN was proposed to process time-series prediction problems, and the experimental results showed that TCN was better than LSTM and GRU in handling the problem of long data memory [23].

However, only air pollutants and meteorological factors of target prediction station are considered in this study, and the influence of adjacent stations is not considered. Numerous research studies have shown that air pollutants constantly diffuse and spread over time and space by various factors such as wind speed and humidity [24, 25]. The pollutant concentrations of the target prediction station are influenced by the adjacent stations, so it is necessary to fuse the other stations for air quality prediction [4]. Long short-term memory neural network extended (LSTME) considering historical data from all stations and auxiliary factors was proposed to capture spatiotemporal correlation among stations [26]. Nonetheless, not all stations are highly correlated due to the distance among stations and the degree of pollutant spread. If the variables of weakly correlated stations are fed into models, the computational complexity of the models will increase leading to low prediction accuracy.

As a result, prior spatiotemporal correlation analysis among stations is critical in the process of modeling. Pearson’s correlation coefficient was used to select adaptive k-nearest neighboring stations, and a spatiotemporal convolutional long short-term memory neural network extended (C-LSTME) model was proposed for predicting air quality concentration [27]. Pearson’s correlation coefficient was adopted to measure the relationship between in Beijing and air pollutants in its surrounding cities, and a hybrid model based on convolutional neural network (CNN) and LSTM was introduced to predict the daily concentrations [28]. A spatiotemporal causal convolutional neural network (ST-CausalConvNet) for short-term prediction was proposed, and Pearson’s correlation coefficient was used to analyze spatiotemporal correlation; experimental results showed that ST-CausalConvNet was better than LSTM and GRU on a case study from Beijing [29]. However, these methods do not consider the influence of distance among stations in the analysis of spatiotemporal correlation.

In view of the above questions, a novel hybrid model that combines a spatiotemporal correlation analysis method and an effectively simple network is proposed to forecast the concentrations of air pollutants. The main contributions of this study are as follows:(1)A new correlation calculation method named distance grey relation analysis (DGRA) is constructed, which introduces the inverse distance weight on the basis of grey relation analysis. The DGRA method combines distance and concentrations among stations to calculate the correlation coefficient. The stations strongly correlated with the target station can be properly selected as spatiotemporal factors of models by the DGRA method.(2)An effective and simple model named the spatial-temporal convolutional network (STCN) is proposed based on the improved structure of TCN. The STCN model is constructed by a 1D convolutional layer and multi-channel TCN layers to sufficiently extract the spatial-temporal information among multiple stations, which effectively avoids information leakage.(3)Pearson’s correlation coefficient is applied to calculate the correlation between other factors and concentrations of the target station. The selected variables are used as auxiliary factors to further improve the prediction accuracy.(4)The data from Fushun City is adopted to verify the performance of proposed model. Compared with many other models (SVR, MLP, LSTM, GRU, TCN, CNN-LSTM, and CNN-GRU), the experimental results show that the new hybrid model has great accuracy and stability for prediction.

The remaining part of this article is organized as follows. Section 2 is the data source and analysis. Section 3 is a detailed description of the proposed model. Section 4 introduces the experiments and discussions. Finally, the conclusion of this article is given in Section 5.

2. Data Source and Analysis

The data source and analysis are described in this section. After performing data cleaning operations, the spatiotemporal correlation among multiple stations is analyzed by the DGRA method, and Pearson’s correlation coefficient is calculated to select auxiliary factors for the improvement of prediction accuracy.

2.1. Data Source

The data are collected from 40 environmental monitoring equipment deployed by the partner company in Fushun City, Liaoning Province, China. The 40 stations are located in four districts in the west of Fushun City, namely Wanghua District, Shuncheng District, Dongzhou District, and Xinfu District. The specific distribution is shown in Figure 1. The 40 monitoring stations collect data on six air pollutants (, , , , , and ) and four meteorological data (humidity, air pressure, wind speed, and wind direction). The data information is shown in Table 1, which includes range, mean, and unit of different variables. The data collection method is hourly continuous sampling, from 0 o 'clock to 23 o 'clock every day, once an hour. Data from January 1, 2020, to December 31, 2020, are used as a data set in this study.

Reliable data are the basis of accurate prediction, while they are often missing in the data due to sensor malfunctions and errors, power outages, computer system crashes, and pollutant levels lower than detection limits [30]. If these abnormal data are not processed, it is likely to lead to large deviation in the prediction results. Consequently, the early data processing and analysis are essential [31]. Because there are too much data missing in stations 06, 13, 19, 20, 33, and 35, we choose to eliminate the data of these stations. Besides, we use linear interpolation to fill in the missing data and use maximum-minimum normalization to normalize the data for remaining stations. Among the processed data, 80 is used as the training data set, and the remaining 20 is used as the test data set.

concentrations of stations 01, 23, 25, and 31 from January 1, 2021, to January 10, 2021, are shown in Figure 2. In a period of time, concentrations of different stations show the same trend to a certain extent. It proves that the change in concentrations at one station will be affected by surrounding stations. For this kind of complex spatiotemporal relationship, it is significant to select an appropriate method to evaluate the strength of the correlation among multiple stations. The selected stations strongly correlated with the target station, which severed as the input of the subsequent model, will directly affect the prediction effect of concentrations.

2.2. The Analysis of Spatiotemporal Correlation by the DGRA Method

Identifying the correlation among multiple variables is a key research problem in time-series prediction. Grey relation analysis (GRA) is a new analysis method based on grey system theory, which measures the degree of correlation according to the similarity or difference degree of development situation among factors [32]. GRA has no excessive demand for samples and does not need to consider the typical distribution law. Therefore, this method has been widely applied to agricultural economy [33], water conservancy [34], macroeconomics [35], and other aspects. The air quality monitoring stations studied in this study are located in four main urban areas of Fushun City, and their distribution is relatively dense. As a result, the correlation coefficients calculated by the GRA method are generally high, which will result in too many stations being adopted to forecast, while too many stations will increase the computational complexity and also add a lot of redundant information, which is not conducive to the improvement of prediction accuracy.

Consequently, we propose a new correlation calculation method namely distance grey relation analysis (DGRA) method, which introduces a distance weighted matrix on the basis of GRA. The DGRA method can combine distance and concentrations to calculate the correlation coefficient among stations, which can better represent the correlation of stations and help to select the appropriate input variables. The specific calculation formulas are as follows.

To begin with, a sequence matrix is determined based on data of stations, where each column represents the concentrations of a station during times. In this matrix, each column is used as primary reference sequence, respectively, and the remaining columns are used as comparison sequence.where denotes the monitoring station and represents time.

Then, the sequence difference between the reference sequence and the comparison sequence is calculated.where denotes the difference between station and station at time and and , respectively, represent the concentration values of station and station at time , and the range of and is from 1 to .

After that, the correlation coefficient is calculated by the following equation:where denotes the correlation between the reference sequence and the comparison sequence at time t. is minimum difference in two levels, and is maximum difference in two levels. represents distinguish coefficient, which usually takes 0.5.

Next, the average value of the correlation coefficients between the reference sequence and the comparison sequence in each time step is taken as the correlation degree, which is shown in equation (4). At the same time, the grey relation matrix is constructed in equation (5).

Now, the distance is calculated according to the longitude and latitude information of each monitoring station. The calculation equation is shown in the following equation:where denotes the distance between station and station . , , and are the latitude of stations and , and are the longitude of stations and , and is the radius of the earth, which is 6378 km.

According to the results obtained by equation (6), the distance weight of station and station is calculated by equation (7). Meanwhile, distance weight matrix is constructed in equation (8).

Finally, the correlation coefficient matrix calculated by the DGRA method is obtained by the following equation:

The calculation results of four stations (station 01, station 17, station 18, and station 26) by the DGRA method are shown in Figure 3. The coefficients calculated by the DGRA method are within the range of 0–1. If the coefficient is close to 1, the correlation among two stations is the stronger; if it is close to 0, the correlation is the smaller. As can be seen from the figure, station 26 (purple line) located in the center has a great impact on other stations and the correlation coefficients are all above 0.5, while station 18 (green line) located in the remote location has a weak impact on other stations. Nonetheless, the diffusion and spread of are affected not only by geographical location but also by meteorological and other gaseous factors. For example, station 17 (orange line) and station 10 are close together, but they are not highly correlated. Based on the above analysis, the DGRA method proposed in this study can not only consider concentrations among stations but also consider the distance among stations, which can well evaluate the spatiotemporal correlation among multiple stations. The correlation coefficients with the other stations are all above 0.5 for the station 01, which has been marked in red in Figure 3, indicating that other stations have strong mutual influence with station 01. Therefore, station 01 located in the center of the four regions is taken as an example to forecast concentrations.

2.3. Selection of Auxiliary Factors

For the purpose of improving the prediction accuracy of the model, it is crucial to identify the correlation between various influencing factors and the concentrations before the model is built, which ensures that the model uses the proper input features for prediction. is affected by variable factors, but not all of them are strongly correlated. To reduce redundant information, Pearson’s correlation coefficient is used to calculate the correlation among factors. The factors strongly related to are selected as auxiliary input variables.

Suppose one time series is the vector and the other time series is vector , and the correlation coefficient among them is calculated by the following equation:where if , there is a positive correlation; if , there is a negative correlation. The absolute of is closer to 1, the gap between and is smaller, and the correlation is greater.

The correlation coefficients between other air pollutants, meteorological factors, and concentrations of station 01 are calculated. As shown in Figure 4, is weakly correlated with . As a result, the factors except are used as auxiliary variables to predict concentrations.

3. Network Model

The principles of time-series CNN and the structure of TCN model and STCN model are introduced in detail in this section. The overall structure of DGRA-STCN model is shown in Figure 5, which mainly includes three parts: first of all, spatiotemporal correlation is analyzed by the DGRA method, and the stations strongly correlated with the target station are selected to form a spatial fusion matrix. As we all know, concentration at time is affected by the hours before time . Therefore, a time sliding window is added to construct temporal sequence after building the matrix and the sequence is fed to the STCN layer for training. At the same time, the variables with strong correlation obtained by Pearson’s correlation coefficient calculation and analysis are taken as auxiliary factors and trained by a single TCN layer. The two parts of training results are spliced and fed to the full concatenation layer. Finally, the final prediction results are obtained. The detailed algorithm is shown in Algorithm 1.

3.1. Time-Series CNN

A convolutional neural network (CNN) is a kind of feed-forward neural network, which includes convolutional computation. In the convolution operation, one-dimensional convolution (1D Conv) is often applied to deal with time-series problems. Similar to the commonly used two-dimensional convolution operation, 1D Conv also applies the convolution kernel to carry out the convolution operation, while the convolution kernel no longer moves along the two-dimensional direction, but along the one-dimensional direction, namely the time dimension. In the problem of prediction, time series can be treated as a spatial dimension just like the height or width of a two-dimensional image.

As shown in Figure 6, the convolution kernel in each convolutional layer is applied to process data from multiple stations. The abscissa shows concentrations of multiple stations, and the ordinate represents time dimension. The convolution kernel moves along the time dimension for the convolution calculation, and the feature matrix is finally obtained.

3.2. TCN

TCN mainly consists of three parts, namely causal convolution, dilated convolution, and residual connection. The principles of each part are described in detail as below.

3.2.1. Casual Convolution

In forecasting, the forecast value is only influenced by the current and previous moments and is not expected to be influenced by future information. Therefore, causal convolution is introduced to avoid information leakage. As shown in Figure 7, for a four-layer network, only 4 units , , , and participate in the convolution, which can effectively avoid information leakage.

3.2.2. Dilated Convolution

The dilated convolution is a kind of convolution idea, which is proposed to solve the problem of image semantic segmentation. The dilated convolution is widely applied in many fields such as medical image processing, human activity recognition, and network intrusion detection [3639]. “Holes” are added to expand the receptive field. For example, the original 33 convolution kernel can have 55 (dilated rate = 2) or larger receptive field under the same number of parameters and calculation amount, so that no further sampling is required. Compared with the original standard convolution, the dilated convolution has one more hyperparameter called the dilated rate, which is the number of intervals before each point of the convolution kernel. For a one-dimensional sequence input , with a convolution kernel , the dilated convolution operation on the elements of the sequence is defined as follows:where is the dilation factor, is the convolution kernel size, and indicates the past direction.

3.2.3. Residual Structure

The residual structure was first proposed in ResNet [40]. It can alleviate the gradient disappearance problem caused by increasing depth in deep neural networks and has been applied in image classification, object recognition, and so on [4143]. For TCN, the receptive field can be climbed by increasing the number of layers, kernel size, and dilated rate. When a prediction task needs a receptive field to be , TCN may require 10 layers, while as the depth of the network increases, the degradation problem will occur. Introducing the residual structure can effectively solve the problem of deep network degradation.

3.3. STCN Model

The networks combining CNN and TCN have been widely applied to text classification, flow rate measurement, and other fields [44, 45]. In this study, an simple model named STCN model, which combines time-series CNN and multi-channel TCN, is proposed for spatiotemporal feature extraction. The architecture of STCN model is illustrated in Figure 8. This model only uses convolution operation to process data in parallel and improve operation efficiency to solve the problem of difficult spatiotemporal feature extraction by previous methods. The model mainly consists of two parts: firstly, time-series CNN is applied to extract spatial features of input data and achieve dimensionality reduction. Secondly, this model constructs multi-channel TCN layers, which include many units. The structure of each unit is consistent, including dilated causal convolution, weight norm, activation function, and dropout. Multi-channel TCN layers can extract temporal features of data for many times and splice the results of each channel to avoid information leakage. This study constructs a three-channel TCN network, and the number of units of each channel is given in the experiments.

(1)Data: The concentrations data of stations with time steps;
(2)The location information of stations;
(3)The auxiliary factors from the target prediction station.
(4)Result: The prediction results of next few hour concentrations;
(5)The evaluation metric, including mean absolute error (MAE), root mean square
error (RMSE) and coefficient of determination .
(6)according to DGRA method, the correlation coefficient of stations is calculated using equations (1) to (9).
(7) stations correlated to target prediction station are selected to construct spatial fusion matrix by setting thresholds.
(8)temporal sequence is constructed as the input of model by using sliding windows.
(9)for each sample do
(10)use a STCN layer to extract spatiotemporal features of input and obtain representation 1.
(11)end
(12)according to Pearson correlation coefficient, the air pollutants and meteorological factors related to are selected as auxiliary factors.
(13)for each sample do
(14)use a TCN layer to extract temporal features of auxiliary factors and obtain representation 2.
(15)end
(16)connect representation 1 and representation 2.
(17)use full concatenation layer to obtain the final prediction values.
(18)return prediction values, RMSE, MAE and

4. Experiments and Discussions

In this section, we first discuss the results of hyperparameter settings (Section 4.1). Subsequently, three metrics are introduced to evaluate the effect of models (Section 4.2). We then analyze the influence of related station selection and effects of spatial correlation and auxiliary factors (Sections 4.3 and 4.4). Finally, we compare results with baselines to comprehensively evaluate the effectiveness of the proposed model (Section 4.5).

4.1. Experimental Settings

It is a very difficult but important thing to select the appropriate hyperparameter in deep learning networks, which directly affects the performance of models. Through a series of comparative experiments: the size of the convolution kernel is set to 3 to ensure that it can side along the time axis when the convolution operation is performed. According to previous studies, if the length of time window is too short, the extracted information will be insufficient; if the length is too long, it is not conducive to the memory of the model. Consequently, the length of time window is set to 24; that is, the historical data of the past 24 hours are used to predict the data of the next few hours. Stochastic gradient descent (SGD) is selected to be the optimizer to improve training speed, and learning rate is set to be 0.005; dropout layer is set to be 0.5 to avoid overfitting. To extract sufficient features from the data, the number of units is set to 8. The details are shown in Table 2.

4.2. Metrics

Three metrics to evaluate the effectiveness of the models are applied in this study, which includes root-mean-square error (RMSE), mean absolute error (MAE), and coefficient of determination .

RMSE: root-mean-square error is the square root of the square of the deviations between predicted values and true values and the ratio of the number of observations . It is sensitive to the very large or very small errors in a group of measurements, so it can well reflect the precision of the measurement. The smaller RMSE is, the better the model is. The calculation formula is as follows:

MAE: mean absolute error is the average of the absolute values of the deviations between the observed values and the true values of all individual samples. It can avoid the problem that the errors cancel each other, so it can accurately reflect the size of the actual prediction error. The smaller the MAE is, the better the model is. The calculation formula is as follows:

: the coefficient of determination is also known as goodness of fit, which refers to the degree to which the regression line fits the observed value. If the value of is close to 1, the independent variable can better explain the dependent variable. The calculation formula is as follows:where is the real value of output, is the prediction of output, is the sample size, and denotes the mean of all real values.

4.3. The Influence of Related Station Selection

For the target prediction station 01, the impact of setting different thresholds on the accuracy of DGRA-STCN model is analyzed and discussed. The results are shown in Table 3. It is found that when the threshold is set too high, the input information will be little; when the threshold is set too low, too much input information will cause redundancy. For example, when the threshold is 0.92 and 2 stations are selected (RMSE 22.979, MAE 13.164, 0.636), it is proved that few surrounding monitoring stations are not conducive to produce great prediction effects, so full consideration of spatiotemporal correlation is essential. When the threshold is 0.82 and 16 stations are selected as the input of model (RMSE 24.844, MAE 14.958, 0.565), it is demonstrated that too much input information will result in redundancy and increased computational complexity of the model and is also not conducive to prediction. Through experiments, the threshold is finally determined to be 0.88 and the 11 relevant stations are served as the inputs of the network (RMSE 12.505, MAE 8.214, 0.884), so that the model can achieve the best prediction performance. At the same time, this group of experiment proves that the DGRA method is an effective method to evaluate the spatiotemporal correlation of multiple monitoring stations.

4.4. Effects of Spatial Correlation and Auxiliary Factors

To evaluate the influence of spatiotemporal analysis and auxiliary factors on the prediction accuracy, two experiments are designed to verify the effect of DGRA-STCN model. The experimental results are shown in Table 4. One experiment only uses the data of air pollutants and auxiliary factors from target prediction stations and does not add the data from other stations. The results of this experiment (RMSE 16.568, MAE 11.946, 0.722) prove that considering the spatial characteristics of multiple stations is very important. concentrations at one station vary with historical air pollutants and the diffusion of at other stations in the space. As a result, it is not sufficient for only considering the data from target prediction stations in modeling. Another experiment only considers spatial correlation; that is, the part of auxiliary factor sequence passing through the TCN layer in DGRA-STCN model is removed and only stations with strong correlation related to target prediction station are severed as input, which is passed through the STCN layer. The results of this experiment (RMSE 13.955, MAE 9.598, 0.848) prove that adding auxiliary factors such as and humidity is necessary to improve model accuracy. These auxiliary factors affecting the diffusion and propagation of air pollutants must be considered in modeling. To sum up, the DGRA-STCN model proposed in this study comprehensively considers the data of neighboring stations and the auxiliary factors of target stations and achieves the best prediction performance.

4.5. Performance Comparison of Different Models

To verify the prediction performance of DGRA-STCN model proposed in this article, support vector regression (SVR), multilayer perceptron (MLP), and five deep learning models (TCN, LSTM, GRU, CNN-LSTM, and CNN-GRU) are used to compare the predictive effectiveness of different models. SVR is an machine learning algorithm that can be used for both classification and regression. MLP, also known as ANN, contains an input layer, an output layer, and several hidden layers to perform classification and regression tasks. LSTM and GRU are variants of RNN, which is a common algorithm for dealing with time-series problems. For the aforementioned four algorithms and TCN model, air pollutants and auxiliary factors at target prediction station are projected as inputs to these five models. For CNN-LSTM, CNN-GRU, and DGRA-STCN models, data from adjacent stations and auxiliary factors are used as inputs to the models. CNN is used to extract spatial features of data, while LSTM or GRU is used to extract temporal features of data. Figure 9 shows the results of eight models for the next 1-hour prediction. Figure 10 shows the comparison of predicted values and observed values for 200 hours in the test set.

Compared with other six deep learning models, the performance of SVR (RMSE 21.813, MAE 14.334, 0.624) and MLP (RMSE 20.342, MAE 13.545, 0.683) is the worst. This is because the simple machine learning models cannot perform feature extraction of air pollutant data well, leading to poor fitting effect of models. Among TCN, LSTM, and GRU, the performance of TCN (RMSE 18.700, MAE 12.988, 0.704) is poor compared with the other two models. Figure 10(a) shows the prediction results of the TCN model (orange line represents observed values and blue line represents prediction values), which reflects that the TCN model can predict the general trend of concentrations, but there is a large deviation between the prediction values and the observed values at many times. This is because when the input information is little, TCN will leak some important information although its processing speed is fast, resulting in poor prediction effect of the model. The prediction effect of LSTM (RMSE 17.893, MAE 11.577, 0.710) and GRU (RMSE 18.601, MAE 11.714, 0.681) is almost the same. LSTM and GRU models are similar in structure, and GRU is simpler. While LSTM model is more sufficient in feature extraction, the prediction effect of the two models is better than TCN on the whole due to their better memory ability of data when input information is small. However, their prediction accuracy is still not high and the stability is poor.

CNN-LSTM (RMSE 17.306, MAE 11.272, 0.717) and CNN-GRU (RMSE 17.230, MAE 9.948, 0.748), which add data from surrounding stations, can fit the observed values better on the whole, which proves that it is important to take the spatiotemporal characteristics of surrounding stations and the influence of auxiliary factors into consideration for modeling. Figure 10(b) shows the performance of CNN-LSTM model, in which the orange line represents observed values and the blue line represents prediction values. However, their prediction effects are not good at some moments of high concentrations, which have been marked in red circles, leading to poor performance evaluation of the models.

Figure 10(c), in which the orange line represents observed values and the green line represents prediction values, shows the prediction effect of DGRA-STCN model (RMSE 12.505, MAE 8.214, 0.884) proposed in this study. The DGRA-STCN model shows the best predictive accuracy of all the comparison models. The predicted trend is basically the same as the observed values, and some prediction values almost coincide with the observed values at many moments such as the moments marked in black circle. In addition, at some moments of high concentrations such as the moments marked in red circles, the prediction values are closer to the real values compared with other models. In conclusion, the DGRA-STCN model demonstrates high predictive accuracy and stability over the overall trend, as well as when concentrations fluctuate greatly.

To verify the temporal stability of proposed DGRA-STCN model, we also give the prediction effects of three models (CNN-LSTM, CNN-GRU, and DGRA-STCN) for the future 2-hour and 4-hour concentration prediction. It can be seen from Table 5 that as the predicted hour increases, the performance of all models in 2-hour and 4-hour prediction gradually becomes poor. The reason for this result may be that as the forecast hour increases, the correlation between the predicted value and the input data decreases. Compared with CNN-LSTM and CNN-GRU, the performance of the proposed DGRA-STCN model for 2-hour prediction (RMSE 21.191, MAE 12.305, 0.693) and 4-hour prediction (RMSE 26.385, MAE 16.778, 0.611) is significantly better. It can be demonstrated that DGRA-STCN model can more effectively extract spatiotemporal features to predict concentrations over longer periods of time. Compared with the contrast models, DGRA-STCN model has greater temporal stability and generalization ability.

5. Conclusion

In this study, a novel hybrid model named DGRA-STCN model is proposed for air quality prediction. First of all, the DGRA method that can consider not only distance but also concentrations among stations is proposed to calculate the spatiotemporal correlation coefficients of all monitoring stations. According to obtained correlation coefficients, stations strongly related to the target station are selected as the input of model. To further improve the accuracy of the model, we add auxiliary factors, which are highly correlated with calculated by the Pearson correlation coefficient. Finally, the input features are extracted through STCN layer and TCN layer, respectively, and then, the final predicted values are obtained through the full-connection layer. The data set of Fushun City is used to verify the effectiveness of model, and the experimental results show that DGRA-STCN model outperformed the compared methods in concentration prediction. The DGRA-STCN model has better prediction performance on the overall trend and more accurate prediction effect when there are large fluctuations and sudden changes in the data. Besides, the temporal stability of DGRA-STCN model is significantly better than CNN-LSTM and CNN-GRU.

Although the high prediction accuracy is achieved in this study, the spatiotemporal feature extraction model can be further improved to meet the demand of higher prediction accuracy. The distribution of monitoring stations is a non-Euclidean problem, and thus, we will consider using graph convolutional network (GCN) to extract spatiotemporal features from different stations in the future. We will use the advantages of GCN to deal with non-Euclidean problems to achieve higher precision air quality prediction.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under grant numbers 62173008, 61873007, and 61603009.