#### Abstract

To enhance the forecasting accuracy for PM_{2.5} concentrations, a novel decomposition-ensemble approach with denoising strategy is proposed in this study. This novel approach is an improved approach under the effective “denoising, decomposition, and ensemble” framework, especially for nonlinear and nonstationary features of PM_{2.5} concentration data. In our proposed approach, wavelet denoising approach, as a noise elimination tool, is applied to remove the noise from the original data. Then, variational mode decomposition (VMD) is implemented to decompose the denoised data for producing the components. Next, kernel extreme learning machine (KELM) as a popular machine learning algorithm is employed to forecast all extracted components individually. Finally, these forecasted results are aggregated into an ensemble result as the final forecasting. With hourly PM_{2.5} concentration data in Xi’an as sample data, the empirical results demonstrate that our proposed hybrid approach significantly performs better than all benchmarks (including single forecasting techniques and similar approaches with other decomposition) in terms of the accuracy. Consequently, the robustness results also indicate that our proposed hybrid approach can be recommended as a promising forecasting tool for capturing and exploring the complicated time series data.

#### 1. Introduction

PM_{2.5} refers to particulate matter with a diameter of less than or equal to 2.5 microns in the atmosphere, also known as particulate matter that can enter the lungs [1–3]. Although PM_{2.5} is only a small fraction of the Earth’s atmospheric composition, it has an important impact on air quality [4]. It will exert a negative influence on society life, such as increasing the risk of disease and impeding economic development [5–10]. However, more air pollutants come with the development of industry and the increase in the number of fuel-powered cars (Maji et al. 2018). Accurate forecasting of PM_{2.5} concentration has a very important guiding significance for people. It can enable people to make correct decisions, reduce economic losses, and benefit people’s life and health. Therefore, it is necessary to predict the PM_{2.5} concentration efficiently and accurately.

At present, many scholars have made a lot of research studies in this field. Accordingly, these studies are roughly divided into four categories: time series model [11–14], econometric model [15–17], artificial intelligence (AI) model [18–20], and the hybrid model [21, 22]. In particular, the time series approach, as the traditional forecasting approach, is often used to predict PM_{2.5} concentration. For example, moving average (MA), autoregressive (AR), autoregressive moving average (ARMA), and autoregressive integrated moving average (ARIMA) are often used for predicting PM_{2.5} and PM_{10} concentration [14, 23]. The econometric models are also often used as a forecasting of PM_{2.5} due to its better interpretation [16].

However, due to the related multiple factors’ impact, the PM_{2.5} concentration data exist the nonstationary and nonlinear features. Thus, the artificial intelligence methods, which can simultaneously capture and approach the complicated data feature, have been extensively utilized to forecast the PM_{2.5} [18]. Meanwhile, a variety of machine learning techniques are used for training and modelling the data pattern in this field [24]. Furthermore, the deep learning can well find the mapping relation of PM_{2.5} concentration through nonlinear learning, which may get a good predictive performance.

In addition, considering the limitations of a single model, some scholars proposed the mixed models to make a better forecasting accuracy, which combined the advantages of different models to obtain more stable and accurate results [20, 25, 26]. Similarly, the hybrid approach involves different combination means among the forecasting models. Hybrid integration approach combines the advantages of different models to get better results. In general, it can be divided into finding different components through decomposition and then can be combined with the forecasting model [21, 27–29]. Some scholars have added some parameter optimization algorithms [30, 31].

As mentioned above, although many scholars have adopted different forecasting approaches or proposed different hybrid models in past research, most of the existing combined models cannot comprehensively consider the problems of data noise processing, data feature capture, and forecasting techniques. Particularly, as for data noise processing, many previous studies used different denoising methods for processing the original data, such as singular spectrum analysis [32], Fourier transform, and wavelet transform [33–38]. Among them, the wavelet analysis method can deal the original data more convenient and efficient. Meanwhile, the advantage of wavelet transform method is that the noise is almost completely suppressed and the characteristic peak points reflecting the original signals are well preserved [39, 40]; Dimitriou and Kassomenos 2014. As for data feature capture, the common approaches include EMD family and the VMD approach, and VMD has been widely applied to decompose the time series of different fields, such as wind speed prediction and power load prediction [41–44] and achieved relatively good results. Moreover, Compared with EMD, VMD can solve the divergence problem of EMD endpoint and better extract the components of PM_{2.5} concentration data. As for forecasting techniques, the machine learning approaches are popularly incorporated into the predicted approach for forecasting in the complex system, especially in the field of PM_{2.5} [30]. And, the KELM has the advantages of less training parameters, fast learning speed, and strong generalization ability and especially can get better forecasting results in nonlinear data sets [45].

Considering the noise caused by the way of PM_{2.5} concentration data collection and the denoising characteristics of wavelet transform, firstly we choose wavelet denoising and secondly we choose VMD decomposition due to the complexity of time series data. Finally, considering the validity of KELM’s data fitting, KELM was selected as the final prediction model. Therefore, this study proposes a novel hybrid forecasting method, WAV-VMD-KELM, for improving the forecasting accuracy by systematically considering noise processing, data feature capture, and forecasting techniques. Accordingly, first, wavelet transform was used to denoise the data, then variational mode decomposition (VMD) was adopted to decompose the data, kernel extreme learning machine (KELM) forecasting approach was employed to predict the decomposed components, and finally the results were restored. Therefore, a new hybrid approach denoising-decomposition-ensemble algorithm is proposed based on the comprehensive consideration of the cause of noise and the nonlinear, nonstationarity of PM_{2.5} concentration. And, the two main contributions of this study are as follows:(1)In this study, focusing on the problems of noise processing, data feature capture, and forecasting technique selection of the PM_{2.5} concentration, a novel hybrid forecasting approach, i.e., Wav-VMD-KELM, is proposed for improving the forecasting accuracy by denoising, decomposing, individual forecasting, and ensemble results(2)A novel decomposition-ensemble approach with denoising strategy is proposed to solve the forecasting approach of hourly data of PM_{2.5} concentration in Xi’an. The experimental results show that the new hybrid approach forecasting method proposed in this study makes a better forecasting performance compared with the benchmarks and can significantly improve the forecasting level of PM_{2.5}.

The rest of this paper is summarized as follows. Section 2 introduces the approach used in this research. And, several experiments and analyses and discussion of the forecasting results were performed in Section 3. Finally, Section 4 discusses the conclusions of this study.

#### 2. Methodology

Section 2.1 gives an overview of the proposed decomposition-ensemble approach with denoising strategy, and Sections 2.2–2.4, respectively, describe the related techniques, wavelet denoising, EMD, VMD, and KELM.

##### 2.1. Framework

As shown in Figure 1, the entire process includes wavelet denoising, multiscale analysis, PM_{2.5} concentrations forecasting, and evolution of approaches which are summarized as follows:(1)*Wavelet Denoising.* As for the original PM_{2.5} concentration time series, an effective approach of denoising-wavelet denoising is adopted to process the nonstationary time series. Moreover, Section 2.2 gave the detailed process of wavelet denoising.(2)*Multiscale Analysis*. VMD is applied to decompose the above denoising sequence into several modes, and these modes reveal the different characteristics which are hidden in the PM_{2.5} concentration time series by their low and high frequency.(3)*PM*_{2.5}*Concentrations Forecasting.* According to the different components, the artificial intelligence tool, KELM, is selected as the forecasting technique for modelling the corresponding components. Next, every forecasting result of the different components is integrated as the final forecasting results.

##### 2.2. Wavelet Denoising

Wavelet denoising method was first proposed by [34]. It is a kind of nonlinear denoising method, which can be approximately optimal in the sense of minimum mean square error. Figure 2 shows the wavelet denoising method, and each step of wavelet denoising is described in detail as follows.

Wavelet denoising is based on wavelet decomposition, so some concepts of wavelet decomposition are introduced first. The meaning of wavelet transform is one of the functions called basic wavelet transform displacement *τ*, then at different scales *α*, with signal to be analyzed *X*(*t*) for inner product, that is,

In the formula, is called the scale factor, and its effect is on the basic wavelet . Function scaling, *t* reflects the displacement, it can be positive or negative, and also, ɑ and *t* are continuous variables.

As mentioned above, the noisy time series data can be expressed as , where *f*(*x*) is real signal and is white noise. And, the wavelet transform of both sides of the equation can be obtained as follows:

According to the properties of the wavelet transform, the wavelet transform of the actual measured signal is equal to the sum of the wavelet transforms of multiple signals. After orthogonal wavelet transform, the correlation of signal *y*(*t*) can be removed to the greatest extent, and most of the energy can be concentrated on a small number of wavelet coefficients with relatively large amplitude. After the wavelet transform, noise will be distributed on all time axes in various scales, and the amplitude is not very large. Based on this principle, the wavelet coefficients of noise are reduced to the greatest extent in each scale of wavelet transform, and then the signal is reconstructed by using the processed wavelet coefficients, so as to achieve the purpose of suppressing noise [37].

So, the threshold denoising process of one-dimensional signals can be divided into three steps: selection of the appropriate wavelet transform, threshold processing of wavelet coefficients, and wavelet reconstruction.(1)Selection of the appropriate wavelet transform *Wavelet Decomposition of the Signal*. Select a wavelet and determine the level *N* of a wavelet decomposition and then perform *n*-layer wavelet decomposition calculation on the signal. Generally, the selection of wavelet basis functions should be considered comprehensively from the aspects of support length, vanishing moment, symmetry, regularity, and similarity. Because the wavelet basis function has its own characteristics in signal processing and no one wavelet basis function can get the optimal denoising effect for all kinds of signals. In general, Daubechies (dbN) wavelet and Symlets (symN) wavelet are two groups of wavelet bases that are often used in speech denoising. In wavelet decomposition, the selection of decomposition layers is also a very important step. The larger the value is, the more obvious the different characteristics of noise and signal are, and the more conducive it is to the separation of the two. On the other hand, the larger the number of decomposition layers, the larger the reconstructed signal distortion will be, which will affect the final denoising effect to some extent. Therefore, a suitable scale of decomposition is selected after comprehensive consideration in application.(2)Threshold processing of wavelet coefficients An important factor directly affecting the denoising effect is the selection of threshold, and different thresholds will have different denoising effects. The threshold function is a rule to correct the wavelet coefficient, and different inverse functions reflect different strategies to deal with the wavelet coefficient. There are two most common types of threshold functions: hard and soft. There is also a Garrote between the soft and hard threshold functions. The hard threshold function is superior to the soft threshold method in the sense of mean square error. In this study, hard threshold is selected to denoise the data, and the details are as follows: when the absolute value of the wavelet coefficient is less than the given threshold value, let it be 0; If it is greater than the threshold, it stays the same, the mathematical expression is as follows: Threshold value: where *s* is the standard deviation of the noise signal and *N* is the length of the data.(3)Wavelet reconstruction The processed wavelet coefficients are reconstructed. According to the low frequency coefficients of the *n*th layer of wavelet decomposition and the quantized high frequency coefficients of the first to the *n*th layer, the signal is reconstructed by wavelet. The details are the estimation value of the original signal is obtained by signal reconstruction based on the low frequency coefficients of the *n*th layer of wavelet decomposition and the high frequency coefficients of the first to the *n*th layer after processing.

##### 2.3. Variational Mode Decomposition

EMD is a commonly used decomposition method. In this paper, EMD is used as a comparison of VMD. EMD is a method proposed in [46]. The main purpose of this algorithm is to decompose signals into characteristic modes. The advantage of this method is that it does not use any defined function as the basis, but generates the inherent mode function adaptively based on the analyzed signal. It can be used to analyze nonlinear and nonstationary signal sequences with high signal-to-noise ratio and good time-frequency focusing.

VMD is a novel nonrecursive and adaptive signal decomposition method that can accommodate much more sampling and noise than some popular decomposition methods such as empirical mode decomposition. The main goal of VMD is to decompose a time series into a discrete set of band-limited modes *u*_{k}, where each mode *u*_{k} is considered to be compact around a center pulsation *ω*_{k,} which is determined during the decomposition.

For example, the time series *f*(*x*) is decomposed into a set of modes *u*_{k} around a center pulsation *ω*_{k} according to the following constrained variational problem [39, 40]:subject towhere *k* is the number of modes, *δ* and represent the Dirac distribution and the convolution operator. {*u*_{k}} and {} represent the set of modes and the set of center pulsations, respectively.

The above constraint variational problem can be headed with an unconstrained variational problem according to Lagrange multipliers *λ*, which is given as follows:where *α* represents a balance parameter, *λ* represents the Lagrange multipliers, and denotes a quadratic penalty term for the accelerating rate of convergence.

Furthermore, the solution to the equation can be solved by the alternative direction method of multipliers (ADMM) by means of finding the saddle point of the augmented Lagrangian function L in a sequence of iterative suboptimizations. Therefore, the solutions for *u*_{k}, *ω*_{k}, and *λ* can be obtained as follows:where , , , , and represent the Fourier transforms of , , , , and , and the number of iterations is *n.*

The number of modes *k* needs to be determined, before the VMD method. There is no theory on the optimal selection of the parameter *k*.

##### 2.4. Kernel Extreme Learning Machine

For a single hidden layer neural network, assume *N* arbitrary samples , among the rest,and, for a single hidden layer neural network with *L* hidden layer nodes, it can be expressed as follows:where , , is the activation function, is the input weight, is the output weights, and is the bias of the *i*th hidden layer element. .is and do the inner product. The objective of single hidden layer neural network learning is to minimize the error of output, which can be expressed as follows:

That is to say, there exist *β*_{i}, *W*_{i}, *b*_{i}, making

It can be represented by a matrix:where *H* is the output of the hidden layer node, *β* is the output weight, and *T* is the expected output.

In order to train single hidden layer neural network, we need to get , , and , makingwhere , which the same thing as minimizing the loss function

Traditional gradient-based algorithms can be used to solve such problems, but the basic gradient-based learning algorithm needs to adjust all the parameters in the process of iteration [45]. In the ELM algorithm, once the input weight *W*_{i} and the bias *b*_{i} of the hidden layer are randomly determined, the output matrix *H* of the hidden layer is uniquely determined. Training a single hidden layer neural network can be transformed to solve a linear system . And, the output weight *β* can be determined:where *H*^{+} is the matrix’s Moore–Penrose generalized inverse matrix. And, it can be proved that the norm of the solution is minimal and unique.

Kernel function has strong nonlinear mapping ability, which can overcome the dimension disaster. For the problem of linear inseparability, kernel function can be mapped to high-dimensional space to make it linearly separable [45]. In KELM, the feature map of the hidden layer *h*(*x*) remains unknown, replaced by its corresponding kernel function *K*(*u*, ). The number of hidden layer nodes *L* also does not need to be set. The detailed structure of KELM can be referred to Figure 3.

#### 3. Experimental Results and Analysis

PM_{2.5} concentration data are hourly data collected from Xi’an, Shaanxi province, in this research. And, the data collected from (http://www.cnemc.cn/). These data serve as our experimental data, and then two popular evaluation criteria of forecasting approaches are used to verify the forecasting result of the hybrid approach. In this section, we verified the effectiveness of our hybrid approach with the experiment of PM_{2.5} forecasting in Xi’an. Also, MATLAB R2018a environment running on Windows 10 with a 64-bit 2.00 GHz AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx was performed in this experiment.

##### 3.1. Data Description

Xi’an, located in the west of China, is a famous tourist city with a long history of civilization, and Xi’an is also an important city for One Belt and One Road. However, the large amount of automobile exhaust and industrial exhaust in Xi’an has caused serious pollution to the air, the haze continues to exist in Xi’an, and the air quality becomes increasingly serious. The quality of air quality not only affects tourists’ travel experience but also has a potential impact on people’s health. Hourly PM_{2.5} datasets are collected from January 1, 2019, to December 31, 2019, as illustrated in Figure 4. Before prediction, some basic descriptive statistical analysis were used to make a preliminary exploration of the data. For instance, Table 1 lists mean, standard deviation (std.), and min-max of the PM_{2.5} concentration data. As depicted in Table 1 and Figure 4, the PM_{2.5} concentration time series is complex data with nonlinear and nonstationary characteristics, which is embodied in the maximum and minimum values and the continuity of the data. More importantly, we found some data less than 3 and were 10 times different from the adjacent data. For such data, we use the mean value of the data before and after the data. And, the former data from January 1, 2019, 0 o'clock to 24 hours, 24 December 2019, were selected as the training data set in the sample and the remaining data from 0 o'clock December 25, 2019, to December 31, 2019, were taken as the forecast data set out of sample.

In this study, a common time series forecasting method is used. History observations {*x*_{t−1}, *x*_{t−2}, …, *x*_{t−p}} as inputs to calculate predicted value *x*_{t+h−1}, where *h* and *p* indicate forecasting horizon and lagging order. Figure 5 details the criteria for dividing data by this approach. In this study, lagging order *p* = 12 and forecasting horizon *h* = 1.

##### 3.2. Evaluation Criteria

Next, we use two commonly used evaluation criteria, the mean absolute error (MAE) and the root mean square error (RMSE), to evaluate the forecasting effect of the approach. MAE can be used to reflect the overall level of forecasting errors; then, RMSE represents the degree among the actual values and the predicted results:where *N* is the numbers of testing sample, is the *i*th observed values, and represents the *i*th forecasting values.

##### 3.3. Benchmarking Models

In the field of deep learning as air pollution forecasting, SVR, BP, ELM, and KELM are popular artificial intelligence techniques forecasting approaches. In addition, the above single forecasting techniques are often combined with denoising and decomposition integration methods to improve the forecasting performance. For the proposed WAV-VMD-KELM approach, four single benchmarks, i.e., SVR, BP, ELM, and KELM, and four denoising-forecasting benchmarks, i.e., WAV-SVR, WAV-BP, WAV-ELM, and WAV-KELM, are all formed for comparing the forecasting accuracy, from the perspectives of the denoising method and the forecasting techniques. In these above hybrid method for ensemble approaches, the first step responds to the denoising process, and the second step for the individual forecasting process. Finally, seven denoising decomposition-ensemble benchmarks are WAV-EMD-SVR, WAV-EMD-BP, WAV-EMD-ELM, WAV-EMD-KELM, WAV-VMD-SVR, WAV-VMD-BP, and WAV-VMD-ELM.

##### 3.4. Empirical Results

A comprehensive comparison of the proposed decomposition-ensemble approach with denoising algorithm with other popular forecasting approaches and similar denoising, ensemble approaches is conducted. For a clear discussion, the comparison results are analyzed from the following four perspectives. First, the four single benchmarking models are compared with each other. Second, the four denoising single benchmarking models are compared with last experiment, to verify the effectiveness of wavelet denoising. Third, the proposed WAV-VMD-KELM is compared with other similar denoising-decomposition-ensemble counterparts, to verify its superiority. Finally, major findings of the empirical study are concluded.

###### 3.4.1. Performance Comparison of Single Methods

Four single benchmarking models, i.e., SVR, BP, ELM, and KELM, are performed to forecast the PM_{2.5} concentrations. Forecasting performances of the four single models are evaluated via MAE and RMSE, as the results reported in Table 2. Compared with the other three artificial intelligence (AI) models, KELM has the optimal effect under the evaluation standard of accuracy.

At the MAE and RMSE evaluation level, it can be found that there is one interesting conclusion. First, it is obvious found that KELM to be the best of all AI models, and the possible reason is that the KELM model might be inapplicable to the complex time series of PM_{2.5} concentration.

###### 3.4.2. Performance Comparison of Denoising Forecasting Model

In the actual process of signal acquisition, the collected signal will inevitably be disturbed by noise or environment and other factors. The main reason lies in sources of the data. First, wavelet denoising is carried out on the data, and then the single approach mentioned above is used to predict the denoised data and then wavelet denoising using the aforementioned single algorithms as individual forecasting tools are then performed and compared with each other.(1)Denoising results In the first step of the proposed work, wavelet denoising is employed to remove white noise generated by the collection process in PM_{2.5} data. Figure 6 shows the contrast between the denoising results and the original data. From the results, it can be obviously seen that, after wavelet denoising, the data becomes flat where the change is sudden. This process of smoothing the data can help restore the true state of the data and also help the models’ forecasting. The parameters corresponding to wavelet denoising are as follows: wavelet basis function: db6; layer: 3; decompose the filter into three layers using hard thresholds and threshold 4.88. RMSE of the denoised data and the original data is 5.88.(2)Forecasting results In the second step, the single models, i.e., SVR, BP, ELM, and KELM, are employed as individual forecasting tools. Here, to keep the comparison consistent before and after the experiment. The parameters used are the same as before. Table 3 shows the comparison results of the four denoising and forecasting in terms of MAE and RMSE. One important conclusion can be obviously seen that the proposed wavelet denoising is verified for PM_{2.5} concentration data forecasting, in terms of accuracy from these results. In terms of MAE and RMSE, two important findings can be obtained. First, wavelet denoising has obvious improvement for all single models, which also proves that wavelet denoising is effective for this kind of sensor data. The main reason is that PM_{2.5} data is mainly collected through sensors, which may produce some noise due to equipment. Wavelet denoising designed in this study can effectively remove such noise; in this way, the result can be greatly improved. Second, from the perspective of the accuracy of the two evaluation criteria, KELM’s forecasting is the best, which may be mainly because the KELM approach can better adapt to such nonstationary time series data.

###### 3.4.3. Performance Comparison of Denoising-Decomposition-Ensemble Models

The seven different denoising-decomposition-ensemble approaches use the denoising-forecasting technology as individual denoising and forecasting tools, and then these approaches compared and performed with each other. In particular, EMD and VMD are selected as two decomposition methods to compare the effectiveness of decomposition.(1)Decomposition results In the first step of the previous research, EMD and VMD are used to decompose the denoised PM_{2.5} concentration data. Figures 7 and 8 partially show the decomposition results of EMD and VMD. The IMFs are listed from the highest to the lowest frequencies, while the last one is the residue variable in Figure 7. From the reveal of Figure 7, it is obvious that the complex PM_{2.5} concentration values (also see Figure 5) can be divided via EMD into a number of simple components, and these components further help to make modelling easier. In addition, it is obvious that IMFs 1–3 follow a random walk between 0 and 100, while IMFs 4–6 reveal a regular feature of periodicity with different cycles. Moreover, MFs 7 and the residue appear smooth central tendencies. Based on these components, some simple models can characterize the characteristics of the data, which helpfully enhances forecasting accuracy. At the same time, in Figure 8, the components are described from the highest to the lowest frequencies in order. From the results of Figure 8, it can be obviously seen that the complex PM_{2.5} concentration data can be divided via VMD into simple components, which simple models can get better results.

(2)Forecasting results In this step, the denoising-decomposition-ensemble models, WAV-EMD-BP, WAV-EMD-SVR, WAV-EMD-ELM, WAV-EMD-KELM, WAV-VMD-BP, WAV-VMD-SVR, and WAV-VMD-ELM, are used as forecasting tools to respectively model the previous extracted components. The parameters here are the same as before. Table 4 displays the comparison results of the seven hybrid models in terms of MAE and RMSE. An important conclusion can be obviously seen that the decomposition technique, especially, VMD is verified for PM_{2.5} concentration data forecasting, in terms of accuracy from these results. In terms of MAE and RMSE, three important findings can be obtained.

First, the decomposition technique to improve the forecasting accuracy, which proves that the decomposition of integration technology is effective on such time series forecasting, and the main reason is that the sequence is divided into different components, and each component has a certain regularity. It is under the same approach forecasting model which can be compared with no decomposition of the data to achieve better results. Then VMD effect is better than that of EMD, mainly because the advantage of VMD is that it can solve the endpoint divergence problem of EMD, which can find the characteristics of PM_{2.5} concentration data, Thirdly, the novel hybrid approach proposed achieves the best effect in this experiment.

###### 3.4.4. Discussion

The effectiveness of proposed model is discussed in this section.

The experimental results show that, firstly, wavelet denoising is effective in processing the data information collected by the sensor, and the decomposition effect after EMD is better than that without decomposition, and the decomposition effect of VMD is better than that of EMD.

Besides, it is apparent that, compared with wavelet, the WAV-VMD-KELM approach has lower forecasting errors in terms of MAE and RMSE, which demonstrates the superiority of VMD over EMD. In Figures 9 and 10, it is clear that the developed hybrid approach has the best forecasting result which can explains the effectiveness of the combination method of the proposed approach and its advantage forecasting performance. In order to conveniently show the forecasting availability of the VMD-KELM approach, Figure 11 shows the forecasting curve of the PM_{2.5} concentration of the WAV-VMD-KELM approach. As shown in Figure 11 and Table 4 that the developed hybrid forecasting approach can get the highest forecasting result among these approaches considered in this research.

It can be summarized that this WAV-VMD-KELM approach is obviously better than the comparison approaches with its smaller error when comparing with other approaches considered in this research.

#### 4. Conclusions

For the past few years, there are many alternative approaches available for PM_{2.5} concentrations forecasting. And, these works finally contributed to the improvement of forecasting performances to a certain degree. Yet, there are still some deficiencies in previous articles. For instance, the importance of noise produced by equipment is often neglected and all that. Thus, to solve the problems mentioned above, a new hybrid forecasting method which combines wavelet denoising is developed. Based on the experimental results and analysis, we can reach the conclusions as follows:(1)In those experiments, WAV-VMD-KELM gets the best performance among the comparison approaches(2)As a decomposition method, VMD has better predictive ability than EMD considering that the former is effective in capturing the various features hidden in the original datasets(3)Wavelet denoising can improve the accuracy of forecasting which indicates that there is necessity to remove the noise specifically when the data is obtained by the sensor

Overall, with more accurate prediction, this proposed hybrid approach presents a superior performance beyond other alternative approaches, offering a novel and feasible method in the field of PM_{2.5} concentrations forecasting. Furthermore, this new and viable option can also be applied to many other complex areas, such as tourism demand forecasting, wind speed forecasting, economic growth forecasting, product sales forecasting, and traffic flow forecasting, and others. In fact, it is worth noticing that there still exist some limitations. First, this research only puts PM_{2.5} time series into consideration and does not include other influencing factors. Another shortcoming is that this paper adopts single objective optimization algorithms only for forecasting PM_{2.5} time series without trying the multiobjective versions. Hence, future research on PM_{2.5} time series forecasting should highlight the following aspects, that is, the usage of other possible factors (meteorological factors in particular) as well as the application of the multiobjective optimization algorithm. In addition, it can also be practicable to establish deep learning-enabled approaches, which are more effective and have the potential to be another significant research direction for future work [12, 47].

#### Data Availability

PM_{2.5} concentration data are hourly data collected from China National Environmental Monitoring Centre (CNEMC, http://www.cnemc.cn/) in this research.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this study.

#### Acknowledgments

This research work was partly supported by the Fundamental Research Funds for the Central Universities under grant no. xpt012020022, the National Natural Science Foundation of China under grant no. 71904153, and the Project Funded by China Postdoctoral Science Foundation under grant no. 2018M53598.