Analysis and Models in Interdisciplinary MathematicsView this Special Issue
Forecasting SO2 Pollution Incidents by means of Elman Artificial Neural Networks and ARIMA Models
An SO2 emission episode at coal-fired power station occurs when the series of bihourly average of SO2 concentration, taken at 5-minute intervals, is greater than a specific value. Advance prediction of these episodes of pollution is very important for companies generating electricity by burning coal since it allows them to take appropriate preventive measures. In order to forecast SO2 pollution episodes, three different methods were tested: Elman neural networks, autoregressive integrated moving average (ARIMA) models, and a hybrid method combining both. The three methods were applied to a time series of SO2 concentrations registered in a control station in the vicinity of a coal-fired power station. The results obtained showed a better performance of the hybrid method over the Elman networks and the ARIMA models. The best prediction was obtained 115 minutes in advance by the hybrid model.
Coal-fired power stations are a major source of atmospheric pollutants, SO2 being one of the most significant. Mixed with rain, with SO2 is transformed into sulfuric acid producing acid rain. The wind helps to transport this element thousands of kilometers until it settles on the ground causing various negative effects. Sulfuric acid causes respiratory irritation, sometimes leading to damage to lung tissue.
European legislation on air pollution from coal-fired power stations establishes limits for the emissions of SO2. Specifically, it imposes a limit on the average of 24 consecutive concentrations of SO2 taken at 5-minute intervals. An emission episode is said to occur when the series of biannual averages is greater than a threshold set by current regulations. The interest is in predicting emission episodes with at least one hour in advance.
Forecasting SO2 levels can be addressed through mathematical models such as autoregressive-moving-average (ARMA) or artificial neural networks (ANNs) models. An ARMA model was used by Hassanzadeh et al. in 2009  to forecast SO2 levels for five stations. According to their results, an ARMA (2,2) model provides reliable predictions. Kandya and Mohan  study the forecasting of SO2 (and other pollutants) using five statistical techniques. Although each technique has its own advantages and limitations, they found that the one-day prediction autoregressive integrated moving average (ARIMA)  technique scores well over the other techniques. Goyal et al.  pointed out that linear models such as multilinear regression MLR and ARIMA fail to predict extreme concentrations of pollutants.
Neural networks have also been used to predict concentrations of SO2 and other pollutants emitted by a power plant. Mok and Tam  used three-layered feed-forward artificial neural networks to predict the daily SO2 concentration 5 days in advance. Nunnari et al.  compared multilayer perceptron (MLP) models with a neurofuzzy approach and an autoregressive-moving-average model with exogenous inputs (ARMAX) model. The results confirmed the superiority of the MLP model over the other. Pérez et al.  compared the forecasting produced by three different methods (MLP, multiple linear regression (MLR), and persistence methods). They concluded that the MLP models achieved more accurate regression results and better predictions. Fernández de Castro et al.  found good results for predicting SO2 levels half an hour in advance in the neighborhood of a power plant using neural networks. Cortina et al. (2008) compared an adaptive linear neural network (ADALINE) and a generalized regression neural network (GRNN) for the prediction of pollution levels due to the chemical industry and electricity generation in Salamanca (Spain). Prediction experiments were carried out for 1, 12, and 24 hours in advance. They concluded that a linear regression network needs less adjustment of parameters than a nonlinear regression network, thus facilitating its implementation; however, to obtain better results with a linear regression network, they need to search for a pattern scheme. Abdul-Wahab and Al-Alawi  used a neural network model to forecast SO2 concentration levels at a refinery in Oman; they also analyzed the effect of five meteorological parameters that were expected to affect the SO2 concentrations. Zhang et al.  compared the performance of several statistical methods in SO2 forecasting. The results showed that all the methods mentioned can be used in time series analysis of air pollutants although the denoising BP neural network has some advantages, mainly relating to its strong memory and learning ability.
Hybrid ARIMA-ANN models have also been applied in atmospheric pollutant forecasting. Tseng et al.  showed that the hybrid performed better than ARIMA or ANN alone. Díaz-Robles et al.  developed a hybrid model combining ARIMA and ANN to predict extreme events of particle emission in a city in Chile. They concluded that the hybrid model performs better than either of the models used separately.
In this study we analyzed the usefulness of Elman recurrent neural networks in forecasting SO2 emission episodes in a coal-fired power station, given the capacity of this ANN to work with temporary data and compared to an ensemble method that improves the prediction of an ARIMA model using Elman recurrent neural networks. The method proposed is completely different from those combining ARIMA and ANN models previously exposed. The differences are twofold: the kind of neural network used is different (Elman neural network instead of back-propagation recurrent neural network) and the way the network is used, not for an adjustment of the ARIMA model residuals but for the prediction of SO2 concentration in half an hour before the time of interest.
2. Materials and Methods
2.1. The Database
The data for the present research was obtained from a coal-fired power station in northern Spain. It contains the average 5-minute concentrations of SO2 measured during the year 2012 in a control station located in the neighborhood of the power station. Figure 1 shows the time series for the whole year. As be appreciated, there are some SO2 peaks in the line representing the SO2 concentrations registered in the control station, some of them corresponding to emission episodes.
The data corresponding from January 1 to June 30 was used to construct one model, while that corresponding to July was used for the referred model validation. Also, another test was carried out. For this test, models were trained with data from January 1 to November 30 and SO2 records for December were used for validation. The purpose of using these two sets of data for the models training was to compare the results obtained with different amounts of training information and also to perform the SO2 concentration prediction in two different seasons (summer and winter, resp.).
2.2.1. The ARIMA Models
ARIMA models are the most generally used class of models for forecasting time series that can be stationarized by transformations such as differencing and logging . The acronym ARIMA stands for auto-regressive integrated moving average. lags of the differenced series appearing in the forecasting equation are called autoregressive terms, while lags of the forecast errors are called moving average terms. A time series which needs to be differenced to be made stationary is said to be an integrated version of a stationary series.
A nonseasonal ARIMA model  is classified as an ARIMA () model, where(i)is the number of autoregressive terms,(ii) is the number of non-seasonal differences,(iii) is the number of lagged forecast errors in the prediction equation.The generalized form of ARIMA can be described as follows : where is the backward shift operator, is non-seasonal order of differences, is seasonal order of differences, and , and are polynomials in and .
Forecasting based on ARIMA (autoregressive integrated moving averages) models, commonly known as the Box-Jenkins approach, comprises the following stages:(i)model identification,(ii)parameter estimation,(iii)diagnostic checking.
These stages are repeated until a suitable model for the given data has been identified. In this research we have used a variation of the Hyndman and Khandakar algorithm  which combines unit root tests and minimization of the AIC and MLE to obtain the ARIMA models. The use of these algorithms speeds up the model identification process.
2.2.2. Recurrent Neural Networks Models
For the present research a kind of partially recurrent neural network (RNN) called an Elman network  was employed. An Elman RNN is a network with an initial configuration based on a regular feedforward neural network.
An Elman network has a layer called a context layer. The neurons in the context layer, which are called context neurons, hold a copy of the output given by the neurons of the hidden layer to the output one (Figure 2). In the following computing step, information that was given as an output by the hidden layer is used as a new input information for this layer.
The strength of the relationships between neurons in an Elman RNN is indicated by their weights. For this kind of neural network, the weight values of the neurons are chosen randomly when the process is initiated, and their values are changed during the model training in order to optimize them, with the exception of the weights from the hidden layer which do not change during the training process because the values of the context neurons need to receive the output information of the hidden layer as it is calculated.
The dynamics of the Elman RNN are described by the following equations : where is hyperbolic tangent function, is an input of the network at a discrete time , is output of the network at a discrete time , is nodes of the context layer, nodes of the hidden layer, is weight matrix of the context-hidden layer, is weight matrix of the input-hidden layer, is weight matrix of the hidden-context layer, and is weight matrix of the hidden-output layer.
The training of the Elman recurrent neural networks models was made using the Levenberg-Marquardt algorithm. This procedure is a modification of Gauss-Newton’s method, which was designed in order to minimize the sum of squares of nonlinear functions combining this technique with the steepest-descent algorithm. The Levenberg-Marquardt algorithm, whose application is currently very common for RNN, was chosen as it does not suffer the slow convergence problems that were reported in the methods  from which it is derived.
2.2.3. The Proposed Hybrid Model
Since their introduction in the 1970s , ARIMA models have been used for the forecasting of linear time series. As has already been reported in the previous studies, these kinds of models have a poor performance as regards prediction in nonlinear problems . In order to overcome their limitations, a hybrid model is proposed in the present research. This hybrid model consists of two main steps:(i)training of an Elman recurrent neural network in order to mimic the temporal linear behavior of the SO2 time series, predicting some output values that will be the input of the ARIMA model,(ii)selection of an ARIMA model that will model the time variation of the SO2 concentration, using as input values the prediction of the recurrent neural network.
Therefore, as may be observed in Figure 3, the proposed hybrid model uses the SO2 concentrations predicted by the Elman recurrent neural network model two to three hours in advance as input for an ARIMA model in order to achieve a more accurate prediction of the SO2 concentration. The main goal of this hybrid model is not only to achieve a general more accurate prediction of the SO2 concentration at all times, but also to improve the detection of pollution incidents. In other words, this hybrid model was chosen in order to improve the capability of the previous model to detect incidences of pollution as early as possible. The SO2 concentration in each moment is calculated as the sum of the concentration 5 minutes before () and in the mentioned moment () given by the recurrent neural network, plus the increase in the SO2 concentration from the previous moment, and divided by two.
3. Results and Discussion
3.1. Results of the ARIMA Model
The best model found using the Hyndman and Khandakar algorithm was the ARIMA . Figure 4 shows the autocorrelation functions (a) and the residual partial autocorrelation (b). The root-mean-square error (RMSE) obtained for the set of training data (data from January to November) applying the mentioned ARIMA model was 8.7446. The Ljung-Box statistic  was used to check the adequacy of model. The value for the Ljung-Box statistic was 0.9821, and therefore it can be stated that the data in the residuals were independently distributed or, in other words, that the residuals from the ARIMA model have no correlation. The error obtained when the ARIMA model was applied to the set of validation data set corresponding to the month of December was of 8.9629. Similarly, when the equivalent model with data from January to June was trained, the RMSE obtained was 8.9101. In this case, the value of the Ljung-Box statistic was 0.8347 while the RMSE value of the model applied to the data of July gave a value of 10.60121. As it can be observed in Figure 5, the main weakness of the ARIMA model is that in spite of its good average, it seems to be unable to predict pollution incidents because it is unable to reproduce the real peaks of SO2 concentration (see Figure 1).
3.2. Results of the Recurrent Neural Network Model
Some Elman recurrent neural networks architectures were tested in order to find the best generalization characteristics of the data. The input variables employed were the SO2 concentrations 2 and 3 hours before the current time and recorded each five minutes.
The best configuration resulted in an Elman neural network with 11 neurons in the hidden layer. The activation function employed was the hyperbolic tangent , while the learning rate was 0.1 and the momentum 0.9. The results obtained using the information from January to November as training data gave us an RMSE of 5.5225 while convergence was achieved after 934 epochs in the case of the model trained with data from January to June; convergence was achieved after 632 epochs with an RMSE of the training data set of 7.8013. The validation results corresponding to the dataset of December, trained with the data from January to November were 7.6121 while the results of the data corresponding to the month of July using the model trained with the data from January to June gave us an RMSE result of 8.0347.
3.3. Results of the Hybrid Model
The proposed hybrid model was applied to the same database. First of all, a hybrid model using the values of the SO2 concentration from January to November as input data was trained. The RMSE value of the training data set was 5.0238 while the RMSE obtained using the validation subset corresponding to the month of December was 6.6850. Finally, another model was trained using data from January to June, obtaining an RMSE of 6.6014 while the RMSE obtained when the model was validated with data from the month of July, the value obtained was 6.5356. In this case, the number of epochs that were necessary for the Elman recurrent neural networks convergence was from 1023 and 712, respectively. Figure 6 shows the results obtained of the application of the hybrid model to the month of December. This figure represents a total of 8,928 measurements, showing their real value and the predicted one. It can be observed how the hybrid model is able to predict the SO2 pollution incidences although in some cases it is not able to predict their maximum values.
In the present research, the utility of three different mathematical models (ARIMA, Elman recurrent neural network, and a hybrid model) to predict SO2 emission episodes of a coal-fired station was analyzed. Emission episodes correspond to peaks in the time series of SO2 concentration.
The ARIMA model was not able to reproduce emission episodes, just the general trend of the time series. The Elman recurrent neural network performed better given its capacity to detect emission episodes. However, the best results were obtained with a hybrid model that applies the ARIMA model to the Elman neural network output.
The results obtained with the hybrid model made it possible to predict emission episodes 115 minutes in advance, which is a sufficient response time to take preventive measures.
We would like to remark that the main advantages of the method proposed in the present research are, on the one hand, linked with the capability of Elman recurrent neural networks to perform sequence prediction that is beyond the power of a standard backpropagation recurrent neural network, and, on the other hand, with how its capabilities are used in our hybrid model. Therefore, in the proposed hybrid model, the predictions obtained from the Elman recurrent neural network are used as input values for an ARIMA model that, having information corresponding from 30 to 5 minutes in advance, is able to predict the SO2 concentration 55 minutes in advance.
A. Kandya and M. Mohan, “Forecasting the urban air quality using various statistical techniques,” in Proceedings of the 7th International Conference on Urban Climate, Yokohama, Japan, 2009.View at: Google Scholar
G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, Calif, USA, 1976.
K. M. Mok and S. C. Tam, “Short-term prediction of SO2 concentration in Macau with artificial neural networks,” Energy and Buildings, vol. 28, no. 3, pp. 279–286, 1998.View at: Google Scholar
B. M. Fernández de Castro, J. M. Prada Sánchez, W. González Manteiga, M. Febrero Bande, J. L. Bermúdez Cela, and J. J. Hernández Fernández, “Prediction of SO2 levels using neural networks,” Journal of the Air & Waste Management Association, vol. 53, no. 5, pp. 532–539, 2003.View at: Google Scholar
S. A. Abdul-Wahab and S. M. Al-Alawi, “Prediction of sulfur dioxide (SO2) concentration levels from the Mina Al-Fahal Refinery in Oman using Artificial Neural Networks,” American Journal of Environmental Sciences, vol. 4, no. 5, pp. 473–481, 2008.View at: Google Scholar
J. Zhang, H. Jiang, Z. Chen, X. Li, and Y. Lu, “The comparison of environmental time series statistical prediction methods,” in Proceedings of the Asia Pacific Conference on Environmental Science and Technology Advances in Biomedical Engineering, vol. 6, pp. 267–272, 2012.View at: Google Scholar
T. C. Mills, Time Series Techniques for Economists, Cambridge University Press, Cambridge, UK, 1990.
R. J. Hyndman and Y. Khandakar, “Automatic time series forecasting: the forecast package for R,” Journal of Statistical Software, vol. 27, no. 3, pp. 1–22, 2008.View at: Google Scholar
J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, no. 2, pp. 179–211, 1990.View at: Google Scholar
G. M. Ljung and G. E. P. Box, “On a measure of lack of fit in time series models,” Biometrika, vol. 65, no. 2, pp. 297–303, 1978.View at: Google Scholar