Research Article | Open Access
Herui Cui, Pengbang Wei, Yupei Mu, Xu Peng, "SARIMA-Orthogonal Polynomial Curve Fitting Model for Medium-Term Load Forecasting", Discrete Dynamics in Nature and Society, vol. 2016, Article ID 9649682, 9 pages, 2016. https://doi.org/10.1155/2016/9649682
SARIMA-Orthogonal Polynomial Curve Fitting Model for Medium-Term Load Forecasting
Seasonal component has been a key factor in time series modeling for medium-term electric load forecasting. In this paper, a seasonal-ARIMA model is developed, but the parameters of the SAR and the SMA turn out to be quite nonsignificant in most cases during the model order selection. To address this issue, the hybrid time series model based on the HP filter is utilized to extract the spectrum sequences with different frequencies and analyze interactions among various factors. Finally, an integrative forecast is made for the electricity consumption from January to November in 2014. The empirical results demonstrate that the method with HP filter could reduce the relative error caused by the interaction between the trend component and the seasonal component.
To a certain extent, the medium-term power consumption is affected by the seasonal factors, the historical consumption, and consumption peaks caused by unexpected events. According to the current prediction techniques, these factors are temporarily categorized into the long-term trend (), the seasonal fluctuation (), the cycle volatility factors (), and the irregular volatility factors (). The influences of these factors superimpose on each other and thus that became a difficult problem in model construction.
In recent years, many researches have been conducted in the field of the above four fluctuation factors in power load forecasting. The neural network method is often used to make predictions on electric load [1, 2], whereas, considering the data volume, the time sequence model is more in line with the characteristics of the sequence than the neural network model . Azadeh et al.  combined the seasonal fluctuation and the nonlinearity of forecasting with the fuzzy system and data mining techniques to analyze the monthly electricity demand in Iran. The SVM model can also be exploited to analyze the effect of the seasonal fluctuation and the long-term trend . The significant trend sequence can be analyzed through the model combined with neural network method [6–8]. It follows that the characteristics such as trend and seasonal ones are the key factors which affect the accuracy of the load forecasting in the medium-term load forecasting. The SARIMA model which could eliminate the effects of seasonal factor and irregular change factors is more suitable for the monthly electricity consumption forecasting [9, 10]. The decomposition method of sequence is often used to analyze the superimposed effect produced by seasonal change tendency and the long-term growth and decline trend . Among them, the application of Hodrick-Prescott (HP) filtering method has the certain superiority in the series decomposition [12, 13].
The seasonal-ARIMA model is able to take all seasonal fluctuation of sequence into full account. However, due to the interaction of the four fluctuations, “”, “”, “,” and “”, and the interaction between the seasonal factors and the nonseasonal factors, the seasonal parameters are nonsignificant in practical applications in most cases. HP filter is based on the spectral analysis to separate the data sequences and relieve the superimposed impact of the fluctuations. In this paper, by using the HP filter we get the sequence with significant trend and the sequence with significant periodicity. With separated modeling and integrative analysis, the model can successfully relieve the mutual influence of the changing trend and improve the precision of prediction. Besides, to cope with the model order problems, the paper has conducted a long-memory test on the original data sequence. The result shows that the sequence does not meet the standard random walk process, which puts forward new ideas with the power load forecasting.
The purpose of this paper is to design an accurate prediction model. And the remaining parts of the paper are organized as follows. Section 2 introduces the principle of the method. Section 3 describes the process of the power load forecasting by using the traditional method and the improved method and discusses the results. The last section makes some conclusions of this paper.
2. Forecasting Models
2.1. SARIMA Model
SARIMA (Seasonal Autoregressive Integrated Moving Average), which is denoted as , is based on the traditional model, and it can eliminate the periodicity influence in a prediction process and thus is a widely applied model for forecasting seasonal time series [14, 15]. The formula can be described as follows:where is the backward shift operator. The integers , , , and are the order of , , , and , respectively. The integers and are the number of regular differences and seasonal differences, respectively, and, for a nonstationary time series , could come to a stationary series by using the difference operator . satisfies the formula . The formulasare polynomials in of degrees and . And the formulasare polynomials in of degrees and . And which is a current interference with variance and mean = 0 is considered as the estimated residual at time . At the same time, is an independent and identically distributed normal random variable.
In the process of the seasonal time series analysis, there are three questions that need to be analyzed.
(1) Stationary Test. The stationarity of the time series is the premise for building the S-ARIMA model. When it meets the condition that , , and are constants in formula (4), we can define as weakly stationary or covariance stationary:
The ADF unit root test can be used to test whether the sequence is stationary or not. If the sequence is nonstationary, the difference transformation would be used until the difference sequence is stationary. The stationary sequence with differential transformation is defined as .
(2) Seasonal Analysis. Before we make the seasonal analysis, the autocorrelation function should be defined. It can be expressed as follows:where is a stationary sequence and is the average of the sequence . By judging the autocorrelation function and the confidence interval, the periodicity and the cycle of could be obtained. According to the additive model which is defined as (6), the sequence can be seasonally adjusted:where means long-term trend and cycle volatility, means seasonal fluctuation, and means irregular volatility.
(3) Model Order Selection and Model Prediction. Firstly, the seasonally adjusted sequence is defined as . The lag intervals for endogenous function and the confidence interval of the autocorrelation function and the partial autocorrelation function should be analyzed in order to determine the order of AR(), MA(), SAR(), and SMA() and build the model. Then, according to the principle of minimum mean square error, the prediction is the conditional expectation of , and it can be expressed as follows:when the higher-order problem exists in the model, we can make the long-memory test for the stationary sequence .
2.2. Method of ARFIMA Model
The long-memory analysis, which is specific to the random walk process, is put forward by H. E. Hurst in the research of the relationship between the reservoir of water flow and the storage capacity in 1951. And he puts forward the rescaled range analysis () for the long-memory analysis. Then the researchers often use this method for financial sequence analysis and build the Autoregressive Fractionally Integrated Moving Average (ARFIMA) model [16, 17]. The analysis procedure of method is shown in the following paragraph.
Firstly, the sequence is divided into the infinite number of intervals, and the length of each interval is . Every interval is defined as follows:where is the average of the interval and is the cumulative deviation of the interval . Then the letter can be used in denoting the difference between the maximum and the minimum , and the letter can be used in denoting the standard deviation of the sequence , so the formula of analysis can be expressed as follows:where , which is the Hurst Index, is defined as the index of and is a constant. The logarithm should be taken on both sides of the equation, and adjust the equation as follows:
Then the Hurst Index can be worked out by the OLS method. Finally, the long memory can be judged by the standard as follows :when , the original sequence is likely to have a long memory ; however, whether there is an ARFIMA model which is suitable for most of the medium-term load forecasting cannot be guaranteed .
2.3. Orthogonal Polynomial Curve Fitting
Orthogonal polynomial curve fitting is the improvement of the Ordinary Least Square (OLS). There is a premise that the independent variables must be accurate values before using the OLS method, but it is not reasonable in most cases. When the error of the independent variables reaches a certain extent, the prediction model with OLS method would produce a certain error. In view of this situation, the orthogonal polynomial curve fitting is proposed. And its basic principle is that the square sum of the orthogonal distance from all points to the fitting curve is minimum. In the OLS method the fitting polynomial can be expressed as follows:which is fitted by the least square criterion: the distance square sum between the predicted value and actual value is minimum, and it can be expressed as follows:then the undetermined coefficients can be got by the mean value theorem. This orthogonal polynomial curve fitting method is improved on the basis of OLS method, and the errors of the dependent variable and the independent variable are considered to build forecasting model. And the fitting polynomial can be expressed as follows:where is the predicted value of the independent variable . The orthogonal distance error can be expressed as follows:where and are the random error of and , respectively. Then the criterion of the orthogonal polynomial curve fitting can be expressed as follows:
Combining the orthogonal polynomial with the OLS method, the multinomial model can rise to the imitative effect.
The objective function can be expressed as follows:where represents the real point, represents the fitted curve, and represents the orthogonal distance from the real points to the fitted curve.
The parameter equation of fitted curve can be defined as follows:where is a point of fitted curve and is the included angle of the tangent to the abscissa axis, so the objective function can be expressed as follows:
Then we should take its partial derivative with respect to , , and in order to calculate the minimum error and the fitted curve. The equation set can be expressed as follows:where and are the mean values of the sequences and .
2.4. Hodrick-Prescott Filter
Hodrick and Prescott first put forward Hodrick-Prescott filter (HP filter) method in the paper analyzing the economic cycle about postwar America. The method regarded the time series as the spectrum for analyzing [14, 15]. It divided the sequence into two groups, and their relationship with the original sequence is counted as where the sequence with long-term trend is denoted as and the sequence with short-term volatility is denoted as . The separation process must satisfy the minimum loss function principle:where , where is the smoothing parameter and and represent the standard deviation of the sequence and the sequence , respectively. When increases, estimated total trend changes in relation to the change in the sequence which is reduced. It means that takes the high number, the estimated trend is more smooth, and when trends to infinity, estimated trend will be close to the linear function. As a general rule of thumb, when we analyze the monthly data, can be defined as .
In this paper, the HP filter is applied to the nonseasonally adjusted series, and the original sequence is divided into two sequences with the significant spectral frequency and building the model more accurately by weakening the mutual effect between the two sequences.
2.5. Error Estimation Methods
There are five basic error estimation methods; simultaneously, the model can be evaluated by relative error (RE), mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE), which can be expressed as follows:
2.6. Sequence Analysis and Combination Model Building
The improved model is based on separating the original sequence by filtering analysis. Then according to the characteristics of each sequence, the models can be established for forecasting. The detailed process is as follows.
(1) According to the HP filtering principle, the original sequence , defined as the superposition of the waves with different frequencies, can be divided into the sequence and the sequence .
(2) The sequence is defined as a function of time “,” and its scatter-plot can be drawn. The error term from each point to the fitting curve is denoted as , . Then the orthogonal polynomial with the OLS method is used for making polynomial curve fitting to minimize the sum of squared errors .
(3) According to the polynomial fitting in the previous step, the sequences’ predictions can be got and defined as .
(4) The stationary property of the sequence is tested. If it was stationary, the correlation analysis can be used on the sequence; otherwise, the differential transform is conducted on the sequence until it is stationary. The stationary sequence is denoted as .
(6) The rationality of the ARIMA model is tested by the residual sequence.
(7) According to the ARIMA model, the sequences’ predictions can be got and denoted as .
(8) The final prediction result can be obtained based on the principle of HP filter:
The improved model will produce twice prediction error in the analysis. In theory, there is the possibility of increasing the errors and reducing the prediction accuracy. But in the actual analysis, the HP filter method weakens the mutual influence of factors (including the long-term trend and the seasonal fluctuation) and utilizes integrative forecasting for models with the different characteristics of the sequence and the sequence . In this way, the trend of the sequence can be effectively fitted and the influence on the seasonal trend can be reduced. Finally higher prediction accuracy can be achieved.
According to the above steps, the specific process of the improved model is shown in Figure 1.
3. Empirical Analysis
The example chooses the research data about electric power consumption from January 2004 to November 2014 in China, the data in January 2004 to December 2013 for model building, and the data in January 2014 to November 2014 for testing the prediction error.
3.1. The Forecasting of the Seasonal-ARIMA Model
Figure 2 plots the monthly electricity consumption data from January 2004 to December 2013; as we can see, the intercept and the trend exist in the original sequence, and the sequence is nonstationary. The result of the ADF unit root test with intercept and trend on the sequence is shown in Table 1.
|Prob. is the value; the smaller the value, the larger the significance. indicates to reject the original hypothesis under the confidence level of 95%.|
The ADF unit root test demonstrates that the original sequence is nonstationary and the first-order difference of the original sequence is stationary under the 5% significant level. Through observation of the autocorrelation function of the first-order difference sequence, we discover that the sequence’s seasonal cycle is 12. Therefore, the additive model is used to adjust the seasonal trend of the sequence. The analysis of the partial autocorrelation and the autocorrelation is shown in Figure 3.
On the basis of the 95% confidence level, the confidence interval of the correlation coefficient is
Combined with Figure 3, we can find that partial autocorrelation coefficient in and the autocorrelation coefficient in are not in the confidence interval. Based on the autocorrelation diagram and the partial autocorrelation diagram, OLS method is adopted to establish the seasonal-ARIMA model, and the adjustment of model parameters is based on the significance of correlation coefficient. The estimation of parameter is shown in Table 2.
After determining the model order through parameter significance testing, the time series model we obtain is . In this model, the nonseasonal autoregressive items are AR(1) and AR(2), the nonseasonal moving average items are MA(1) and MA(11), and the seasonal autoregressive items are SAR(1). According to this model, the electricity consumption from January to November in 2014 could be forecasted and the result is shown in Table 6.
3.2. The Forecasting of the Seasonal-ARFIMA Model
In the process of model order selection, MA(11) has significant influence on modeling and predicting. This phenomenon shows that the error caused by the long-term observations still influences the current monthly electricity consumption to some extent. We infer that the monthly electricity consumption may have the long-term memory characteristics, and this conjecture is confirmed by the long-term memory test using method which shows the Hurst exponent:
According to the criteria, , we know that the monthly electricity consumption has the long-term memory characteristic. Therefore the current forecast is influenced by the distant observations.
3.3. Integrative Model
The HP filter is applied to analyze the original sequence and the smoothing parameter is assigned values as 14400. And the decomposition results are shown in Figure 4. The blue curve means the original sequence. The red curve means the long-term trend, and we can find that the growth rate of the power consumption is mainly constant from 2004 to 2013. The green curve means the cyclical and irregular change. And as time goes on, the fluctuation range is more significant.
According to HP filter, the original sequence can be separated into the sequence with the long-term trend and the sequence with other fluctuation properties.
The data separation result using HP filter is shown in Figure 5. We can see the sequence approximates a smooth curve, in which the curve of the sequence fluctuates up and down around the zero.
In the perspective of statistics, the sequence can be transformed into a sequence relating to . Let the independent variable represent time and let the sequence represent the dependent variable of the system. A curve is fitted to describe the sequence as a function of . It turns out that the best fitting is achieved when the order of curve fitting is four and the relative error of curve fitting is shown in Figure 6.
The fitting polynomial is
According to the fitting polynomial, the sequence from January to November 2014 can be forecasted and the results are shown in Table 3.
The sequence is analyzed based on the time series model. And according to the significance we can adjust the parameters and record in Table 4. Finally, the model is .
According to , the sequence is forecasted and the prediction results from January to November 2014 are shown in Table 5.
Finally, according to the HP filter principle, , we get the prediction of electric consumption from January to November 2014, as shown in Table 6.
3.4. Model Error Analysis
According to Table 6, we make a line chart about the absolute value of the relative error as shown in Figure 7. Except for March, May, and June, the relative errors of the integrative model are smaller than the SARIMA model. From the long-term forecast in July to November, the forecasting results of integrative model are superior to the forecasting results of SARIMA model. In March, May, and June, the optimization models of fitting values are not optimized. On the one hand, this is the objective result of the experimental data; on the other hand, the improved method is not perfect enough. The predicted data is in the large floating in March; this fluctuation is caused by the prediction of sequence .
According to the prediction result, we make analysis for the prediction error and fill the result in Table 7. Through the comparative analysis, it can be discovered that the prediction accuracy of the improved model is significantly improved.
As shown in Table 7, the MSE of the improved model is the minimum, and it means the predictive ability of the improved model is the most stable. By observing the MAPE, the improved model is 1.817%, which is less than 2.643% of the SARIMA model, and it shows that the predictive results of the improved model are close to the real value. The MAE error measurement also shows the same result.
In this paper, HP filter is utilized for adjusting the time sequence data. Thus the original sequence is decomposed into the sequences with different trend, and the mutual interference between the different fluctuation items can be relieved. Moreover the relative error of load forecasting is reduced and the multistep prediction is guaranteed.
The testing result of the method shows that the monthly electricity consumption has the property of long-memory process. This conclusion may be attributed to interference of the amount of the data. However, in the actual order analysis, the higher-order AR or MA indeed affects the power load forecasting. Therefore, the long memory of the time series can be considered for building the SARFIMA model in the midterm power load forecasting.
The authors declared that there is no conflict of interests related to this paper.
The authors sincerely acknowledge the financial support from the National Natural Science Foundation of China (no. 71471061).
- M. Meng, D. Niu, and W. Sun, “Forecasting monthly electric energy consumption using feature extraction,” Energies, vol. 4, no. 10, pp. 1495–1507, 2011.
- B. Yang and Y. Sun, “An improved neural network prediction model for load demand in day-ahead electricity market,” in Proceedings of the 7th World Congress on Intelligent Control and Automation (WCICA '08), pp. 4419–4424, Chongqing, China, June 2008.
- S. S. Pappas, L. Ekonomou, D. C. Karamousantas, G. E. Chatzarakis, S. K. Katsikas, and P. Liatsis, “Electricity demand loads modeling using AutoRegressive Moving Average (ARMA) models,” Energy, vol. 33, no. 9, pp. 1353–1360, 2008.
- A. Azadeh, M. Saberi, S. F. Ghaderi, A. Gitiforouz, and V. Ebrahimipour, “Improved estimation of electricity demand function by integration of fuzzy system and data mining approach,” Energy Conversion and Management, vol. 49, no. 8, pp. 2165–2177, 2008.
- M. De Felice, A. Alessandri, and F. Catalano, “Seasonal climate forecasts for medium-term electricity demand forecasting,” Applied Energy, vol. 137, pp. 435–444, 2015.
- C. Hamzacebi and H. A. Es, “Forecasting the annual electricity consumption of Turkey using an optimized grey model,” Energy, vol. 70, pp. 165–171, 2014.
- D. Niu and M. Meng, “Research on seasonal increasing electric energy demand forecasting: a case in China,” Chinese Journal of Management Science, vol. 8, no. 2, pp. 108–112, 2010.
- N. Dongxiao, C. Zhiye, X. Mian, and X. Hong, “Combined optimum gray neural network model of the seasonal power load forecasting with the double trends,” Proceedings of the CSEE, vol. 22, pp. 29–32, 2002.
- Y. Wang, J. Wang, G. Zhao, and Y. Dong, “Application of residual modification approach in seasonal ARIMA for electricity demand forecasting: a case study of China,” Energy Policy, vol. 48, pp. 284–294, 2012.
- Q. Zhanjun, “Medium and long-term load forecasting based on census X12-SARIMA model,” in Proceedings of the CSU-EPSA, vol. 28, pp. 34–38, 2014.
- Z. Ma and Q. Shu, “Short term load forecasting based on ESPRIT integrated algorithm,” Power System Protection and Control, vol. 43, no. 7, pp. 90–96, 2015.
- X. Zhu, W. Luan, and Y.-S. Zhu, “Analysis of macroeconomic time series by the HP filter,” in Proceedings of the 2nd International Conference on E-Business and E-Government (ICEE '11), pp. 3003–3006, Shanghai, China, May 2011.
- Y. He, B. Wang, J. Wang, W. Xiong, and T. Xia, “Correlation between Chinese and international energy prices based on a HP filter and time difference analysis,” Energy Policy, vol. 62, pp. 898–909, 2013.
- R. S. Pindyck and D. L. Rubinfeld, Econometric Models and Economic Forecasts, China Machine Press, 4th edition, 1999.
- F. Huanhuan and Z. Lingyun, EViews Statistical Analysis and Application, China Machine Press, Beijing, China, 2009.
- C. W. Granger and R. Joyeux, “An introduction to long-memory time series models and fractional differencing,” Journal of Time Series Analysis, vol. 1, no. 1, pp. 15–29, 1980.
- Z. Huiming and L. Jing, Bayesian Econometric Model, Science Press, Beijing, China, 2009.
- A. Lahiani and O. Scaillet, “Testing for threshold effect in ARFIMA models: application to US unemployment rate data,” International Journal of Forecasting, vol. 25, no. 2, pp. 418–428, 2009.
- N. Chaâbane, “A hybrid ARFIMA and neural network model for electricity price prediction,” International Journal of Electrical Power & Energy Systems, vol. 55, pp. 187–194, 2014.
- R. S. Tsay, Analysis of Financial Time Series, John Wiley & Sons, New York, NY, USA, 3rd edition, 2010.
Copyright © 2016 Herui Cui et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.