Short-Term City Electric Load Forecasting with Considering Temperature Effects: An Improved ARIMAX Model
Short-term electric load is significantly affected by weather, especially the temperature effects in summer. External factors can result in mutation structures in load data. Under the influence of the external temperature factors, city electric load cannot be easily forecasted as usual. This research analyzes the relationship between electricity load and daily temperature in city. An improved ARIMAX model is proposed in this paper to deal with the mutation data structures. It is found that information amount of the improved ARIMAX model is smaller than that of the classic method and its relative error is less than AR, ARMA and Sigmoid-Function ANN models. The forecasting results are more accurately fitted. This improved model is highly valuable when dealing with mutation data structure in the field of load forecasting. And it is also an effective technique in forecasting electric load with temperature effects.
Short-term load forecasting (STLF) is mainly used to forecast the power load for the next few days or week [1–3]. It plays an important role in the modern electricity Demand Side Management (DSM), as its accuracy directly affects the economic cost of operators in the electricity market. Accurate load forecasting is helpful for security, stability, and economic operation in power grid. It is also advantageous in making reasonable arrangements for maintenance plan. Meanwhile, power load forecasting can optimize power system dispatch and reduce production cost.
Short-term daily peak power load in summer fluctuates regularly, showing an obvious periodical characteristic. It is greatly affected by temperature, wind, precipitation, and other meteorological factors. There are significant mutation structures in load data [4–6]. There are traditional methods in power load forecasting, such as regression model, gray model, support vector machines, neural networks, and time series. Ramón Cancelo et al.  used Red Eléctrica de España (REE) to forecast the electricity load from a day to a week ahead. Hipperta et al.  adapted large neural networks in electricity load forecasting to handle nonlinear time series data. Felipe Amarala and Castro Souza  used smooth transition periodic autoregressive (STPAR) models for short-term load forecasting. Amjady and Keynia  proposed a new neural network learning algorithm based on a new modified harmony search technique. This learning algorithm is widely used to search the solution space in various directions, by which overfitting problem and trapping in local minima and dead bands can be avoided. Wangdi et al.  adapted ARIMAX model to determine predictors of malaria for the subsequent month. And the test showed that prediction accuracy has been greatly improved. Chadsuthi et al.  studied seasonal leptospirosis transmission and the association with rainfall and temperature by using ARIMAX model showing that factoring in rainfall (with an 8-month lag) yields the best model for the northern region. The above forecasting methods are obviously effective in dealing with mutation structures and intelligent algorithms. However, they are not ideal in practical operation due to the limitation of data and laboratory equipment. The generalization capability is also weak. Traditional time series forecasting methods highlight the time role, without considering the external factor effects. Thus, the forecasting accuracy of time series methods is poor, with obvious defect [13, 14]. Based on the above research, an improved ARIMAX model is proposed here by combining the traditional time series with regression analysis to forecast short-term electric load, which has a strong practice value in the short-term power load forecasting field. This model fills the gaps of external effects on electric load. The prediction result showed that the improved ARIMAX model has a smaller model information amount than or [15, 16].
2. Sigmoid-Function ANN Model
ANN (Artificial Neural Network) is very practical forecasting technology in short-term electric load forecasting fields, especially for those nonlinear data. The basic concepts about ANN are shown as where is the state of network unit . Consider
Through the operating of first-order derivative to output unit, is obtained as follows:
The specific algorithm of Sigmoid-Function ANN model is shown as follows:(1)The initial value of weight or threshold is defined as , while is small random number.(2)Training samples are input vector , ; expectation output , . Steps from to are carried out for each input sample.(3)Computing actual output and the state of hidden units in network(4)Calculation training error(5)Correcting weights and thresholds(6)When index is located at , judging if .
3. Time Series Theory
Time series is a typical time-domain analysis method. It can be used to reveal the internal laws of the sequences from the perspective of autocorrelation.
Typical time-domain analysis steps are the following:(1)Observing sequence features.(2)Selecting the appropriate fitted model according to the features computed by SAS.(3)Model testing and optimization process.(4)Using fitted model to infer the nature of sequence.
Core contents of time series analysis method are proposed by American statistician George E. P. Box and United Kingdom statistician Gwilym M. Jenkins in their book Time Series Analysis Forecasting and Control, in which it is called autoregressive moving average model (ARIMA). Some important concepts are displayed here.
Stationarity. is set as time series, and , ( is positive integer) , , ( is integer), , named for strictly stationary time series.
White Noise. Time series meet the condition∀ ; thennamed for white noise sequence or displayed as
Definition 1. The model is named autoregressive moving average model, if it contains the following structures, abbreviated as
Introduced delay operator , it can also be presented as
Cointegration Theory. The cointegration theory was put forward by Engle and Granger in 2001 . Model can be calculated without the requirement that all sequences are stationary, if the cointegration relationship is obvious. The typical cointegration test is test [18–20].
Definition 2. Supposing that the response variable and the input variable sequences are all stationary, the regression model is established in response to the input variable sequences and response sequences:
In the actual modeling process, an improved ARIMAX model is proposed to forecast the short-term electric load. The specific process is displayed below.
4. The Improved ARIMAX Modeling Process
4.1. Modeling Steps
These are as follows(1)Perform logarithmic transformation on the original response sequence and the inputted sequences in order to meet the homogeneity of variance assumption.(2)Checking the stationarity of logarithmic transformation sequences, If the sequences are stationary, move on to the next step; if not, conduct differential operation to the logarithmic sequences and testing stationarity again; then execute the second-order differential operation until the stationarity is satisfied.(3)Establishing the ARMA model about , are N-order difference stationary inputted sequences.(4)Establishing the ARMA model about , are the N-order difference stationary response sequences.(5)Exploring the correlation coefficient between the stationary N-order difference logarithmic sequences “” and “” to determine the structure of improved ARIMAX model. This step is the improved part for traditional ARIMAX model. Therefore, the revised ARIMAX model can be calculated as follows:(6)Fitting residual sequence is a zero mean white noise sequence.
Based on the above steps the improved ARIMAX model can be applied into load forecasting process.
4.2. Modeling Flowchart
See Figure 1.
5. Load Forecasting with ARMA Model
5.1. Load Data
The table in the appendix shows the daily maximum power load data and the maximum temperatures in a city from 1st June to 14th August (see Table 12). In this paper, the data is used to explore the classical time series models and ANN models are used to firstly forecast load. In Section 6, an improved ARIMAX model is established to compare the prediction accuracy [21–23].
5.2. Establishing ARMA Model
After the time series analysis on the load data by SAS software, autocorrelation table is previously mentioned. Table 1 shows that the autocorrelation coefficients of the sequences are always positive [26–28]. It can be inferred that daily peak power load data is nonstationary series with a monotonic trend, which is shown in Figure 2.
At the same time, the partial autocorrelation table can be obtained. Table 2 shows that only the first-order partial autocorrelation coefficient is significantly greater than two-time standard errors . The rest partial autocorrelation coefficients rapidly decline to zero, making random fluctuations within two-time standard deviation ranges. Thus it can be regarded as the first-order truncation.
According to white noise test, statistic (probability) is less than 0.05; thus the sequence is nonwhite noise. Then, the model is applied to forecast power load data. In residual autocorrelation coefficient test about model, it is shown that statistic is larger than 0.05; thus this model applies.
After the SAS processing, model can be presented as
In order to optimize the ARMA model, the minic option is used to detect the best order . Setting model as , the option detects the . The model is presented as
6. Load Forecasting with Improved ARIMAX Model
6.1. Testing Statistics
Regardless of the kinds of models, the value of test statistic is significantly greater than 0.05 by ADF test. Daily maximum power load data series are markedly nonstationary. Therefore, the following analysis is conducted on nonstationary data sequence.
Firstly, performing logarithmic transformation on the original sequence,
Thus the sequences can meet the homogeneity of variance. The white noise test of sequence indicates that the sequence is a nonwhite noise sequence. Unit root test shows that the value of statistic is significantly greater than 0.05. It is suggested that sequence is nonstationary. There is one unit root in sequence at least. And the analysis on sequence is similar to that of [23–25].
Secondly, operating first-order differential operators to and sequences to get stationary and ,
Thirdly, operate stationary test and white noise test to logarithmic sequence after first-order differential and . The test result shows that the value of white noise test is greater than 0.05, which means that and sequences are pure random white noise sequences . And the value of statistic is less than 0.05, showing that and sequences are stationary series. Until now, the test analysis has been finished.
6.2. Computing and Sequences Model
Firstly, the model is established. The test shows that is a stationary white noise sequence; thus the fitting model is
Secondly, the model is established (see Table 6). The test results obtained by SAS from Tables 3 to 5 show that is a stationary white noise sequence (the value in Table 3 is larger than 0.05, while the value in Table 4 is smaller than 0.05). The best order for model is . Therefore, the fitting model is or model. The constant term is not significant, using the noint option to remove the intercept. The final fitting model is shown as 
6.3. Computing Load Data with Improved ARIMAX Model
The above model is used to filter input variable sequence and the response variable sequence . The mutual relationships numbers between the independent variables and the response variable are calculated after filtration by ARIMA analysis process.
It can be found in Table 7 that only the 0-order delay mutual relationship number is significantly nonzero, which means that there is no hysteretic effect between response sequence and input sequences. Therefore, the model should be treated in the same period.
The regression analysis in Table 8 shows that the final regression coefficient is 0.37098.
The statistics test is carried out on residual sequence, showing that the residual sequence is stationary white noise sequence (). The fitted model for residual sequence is , and is zero mean white noise sequence [32–35].
It is known that there is significant correlation in the zero-order between the two sequences in Table 7. The same period model is established between and , based on the parameter estimates in Table 8 and tests in Tables 9~10. The value in Table 9 is larger than 0.05; thus the autocorrelation check of residuals shows that the model is effective for forecasting loads:
The load from 15th to 31st is forecasted according to the improved ARIMAX model.
By operating logarithm to the forecasting results “,” the next 15-day maximum load can be obtained, which is shown in Table 11.
MAE (Mean Absolute Error) is computed as follows:where is the prediction value, is the actual value, and is the sample size.
It can be seen in Table 11 that the MAE of the improved ARIMAX model is the minimum, Sigmoid-Function ANN ranked second small, followed by AR model, and ARMA model is the maximum. It means that the improved ARIMAX model is better than S-ANN, AR, or ARMA model according to the AIC and SBC Criterion in Table 11. The revised ARIMAX model is more effective, by which more accurate load results can be obtained.
It can be seen in Figure 3 that the blue line is the actual daily maximum power load data, while the red line is the forecasting data of improved ARIMAX. The difference between improved ARIMAX model and actual power load data is the minimum among these models. Residual stationarity and white noise test show that the residual is stationary white noise sequence, showing that . There is second-order delay correlation between and . The final fitting model is
Based on the above analysis, the improved ARIMAX model can effectively dig up self-related information of load data. As an effective method for short-term load forecasting, the model can get a more accurate prediction result than traditional time series models. Prediction accuracy of this model is greatly improved, which is of high value in engineering application area. It is verified by relative error analysis of ARMA and the improved ARIMAX that the revised model has higher prediction accuracy than usual forms.
|:||The state of network unit|
|:||Output (hidden) layer unit|
|:||Initial value of weight or threshold|
|:||Mean of time series|
|:||Random interference coefficient|
|:||-order moving average coefficient polynomials|
|:||Residual sequence moving average coefficient polynomials|
Conflict of Interests
The authors declare no conflict of interests.
The authors gratefully acknowledge the financial support from the National Natural Science Fund of China (no. 71471061).
K. Huarng and H.-K. Yu, “A dynamic approach to adjusting lengths of intervals in fuzzy time series forecasting,” Intelligent Data Analysis, vol. 8, no. 1, pp. 3–27, 2004.View at: Google Scholar
K. Wangdi, P. Singhasivanon, T. Silawan, S. Lawpoolsri, N. J. White, and J. Kaewkungwal, “Development of temporal modelling for forecasting and prediction of malaria infections using time-series and ARIMAX analyses: a case study in endemic districts of Bhutan,” Malaria Journal, vol. 9, article 251, 2010.View at: Publisher Site | Google Scholar
S. Chadsuthi, C. Modchang, Y. Lenbury, S. Iamsirithaworn, and W. Triampo, “Modeling seasonal leptospirosis transmission and its association with rainfall and temperature in Thailand using time-series and ARIMAX analyses,” Asian Pacific Journal of Tropical Medicine, vol. 5, no. 7, pp. 539–546, 2012.View at: Publisher Site | Google Scholar
X. H. Yang, D. X. She, Z. F. Yang, Q. H. Tang, and J. Q. Li, “Chaotic bayesian method based on multiple criteria decision making (MCDM) for forecasting nonlinear hydrological time series,” International Journal of Nonlinear Sciences & Numerical Simulation, vol. 10, no. 11-12, pp. 1595–1610, 2009.View at: Google Scholar
H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, Cambridge University Press, Cambridge, UK, 1997.View at: MathSciNet
G. G. Szpiro, “Forecasting chaotic time series with genetic algorithms,” Physical Review E—Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, vol. 55, no. 3, pp. 2557–2568, 1997.View at: Google Scholar