Abstract

The support vector regression (SVR) and neural network (NN) are both new tools from the artificial intelligence field, which have been successfully exploited to solve various problems especially for time series forecasting. However, traditional SVR and NN cannot accurately describe intricate time series with the characteristics of high volatility, nonstationarity, and nonlinearity, such as wind speed and electricity price time series. This study proposes an ensemble approach on the basis of 5-3 Hanning filter (5-3H) and wavelet denoising (WD) techniques, in conjunction with artificial intelligence optimization based SVR and NN model. So as to confirm the validity of the proposed model, two applicative case studies are conducted in terms of wind speed series from Gansu Province in China and electricity price from New South Wales in Australia. The computational results reveal that cuckoo search (CS) outperforms both PSO and GA with respect to convergence and global searching capacity, and the proposed CS-based hybrid model is effective and feasible in generating more reliable and skillful forecasts.

1. Introduction

In the contemporary energy market, the demand for electricity soars intensely due to the development of economy and society, while reserves of fossil fuel for power generation are becoming exhaustive and various ecosystem problems are increasing. Under this serious condition, renewable, clean, and nonpolluting energy becomes alternative energy for substituting fossil fuel. So wind energy becomes the one satisfying the above requirements. Meanwhile, as the increasing generation of wind power and the growth of integration of wind power into grid system, electricity generation based on wind energy resource has been playing an increasing role in China. The installed wind power capacity has been increased by approximately 200% between 2005 and 2009 [1]. Despite the high cost of wind power plant, wind power has its unique advantages especially at remote locations which are rich in wind energy resource in China.

Wind series from the southwest of China, Wuwei City and Jinchang City in Gansu province, appear to have complex characteristics, such as high volatility, nonstationarity, and nonlinearity. In order to work efficiently on the market of the wind power, it is apparent that forecasting the wind power production is essential for farm owners and assists producers in making decisions for the sale of energy, thus increasing production and profits. If an accurate prediction of the wind speed for the following time can be evaluated, the total amount of active power that can be produced by each generator on a wind farm can be determined. So wind speed prediction is getting more and more attention [2].

However, as the result of the complicated characteristics of wind speed, such as chaotic fluctuation, nonstationarity, and nonlinearity, forecasting has been the most challenging task. In order to predict wind speed efficiently, research in the field of forecasting the wind power or wind speed has been devoted to the development of reliable and effective tools and many different methods have been reviewed and proposed in [35].

As wind speed appears to be of high volatility and nonstationary, some additional techniques as preprocessing procedures are proposed to remove the irregular wind speed, such as empirical mode decomposition (EMD) [68] method and wavelet transform (WT) or wavelet denoising (WD) method [914]. WT has been widely applied to present a signal in both frequency and time domains. Wavelet transform method has been extensively applied recently in analyzing a nonstationary and high fluctuant series. It decomposes the original complicated data into several components of wavelet transform, one of which is smooth and reflects the inherent and real information. Because of the complexity of factors of wind speed fluctuation, wavelet transform as a preprocessing procedure is used so as to obtain a further excellent performance. In [13], WT is used to decompose the original wind speed series into detail signal and approximation signal to remove the abnormal fluctuation of wind speed series for further modeling. Catalão et al. [9] propose artificial neural networks combined with WT for short-term wind power forecasting in Portugal. The proposed model is both effective and novel, outperforming persistence, ARIMA, and NN approaches. EMD is based upon the local characteristic time scale of signal and could decompose the complicated signal function into a number of intrinsic model functions (IMFs) for further modeling [15].

Zhang et al. [16] said that the frequently used statistical approaches of wind speed series forecasting can be classified into statistical models and artificial intelligent algorithms (AI). The former establish time series models to predict the future speed by mining information contained in the historical signals. Time series method includes Autoregressive Integrated Moving Average (ARIMA), which is used for forecasting wind power in US wind farms in [17, 18]. A part of the models outperform the persistence model. The Autoregressive Conditional Heteroscedastic (ARCH) model is conjoined with ARIMA model to take the heteroscedasticity influence of the residual series [19] into account. An ARMA-GARCH-M framework is employed to examine the 26 regional wind power energy markets in the US using daily average wind speed [20]. It revealed that wind speed displays a characteristic of time-varying volatility and there is different relationship between the mean and volatility of wind speed series across the different locations. In [21], the ARIMA-ARCH model is employed to predict wind speed series itself. It is demonstrated that the ARIMA-ARCH model offers better performance than single ARIMA model.

Another approach is the intelligent algorithm models building a nonlinear model to fit the historical wind speed series by minimizing the training error, such as Artificial Neural Networks (ANN). It is a widely used statistical method for many fields, such as stock price [22], electricity price [23], load forecasting [24, 25], gas consumption [26], and wind speed [27, 28]. A typical artificial neural network, Backpropagation Neural Network (BPNN) [29], is actually a mapping function relation from the vector(s) of input to output with unknowing the correlation between the data. It has been proven in mathematical theories that BPNN can implement any complicated nonlinear mapping function and approximate an arbitrary nonlinear function with satisfactory accuracy [30]. By learning the historical data pattern, BPNN can be effectively utilized to predict series in new horizon. Similarly, support vector regression (SVR) is also designed to capture the nonlinear patterns from time series [1, 31, 32]. Also, it has been observed that it can model nonlinear wind speed with an excellent performance. Nevertheless, one of disadvantages of the method is dilemma of selection of values of parameters in support vector machine because the way of selecting values for the parameters will affect the generalization performance remarkably. In this paper, chaos optimization is applied to accomplish selection of values of parameters.

As chaotic fluctuation, nonstationarity, and nonlinearity of wind speed series, hybrid models based on linear and artificial intelligence are popularly proposed in the research of wind speed series forecasting. Liu et al. [33] proposed two hybrid methods: ARIMA-ANN and ARIMA-Kalman models. ARIMA model is utilized to determine the structure of ANN and initialize the Kalman measurement and the state equations for Kalman. Su et al. [34] proposed ARIMA and Kalman filter to predict the daily mean wind speed in the west of China. To develop a novel hybrid model which is adapted to the data set and increase the fitting accuracy, this approach used Particle Swarm Optimization (PSO) to optimize the parameters of the ARIMA model. Both of them obtain good performance and are applied to the wind speed forecasting. A hybrid of ARIMA-ANN is employed in [35]. The ARIMA models were used to forecast the linear pattern and then with the obtained errors ANN were built to forecast the nonlinear tendencies that the ARIMA could not identify. It reveals that these hybrid models have a higher forecasting accuracy than the single ARIMA and ANN.

There is a large amount of research directed to the development of reliable and accurate wind speed and power prediction models. However, it is difficult to draw a conclusion of which model is the best because a model could perform well at its site, but not at other sites. In other words, a potential best forecasting model at one site does not guarantee the model to work well at another site. This paper discusses forecasting accuracy in different sites and months based on a preprocessing method and comparison between a new optimal algorithm and some conventional optimal algorithms that are used in the forecasting models. In most of the cases, the statistical tools can provide accurate results in the short-term, medium-term, and long-term prediction. However, as to the very short-term and short-term horizon, the effect of atmospheric dynamics on the wind speed becomes more important, so in these cases the use of physical approaches becomes important. This paper will explore the accuracy of very short-term (10 minutes) of 3-step forecasting by the use of statistical approaches.

The main contributions of this paper are as follows. Several standard forecasting models (SVR, BP, and Elman) are used to forecast wind series. These models make an excellent performance, respectively. In order to improve accuracy further, another two kinds of techniques are proposed in this paper. The first kind is to use 5-3 Hanning filter and Wavelet denoising as a preprocessing procedure. The second kind is a new mate-heuristic algorithm, cuckoo search, which is introduced to optimize the parameters of SVR and compare with grid search (GS) and two conventional optimal algorithms (GA and PSO). To demonstrate that our proposed method is effective, electricity price in New South Wales is utilized to build proposed models and get satisfying results.

This paper is organized as follows. The explicit theories of the proposed approach are described in Section 3, including 5-3 Hanning filter and WD, SVR, BP, Elman, and optimal algorithms. Section 3 provides the proposed methods in this paper. In Section 4, numerical results and evaluation of forecasting performance in the case study are shown. Section 5 provides some conclusions and suggestions.

2.1. The Data Preprocessing Method

The proper data preprocessing can effectively remove the useless information, such as outliers and noises, in a time series. As wind speed appears to be of high volatility and nonstationary, some preprocessing procedures are introduced to remove the irregular wind speed and outliers of electricity price.

2.1.1. The Proposed 5-3 Hanning Filter (5-3H) Technique

5-3H method is short for the medians of five-three-Hanning smoothing method (“five” is a method for a median-of-five smoothing, “three” for a median-of-three smoothing, and “H” for Hanning smoothing). This method, presented by Tukey, adopts weighted smoothing by three times to the original data to generate the ultimate smoothed estimates. Tukey introduces three steps for the signal preprocessing: five-point moving average smoothing, three-point moving average smoothing, and Hanning moving average smoothing, respectively. Flowchart of this method is illustrated in Figure 1.

Let the original data be , where is the length of time series . And three steps are illustrated and expressed as follows.

Step 1. Five-point moving median average smoothing. The original data , , , , sorted in ascending order is displayed as , , , , . The five-point moving median average smoothing signal can be presented as follows: for , where is the th point and is the th point in the original signal. Then, the four items missing in the series can be estimated as follows:

Step 2. Three-point moving average smoothing. For the smoothed signal in the first step, we use three-point moving average smoothing method to form the second smoothed estimates. The series , , sorted from small to large is expressed as , , . So three-point moving median average smoothing signal can be presented as follows: for , where is the th point in the second smoothed time series. Then, the six items missing in the series can be estimated as follows:

Step 3. Hanning moving average smoothing. As for the second smoothed signal, we use Hanning filter to produce final smoothed signal. For a Hanning smooth, for to , where is the th point in the final smoothed signal. Then, the six items missing in the series can be estimated as

Step 4. Compute median absolute deviation (MAD). MAD reflects the degree of absolute dispersion of every original data. The median of can be presented as
MAD can be expressed as

Step 5. Set threshold to remove outliers and smooth data. In this paper, we set threshold value as 0.3. is series to replace original data needed to be replaced by this following formula: where is a logical variable valued either 0 or 1. So the preliminary 5-3H values can be expressed as
And by replacing the eight values in the beginning and end of preliminary 5-3H values we could obtain the final 5-3H values: for .

2.1.2. Wavelet Transform (WT)

The WT method is an effective mathematical method used to analyze signal by decomposition into various frequencies. WTs can be categorized into two kinds: Continuous Wavelet Transform (CWT) and Discrete Wavelet Transform (DWT). DWT is for wavelets discretely sampled. As for WTs, a key advantage over Fourier transforms is their temporal resolution, which captures both location information and its frequency. In this work, DWT is used to decompose the original wind speed data.

WT decomposes a signal into many detail components and an approximation component, where approximation component contains low-frequency information, the most essential part to identify its signal, and where the detail components reveal the noise of signal. Figure 2 is a tree of wavelet decomposition displaying the decomposition procedure. Firstly, the original data is decomposed into an approximation component and a detail component ; and then the is continued to be decomposed into another approximation component and detail component if it is necessary to analyze the signal with higher level resolution. Continue this process until it reaches a suitable number of levels.

The original wind speed data is decomposed into several components, one approximation component and multiple detail components, to reflect the characteristics of the wind speed data on different levels. The approximation is designed to present the main trend of the original wind speed and the details are designed to present the stochastic volatilities on different levels. A suitable number of levels can be determined by comparing the similarity between the approximation and the original wind speed.

2.2. Artificial Intelligence Algorithm

Zhang et al. [16] considered that statistical models are not perfect in forecasting. As most of statistical models assume that the data is normally distributed, however, wind speed series is not normally distributed [36]. Additionally, the stochastic and intermittent characteristics of wind speed series require more complex models and functions for capturing the nonlinear trends and relations, whereas these models are built based on a hypothesis: a linear correlation structure exists among time series values [37]. Consequently, the wind speed series is difficult to be forecasted accurately by statistical models. To address these problems of statistical approaches, the AI models, mainly including ANN and SVR, have got more and more concerns for accurate short-term wind speed prediction.

2.2.1. Artificial Neutral Network (ANN)

ANN consists of interconnected artificial neurons which are programmed to imitate the natural properties of biological neurons. It has been widely used in forecasting time series, especially the data nonnormally distributed, such as wind speed.

2.2.2. Backpropagation Neutral Network (BPNN)

In this work, a backpropagation (BP) is adopted as one of the comparative approaches for short-term wind speed forecasting. In Figure 3 the BP contains an input layer, at least one hidden layer, and an output layer, which implement the map of an input vector to output scalar via activation function in different neurons. With inputs and hidden neurons, the output of the th hidden node can be calculated as where denotes the connection weight from the th input node to the th hidden node, is -step behind past wind speed , and denotes a sigmoid activation function in the hidden layer. Then, the wind speed prediction can be estimated by where denotes the connection weight from the th hidden node to the output node, denotes the forecasted wind speed at the th sampling instant, and is a linear activation function for the output layer. The nonlinear mapping capability of ANN is achieved by minimizing the overall error between the actual wind speed and the predicted wind speed through Levenberg-Marquardt (LM) algorithm [38].

2.2.3. Elman Recurrent Neutral Network (ERNN)

Elman recurrent neural network (ERNN) is a famous recurrent topology, developed by Tong et al. [39]. In a typical ERNN, the hidden layer neurons are fed by the outputs of the context neurons and the input neurons (Figure 4). Context neurons are known as previous states (memory units) of output of hidden neurons [40]. This recurrent topology makes the ERNN more sensitive to the historical data, increasing its capacity of dealing with the dynamic information. In addition, it is not necessary to use state variable as the input or the training data. Its dynamic characteristics are provided by its internal connections, which make network more suitable for time-varying system modeling. This is also an important factor making ERNN superior to the feed-forward neutral network, such as multilayer perceptions (MLP) and radial basis function networks (RBF).

Because of ERNN’s training algorithm which is mainly based on the gradient descent method, this may cause a number of problems [41]: the speed of network convergence is slow, and the training may give rise to a lower learning efficiency; as the network structure and weights are not trained concurrently, a good performance of dynamic approximation cannot be guaranteed; lack of the global search capacity easily makes it fall into a local best.

2.2.4. Support Vector Regression (SVR)

Developed by Vapnik [42], support vector machines (SVMs) are one of the most widely used models based on statistical learning theory. A nonlinear mapping is defined to map the training data set (input data) into a high dimensional feature space (which has infinite dimensions), (Figure 5). Then, in this high dimensional feature space, theoretically there exists a linear function, , to formulate the nonlinear relationship between input data and output data. Such a linear function, namely, SVR function, is given as follows:where is the forecasting values; the coefficients () and () are adjustable. As mentioned above, through SVM method one aims to minimize the empirical risk,where is the -insensitive loss function (Figure 6) and is defined as follows:

In addition, is employed to find out an optimal hyper plane on the high dimensional feature space (Figure 5) to maximize the distance separating the training data. So, the SVR focuses on finding the optimal hyper plane and minimizing the -insensitive loss function and the training error between the training data.

Then, the SVR minimizes the overall errors with the constraints

The first term in (17) employed the concept of maximizing the distance between two separated training data sets and is used to regularize weight sizes, to penalize large weights, and to maintain regression functional flatness. And the second term penalizes training errors of and by using the -insensitive loss function. is a parameter to trade off two terms. Training errors above are denoted as , and training errors below are denoted as (Figure 6).

After solving the problem of quadratic optimization with inequality constraints, the parameter vector of (14) is obtained:where , are obtained by solving a quadratic problem and are the Lagrangian multipliers. Finally, the SVR regression function is obtained as (21) in the dual space, where is called the kernel function, and the value of the kernel equals the inner product of two vectors, and , in the feature space , , respectively; that is, . Any function that meets Mercer’s condition [42] can be used as the kernel function.

2.3. Artificial Intelligent Optimization Algorithm

The empirical results show that the selection of the two parameters and (the parameter of Gaussian kernel function) in SVR influences the forecasting accuracy significantly. In order to further improve forecasting accuracy of wind speed, we have employed different evolutionary algorithms (GA, PSO, and CS) for parameters determination, to identify which algorithm is suited for specified data patterns.

2.3.1. Genetic Algorithm (GA)

GA was firstly developed by John Holland et al. in the 1960s. It is an effective algorithm for nonlinear global optimization that was inspired by the biological evolution process. It is especially suitable for solving complicated optimization problems for simplicity and robustness, and it has been in use extensively in various forecasting and optimization fields. The GA approach is listed as follows.(i)Select a group of random candidate solutions.(ii)Iterate the following steps until reaching stop criterions:(1)computing the fitness values of the candidate solutions in accordance with the adaptive condition,(2)producing the next generation according to the proportionate principle (the one with higher fitness is more inclined to be chosen),(3)performing a crossover and mutation operation to the candidate solutions and generating new ones.(iii)Return the solutions.

2.3.2. Particle Swarm Optimization (PSO)

The PSO algorithm was first proposed by Kennedy and Eberhart [43], inspired by the social swarming behavior of animals moving in large groups (birds and insects in particular). Like other swarm-based techniques, the algorithm contains a number of individuals refining their knowledge of the given search space. In this search space, the individuals, called as particles, have a position and a velocity. The PSO algorithm works via attracting the particles of the given search space positions of high fitness. A memory function in each particle adjusts its trajectory according to two pieces of information: the best position that it has so far visited and the global best position attained by the whole swarm. The whole swarm can be considered as a society, and the first piece of information can be thought of as a result from the particle’s memory about its past states, and the second piece of information is resulting from the collective experience of all individuals of the society. A fitness evaluation function in PSO computes each particle’s position and assigns it a fitness value. Each particle can remember the global best, which can be identified when the position of highest fitness value is visited by the swarm. The position of the highest fitness value that has been personally visited is called the local best.

2.3.3. Cuckoo Search (CS)

The cuckoo search (CS) algorithm is a new optimization metaheuristic algorithm (Yang and Deb in 2009 [44]), based on a stochastic global search and the obligate brood-parasitic behavior of cuckoos by laying their eggs in the nests of host birds. In this optimization algorithm, each nest represents a potential solution. They choose the recently spawned nests so that they can be sure that eggs could hatch first for the reason that a cuckoo egg usually hatches earlier than its host bird. In addition, by mimicking the host chicks, a cuckoo chick can deceive the host bird to grab more food resources. If the host birds discover that an alien cuckoo egg has been laid in (with the probability ), they either propel the egg or abandon the nest and completely build a new nest in a new location. New eggs (solutions) laid by cuckoo choose the nest by Levy flights around the current best solutions. And with the levy flight behavior, the cuckoo speeds up the local search efficiency.

In sum, two search capabilities have been used in cuckoo search: global search (diversification) and local search (intensification), controlled by a switching/discovery probability (). Yang and Deb simplified cuckoo parasitic breeding process by the following three idealized rules.(i)Each cuckoo lays only one egg at a time and randomly searches a nest to lay it.(ii)The egg of high quality will be considered to survive to the next generation.(iii)The host bird of the nest, where a cuckoo lays its egg, can discover an alien egg with a possibility, . And the host bird either propels the egg out of the nest or abandons its nest to build a new one in a new location. The number of available nests is fixed during these rules.

To better understand these rules, they can be transformed into the following steps.

Step 1. A cuckoo randomly chooses a nest to hatch only one egg. An egg represents a potential best solution.

Step 2. To maximize the probability of their eggs survival, the cuckoo birds search the most suitable nests by law of Levy flight. According to the elitist selection principle, the best egg (minimum solution) will survive to the next generation and will have the opportunity to grow into a mature cuckoo bird. In this step, the aim of cuckoo algorithm is to obtain the ability of intensification.

Step 3. The number of available nests (population) is fixed during these rules. The alien egg laid by a cuckoo bird is discovered by the host with a probability , and this egg is thrown away or the host abandons the nest to completely build a new one in a new location (with a new random solution). In this step, the aim of cuckoo algorithm is to obtain the ability of diversification.

For minimization problems the quality or fitness function value may be the reciprocal of the objective function. Each egg in a nest represents a solution and the cuckoo egg represents a new solution. Therefore, there is no difference between an egg, a nest, and a solution.

When generating new solutions for, say, a cuckoo , a Levy flight is performed as follows:where is the step size and should be related to the scales of the problem of interest. In most cases, is proposed. Equation (21) is essentially the stochastic process for a random walk. In addition, a random walk is a Markov chain process where next status only depends on the current status (the first term in (21)) and the transition probability (the second term in (21)). The product means entrywise multiplications, which is similar to those used in PSO, but the random walk process via Levy flight here is more efficient in exploring the search space, for its step length is much longer in the long run.

The Levy flight provides a random walk process while the random step length is drawn from a Levy distribution:which has an infinite variance with an infinite mean. Here, the steps essentially form a random walk process with a power-law step-length distribution with a heavy tail. Some of the new solutions are generated by Levy flight around the current best solution obtained so far, which will intensify the local search (intensification). However, a substantial fraction of the new solutions should be generated by far-field randomization (diversification), whose locations should be far enough from the current best solution; this will ensure that the system will not be trapped in a local optimum.

The simple flowchart of the cuckoo search algorithm is presented in Figure 7.

3. The Proposed Hybrid Model

As the high volatility, nonstationarity, and nonlinearity of wind speed series, many useful tools are introduced to predispose so as to make an accurate forecasting.

The procedure for applying the proposed method to predict the 10 min wind speed is illustrated in Figure 8 and described as follows.

Step 1. Conduct the 5-3H method to test and discover the outliers and then replace by 5-3H values.
In this step, after a large number of experiments, we set threshold parameter . The result shows that not only could 5-3H detect the outliers effectively, but it can also smooth the original data to some extent, in which it captures the majority of the trend of wind speed.

However, some slight white noises still exist in the series after 5-3H. Hence, it is necessary to further smooth via wavelet in Step 2.

Step 2. Decompose 5-3H values by wavelet denoise by db3 wavelet basis function and reconstruct the series.

In this approach, we adopt db3 as the wavelet basis function in only one layer to decompose the data. As the result of respective smooth preprocessing data after 5-3H, and as making many an experiment, we discover that decomposing the data to one layer has the best effectiveness of denoising, which otherwise could denoise excessively to get rid of useful information of original data. In relation to threshold selection, we use the popular method of threshold selecting, Birge-Massart method. After being filtered, the wind speed of high frequency, that is, white noises, could be smoother so as to be better used in forecasting.

Step 3. Use three popular artificial intelligent algorithms, BP, Elman, and SVR, to fit the models and predict the future values of one day.

We discover that the SVR functions are the best among these two models. To further improve the performance of SVR, we propose another two steps at the same time, which are, respectively, another three artificial intelligent optimization algorithms in Step 4.

Step 4. Conduct the GA, PSO, and CS to optimize the two main parameters of SVR and make a comparison with the conventional approach of grid search.

A nonheuristic algorithm of searching parameters of SVR is grid search in this paper to search the best parameters and . Although, in the sense of grid search, it could find the best accuracy (the global optimum), employing the metaheuristic algorithm can find the global optimum more efficiently if considering the search in a larger field of and by grid search is time-wasting. Therefore, under this consideration and in order to further improve accuracy of forecasting, GA, PSO, and CS are employed to search the two main parameters of SVR.

4. Analysis and Discussion of the Applicative Case Studies

4.1. Data Presentation

To validate the proposed forecasting method, three cases are introduced. The first two are 10 min average wind speed series from wind towers of 70 meters in two sites in four seasonal months (January, April, July, and October in 2011, which are the representative months for each quarter of the year). The first site locates in the Jiling Shoal, Jinchang City, with longitude of 101.7999, latitude of 38.5248, and altitude of 2195.000. The second wind tower is in Qingtu Lake, Wuwei City, with longitude of 103.6201, latitude of 39.1031, and altitude of 1298.000. Of each wind tower in each month, we draw 744 samples and make a 3-step forecasting. The previous 600 samples are used to build a model and then predict the remaining 144 () points ( minutes, which amounts to a whole day). To further validate the universe of approach, the data of electricity price from New South Wales (NSW) in January 2012 is also used as the third case. In Figures 911, the raw data in three cases are illustrated.

4.2. Error Evaluation

Table 1 shows the results of proposed intelligent algorithms to forecasting the wind speed by 3 steps in Jiling Shoal and Qingtu Lake and electricity price by 3 steps in NSW in 2011. We refer to PBP, PElman, and PSVR as prediction method after preprocessing. To validate the proposed approach, we mainly contrast the results of PBP and BP, PElman and Elman, and SVR and PSVR. The mean absolute error (MAE), mean square error (MSE), mean absolute percentage error (MAPE), and symmetric mean absolute percentage error (SMAPE) are utilized to scale the prediction accuracy of these three models [45]. The MAE values can be calculated by and the values of MSE can be computed byand the MAPE values can be calculated by

And the values of SMAPE can be computed by where and signify the th actual and predicted values at time , respectively. In Table 1 listed are comparisons of the MAE, MSE, MAPE, and SMAPE values for the PBP and BP, PElman and Elman, and SVR and PSVR models.

As shown in Table 1, plenty of MAE, MSE, MAPE, and SMPAE values obtained through the proposed methods are displayed. To further facilitate understanding of the performance of improved approaches, the 4 kinds of decreased relative error (RE) obtained by forecasting results of these three models are calculated by where the reference model in our case is the model without preprocessing, model () represents one of three models, and () stands for one of four seasonal months. The results of RE values are listed in Table 2 and illustrated in Figure 14. Additionally, to provide a comprehensive evaluation of performances of proposed methods, the average (Ave.) error criterion is introduced, which is computed according to

4.3. Simulating Results Analysis

Steps 1 and 2. Conduct 5-3H and wavelet denoise method to predispose wind speed in Jiling Shoal in four seasonal months in 2011 by use of proposed method.

As we can see from Figures 12-13, the first subplot to fourth subplot are pictures of original and preprocessing wind speed series in January, April, July, and October, respectively. Figure 14 illustrates the raw electricity price series in NSW. The preprocessing data are apparently smoother as the result of 5-3H outliers filtering and wavelet denoising. According to the algorithm of 5-3H, not only could it detect the outliers, but, more importantly, it has also the characteristics of smoothing data. Through these proposed predisposing approaches, the tendency of wind speed series and electricity price become clear and are more adaptive to be forecasted; that will be illustrated in next step.

Step 3. Use three popular artificial intelligent algorithms, BP, Elman, and SVR, to fit the models and predict future values of one day.

As is listed in Table 1, PBP, PElman, and PSVR represent models after data preprocessing. We can easily discover that almost all the accuracies of PBP, PElman, and PSVR outperform those of BP, Elman, and SVR. The more explicit results, RE, are shown in Table 2 and Figure 15. Table 2 reveals the improved percentage of accuracy by 3 models evaluated by four error measures (MAE, MSE, MAPE, and SPMAE). Figures 16-17 are forecasting results of wind speed in a certain month and electricity price in NSW. And Figure 15, made through the result of Ave. in Table 2, shows the average improved percentage of RE values. As is shown in the PBP column from Table 2, almost all the RE values of MAE are positive, which implies that all of the MAE values attained by PBP are smaller than those obtained by BP. In addition, almost all the RE values in these 3 models are positive, implying smaller errors than original approaches. Figure 15 shows a comprehensive result, average percentage of RE values, in three cases. In Jiling Shoal, PBP has the greater improvement than the other two evidently, while PBP and PSVR perform better than PElman in Qingtu Lake, and in NSW the PElman outperforms the other two intensively. As is shown in Figure 14, the data preprocessing in NSW is more effective because it removes more outliers of electricity price than that of wind speed. In conclusion, both Table 2 and Figure 15 demonstrate excellent performance of the proposed preprocessing methods. As the fact that the whole installed electricity capacity in China in 2011 is 62364.2 MW, this slightly improved accuracy could even economize a large amount of money.

In particular, from Table 1, the finding shows that the PSVR functions best among these other two models. To further improve the accuracy of forecasting of wind speed, we propose another three artificial intelligent optimization algorithms to improve performance of PSVR in Step 4.

Step 4. Conduct the CS to optimize the two main parameters of PSVR and make a comparison with the conventional approach, GS, GA, and PSO.

Using the metaheuristic algorithms, GA and PSO, to optimize the hyperparameters of SVR could generally attain a better accuracy than using a nonheuristic conventional method, such as grid search (GS). However, as Moghram and Rahman [46] said, no certain model or algorithm that forecasts effectively in a wind farm could be applied to any wind farms as a result of difference of wind speed between wind farms and various location-specific factors influencing the wind speed patterns. To explore the potential best algorithm forecasting wind speed in Jiling Shoal and Qingtu Lake, it is necessary to make a comparison between different algorithms. In next part, we choose the most commonly used algorithms (including artificial intelligent and non-artificial-intelligent algorithms, GS, GA, and PSO) to optimize the hyperparameters of SVR and then make a contrast with a new metaheuristic algorithm (CS) optimizing SVR. The final results are shown in Tables 3-4 and Figures 1821.

Figures 20-21 are forecasting results of wind speed in a certain month and electricity price. Table 3 displays four kinds of forecasting error indexes of PSVR through optimization of GS, GA, PSO, and CS in three cases. The final four models are marked as PSVR, GA-PSVR, PSO-PSVR, and CS-PSVR. And PSVR uses the GS method to search parameters in this paper. The boldfaced numbers denote the best accuracy in each line. The bottom line of Table 3, “Total line,” sums up the times of best accuracy in each algorithm, which is illustrated in Figure 18. Obviously, CS-PSVR performs best and is 21 times better than the other three algorithms. However, the performance of GA-PSVR and PSO-PSVR falls behind the GS slightly.

Table 4 reveals the explicit RE values of GA-PSVR, PSO-PSVR, and CS-PSVR relative to PSVR and demonstrates specific respective accuracy. Figure 19 is drawn according to the Ave. in Table 4. Although, from the analysis of Table 4 and Figure 19, we may figure out that CS-PSVR is the best while GA-PSVR and PSO-PSVR are worse than PSVR, from the column CS in Table 4, CS-PSVR only improves Ave. accuracy less than 0.5% in Jiling Shoal and about 0.3% to 0.6% in Qingtu Lake. This proves that limited is the capability of artificial intelligent optimization algorithm forecasting wind speed patterns in Jiling Shoal and Qingtu Lake. However, we should notice that CS-PSVR forecasting electricity price in NSW performs very well due to the result that it augments more than an average of 5% accuracy and even approaches to that of 12.52% in MSE. This is because electricity price has more apparently seasonal trends, while the wind speed series is fluctuated more intensively randomly. In conclusion, through comparing average RE values of GA-PSVR, PSO-PSVR, and CS-PSVR in NSW, the proposed method CS-PSVR is more excellent than the other three algorithms.

5. Conclusion

Wind power has been rapidly growing in the world. The forecasting of wind speed plays an important role in the wind energy. Accurate wind speed prediction is becoming increasingly important to improve and optimize renewable wind power generation. Particularly, reliable short-term wind speed prediction can enable model predictive control of wind turbines and real-time optimization of wind farm operation. In this paper we utilize 5-3H and wavelet denoising method to prepress the original data and then conduct BP, Elman, and SVR models to make a 3-step prediction every 10 minutes. Finally, we adopt GA, PSO, and CS to optimize the PSVR. It is discovered that 5-3H combined with wavelet denoising can significantly improve accuracy of BP network, Elman network, and SVR forecasting wind speed in two sites and electricity price in NSW. These results reveal that excellent ability of removing outliers and denoising of 5-3H combined with wavelet denoising can be applied into the wind speed forecasting in the Jiling Shoal and Qingtu Lake and the electricity prediction in NSW. Relating to the optimization of the two main hyperparameters of SVR, the capacity of a new metaheuristic intelligent optimization algorithm, cuckoo search, outperforms that of traditional methods that are GS, GA, and PSO.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This paper was supported by the R&D Special Fund for Public Welfare Industry (meteorology) (GYHY201206013), the Key Technology Integration and Application of the Chinese Meteorological Administration Projects (CMAGJ2013M35): the popularization and application of low frequency oscillation on the monthly scale of north drought forecast in China, and the Gansu Provincial Meteorological Service Center Innovation Fund (2013-10).