Abstract
Power systems could be at risk when the powergrid collapse accident occurs. As a clean and renewable resource, wind energy plays an increasingly vital role in reducing air pollution and wind power generation becomes an important way to produce electrical power. Therefore, accurate wind power and wind speed forecasting are in need. In this research, a novel shortterm wind speed forecasting portfolio has been proposed using the following three procedures: (I) data preprocessing: apart from the regular normalization preprocessing, the data are preprocessed through empirical model decomposition (EMD), which reduces the effect of noise on the wind speed data; (II) artificially intelligent parameter optimization introduction: the unknown parameters in the support vector machine (SVM) model are optimized by the cuckoo search (CS) algorithm; (III) parameter optimization approach modification: an improved parameter optimization approach, called the SDCS model, based on the CS algorithm and the steepest descent (SD) method is proposed. The comparison results show that the simple and effective portfolio EMDSDCSSVM produces promising predictions and has better performance than the individual forecasting components, with very small root mean squared errors and mean absolute percentage errors.
1. Introduction
The demand for clean and renewable energy resources has increased significantly since the acid emissions and air pollution caused by burning fossil fuels have heavily polluted the world environment. As a clean and renewable resource, wind energy plays an increasingly vital role in energy supply and wind power generation becomes an important way to generate electrical power. However, the stochastic fluctuation of wind makes it problematic to forecast [1–3]. Therefore, effort to improve the accuracy of wind speed forecasting continues so as to lower the possibility of the powergrid collapse accident occurrence.
Wind speed forecasting is an important foundation and prerequisite for the prediction of wind power generation. The more accurate wind speed forecasting result can reduce wind rotating equipment and operation cost and improve limitation of wind power penetration. At the same time the precise prediction of wind speed helps dispatching department timely adjustments to the program, so as to reduce the impact of wind power on the grid and effectively avoid the adverse effect of wind farm on the power system, enhancing the competitiveness of wind power in the electricity market.
In literature studies, statistically based and neural networkbased methods are two models pervasively used to forecast the wind speed [4–7]. With the development of artificial intelligent techniques, some artificial intelligent methods have been presented, such as Artificial Neural Networks, fuzzy logic methods, and support vector machine. Guo et al. [8] presented a wind speed strategy based on the chaotic time series modeling technique and the Apriori algorithm. Barbounis et al. [9] employed three different types of neural network (NN) models to forecast the hourly wind speed (up to 3 days) in a wind park located on the Greek island of Crete. However, there are several unknown parameters in the NN model. Thus, many researchers have indicated the need to optimize the parameters in the NN model to improve wind speed forecasting accuracy. Wang and Hu [10] improved the performance of the back propagation (BP) NN model in the wind speed forecasting field by optimizing the parameters in the BP model. Both models, that is, the statistical and the NNbased models, have been used by Azad et al. [11] to solve the longterm wind speed forecasting problem for two stations in Malaysia. However, wind speed forecasting results obtained by the neural network models are not always superior to those obtained by other models. Chen and Yu [12] developed a new model by integrating the unscented Kalman filter with the support vector regressionbased statespace model. Comparison results indicated that the new proposed model outperforms the NN model. Apart from the NN models, the parameter optimization strategy has also been applied to other wind speed forecasting models. Gani et al. [13] proposed that firefly algorithm combines with SVM algorithm for a problem of shortterm wind speed forecast, where firefly algorithm is used to optimize the parameters of SVMs and successfully obtain the accuracy forecasting result. Compared with artificial intelligent models, statistical approaches are less expensive and intrusive and, hence, more practical in forecasting wind power generation. Statistical models are widely used to forecast model for shortterm wind forecasting, predicting wind conditions several hours in advance, which is particularly useful for wind power generation [14]. But for the nonlinear wind speed time series is often not satisfactory, especially in multistep prediction, and the error will be significantly increased with the extension of the prediction time. The new paradigm of big data stream mobile computing is quickly gaining momentum [15], while wind speed forecasting results have been applied to many different areas [16].
It is found that the existing wind speed forecasting models have the following disadvantages: some of the existing models have taken no account of the randomness, instability, and the large fluctuation of the wind speed data, which may lead to a high forecasting error. Therefore, in this research, a model based on the ensemble empirical mode decomposition (EEMD) technique is utilized to adaptively decompose the original wind speed data into a finite number of intrinsic mode functions with a similarity property to modeling. The existing traditional parameter estimation methods, such as the moment estimation or the likelihood estimation, are not dynamic and need to solve some equations with a great deal of calculations. Therefore, the artificial intelligent parameter estimation method named the cuckoo search (CS) algorithm is used in this paper to estimate the unknown parameters in the forecasting model. Though some researchers applied the artificial intelligent parameter estimation approaches to the parameter estimation, they just adopted the original approach without considering the deficiency of the approach. Thus, in this paper, the steepest descent (SD) method is used to optimize the CS algorithm so as to enhance the convergence rate. Based on the above motivations, in this research, a new shortterm wind speed forecasting portfolio which not only can maintain the characteristics of the wind speed data but can also automatically estimate the unknown parameters in the forecasting model with a considerable convergence rate has been proposed through the following three procedures: (I) data preprocessing: apart from regular normalization preprocessing, the data are preprocessed through the EMD model, which reduces the effect of the noise on the wind speed data; (II) artificially intelligent parameter optimization introduction: the unknown parameters in the support vector machine (SVM) model are optimized by the cuckoo search (CS) algorithm; (III) parameter optimization approach modification: although the original CS algorithm is simple and efficient, it has disadvantages such as insufficient search vigor and slow search speed during the latter part of the search. Therefore, this paper proposes an improved parameter optimization approach based on the CS algorithm and the steepest descent (SD) method, which is abbreviated as the SDCS model. The performance of the developed EMDSDCSSVM model has been compared with those obtained by the individual forecasting components using the following two error evaluation criteria: the root mean squared error and the mean absolute percentage error.
The paper is organized as follows: Section 2 introduces related methodologies, Section 3 presents the simulation examples and discussions, and the last section presents concluding remarks.
2. Related Methodologies
2.1. Data Preprocessing Approach
Data preprocessing is a common way to improve forecasting accuracy, especially for data with high noise and different scales. This paper focuses on handling these two problems by using the EMD model and the normalization preprocessing approach, respectively.
2.1.1. Empirical Mode Decomposition Model
The EMD model is an adaptive decomposition approach proposed by Baccarelli et al. [15]. It is used in a wide range of applications, especially in dealing with nonlinear time series. The EMD model decomposes the original time series into several different sequences with different scales (also called the intrinsic mode function (IMF)) as well as a residual sequence. All IMFs must satisfy two requirements:(a)The number of extreme points (all maximum and minimum points are included) must be equal to the number of zero crossings or differ by no more than one.(b)In all cases, the average of the envelopes defined by the local maxima and minima must be zero.
With the above two limitations, a signal sequence can be decomposed with the assistance of the EMD method [16] through the following steps.
Step 1. Calculate all the local extrema (including all the minimum and maximum values).
Step 2. Connect the local maxima by a cubic spline line to generate the upper envelope and similarly produce the lower envelope by connecting all the local minima with a cubic spline interpolation, represented by and , respectively.
Step 3. Calculate the average value of the two envelopes by
Step 4. Calculate the difference () between the data and by
Step 5. Judge whether satisfies the two requirements of the IMFs. If not, regard as the original signal sequence; then . Repeat this process times until which is calculated by is an IMF. The first IMF sequence is obtained by
Step 6. Calculate the first residual sequence according to
Step 7. Regard as the raw data and return to Step 1 to repeat this procedure unless the final residue turns into either a monotonic function or a function from which no more IMF sequences can be extracted.
Finally, the original signal sequence is decomposed into
2.1.2. Normalization Preprocessing
To improve the training efficiency and the generalization ability of the SVM model, normalization preprocessing is used to address the IMF sequences obtained by the SVM model. Normalization preprocessing is defined as follows:where and represent the original data sequence and the preprocessed data sequence, respectively, and and denote the minimum and the maximum data in the original data sequence, respectively.
2.2. Support Vector Machine Model
The SVM model is the core of statistical machine learning theories. It can surmount difficulties that appear in the traditional machine learning methods, such as the curse of dimensionality, easily falling into local optima and overlearning. In addition, it has great generalization ability [17]. Therefore, the SVM model has long been an attractive tool with powerful capabilities in solving classification and regression problems. In this paper, we mainly focus on the SVM model for regression.
Suppose that there are insample data points (or training samples) where denotes the input vector and is the targeted output corresponding to the input vector . The main purpose of the SVM for regression is to find a function which satisfies the following two requirements: (a) the deviation between and is no greater than a given positive real number , for all , and (b) is as flat as possible. In the SVM algorithm, is defined by the formulawhere is a nonlinear mapping, is the threshold value, and the unknown coefficients and can be estimated by solving the following optimization problem:where denotes the penalty coefficient, and are two slack variables, and is the tube size. Problem (8) can be solved by introducing two Lagrange multipliers and and minimizing the following Lagrange function [14]:subject toThis calculation results inwhere is called the kernel function. The following four types of kernel functions are commonly used [18, 19]: (a) linear kernel function: , (b) polynomial kernel function: , (c) sigmoid kernel function: , and (d) Gaussian kernel function: , where , , , and are kernel parameters.
2.3. Artificial Intelligent Parameter Optimization
2.3.1. Original Cuckoo Search Algorithm
The CS algorithm was first developed by Sun et al. [20] in 2007. It is derived from the action of cuckoos laying their eggs in the nests of other birds to let those birds hatch eggs for them. However, once the host birds discover the cuckoo eggs, these eggs will be thrown away or the host birds will abandon their nests and rebuild a new nest elsewhere. The CS algorithm is constructed based on three assumptions: (a) Only one egg is laid by each cuckoo in a randomly selected nest; (b) The best nests will be carried over to the following generations; and (c) The number of available host nests is a constant, and the probability value of an egg laid by a cuckoo being discovered by the host bird is which has the range of 0 to 1.
In the CS algorithm, each nest represents a solution. The pseudo code of the CS technique [21] presented in Algorithm 1 can aid in understanding the CS process.

The Lévy flight mentioned in the pseudo code of Algorithm 1 is generated according to:where is the step size, which should be related to the scale of the problem of interest. The product indicates entrywise multiplication location. A Lévy flight is considered when the steplengths are distributed according to the following probability distribution:which has an infinite variance. Here, the consecutive steps of a cuckoo search essentially form a random walk process that obeys a powerlaw steplength distribution with a heavy tail.
2.3.2. Modified Cuckoo Search Method
Similar to other metaheuristic algorithms, the original CS algorithm is simple and efficient; however, it has disadvantages such as insufficient search vigor and slow search speed during the latter part of the search. As one of the oldest optimization algorithms, the steepest descent (SD) method [22] is simple and intuitive. Currently, there are many effective optimization algorithms established on the basis of this algorithm. In order to overcome the CS’s shortcoming of slow convergence rate, the SD method is used to modify the CS algorithm, and the modified model is abbreviated as the SDCS model. In the SDCS model, the following equation substitutes for (13):where is defined by
The SDCS process can be expressed by the following procedures.
Step 1. Initialize the initial points , the end error , and set .
Step 2. Calculate . If , terminate the iteration and output the value of . Otherwise, go to Step 3.
Step 3. Set .
Step 4. Conduct onedimensional search. Get the value of which satisfied equation ; then set , , and return to Step 2.
The step size and steplength distribution function of the CS algorithm can be improved by using steepest descent due to its simplicity and flexibility. The final optimal solution can be obtained by modifying the step size and steplength distribution function constantly.
2.4. Proposed Novel Model
Based on the above methodologies, we propose a novel shortterm wind speed forecasting portfolio with three steps (Figure 1): (I) data preprocessing: both the regular normalization preprocessing model and the EMD approach are used for data preprocessing, which reduces the effect of noise and different scales on the wind speed data; (II) artificially intelligent parameter optimization introduction: the unknown parameters in the SVM model are optimized by the CS algorithm; (III) parameter optimization approach modification: although the original CS algorithm is simple and efficient, it has disadvantages such as insufficient search vigor and slow search speed during the latter part of the search. Therefore, this paper proposes an improved parameter optimization approach based on the CS algorithm and the steepest descent (SD) method, which we call the SDCS model. The final forecasting model is called the EMDSDCSSVM model.
The performance of SVM depends on a good set of parameters, including the penalty parameter and the parameter of the kernel function. The parameter adjustment and selection of support vector machine is still a difficult issue in the research field. The generalization performance of support vector machine is closely related to the selection of specific parameters in the model. The parameter of penalty coefficient and kernel parameters must be selected by the users. However, in practical applications, the forecasting complexity control is more difficult, because the parameters of and must be adjusted simultaneously.
(1) The Penalty Coefficient . The penalty coefficient is to balance the model between the complexity and the training error, so that the model has better extending ability. Furthermore, the parameter can control the robustness of the forecasting model. The different training groups have different optimal values. For forecasting problems, if the parameter is smaller, the punishment for miscalculation samples in the sample data is smaller. As a result, the training error becomes larger, and the system’s generalization ability is poorer. When new data is forecasted by the model, the fitting error will be very high, and the phenomenon of “less learning” will appear. On the contrary, if the parameter is too large, the weight of will be smaller. Although the fitting error of the available data is very low, the fitting error of the new data is also very high. It is the socalled “overlearning” phenomenon. The generalization ability of the model is still very poor. Each sample data group has at least one suitable , which makes the SVM generalization performance the best. Therefore, the correct choice of parameter can improve the prediction accuracy of the model.
(2) The Kernel Function . For the kernel function of the SVM, the linear kernel function, polynomial kernel function, radial basis kernel function, and sigmoid function are usually the most used. The width of radial basis function is the same to all kernel functions, and can reflect corresponding width of inner product kernel for input. If is too small, it will lead to overfitting or memory of the training group. If is too large, it will make SVM discriminant function too gentle. Width of kernel function and the penalty coefficient affect the shape of prediction curve of the support vector machine from different angles. In practical applications, too large or too small penalty coefficient and kernel function will make the generalization performance of the support vector machine worse.
Based on the analysis of influences of each parameter on the performance of SVM, we put forward the time series forecasting model by using modified cuckoo search (SDCS) algorithm to optimize SVM parameters. It not only maintains the characteristics of time series, but also can select the parameters of SVM automatically, which eliminates the blindness and randomness caused by artificial selection. The main procedures of this EMDSDCSSVM are as follows.
Procedure 1. Collect wind speed time series data. Use the EMD to preprocess the wind speed data and reconstitute the new wind speed time series, which will be treated as the training sample of the SVM model.
Procedure 2. Determine the range of and , the maximum step (), the minimum step (), and the maximum number of iterations . Set the probability of an egg laid by a cuckoo being discovered by the host bird as , and initialize the number of the host nests as . Each nest corresponds to a twodimensional vector .
Procedure 3. Search the optimum value of the twodimensional vector according to the SDCS algorithm, and the detailed steps that need to be implemented in this procedure are shown in Figure 2.
Procedure 4. Use the optimum parameter values obtained in Procedure 3 and the processed data obtained in Procedure 1 to construct the forecasting model and obtain the forecasting results.
3. Simulation Examples and Discussions
3.1. Data Division and Parameter Initialization
Wind speed data recorded by four wind turbines (numbered #1, #2, #3, and #4) during the period from Jan 2, 2011, to Jan 6, 2011, with a time resolution of 10 minutes are used to verify the effectiveness of the new proposed hybrid model. The data from Jan 2 to Jan 5 are adopted as the insample data (i.e., training data), while those on Jan 6 are used as the outofsample data (i.e., testing data).
Step 1. The original wind speed series are decomposed into a highfrequency component and a lowfrequency component, which represents the noise signal and main features of the wind speed series (see Figures 3(a)–3(c)).
Step 2 (data splitting and normalization). The available wind speed series after noise reduction are split into the training set and the test set, which are denoted by including input sets and output sets for training parameters of SVM and consisting of inputs and outputs for the testing model’s forecasting effectiveness, respectively. To establish the model, the training datasets and the input test sets are normalized with the same setting (see Figure 3(d)).
Step 3 (initialization: a SVM with two parameters). The penalty coefficient and the kernel function are shown in Figure 3(e). The number of connection weights of the SVM is the size of the cities in the SDCS algorithm, namely, the dimension of the optimized parameters.
Step 4 (optimization). The objective function of the SDCS algorithm is given as follows:
Step 5 (SVM construction). The best solution obtained by the SDCS algorithm is set to be the final connection weights of SVM training and construction. The terminal condition of network training is set to be the reach of maximum iterations or no further improvement (see Figures 3(d)–3(e)).
Step 6 (EMDSDCSSVM construction for the test dataset). The forecasting data of the output test sets are generated by importing the input test sets based on the established optimal SVM (see Figure 3(e)).
Step 7 (evaluation). The quality of the EMDSDCSSVM is assessed by the indices SDCS and SVM, which presents the validity and informativeness of EMD, respectively. With the aim of comprehensive evaluation, MAPE is calculated as well.
To employ the methodologies introduced in Section 2 of this paper, the parameters contained in the models are initialized as follows: in the CS algorithm, the number of the host nests is initialized as , and the probability of an egg laid by a cuckoo being discovered by the host bird is given as . The Gaussian kernel function is chosen for the SVM method. In the GA algorithm, the maximum number of iterations is initialized as 50, and the population size is 100. The probability of cross is 0.3 and the probability of mutation is 0.1. When the CS algorithm and GA are adapted for SVM optimization, the search interval of the penalty coefficient is set to , while the search interval of the kernel function is set to .
3.2. Data Preprocessing Results
Wind speed data are first preprocessed by the EMD method. Figure 4 shows the IMFs and residue results obtained by the EMD method for the four wind turbines. As indicated in Figure 4, for the #2 and #3 wind turbines, 7 IMF sequences are extracted from the original wind speed training dataset, while 6 IMF sequences are extracted for the other two wind turbines. According to the principle of denoising, eliminating the highfrequency sequence from the IMF sequences can assist in obtaining cleaner data sequence, that is, data sequence with lower noise. For this paper, the first IMF sequence obtained by the EMD method is eliminated from the original data sequence to improve the accuracy of wind speed forecasting. The visualization of the denoise preprocessing of the EMD method of the four wind turbines is shown in Figure 5. The final results after denoise processing with the EMD method and the normalization operation are also presented in Figure 5.
3.3. Forecasting Results
To validate the effectiveness of the EMDSDCSSVM model in wind speed forecasting, the model is used to forecast wind speed with four horizons: 1stepahead, 2stepahead, 4stepahead, and 6stepahead. The forecasting results obtained by this model are compared with those obtained by the nonparameterization method EMDSVM, the unmodified parameterization method EMDCSSVM, and another parameterization method, EMDGASVM, where GA is the abbreviation for the Genetic Algorithm [23].
Figure 6 presents the forecasting results of the four EMDbased models. In this figure, the wind speed data in the center of the circular rings with the value of 0 is the smallest, while the bigger the radius, the larger the wind speed value. The difference of the radius between each adjacent two circular rings is 5. As shown in Figure 6, the forecasting results obtained by these EMDbased models fit the actual wind speed data best when the forecasting horizon is 1stepahead, while the fit is the worst in the 6stepahead situation; that is, the deviation between the wind speed data forecast by the models and the actual wind speed data becomes larger as the forecasting horizon increases. In addition, the EMDSVM and the EMDGASVM methods deviate much more significantly from the actual data when compared to the other models.
In addition, the forecast results obtained by these models are analyzed according to the QuantileQuantile (QQ) plot. The quantile corresponding to a datum means that approximately a decimal fraction of the data can be found below the datum. The quantile is calculated in the following manner: sort the data in a sequence in an ascending order. The sorted data have rank . Then, the quantile value for the datum is computed by
The 0.25, 0.5, and 0.75 quantiles are called the lower quantile, the median, and the upper quantile, respectively. The QQ plot is used to compare the quantiles of two samples. If the two samples come from the same type of distribution, the plot will be a straight line. A straight reference line that passes through the lower quantile and the upper quantile is helpful for assessing the QQ plot. The greater the distance from this reference line, the more likely it is that the two samples come from populations with different distributions. The vertical and the horizontal axes of the QQ plot are the estimated quantiles from the two samples, respectively. If the sizes of these two samples are the same, the QQ plot is just a plot of the sorted data in the first sample against the sorted data in the second sample. As an example, Figure 7 provides an empirical QQ plot of the quantiles of the actual wind speed sequence versus the quantiles of the forecast data for the #4 wind turbine, where represents the actual wind speed data sequence, and , , , and denote the wind speed data sequences forecasted by the EMDSVM model, the EMDCSSVM model, the EMDGASVM model, and the EMDSDCSSVM model, respectively. The straight line shown in each subplot is just the extrapolated line which joins the lower and the upper quantiles, and the vertical axis and the horizontal axis in each subplot are the estimated quantiles from the corresponding forecast data sequence and the actual data sequence. As observed from Figure 7, the forecast values sometimes are larger than the actual values (corresponding to the plus symbol located above the straight line), while sometimes they are smaller than the actual values (corresponding to the plus symbol located below the straight line). Figure 7 also reveals that the EMDSDCSSVM model fits the actual wind speed data best when compared to the other three models.
3.4. Forecasting Error Comparison
Results presented in Section 3.2 provide graphical visualization of the performance of the different forecasting models. In this section, the superior performance of the EMDSDCSSVM model is shown quantitatively. To do this, two error evaluation criteria named the root mean squared error (RMSE) and the mean absolute percentage error (MAPE) are adopted and defined as follows:where is the number of data points in the outofsample data and and are the actual value and the forecasted value, respectively.
From Table 1 and Figure 8, it can be seen that compared to the ANN forecasting models, the SVM models perform favorable forecasting accuracy; in particular in four and sixstepahead forecasting result, the SVM is superior to the ANN model of BPNN, Elman NN, and WNN.
The forecasting error results with different forecasting horizons of these 4 models are given in Table 2 and Figure 9. As observed from Table 2 and Figure 9, the forecasting error values become larger as the forecasting horizon increases. For the #1, #2, and #4 wind turbines, the EMDSDCSSVM model always obtains more accurate wind speed forecasting results than the other three models. In addition, for the #3 wind turbine, the EMDSDCSSVM model is superior to both the EMDSVM and the EMDCSSVM models, which means that the proposed novel model EMDSDCSSVM has made promising predictions and has better performance than its individual forecasting components.
4. Conclusions
Wind speed forecasting plays a significant part in the economy and security of wind farm systems’ operation. Accurate forecasting results have significant influence on the economy. Recently, academia and industry have paid more attention to wind speed forecasting. More accurate forecasting could reduce costs and risks, improve the security of power systems, and help administrators develop an optimal action program, thereby enhancing the economic social benefits of powergrid management. Therefore, it is highly desirable to develop techniques for wind speed forecasting to improve accuracy. However, individual models do not always achieve a desirable performance. The proper selection method of a hybrid model can reduce certain negative effects that are inherent to each of these individual models; moreover, the hybrid forecasting model can make full use of the advantages of each of the individual models and is less sensitive, in certain cases, to the factors that make the individual models perform in an undesirable manner.
In this paper, to enhance the forecasting capacity of the proposed combined model, consisting of three procedures, the data preprocessing procedure, the artificial intelligent parameter optimization introduction procedure, and the parameter optimization approach modification procedure were integrated. The SVM model used in this paper can handle data with nonlinear features, and the SD technique is adopted to enhance the convergence speed of the CS algorithm, which is utilized to optimize the parameters in the SVM model. The effectiveness and robustness of the proposed approach has been successfully tested by the real wind speed data sampled at four wind turbines. Based on the QQ plot and the error comparison, results show that the developed portfolio EMDSDCSSVM has made promising predictions and has better performance than its individual forecasting components despite very small MAPE and MSE values. For instance, the average MAPE values of the combined model were 0.7138%, 1.0281%, 4.8394%, 0.9239%, and 7.3367%, which are lower than those of BPNN, WNN, and Elman NN. By improving forecasting accuracy and stability, in the wind farm, a large amount of money and energy could be saved. The hybrid model can be applied to forecast the wind speed that can be used in wind power scheduling to produce various benefits, saving on economic dispatching, reducing production costs, and reducing the spinning reserve capacity of electrical power system. This model is also useful for supporting wind farm decision making in practice. The combined forecasting model, which has high precision, is a promising model for use in the future. In addition, this hybrid model can be utilized in other forecasting fields, such as product sales forecasting, tourism demand forecasting, early warning and flood forecasting, and trafficflow forecasting.
Competing Interests
The authors declare that they have no competing interests.