Abstract

Hybrid Ensemble Empirical Mode Decomposition (EEMD) and Least Square Support Vector Machine (LSSVM) is proposed to improve short-term wind speed forecasting precision. The EEMD is firstly utilized to decompose the original wind speed time series into a set of subseries. Then the LSSVM models are established to forecast these subseries. Partial autocorrelation function is adopted to analyze the inner relationships between the historical wind speed series in order to determine input variables of LSSVM models for prediction of every subseries. Finally, the superposition principle is employed to sum the predicted values of every subseries as the final wind speed prediction. The performance of hybrid model is evaluated based on six metrics. Compared with LSSVM, Back Propagation Neural Networks (BP), Auto-Regressive Integrated Moving Average (ARIMA), combination of Empirical Mode Decomposition (EMD) with LSSVM, and hybrid EEMD with ARIMA models, the wind speed forecasting results show that the proposed hybrid model outperforms these models in terms of six metrics. Furthermore, the scatter diagrams of predicted versus actual wind speed and histograms of prediction errors are presented to verify the superiority of the hybrid model in short-term wind speed prediction.

1. Introduction

With the rapid economic development and increasing environmental pollution, the renewable, clean, and pollution-free wind energy is showing promising application prospect. In 2016, the totally installed world wind power capacity increased to 486.7 GW with the newly installed 54.8 GW. The corresponding global growth rate was 11.8%. Meanwhile the newly installed wind power capacity was 23.4 GW and the total capacity reached 168.7 GW with the growth rate of 14.0% for China [1]. To promote the quantity application of wind energy, electricity generated from wind power needs to be connected to the grid so as to be transferred to areas that consume lots of electric energy [2, 3]. Furthermore, during the process of wind energy being transformed into electricity, accurate control should be conducted so that higher generation efficiency can be obtained [4, 5]. However, the intermittency and randomness nature characteristics of wind speed caused great challenges to integration of wind power and the control operation of wind turbines. Therefore, accurate short-term wind speed prediction model should be built to overcome these challenges [6, 7].

There are mainly three kinds of wind speed prediction methods, namely, the physical, the statistical, and hybrid approaches. The principle underlying the physical model is to find out the mapping relation among wind speed, temperature, pressure, and moisture and build the thermodynamics formulae from numerical weather prediction (NWP) models [8]. This kind of model is good at medium- and long-term wind speed prediction. However, the detection and collection of these information data needs a lot of sensors, which can be very expensive. Furthermore, solving this kind of model requires complex calculations. These disadvantages limit the application of the physical model. The statistical model is used more widely than the physical model at short-term and ultra-short-term wind speed prediction. The building process of statistical model is relatively simple, which focuses on seeking rules hidden in the wind data by studying the historical wind speed time series and using these rules to forecast wind speed without considering meteorological conditions [9]. The basic idea of the hybrid approach is to combine different approaches (numerical weather models, statistical approaches, and machine learning algorithms), retaining advantages of each approach. Due to the fact that different prediction models can supplement the capturing properties of wind speed time series, hybrid method may perform much better than any individual forecasting model [8, 9].

In this paper, a hybrid model that combines Ensemble Empirical Mode Decomposition (EEMD) with Least Square Support Vector Machine (LSSVM) is proposed to forecast short-term wind speed accurately. The hybrid model makes use of the decomposition ability of the EEMD algorithm, as well as the forecasting superiority of the LSSVM model. The EEMD is firstly applied to decompose the original wind speed time series into a set of relative stable subseries involving only one mode. Then the partial autocorrelation function (PACF) is utilized to analyze the characteristics of each subseries so as to determine a suitable input of the LSSVM model for each subseries. Next, LSSVM model is employed to produce one-step-ahead and multi-step-ahead predictions for the corresponding subseries. Finally, the forecasting values of all the subseries are integrated as the final wind speed predictions. To validate the prediction performance of the proposed hybrid model (EEMD-LSSVM), LSSVM, Back Propagation (BP) Neural Networks, Auto-Regressive Integrated Moving Average (ARIMA), hybrid of EEMD with ARIMA, and hybrid Empirical Mode Decomposition (EMD) with LSSVM, they are used to produce one-step-ahead and multi-step-ahead predictions for 10-min intervals wind speed time series collected from a wind farm in Galicia, Spain. Furthermore, the scatter diagrams of predicted versus actual wind speed and histograms of prediction errors of different models are discussed in detail. They demonstrate that the short-term wind speed prediction obtained by the proposed hybrid model has a higher correlation with actual wind speed series. The forecasting results prove that the proposed hybrid model outperforms these compared models for short-term wind speed prediction.

The remainder of this paper is organized as follows. Section 2 discusses the related work on short-term wind speed predictions. Section 3 overviews the Ensemble Empirical Mode Decomposition. Section 4 reviews the basic principle of least square support vector machine. Section 5 presents the hybrid EEMD-LSSVM model for prediction of wind speed. Section 6 gives experimental and comparison results of two cases. Section 7 summarizes the conclusions. Acknowledgments are listed at the end.

In this section, we review a few popular approaches for prediction of short-term wind speed. Time series models, such as autoregressive moving average (ARMA) [10] and ARIMA [11], are widely used for short-term wind speed prediction. In recent years, some improved time series models have been proposed for wind forecasting. Kavasseri and Seetharaman [12] apply the fractional-ARIMA model on wind speed forecasting on the day-ahead and two-day-ahead horizons at four potential wind generation sites located in North Dakota, USA. The simulated results show that fractional-ARIMA outperforms ARIMA when wind speed series appear to have long-memory characteristics. Erdem and Shi [13] put forward four approaches based on ARMA for prediction of hourly wind speed, and satisfactory simulation results were obtained. Ambach and Schmid [14] built a time-varying regression model for short-term wind speed prediction, composed of Autoregressive Fractionally Integrated Moving Average (ARFIMA) and Asymmetric Power Generalized Autoregressive Conditional Heteroscedasticity (APARCH). The forecasting results prove that the ARFIMA-APARCH model has a higher prediction precision, as compared with Support Vector Regression (SVR). However, prediction performances of the time series model depend on the selection of model parameters. Furthermore, the time series model cannot capture the nonlinear component of the wind speed series, which limits the forecasting accuracy of the model. Salcedo-Sanz et al. [15] present the application of evolutionary-based approaches to optimize the hyperparameters in Support Vector Machines (SVM) and discuss the performance of evolutionary SVR algorithm for wind speed prediction in a wind farm located at the south of Spain. References [16, 17] study the performance of hourly ahead wind speed forecasting by LSSVM, in which the typical kernel function (linear, polynomial, and Gaussian kernels) and the related parameters on the forecasting accuracy are investigated. An adaptive robust methodology (Bayesian model averaging) has been developed to forecast hourly wind speed forecasting at two North Dakota sites [18]. This Bayesian model averaging methodology provides a unified approach to tackle the challenging model selection issue in wind speed forecasting. By incorporating domain knowledge about wind speed as a prior, Bayesian structural break model is proposed to forecast very short-term wind speed and it is tested with wind speed data collected from utility-scale wind turbines [19]. Cadenas and Rivera [20] make use of Artificial Neural Networks (ANN) to forecast the hourly wind speed series collected from observation points located in La Venta, Oaxaca, Mexico. The results show that this model has high prediction accuracy for short-term wind speed forecasting. Application of different ANN models (Adaptive Linear Element, BP, and radial basis function) in one-hour-ahead wind speed forecasting is investigated [21, 22]. The superiority of the ANN models is demonstrated and found to be successful and feasible. Flores et al. [23] make a comparison of wind speed prediction performance between the ARMA and ANN which are both optimized by Genetic Algorithms (GA). In their research, the results demonstrate that the performance of the ANN model outperforms the ARMA model. Hong et al. [24] propose a Multilayer Feed-forward Neural Network (MFNN) to predict wind speed at different time horizons. The MFNN model has a better performance as compared with the traditional forecasting models. Wu et al. [25] make use of the single multiplicative neuron model and the iterated nonlinear filters to build two models to forecast wind speed. The forecasting performance of hourly wind speed data collected from three stations shows that the proposed two models are better than ARMA, ANN, Kernel Ridge Regression, and single multiplicative neuron. Although both the time series model and the machine learning model can improve the prediction accuracy to some extent, after their parameters are optimized, the highly complex wind speed time series are still beyond their description capability. Due to the various inherent modes in wind speed series, the single model, such as ARIMA and ANN, can not completely capture the characteristics of wind speed data [26]. Therefore, the disadvantages limit the performance of two models for wind speed prediction.

Recently, hybrid approaches have been focused on improving the wind speed prediction precision. Different hybrid wind speed prediction models have been proposed in the literature in order to benefit from the unique capability of single models [2737]. Salcedo-Sanz et al. [27] proposes the hybridization of the fifth-generation mesoscale physical forecasting model (MM5) with neural networks for short-term wind speed prediction of a wind park located at Albacete in Spain. This hybrid model integrates a global numerical weather prediction model and observations at different heights as initial and boundary conditions for the MM5 model. Subsequently, an ANN model is employed to forecast the wind speed by using the outputs of the MM5 model. The wind speed forecast obtained by the hybrid model is close to the measured value. Based on different global forecasting models and banks of regression SVM, Ortiz-García et al. [28] improve a wind speed prediction system on a wind farm in southeast Spain. Several SVM structures are adopted to manage the diversity in input data arising from different global forecasting models and several parameterizations of a mesoscale model. The system implementing SVM outperforms the similar system using multilayer perceptrons. Shukur and Lee [29] combine ARIMA, ANN, and Kalman filter (KF) as a new algorithm to forecast daily wind speed, and the proposed hybrid model outperforms ARIMA, ANN, and KF models. Guo et al. [30] propose a hybrid model composed of Seasonal Auto-Regression Integrated Moving Average (SARIMA) and LSSVM for forecasting monthly wind speed of two observation stations located at the Hexi Corridor in China. The simulation results show that the hybrid SARIMA and LSSVM model has a stronger forecasting ability, compared with ARIMA, LSSVM, SARIMA, and ARIMA-SVM. Liu et al. [31] discuss two hybrid models (ARIMA-ANN, ARIMA-Kalman) for prediction of hourly wind speed. The results of two cases show that the forecasting precisions of both hybrid ARIMA-ANN and ARIMA-Kalman models are higher than those of single models. Yu et al. [32] combine Gaussian Mixture Copula Model (GMCM) and Gaussian Process Regression (GPR) as a wind speed forecasting method. The results show that GMCM-GPR has a higher prediction accuracy than that of GMCM-ARIMA and GMCM-SVR. Bouzgou and Benoudjit [33] discuss a Multiple Architecture System (MAS) for wind speed prediction, which consists of Multiple Linear Regression (MLR), Multi-Layer Perceptrons (MLP), Radial Basis Functions (RBF), and SVM by different integration strategies. The MAS improves the forecasting precision to some extent, compared with the traditional methods. Chen and Yu [34] integrate Unscented Kalman Filter (UKF) with SVM as an approach to predict wind speed collected from three observation stations located in Massachusetts, USA. The results show that the hybrid UKF-SVM model outperforms ANN, SVR, and Autoregressive Integrated with Kalman filter (AR-Kalman) in both one-step- and multi-step-ahead forecasting in terms of accuracy. However, the hybrid model suffers from the problem of selecting proper submodels. Furthermore, the distribution of suitable weight for each submodel is also a challenge. These defects limit the prediction performance of the hybrid model. The main procedure of other hybrid models is to utilize signal processing technology to decompose original wind speed time series into a set of subseries and then to build prediction models for each subseries. Liu et al. [35] combine Wavelet, Wavelet Packet, Time Series Analysis, and ANN to predict half-hourly wind speed data. In their study, the results prove that the hybrid models with the signal processing technology have better forecasting precision as compared with ARIMA and other traditional models. Guo et al. [36] combine Empirical Mode Decomposition (EMD) with Feed-Forward Neural Network (FNN) together as a hybrid model for forecasting monthly and daily wind speed. The results show that the modified EMD-FNN model has better performance for multi-step-ahead prediction as compared with traditional models. Hu et al. [37] compose Empirical Wavelet Transform (EWT), Coupled Simulated Annealing (CSA), and LSSVM as a hybrid model for short-term wind speed forecasting. Liu et al. [38] combine Extreme Learning Machines (ELM) with four signal decomposing algorithms for wind speed prediction. The forecasting results show that the hybrid models are better than single ELM models.

It can be seen that the combination of signal processing technology and efficient forecasting model is a promising way to improve the wind speed prediction accuracy. The main data decomposition techniques are Wavelet Decomposition (WD) and EMD. WD has strong time-frequency analysis ability and hybrid model based WD has stronger prediction ability than the traditional models for wind speed prediction. However, WD is very sensitive to parameter selection. Some defects, such as the decomposition level selection, the option of the type, and order of mother Wavelet used in WD, have not been solved. These problems affect further application of WD for wind speed prediction. The EMD decomposes the data according to their own inherent time scale. Therefore, EMD algorithm avoids the process of setting base function. This essential difference enables EMD to overcome the defects of WD and shows strong processing capacity for nonstationary and nonlinear data set. However, further study reveals that EMD suffers from the mode mixing problem, which limits the decomposition effect [39]. Therefore, the EEMD algorithm is proposed, which overcomes the mode mixing problem by using an assistant procedure of adding white noise. As an efficient prediction algorithm, LSSVM overcomes the disadvantages of ANN, such as requirement of a large training set, getting trapped into local extrema, and overfitting.

3. Overview of Ensemble Empirical Mode Decomposition

The EEMD algorithm was proposed by Wu and Huang in 2009 [40] so as to overcome EMD’s disadvantage of mode mixing problem, which means that one single intrinsic mode function (IMF) contains signals of different scales, or a signal of the same scale distributes in different IMFs. The basic idea of the EEMD is that the observed data integrates real time series and noise. Though the observed time series exhibit different noise level, the ensemble mean is close to the real time series. Based on this idea, the EEMD algorithm utilizes the assistance procedure of adding white noise. The noise-added signal is then decomposed into a series of IMFs by using EMD. The influence of the added white noise can be counteracted with each other by using ensemble averaging. Therefore, the EEMD algorithm shows a more powerful processing capacity for complex and unstable time series data set as compared with EMD. The flowchart of the EEMD algorithm is given in Figure 1.

The EEMD modeling process is given as follows.

Step 1. Add a white noise to the original time series . The noise-added signal is as follows:

Step 2. Make use of the EMD algorithm to decompose into several IMF and a residual , where is the IMF at the th trial; is the residual at the trial. The implementation process of EMD is shown as follows:
Let .
Find all the local maximums and minimums of .
Construct the upper envelope and the lower through interpolating all the local maximums and minimums with cubic spline function.
Calculate the average value of the upper and lower envelopes. Extract the average value from the time series , and define as Check the properties of : if satisfies the standard of stop criteria for screening IMF, the IMF is set as and the residual error can be calculated with . Otherwise, replace by .
If satisfies the stop criteria of EMD, then is the residual error. Otherwise, replace by using and go back to step .

After the sifting processing, the original time series can be described as the sum of all IMFs and a residual error:where is the number of IMF; is the IMF when adding the noise; is the final residual error when adding the noise.

Step 3. Repeat times for Steps 1 and 2 with different white noise .

Step 4. Calculate the ensemble mean of all the corresponding IMFs during times trials as the final result. The calculation formula is as follows:where represents the IMF of the EEMD decomposition results; is the number of IMFs.

4. The Basic Principle of Least Square Support Vector Machine

The LSSVM proposed by Suykens and Vandewalle in 2000 [41] is a modified version of SVM based on the least squares theory. The LSSVM replaces a complex quadratic programming problem with a set of relative simple linear equations compared with SVM. For a given training data set , where is the input of the sample space ; is the output of the sample space ; is the size of the training sample. In the feature space, the expression of the regression model can be formulated as where is the weight vector; is a bias term; is a nonlinear function that maps the input space into the feature space.

The solution for the regression equation can be realized by minimizing the fitting error of the training data. The expression is given as follows:where is the adjustment parameter; is the error between the real value and estimated value of the sample.

To solve this optimization problem, the following Lagrange function is constructed:where represents the Lagrange multiplier.

According to the Karush-Kuhn-Tucker (KKT) conditions, the following expression can be obtained.

After eliminating and , the final mathematical expression (9) can be rewritten aswhere is an -dimension vector; represents the coefficient vector; represents the kernel function matrix whose element at the () row and () column is . is a kernel function. The radial basis function (RBF) is often used as the kernel function in LSSVM. The expression of the RBF is as follows:where is the hyperparameter of the RBF.

Therefore, the regression equation of the LSSVM model can be written as

5. The Hybrid of EEMD with LSSVM Model for Prediction of Wind Speed

Short-term wind speed series often include many intrinsic mode functions with different frequencies and exhibit complex nonlinear characteristics. So, it is difficult for a single model to forecast wind speed accurately. Signal decomposition algorithms can decompose wind speed time series into a set of relative stable subseries, and reduce the modeling difficulty. Hence, subseries decomposed by EEMD only include one or similar scale of wind speed and are easier to be predicted. The advantages of LSSVM show that it is a good model in the field of prediction. Therefore, a hybrid of EEMD with LSSVM model (EEMD-LSSVM) is proposed to further improve the short-term wind speed forecasting precision. EEMD is firstly used to decompose wind speed time series into a set of subseries and then LSSVM is applied to build a proper model for prediction of each subseries, according to their own characteristics. The flow diagram of the proposed hybrid EEMD-LSSVM model is given in Figure 2. The EEMD-LSSVM model mainly includes the following steps. Firstly, the EEMD technology is used to decompose the original wind speed time series into a set of relative stable subseries. Then the PACF technic is used to analyze the property of every subseries so as to build a proper LSSVM model for prediction of every subseries. Next, one-step- and multi-step forecasting are conducted for each subseries. Finally, all the predicted values for every subseries are integrated as the final prediction results of the wind speed.

5.1. Wind Speed Time Series Decomposition Based on EEMD

By analyzing the characteristics of wind speed, EEMD is used to decompose the original wind speed series into a series of IMF and a residual error series , where is the number of components and is the length of the time series. The decomposition process of the EEMD for the wind speed time series mainly includes the following steps. Firstly, a set of white noise series is added to the original wind speed time series. Then the EMD algorithm is adopted to decompose the white-noise-added wind speed signal and the IMF is obtained. The previous process repeats many times and different white noise is added every time. Finally, the ensemble mean method is applied so as to get more IMFs. So, the stochastic and volatile original wind speed series are transformed into a series of subseries that only include single or similar mode and can be accurately captured by the LSSVM model. This decomposition process decreases the difficulty of forecasting wind speed. The subseries acquired by the EEMD decomposition correspond to modes with different frequencies. The subseries with the highest frequency represents the most irregular and stochastic component of the original wind speed. If the input-output relationship of the prediction model for this subseries is not appropriate, the forecasting result will be bad and the prediction precision will be reduced. However, other subseries have relatively lower frequencies, weaker volatility, and different properties. To ensure the forecasting precision for the wind speed series, the proper input-output mapping relationship of every subseries must be found out.

5.2. LSSVM Prediction Model for Wind Speed Subseries

The task of this section is to build a proper LSSVM prediction model for every subseries and then to conduct one-step- and multi-step-ahead forecasting. As different subseries have different frequencies, their properties are different. Therefore, the correlation of every subseries must be determined to build an appropriate LSSVM model for forecasting all subseries. To determine the proper input variable of LSSVM model for wind speed prediction, PACF is applied to analyze every subseries. The PACF can efficiently distinguish the correlation between the current value and several previous values of wind speed time series. If the PACF values are beyond the 95% confidence interval, the correlation degree is considered to be strong; otherwise the correlation degree is weak [42]. Therefore, after the PACF graph of a subseries is plotted, the lag number beyond 95% confidence interval is counted and the input number of the LSSVM model can be determined. We define one of the subseries acquired by EEMD decomposition as and the first data is used to determine the training parameter of the LSSVM model. Thus, the input vector Input and the output vector Output can be written as

The parameters with the best performance are finally selected after comparing the fitting errors of LSSVM model with different parameters for the training sample of wind speed. By using the same method, the LSSVM prediction models can be built for other subseries. These trained LSSVM models are then used to conduct one-step- and multi-step-ahead prediction.

5.3. Aggregate Calculation and Evaluation Indicators of Wind Speed Prediction

The integration method is adopted to sum the one-step- and multi-step-ahead forecasting results of all the subseries as the corresponding level prediction results of the wind speed. The proposed hybrid EEMD-LSSVM wind speed forecasting model makes use of the idea “decomposition, single forecast, and integration.” Sections 5.1 and 5.2 propose to decompose the original wind speed into several relative stable subseries and proper LSSVM models have been built for each subseries. In addition, one-step-ahead and multi-step-ahead predictions have been completed for all the subseries. Here, one-step-ahead and multi-step-ahead predictions have been defined as follows. Suppose that we are at the time index hand and are interested in forecasting , where . The time index is called the forecast origin and the positive integer is the forecast horizon. Let be the prediction of ; we refer to as the -step-ahead prediction of at the forecast origin . When , we refer to as the one-step-ahead prediction of at the forecast origin .

The prediction values of all subseries are integrated as the final forecasting value of wind speed. The integration expression is given as follows:where is the prediction value for the IMF; is the forecasting result of the residual series; is the final prediction result of the original wind speed.

To evaluate the forecasting precision of the hybrid model, the root mean square error (RMSE), the mean absolute percentage error (MAPE), the mean absolute error (MAE), the correlation coefficient (CC), the sum square error (SSE), and standard deviation of error (SDE) are used as the test indices. The expressions of six indices are given as follows:where represents the real wind speed value at time ; is the predicted wind speed value at time ; stands for the average value of all the real wind speed values; stands for the average wind speed forecast values; is the prediction error at time ; represents the average prediction error.

6. Experimental and Comparison Results of Two Cases

6.1. Case One
6.1.1. Wind Speed Data Set

The 10-minute interval data of wind speed at a wind farm in Sotavento Galicia, Spain, collected from March 10, 2014, to March 19, 2014, are used to demonstrate the modeling capabilities of the proposed model for one-step-ahead and multi-step-ahead wind speed predictions. The number of observation wind speed sample points included in the study amounts to 1440. The 10-minute time interval wind speed time series is shown in Figure 3 and it visually displays the volatile characteristics of wind speed over time. The wind speed samples are divided into the training set and the test set. The first nine days’ wind speed data (including 1296 wind speed data pieces) are selected as training data, which are used to determine the structure and parameters of the hybrid EEMD-LSSVM model. The tenth day’s wind speed data are chosen as the test data, which are used to verify the forecasting effectiveness of the hybrid EEMD-LSSVM model. To evaluate the prediction performances of the proposed EEMD-LSSVM model, BP, LSSVM, EMD-LSSVM, EEMD-ARIMA, and ARIMA are used to compare one-step-ahead and multi-step-ahead prediction of wind speed. The evaluation indexes (RMSE, MAPE, MAE, CC, SSE, and SDE) as well as scatter diagrams and histograms of prediction residuals are applied to verify the forecasting performances of these models.

6.1.2. Wind Speed Series Decomposed by EEMD

Figure 4 shows the EEMD decomposition results including nine IMFs and one residual. It can be seen from Figure 4 that the subseries obtained by EEMD are much more stable than the original wind speed time series. Although the first two subseries (IMF 1 and IMF 2) have large fluctuation range, other subseries have strong stability. Furthermore, subseries become more stable and the fluctuation characteristics become weaker from IMF 3 to IMF 9. To evaluate the decomposition effect of the EEMD algorithm, Table 1 gives both the original wind speed and all the subseries’ maximum, minimum, mean, and standard deviation. Compared with the original wind speed series, Table 1 demonstrates that the subseries have a much smaller range between the minimum and maximum. For example, the maximum wind speed is 15.68 m/s and the minimum value is 0.35 m/s. The span of the original wind speed is 15.33 m/s. However, among all the subseries, IMF 7 has the largest span between the minimum and maximum, and this span length is 6.7888 m/s, which is less than half of the original wind speed span. What is more, IMF 9 with the least span has the largest value 0.0011 m/s, the least value −0.0004 m/s, and the span length 0.0015 m/s. At the same time, the standard deviation of the original wind speed series is 2.5897 and the standard deviation of subseries is much smaller than this value. It can be seen that the values of subseries are close to their mean values and have smaller volatility.

6.1.3. Prediction of Each Subseries and Aggregate Calculation for Wind Speed Prediction

To establish an appropriate LSSVM model for prediction of every subseries, the PACF is applied to explore every subseries’ characteristics. Figure 5 shows the analysis result of IMF 1 by using PACF. The PACF graph demonstrates that, among all these lags, the first four are beyond the threshold corresponding to the 95% confidence interval. Therefore, the input number of the LSSVM prediction for IMF 1 is chosen as 4. By adopting the same method, input numbers of the LSSVM models for the remainder of subseries are obtained and given in Table 2. It is obvious that the input number of LSSVM model is different for each subseries. As first several subseries have relatively high frequency, the current value has correlation with several previous values. With the decrease of frequency, the subseries become more and more stationary and the current value is merely related to its former one. For example, the last four subseries have low frequency and their input numbers are found to be 1. In this study, RBF is chosen as the kernel function of the LSSVM model. The parameters of each LSSVM model with the best performance are finally selected after comparing the fitting effects of different parameter values for training the samples of wind speed. Then the trained LSSVM models are used to produce one-step-ahead, two-step-ahead, and three-step-ahead predictions for the corresponding subseries. Finally, the integration method is adopted to sum the prediction results of each subseries as the final prediction of the original wind speed.

6.1.4. Comparative Analysis

In this section the prediction results of the hybrid EEMD-LSSVM model are compared with other five models. Table 3 gives the six indexes (RMSE, MAPE, MAE, CC, SSE, and SDE) of six models (LSSVM, ARIMA, EMD-LSSVM, BP, EEMD-ARIMA, and EEMD-LSSVM) for one-step-ahead (10-minute interval), two-step-ahead (20-minute interval), and three-step-ahead (30-minute interval) predictions. It can be seen that the hybrid EEMD-LSSVM model has the lowest values of RMSE, MAPE, MAE, SSE, and SDE, as well as the largest value of CC, for one-step-ahead wind speed forecasting. The RMSE, MAPE, MAE, CC, SSE, and SDE obtained by the EEMD-LSSVM for one-step-ahead prediction are 0.3819, 4.435%, 0.2634, 0.9872, 21.0020, and 0.3830, respectively. All these values demonstrate that the EEMD-LSSVM model has a stronger forecasting capacity than LSSVM, ARIMA, EMD-LSSVM, EEMD-ARIMA, and BP for one-step-ahead wind speed prediction. Furthermore, for two-step-ahead and three-step-ahead predictions, the hybrid EEMD-LSSVM model also outperforms other models in terms of six indexes. More specifically, for two-step-ahead prediction the EEMD-LSSVM model achieves the lowest MAPE of 5.90% while higher MAPE of 14.58%, 15.37%, 6.53%, 6.80%, and 16.59% are obtained by LSSVM, ARIMA, EMD-LSSVM, EEMD-ARIMA, and BP. In addition, for three-step-ahead prediction, the EEMD-LSSVM model outperforms other models with the lowest MAPE of 7.954% as opposed to the higher MAPE of 18.08%, 19.38%, 9.557%, 9.677%, and 20.57% for LSSVM, ARIMA, EMD-LSSVM, EEMD-ARIMA, and BP. Analysis of the six evaluation indexes proves that the hybrid EEMD-LSSVM model has higher forecasting accuracy than other models for two-step-ahead and three-step-ahead wind speed prediction. It also demonstrates that the hybrid EEMD-LSSVM model can improve the wind speed forecasting precision by combining the respective advantages of EEMD and LSSVM. The percentage improvement of the hybrid EEMD-LSSVM model over other models in terms of RMSE, MAPE, MAE, SSE, and SDE is calculated by using the expressionwhere denotes the RMSE, MAPE, MAE, SSE, and SDE of the contrast model; is the corresponding index of the hybrid EEMD-LSSVM model.

As the index CC represents the correlation coefficient between the predicted wind speed and the real wind speed, the higher CC demonstrates a more precise prediction result. Therefore, the formula is adopted to calculate the percentage improvement of the EEMD-LSSVM over other models in terms of CC:where denotes the CC of the contrast model; is the CC of the hybrid EEMD-LSSVM model.

The percentage improvements of the hybrid EEMD-LSSVM model over LSSVM, ARIMA, and BP models, as well as the hybrid EMD-LSSVM and EEMD-ARIMA models, in terms of RMSE, MAPE, MAE, CC, SSE, and SDE are shown in Table 4. It can be seen that the percentage improvements of the hybrid EEMD-LSSVM model over LSSVM, ARIMA, and BP models at three horizontals of one-step-ahead, two-step-ahead, and three-step-ahead prediction in terms of RMSE, MAPE, MAE, SSE, and SDE are all over 50%. For one-step-ahead wind speed prediction, the improvements of the hybrid EEMD-LSSVM model over LSSVM, ARIMA, and BP in terms of RMSE are 52.15%, 52.30%, and 53.46%, respectively. Moreover, for two-step-ahead wind speed prediction, the SSE improvements of the hybrid EEMD-LSSVM model over LSSVM, ARIMA, and BP reach 82.03%, 82.20%, and 83.34%, respectively. In addition, for three-step-ahead prediction, the improvements of the EEMD-LSSVM over LSSVM, ARIMA, and BP models in terms of RMSE, MAPE, MAE, SSE, and SDE are greater than 50%, with the least value of 52.14% and the highest value of 78.03%. All these demonstrate that LSSVM, ARIMA, and BP models can not completely capture the wind speed series information and therefore have relatively low forecasting accuracy. While the hybrid EEMD-LSSVM model makes use of the EEMD decomposition ability to decrease the wind speed prediction difficulty and improved prediction precision. The improvements of the hybrid EEMD-LSSVM model over the EMD-LSSVM model in terms of RMSE, MAPE, MAE, CC, SSE, and SDE for one-step-ahead wind speed prediction are 15.60%, 12.46%, 13.30%, 0.51%, 28.77%, and 15.17%, respectively. For two-step-ahead prediction, these improvements are 11.55%, 9.65%, 11.84%, 0.58%, 21.78%, and 12.57%, respectively. These values of three-step-ahead prediction are 11.11%, 16.77%, 19.15%, 1.16%, 20.98%, and 13.64%, respectively. Therefore, the hybrid EEMD-LSSVM model has better performance than the EMD-LSSVM model for different forecasting levels. For one-step-ahead wind speed prediction, the improvements of the hybrid EEMD-LSSVM model over the EEMD-ARIMA model in terms of RMSE, MAPE, MAE, CC, SSE, and SDE are 4.88%, 11.03%, 8.83%, 0.13%, 9.52%, and 4.46%, respectively. These improvements are, respectively, 6.83%, 13.24%, 14.76%, 0.36%, 13.21%, and 8.07% for two-step-ahead prediction. These values of EEMD-LSSVM over EEMD-ARIMA for three-step-ahead prediction are 4.43%, 17.81%, 16.63%, 0.07%, 8.68%, and 6.31%, respectively. These demonstrate that the EEMD-LSSVM outperforms EEMD-ARIMA at different wind speed forecasting levels.

Figures 68 give the real wind speed series, one-step-ahead, two-step-ahead, and three-step-ahead prediction of EEMD-LSSVM, LSSVM, ARIMA, BP, EEMD-ARIMA, and EMD-LSSVM. It can be seen from Figure 6 that, for one-step-ahead wind speed prediction, although all these models follow the tendency of real wind speed series, the hybrid EEMD-LSSVM model, EEMD-ARIMA, and EMD-LSSVM fit better than LSSVM, ARIMA, and BP models. Furthermore, the EEMD-LSSVM shows more powerful forecasting ability, especially when there is the abrupt change. This is because wind speed series are the combination of different intrinsic mode functions, so sudden changes often appear in the series. Single model, such as LSSVM, ARIMA, and BP, can hardly describe the complex components of wind speed series especially when the sudden changes occur. EEMD can decompose wind speed series into a series of subseries with relative simple modes and it decreases the extent of abrupt changes. Furthermore, EEMD has better decomposition than EMD for overcoming its mode mixing problem. Therefore, the EEMD-LSSVM model achieves the highest accuracy for one-step-ahead prediction among these models. Figure 7 gives two-step-ahead wind speed forecasting results of these six models. The prediction result of the hybrid EEMD-LSSVM model follows the changes of the original wind speed. However, the forecasting results of other models have relatively larger deviations as compared with real wind speed series. The three-step-ahead wind speed forecasting is demonstrated in Figure 8. It can be seen that the EEMD-LSSVM has the best prediction performance. From Figures 68, it can be seen that the capacities of these models of tracking the change tendency of wind speed decrease because of the increase of the predictive time length. However, the decrease extent of forecasting ability for every model is different. LSSVM, ARIMA, and BP have the largest decrease extent and the hybrid EEMD-LSSVM model has the best forecasting capacity. This is probably because the hybrid EEMD-LSSVM model combines the decomposition ability of EEMD and forecasting ability of LSSVM and it has stronger capacity to deal with the stochastic uncertainty brought by increase of prediction step. It can be concluded that the hybrid EEMD-LSSVM model has the greatest prediction ability for different wind speed forecasting levels.

Figure 9 shows the scatter diagrams of the predicted values of these six models versus the actual wind speed for one-step-ahead, two-step-ahead, and three-step-ahead prediction. If the predicted wind speed is totally the same as the real wind speed, all the data points will extend along the dash line as shown in Figure 9. However, because of the existence of the prediction errors as shown in Table 3, these ideal predicted data points do not exist. It can be seen that for one-step-ahead, two-step-ahead, and three-step-ahead wind speed predictions, the results of the hybrid EEMD-LSSVM model are close to the dash line among all these forecasting models; namely, the predicted values of the EEMD-LSSVM have the highest correlation with real wind speed. The prediction results of other models have relatively larger deviation as compared with real wind data. This phenomenon is consistent with the index CC given in Table 3. The index CC of the hybrid EEMD-LSSVM model for one-step-ahead, two-step-ahead, and three-step-ahead wind speed predictions are 0.9872, 0.9788, and 0.9583, respectively, which is the highest value among all these forecasting models. Judging from the scatter diagrams and CC index, it can be found that although, with the increase of the predictive time length, the deviations between the predicted values of all these models and the real wind value increase to some extent, the hybrid EEMD-LSSVM model is the one with the highest correlation between the predicted value and the real wind speed value for three prediction levels.

Figure 10 shows histograms of prediction residuals of these six models versus the actual wind speed for one-step-ahead, two-step-ahead, and three-step-ahead predictions. For one-step-ahead, two-step-ahead, and three-step-ahead predictions, the forecasting errors of the hybrid EEMD-LSSVM model distribute tightly around zero while the prediction errors of other five models distribute more widely. For one-step-ahead wind speed forecasting, the errors of LSSVM, ARIMA, and BP models exceed the range of (−2.5, 2.5). However, the hybrid EEMD-LSSVM, EEMD-ARIMA, and EMD-LSSVM models do not have any large error and all their forecasting errors distribute within (−2.5, 2.5). This is probably because there are some abrupt changes at the wind speed series and single prediction model lacks the ability to capture these changes, which leads to relatively large forecasting errors. As the hybrid model EEMD-LSSVM can make use of EEMD to weaken the amplitudes of these abrupt changes, a series of subseries that do not have changes with large amplitudes are acquired. The wind speed series decomposition process decreases the difficulty of the following LSSVM forecasting and enables these hybrid models to forecast more accurately. It can be seen that the EEMD-LSSVM prediction errors distribute more closely around 0 and have narrower distribution range. This is because EEMD has stronger decomposition ability than EMD and the proposed hybrid model with EEMD can forecast wind speed more accurately than EMD-LSSVM. The hybrid EEMD-LSSVM model also demonstrates higher forecasting precision as compared with EEMD-ARIMA. For two-step-ahead wind speed prediction, the errors of these models distribute wider than their own errors of one-step-ahead forecasting. This demonstrates that with the increase of predictive time length, the prediction errors of all models increase at some extent as the stochastic uncertainty becomes stronger. Compared with other models, the prediction errors obtained by EEMD-LSSVM distribute around 0. For three-step-ahead wind speed forecasting, the errors of three single models, namely, LSSVM, ARIMA, and BP, reach or exceed 5. However, the errors of two hybrid models still keep relatively small values. This proves that the hybrid models have higher prediction precision than single models. It can also be noticed that the hybrid EEMD-LSSVM model has the prediction errors that are closer to 0 than that of EMD-LSSVM and EEMD-ARIMA. This proves that the EEMD-LSSVM can predict three-step-ahead wind speed more precisely than EMD-LSSVM and EEMD-ARIMA. The above analysis on the histograms of prediction residuals of these models demonstrates that the hybrid EEMD-LSSVM model can capture the volatility and randomness of original wind speed and has higher forecasting accuracy.

To further compare the forecasting precision of these models quantitatively, the absolute relative error (ARE) of every predicted wind speed point is calculated by using the following expression:where is the real wind speed; is the predicted wind speed.

After ARE of every predicted wind speed point is calculated, the percentages of the ARE values that belong to the intervals of <1%, <4%, <7%, <10%, and <15% with respect to the original wind speed time series are given in Table 5. It can be seen that the percentage statistics for the ARE values of the hybrid model for three different wind speed prediction levels are larger than those of other models. More specifically, for one-step-ahead prediction, the percentages of the ARE value that is less than 1% for LSSVM, ARIMA, and BP models are 7.64%, 6.25%, and 6.25%, respectively, and that value for the hybrid EEMD-LSSVM model is 25.00%. This clearly proves that the EEMD-LSSVM has superiority over LSSVM, ARIMA, and BP for obtaining the predicted wind speed points that are within 1% error, which demonstrate a high forecasting precision. Furthermore, this percentage value of EEMD-LSSVM is greater 10.42% than that of EMD-LSSVM and 6.25% larger than that of EEMD-ARIMA. This proves that EEMD-LSSVM has stronger ability of obtaining high precision predicted wind speed points than EMD-LSSVM and EEMD-ARIMA. Based on the comparative analysis of the wind speed prediction results of these six models, it demonstrates that the proposed EEMD-LSSVM is a powerful potential method for wind speed forecasting.

6.2. Case Two

In order to further illustrate the forecasting ability of the proposed EEMD-LSSVM model, another example using different wind speed data set is provided for a wind farm in Galicia, Spain. The same approaches of Case One are adopted to analyze the effect of wind speed prediction. Table 6 gives the performance comparison of different models for forecasting wind speed for three different horizons. It can be seen that the proposed hybrid EEMD-LSSVM model has the lowest values of RMSE, MAPE, MAE, SSE, and SDE, as well as the highest value of CC, for all the three different prediction horizons. This clearly demonstrates that the EEMD-LSSVM model has the highest prediction precision among all these methods for one-step-ahead, two-step-ahead, and three-step-ahead predictions. The percentage improvements of the EEMD-LSSVM over other models are given in Table 7. Furthermore, Figures 1113 give the trend plots of real wind speed and one-step-ahead, two-step-ahead, and three-step-ahead predictions by the EEMD-LSSVM, LSSVM, ARIMA, BP, EEMD-ARIMA, and EMD-LSSVM. Figure 14 shows the scatter diagrams of the predicted wind speed values of these six models versus the actual wind speed for one-step-ahead, two-step-ahead, and three-step-ahead predictions. Figure 15 gives histograms of prediction residuals of these six models versus the actual wind speed for one-step-ahead, two-step-ahead, and three-step-ahead predictions. In addition, Table 8 lists the percentages of the ARE values that belong to the intervals of <1%, <4%, <7%, <10%, and <15% with respect to the original wind speed time series. As shown from Tables 68 and Figures 1115, the generality of the hybrid EEMD-LSSVM method is retained and the same conclusions presented in Case One can also be obtained.

7. Conclusions

The wind speed series often include many different modes and therefore have strong instability and randomness. The volatility of wind speed makes the traditional forecasting models fail to predict wind speed accurately. To further improve the wind speed forecasting precision, the hybrid EEMD algorithm and the LSSVM model are proposed. The EEMD algorithm is applied to remove the stochastic volatility of the original wind speed series and acquire a set of subseries. Aiming at the characteristics of every subseries, the PACF is used to explore the lag number of every subseries so as to determine input variables of LSSVM models for forecasting the subseries. To verify the forecasting effectiveness of the hybrid EEMD-LSSVM, LSSVM, ARIMA, BP, EEMD-ARIMA, and EMD-LSSVM models, they are applied to produce one-step-ahead, two-step-ahead, and three-step-ahead predictions for 10-minute wind speed series collected from a wind farm in Galicia, Spain. Compared with other five models (LSSVM, BP, ARIMA, EEMD-ARIMA, and EMD-LSSVM), forecasting results demonstrate that the hybrid EEMD-LSSVM model has smaller RMSE, MAPE, MAE, SSE, and SDE values, as well as larger CC value for short-term wind speed prediction at three different time intervals. Furthermore, by analyzing the scatter diagrams and CC index of these models, the prediction wind speed values of the hybrid EEMD-LSSVM model are proved to have a higher correlation with the real wind speed series. The histograms of prediction residuals of the hybrid EEMD-LSSVM model are found to be smaller and are close to 0. Based on the analysis, the following conclusions can be drawn. (a) The hybrid EEMD-LSSVM model has the best performance of short-term wind speed prediction among all these models. This phenomenon is especially obvious when the prediction step increases but the forecasting precision of the EEMD-LSSVM decreases much slower than that of other models. (b) The EEMD-LSSVM, EEMD-ARIMA, and EMD-LSSVM models have more precise prediction results than those of LSSVM, ARIMA, and BP models. This demonstrates that the signal processing technique (EMD and EEMD) can reduce the nonstationarity of wind speed series and improve the forecasting precision. (c) With the increase of the predictive time length, the stochastic uncertainty becomes stronger and increases the difficulty of wind speed forecasting, while the hybrid EEMD-LSSVM model keeps the best prediction ability compared with other models. Therefore, it demonstrates that the proposed hybrid EEMD-LSSVM model is a powerful tool for both one-step-ahead and multi-step-ahead wind speed predictions.

Disclosure

Aiqing Kang and Qingxiong Tan are the co-first authors of this paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Aiqing Kang and Qingxiong Tan contributed equally to this work.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 51379080, no. 41571514), the Fundamental Research Funds for the Central Universities (no. 2017KFYXJJ204), and Hubei Provincial Collaborative Innovation Center for New Energy Microgrid in China, Three Gorges University (no. 2015KJX09).