Research Article  Open Access
He Jiang, Luo Shihua, Yao Dong, "Multidimensional Meteorological Variables for Wind Speed Forecasting in Qinghai Region of China: A Novel Approach", Advances in Meteorology, vol. 2020, Article ID 5396473, 19 pages, 2020. https://doi.org/10.1155/2020/5396473
Multidimensional Meteorological Variables for Wind Speed Forecasting in Qinghai Region of China: A Novel Approach
Abstract
The accurate, efficient, and reliable forecasting of wind speed is a hot research topic in wind power generation and integration. However, available forecasting models focus on forecasting the wind speed using historical wind speed data and ignore multidimensional meteorological variables. The objective is to develop a hybrid model with multidimensional meteorological variables for forecasting the wind speed accurately. The complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is applied to handle the nonlinearity of the wind speed. Then, the original wind speed will be decomposed into a series of intrinsic model functions with specified numbers of frequencies. A quadratic model that considers the twoway interactions between factors is used to pursue accurate forecasting. To reduce the model complexity, Gram–Schmidtbased feature selection (GSFS) is applied to extract the important meteorological factors. Finally, all the forecasting values of IMFs will be summed by assigning weights that are carefully determined by the whale optimization algorithm (WOA). The proposed forecasting approach has been applied on six datasets that were collected in Qinghai province and is compared with several stateoftheart wind speed forecasting models. The forecasting results demonstrate that the proposed model can represent the nonlinearity of the wind speed and deliver better results than the competitors.
1. Introduction
The depletion of conventional fossil fuels and the deterioration of the ecological environment have been receiving increasing concern in recent years. It is inevitable that fossil fuels will be restricted or replaced by renewable energy resources, which include wind, solar, geothermal, and biomass ones [1, 2]. As a type of clean, inexhaustible, and environmentally preferable renewable energy, wind energy has been developed rapidly since the 1990s around the world and has received widespread attention [3]. Based on the annual report by the Global Wind Energy Council, the global market was 52.573 GW of the wind power installed capacity in 2017, thereby resulting in a cumulative installed capacity of 539.581 GW [4]. Figure 1 presents the top 10 countries in the world in terms of wind power installed capacity from January to December 2017. The wind power is a function of the cube of the wind speed; hence, small differences in the wind speed forecast cause large differences in the wind power forecast [5]. Moreover, the wind speed is an important variable that is related to the wind turbine output power, and the accurate forecast of wind speed facilitates the improvement of the efficiency of wind turbines and the enhancement of the stability of the power supply [6]. However, the wind speed exhibits the stochastic fluctuations and irregular intermittency due to the influence of several meteorological factors, and its distribution is not normal. Therefore, it is also critically challenging to forecast the wind speed accurately. Many scientific techniques have been developed for forecasting the wind speed. These algorithms can be divided into five types: physicalbased methods, statisticalbased methods, persistence methods, intelligent methods, and hybrid methods [7]. Physicalbased methods, which are suitable for longterm wind speed forecasts, apply parametric models according to the physical changes in the atmosphere [8], such as the numerical weather prediction (NWP) model [9]. Statisticalbased methods, such as statistical regression [10], the autoregressive moving average (ARMA) models [11], Bayesian models [12], and Markov models [13], depend on the historical wind speed data and forecast the future pattern [6]. These models are suitable for forecasting shortterm wind speed but have difficulties if the wind speed series exhibit highfrequency variations. Persistence methods, which are used in shortterm wind speed forecasts, are based on the assumption that the wind speed at time k+1 will not change significantly from the wind speed at time k [14]. The intelligent methods apply machine learning and artificial intelligence algorithms to forecast the wind speed and include Kalman filtering [15], artificial neural networks (ANNs) [16], support vector machine [17], extreme learning machine (ELM) [18], and highdimensional feature selection [19]. The most popular intelligent methods for forecasting wind speed are ANNs. Zhang et al. proposed a new hybrid model that combines a sharedweight long shortterm memory network (SWLSTM) and Gaussian process regression (GPR), and they applied SWLSTMGPR to wind speed prediction cases [20]. Li et al. developed an innovative framework for multistep wind speed prediction that uses a recursive neural network that is based on the wind speed and the turbulence intensity. According to experimental results, the proposed model dramatically outperforms conventional machine learning models on multistep wind speed prediction [21].
Currently, the main focus has shifted from single methods to advanced hybrid methods, which combine techniques such as statistical and machine learning approaches and utilize the characteristics of individual methods. Sheela and Deepa used a hybrid computing model that integrates a multilayer perceptron network and selforganizing feature maps for wind speed prediction [22–24]. Shukur and Lee proposed a hybrid model that was based on ARIMA for forecasting the daily wind speed in Iraq and Malaysia; to improve the forecasting accuracy, an ANN and a Kalman filter (KF) could be utilized to overcome uncertainty and nonlinearity problems [25]. Doucoure et al. used wavelet decomposition and ANNs with predictability analysis to forecast wind speed time series [26]. Liu et al. developed a novel wind speed prediction model that was based on the WPD (wavelet packet decomposition), a convolutional neural network and a convolutional long shortterm memory network, and they demonstrated that the proposed model can perform best in wind speed 1step to 3step predictions [27]. Zhu et al. proposed a wind speed forecasting model by integrating Gaussian process regression with a long shortterm memory network, and the results demonstrated that the proposed method improves the forecast accuracy [28]. Cai et al. proposed a novel filtering strategy that integrates the statistical predictors and the NWP model outputs into a unified framework. Based on the proposed filtering strategy, a combined predictor, namely, SVRSDAUKF (support vector regression, stacked denoising autoencoder, and unscented Kalman filter), is proposed and validated [29].
In addition, wind speed time series exhibit nonlinearity and nonstationarity due to the weather and geographical factors. This may lead to incorrect conclusions regarding the activity of the wind speed when the techniques are used to forecast [15]. Therefore, several researchers have attempted to improve the forecasting precision via the addition of a preprocessing stage for time series decomposition and feature selection [30]. Time series decomposition approaches have been widely used to decompose the original complexstructured series into several simple series, such as the wavelet transform (WT) [31] and variational mode decomposition (VMD) [32]. Feature selection approaches facilitate the selection of suitable inputs for the forecasting models, which enhances the performances of the models substantially. Various feature selection approaches have yielded satisfactory results, such as Gram–Schmidt orthogonalization (GSO) [33], the least absolute shrinkage and selection operator (LASSO) [19], and forward regression [34].
The main contribution of this paper is the developed hybrid forecasting model, which uses random forest, a Gram–Schmidtbased feature selection method, and the whale optimization algorithm. The proposed method has inherited all the advantages of the above approaches, which are embedded together. Instead of combining them in parallel, random forest and the Gram–Schmidtbased feature selection method are embedded in the whale optimization algorithm; this embedding is a challenging task in terms of computation. The objective function value of the whale optimization algorithm is determined by these two methods. The details will be presented in Section 2, which describes the use of the complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN), random forest (RF), the Gram–Schmidtbased feature selection method (GSFS), and the whale optimization algorithm (WOA) to solve several chained problems.(i)The first methodology (CEEMDAN in Section 2.1) addresses the problem that the signals that are detected from the measured objects are often mixed with various levels of noise. Wavelet analysis can filter out weak or useless signals and extract the information that users need. This paper introduces a data preprocessing strategy that is designed based on the theory of decomposition and reconstruction; it filters the noise from the original wind speed efficiently and searches for the intrinsic values of the wind speed data. Finally, this preprocessing procedure utilizes wavelet packet decomposition for denoising and multiscale wavelet decomposition to complete data preprocessing, and it guarantees the accurate forecasting of the wind speed.(ii)The second methodology (RF in Section 2.2) is a machine learning model that is utilized as a forecasting model after selecting all the important variables. This forecasting model is stable since it weakens the correlations between variables to address the multicollinearity issue. Furthermore, it is designed based on bootstrapping, which theoretically guarantees its stability and accurate forecasting results. In summary, compared with traditional programming methods, the utilized machine learning algorithm simplifies the code and improves the execution performance of the code. Second, machine learning technology can be used to solve problems that cannot be solved via traditional methods. Furthermore, machine learning can adapt to new data; namely, the system can be personalized according to the environment.(iii)The third methodology (GSFS in Section 2.3) is an effective and efficient variable selection approach that is used to select the important variable after the preprocessing procedure. This approach is designed based on an orthogonalization method, which makes the selected variables orthogonal with each other. Furthermore, in contrast to other penalized variable selection methods, this novel approach does not introduce any bias into the model; hence, all nuisance variables can be eliminated effectively.(iv)The fourth methodology (WOA in Section 2.4) is a whale optimization algorithm, which is a new swarm intelligence optimization algorithm that simulates the predation behavior of humpback whales in the ocean. The algorithm has the advantages of simple structure, few parameters, strong search performance, and easy implementation. It is applied in the proposed model in an embedded way to determine the optimal weights for assignment to machine learning models. The fitness function in WOA is determined by the machine learning model and the variable selection method.
The hybrid forecasting model, namely, RFGSFSWOA, is proposed in Section 3. The experimental results and the corresponding analysis are presented in Section 4. Finally, Section 5 presents the conclusions of this study.
2. Materials and Methods
In this section, four techniques, namely, CEEMDAN, random forest, GSFS, and WOA, will be introduced. The proposed methodology is developed based on these important techniques.
2.1. Complementary Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)
The complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) originates from the ensemble empirical mode decomposition (EEMD) algorithm. To solve the mode mixing problem, adaptive white noise that has a specified standard deviation can be added at every stage of the decomposition, and a unique residue can be calculated to generate each mode. Moreover, the decomposition is completed with a negligible error; hence, the CEEMDAN provides superior spectral separation of the modes and an accurate reconstruction of the original signal at a lower computational cost [35, 36].
The process of the CEEMEDAN is as follows. Step 1: add realizations of a white Gaussian noise series and generate the corresponding signal: where is the original signal, is the number of the trials, is the noise standard deviation, and denotes the realizations of the white Gaussian noise series. Step 2: decompose realizations via EMD to obtain their first intrinsic model function (IMF): Step 3: calculate the first residue (at the first stage ): Step 4: decompose the noiseadded residues until their first EMD mode and define the (k+1)th IMF of the CEEMDAN: where the operator produces the kth mode of the EMD. Step 5: calculate the th residue: where , is the total number of modes. Step 6: repeat step 4 and step 5 for the next k until the obtained residue is not suitable for decomposition. The final residue is expressed as Hence, the original signal can be expressed as
2.2. Random Forest
Random forest is an ensemble learning method that uses decision trees as base learners and measures the importance of each variable. Random forest is a variant of bagging (bootstrap aggregating), which considers all attributes or variables when establishing a base learner. Decision trees tend to overfit the data, thereby leading to low bias but high variance. To overcome this drawback, random forest constructs decision trees and integrates the results of all individual trees to generate the final outcome [37]. In classification, the mode of the classes that are predicted by the trees is adopted as the output, while, in regression, the mean of the predictions of all individual trees is used.
Since bootstrapping is used to generate the sample from which an individual tree is grown, only approximately 63.2% of the observations are used to produce each base learner, and the remainder of the training set can be used as a validation set: suppose that there are samples. The probability that not all the samples are chosen in the training set is which is if . For data , with representing the observations and representing the response variable, the outofbag (OOB) estimate, which is denoted by , is produced by base learners [32, 33]. Given an ensemble of classifiers , it is defined as follows:where is the set that consists of all possible values of the response and is the subsample that is applied to produce base learner . The OOB estimate of the generalization error is expressed aswhere is the training set.
Both bagging and random forest measure the importance of each variable via the following procedure: (1) calculate the OOB error of each observation and average them to obtain the OOB error of the forest. (2) Permute the values of the jth feature among the training set. (3) Compute the OOB error again on this perturbed data set, and the difference in the OOB error before and after the permutation can be used to measure the importance of the jth variable, where a larger difference corresponds to higher importance of variable j.
2.3. Gram–SchmidtBased Feature Selection Method (GSFS)
Feature selection substantially facilitates the reduction of the model complexity and the enhancement of the forecasting accuracy when establishing a forecasting model. Many feature selection methods are designed based on penalized regression, which shrinks the estimate; these methods yield inaccurate results. In this study, a subset selection method, namely, the Gram–Schmidtbased feature selection method (GSFS) [38], is used for feature selection. The objective of GSFS is to identify all the relevant features that contribute to the model forecasting. The details of GSFS are presented as follows.
GSFS iterations: let be the empty set and initialize and to 0. Here, denotes the initial forecasting values. Calculate the pseudoresponse and select the optimal variable by regressing on :where . Then, and let . The objective in building is to add the optimal variables into the empty set.
In the next steps, we will conduct orthogonalization before selecting each new feature. If , orthogonalize to via the Gram–Schmidt method. Similar to the first step, compute the pseudoresponse and considerwhere . Update and . The objective of this procedure is to select variables that are orthogonal with each other sequentially and add them into an empty set.
The main advantage of GSFS is that it does not involve any penalized regression. It detects the relevant features by examining each feature sequentially. Although it may take more time, it will not miss any important data information. In the second step, GSFS computes the pseudoresponse as the target vector, which can be used to find the most strongly correlated features with the pseudoresponse. This pseudoresponse differs from the ordinary response that is applied in regression or classification problems. It is updated in every iteration of GSFS to eliminate the effects of selected features. An orthogonalization procedure that is based on the Gram–Schmidt method is applied to make all the features orthogonal with each other. This procedure removes the dependencies among features and shortens the computation time for inverting the projection matrix.
The determination of plays an important role in guaranteeing the accuracy and the selection consistency. Let ; namely, let the number of features increase with the sample size. We must impose the following regularity condition before proving the theorem.(C1). It follows that . Assume that the noise term is independent of each feature and let and be constants; namely,(C2) for Let , which represents the noise level and let and . Assume that there exists such that(C3).(C4).(C5)There exists such that and .
Theorem 1. Suppose that regularity conditions (C1)–(C5) are satisfied. Let and Then, , where denotes the set of important variables.
Theorem 1 establishes the selection consistency of the GSFS method and provides a theoretical guarantee on the choice of . Due to the selection consistency, the feature selection method can detect all the important features as the sample size approaches infinity. Here, contains the features that are selected in step . To select the optimal model with all the important features, highdimensional information criteria (HDIC) will be applied after GSFS. The proof of Theorem 1 is similar to [34] and is omitted here.
2.4. Whale Optimization Algorithm (WOA)
In 2016, Mirjalili and Lewis [39] proposed the whale optimization algorithm (WOA), which is a novel metaheuristic that mimics the bubblenet foraging behavior of humpback whales in searching for their prey. It is illustrated in Figure 2(a). The humpback whales attack small fishes or schools of krill that are close to the surface by swimming in a shrinking circle and creating distinctive bubbles along a spiralshaped path. The algorithm includes two phases: the first phase is encircling a prey and applying the spiral bubblenet attacking method, and the second phase is searching randomly for a prey. The details of each phase are presented as follows.
2.4.1. Encircling Prey
The humpback whales dive downwards and generate bubbles in a spiral shape, which makes the prey (typically krill) feel threatened and, hence, gather together. It is assumed that there are N humpback whales in a ddimensional search space. The position of each whale is denoted by . The position of the prey represents the globally optimal solution to the problem. The humpback whales can identify the prey’s position and encircle it. However, there is no priori information about the optimal solution. Therefore, in the WOA, the current best position is adopted as the position that corresponds to the optimal solution. Once this position has been identified, other humpback whales will gather around it and update their position as follows [36]:where is the current iteration, is the current position, is the current best position, and and are the coefficient vectors:where and are 2 random numbers that are in the interval (0, 1) and is a convergence factor, which decreases linearly as the iteration proceeds. Define as the iteration number and as the maximum allowed number of iterations. Factor is defined as
Two mechanisms are designed to describe the hunting behavior of the humpback whales, namely, the shrinking encircling mechanism and the spiral position updating mechanism, as illustrated in Figures 2(b) and 2(c). Shrinking encircling is realized as the convergence factor decreases, while, in spiral position updating, the spiral movement of the humpback whales is simulated by the following:where is the distance between the humpback and the prey, is the shape parameter of the spiral, and is a random number that is in the interval (−1, 1). Since the humpback whales perform these two behaviors simultaneously, in each iteration, each of these mechanisms has a probability of 0.5 to be utilized. The corresponding model is as follows:where is a random number in (0, 1).
2.4.2. Search for Prey
To enhance the exploration in WOA, instead of updating the positions of the humpback whales based on the best position so far, a random position is chosen to guide the search accordingly. with random values that are greater than 1 or less than −1 is applied to force the position to move far away from the best position.
The model of searching for prey is defined as [40]where is a random position (or a random whale) that is chosen from the current population.
3. Proposed Hybrid Forecasting Model: RFGSFSWOA
In this paper, the proposed RFGSFSWOA is defined by the following procedure. Step 1: the original wind speed time series is decomposed into I IMFs (let IMF_{I} = R (n)) by following Step 1–Step 6 in Section 2.1. Step 2: a quadratic model (QM) is established for each : where are variables (first order) in the original dataset; are quadratic variables (second order) that are constructed by calculating the componentwise product of and ; are the corresponding coefficients of the variables and quadratic variables; and are independent Gaussian noise errors that follow . quadratic models will be obtained, which are denoted by . As variables are included in the QM, the computational cost is high even if the value of is moderate. Therefore, a variable selection method is urgently needed to reduce the computational cost. Step 3: to reduce the computational cost of QM in Step 2, GSFS is used to complete the variable selection task. With slight abuse of notation, is denoted by . Let , where represents the initial forecasting value, , and the optimal variables are selected by regressing on , which is expressed as where . Then, and let . In the next steps, the data will be orthogonalized. If , orthogonalize to via the Gram–Schmidt method. Similar to the first step, compute the pseudoresponse and consider where . Update and . Step 4: random forest model is built over the variables that were selected in Step 3. Step 5: assign a weight to each , and the final forecast model is provided by the ensemble model (EM):with , where is obtained via the WOA algorithm.
The RFGSFSWOA algorithm is presented in detail as Algorithm 1. The proposed combination forecasting method, namely, RFGSFSWOA, has the following advantages. Instead of applying a single decision tree, RF grows many decision trees and, hence, can handle the correlation among the features. It bootstraps samples, which promotes the diversity of the base learners and, consequently, reduces the number of variables of the model. Furthermore, RF outperforms bagging as the number of base learners increases. It trains fast since it applies a fraction of the features instead of considering all possible features. The time that is required for training RF is often shorter than that for bagging. The overfitting problem of RF can be controlled well via the selection of a suitable number of features in each decision tree. GSFS is a greedy feature selection method, which is a type of forward selection method. In contrast to penalized feature selection methods such as LASSO and SCAD, it does not introduce any bias into the model. WOA is a metaheuristic method that can obtain the globally optimal solutions of nonlinear optimization problems. A flowchart of the proposed RFGSFSWOA is presented in Figure 3. The RFGSFSWOA algorithm is presented in the Appendix.

4. Case Studies
This section describes the data collection process, establishes the forecasting assessment criteria, and presents, discusses, and analyzes the experiment results. The main objective of the experiments is to evaluate the performance of the proposed hybrid forecasting model, namely, RFGSFSWOA.
4.1. Data Collection
Qinghai province is located in the northeast of the QinghaiTibetan Plateau, and most of the area is rich in wind energy resources; hence, it is suitable for the installation of largescale wind farm. Thus, it is meaningful to investigate sites in this region. The hourly data were collected between 0 : 00 and 24 : 00 from 1/1/2014 to 12/31/2014 at six sites in the Qinghai region. The dataset that is used in this study was collected by National Renewable Energy Laboratory (NREL) and is available at https://www.nrel.gov/gis/tools.html. To simplify the description, the six sites are denoted by Site 1, Site 2, Site 3, Site 4, Site 5, and Site 6. The longitudes and latitudes of these locations are specified in Figure 4. Five meteorological factors, namely, the temperature (T), pressure (P), relative humidity (RH), wind precipitable water (PW), and wind direction (WD), are considered in the forecasting of the wind speed for the Qinghai region, which facilitates the design of wind power plants to efficiently generate electricity for the local population [41]. There are four seasons in each year: spring (Mar., Apr., and May), summer (Jun., Jul., and Aug.), autumn (Sep., Oct., and Nov.), and winter (Dec., Jan., and Feb.). The data are divided into training data and forecasting data as follows: 20 days in each season are randomly chosen as the training data (20 days × 24 hours/days = 480) for building the model, and the forecasting data include 5 days (5 days × 24 hours/day = 120) that are randomly chosen from the remaining days in each season. The performances of various forecasting models are evaluated in the four seasons. The training sets and forecasting sets for the six sites are described in Table 1. The forecasting sets account for approximately 20% of the total data, which is reasonable.

4.2. Forecasting Assessment Criteria
To evaluate the results of the forecasting models quantitatively, four assessment measures, namely, the mean absolute error (MAE), root mean square error (RMSE), mean absolute percent error (MAPE), and Theil inequality coefficient (TIC), are applied. All are calculated between the actual data and the forecasted values, and the smaller their values, the better the forecasting results. These criteria are expressed as follows:where is the size of the forecasting sample and and are the actual value and the forecasted value, respectively, for time period .
4.3. Experiment Results and Corresponding Analysis
The experimental settings are described in detail in Table 2. The parameters in the complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) are set as follows: the noise standard deviation (Nstd) is set as 0.2, the number of realizations (Nr) is set as 500, and the maximum allowed number of sifting iterations (MaxIter) is set as 500. The regularization parameters in LASSO and SCAD are chosen via 10fold crossvalidation. In the experiments on the neural networks, the maximum number of iterations (MaxIterNN) is set as 2000 for both BP and Elman. Each activation function from the hidden layer to the output layer is set as the Tansig function (Func). The number of hidden neurons is crucial since it mainly determines the network complexity. To determine the optimal number of hidden neurons, values are selected from the following set: . Furthermore, Elman is trained using backpropagation through time, and the corresponding weights are selected based on a uniform distribution. Based on the experimental results, the gradient descent with momentum is set as 0.82, and the adaptive learning rate is set as 0.02. In GSFS, we set q as 30; hence, 30 variables will be chosen after the GSFS procedure, and, thus, there are 1200 variables in the quadratic model. The number of trees in RF is set as 3000, and the maximum number of iterations (MaxIterRF) is 500. RF is implemented using MATLAB function TreeBagger. The fitness function of WOA is selected as “F1,” which is the square error loss function, and the range is (0, 1). The weights are determined after normalization such that the sum of all the weights is 1.

To evaluate the forecasting performance of the proposed hybrid model for forecasting hourly wind speed, experiments for six sites and four seasons are designed in this paper. Four types of assessment criteria are used in all experiments to evaluate the performance. The proposed hybrid forecasting model is RFGSFSWOA, and six forecasting models, namely, an Elman neural network, ElmanLASSO, ElmanSCAD, a BP neural network (BP), BPLASSO, and BPSCAD, are applied as comparison models to evaluate the performance of the developed hybrid model. Wind speed data exhibit volatility or instability due to changes in the weather; hence, this study conducts a preprocessing step before building all the models; namely, the complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is used to eliminate the noise and smooth the wind speed time series data before the developed hybrid model and six comparison models are established for hourly wind speed forecasting in four seasons at six sites.
In each season, 20 days of hourly wind speed time series are selected randomly as the training phase for the construction of the models, and 5 days of wind speed time series are selected randomly as the forecasting phase. Thus, 80% of the data will be used as training data; the remaining data (20%) will be used as test data. The reason for this splitting scheme is that the model can be trained sufficiently with more observations (80% of the data) and overfitting can be avoided. However, the number of observations in the test data is also important: the test accuracy cannot be evaluated accurately if the number of observations is too small (less than 20%). The multiple output strategy (5 days are used as outputs), which is one of the multistep forecasting techniques, is applied in the experiments.
The wind speed is influenced by several meteorological variables. Consequently, seven forecasting models consider multidimensional meteorological factors, such as the temperature, pressure, relative humidity, precipitable water, and wind direction, as dependent variables in multivariate models. The use of multivariable inputs increases the difficulty of determining the model parameters.
Four performance criteria, namely, MAPE, RMSE, MSE, and TIC, for each forecasting model in the four seasons at the six sites, are listed in Tables 3–5. Figures 5–7 present the forecasting results of the seven models for observation sites 1–6, which successfully illustrate that the proposed hybrid model RFGSFSWOA yields closer results to the real wind speed data than any other forecasting model in all instances. Furthermore, the developing hybrid framework realizes the best MAPE, RMSE, MSE, and TIC values. For hourly wind speed forecasting with multidimensional meteorological variables, the forecasting performance is described in detail in Tables 3–5 and Figures 5–8, from which the following findings are obtained.(I)From Table 3 and Figure 5, it is found that Elmanbased models, namely, Elman, ElmanLASSO, and ElmanSCAD, outperform the BPbased models, namely, BP, BPLASSO, and BPSCAD, and that Elman and BP yield the worst forecasting results. For instance, at Site 1, the developed hybrid model, namely, RFGSFSWOA, outperforms the other six forecasting models. The RFGSFSWOA model realizes the smallest MAPE of 2.44% in summer, and similar results are obtained in terms of the MAE, RMSE, and TIC criteria.(II)According to Table 4 and Figure 6, the proposed RFGSFSWOA still outperforms other forecasting models by realizing lower values of MAE, RMSE, MAPE, and TIC. The forecasting models that were established based on Elman substantially outperform the models that were constructed based on BP. ElmanSCAD outperforms BPSCAD. For example, the MAPE of ElmanSCAD is 7.61, which is critically lower than that of BPSCAD, which is 10.37 in spring at Site 3. In autumn at Site 4, the RMSE of ElmanSCAD is 0.27, which is approximately 34.15% lower than that of BPSCAD. Furthermore, similar to the results from Table 3 and Figure 5, the SCADrelated forecasting models outperform the LASSOrelated forecasting models. This is mainly because SCAD can better handle the collinearity of variables than LASSO. Similar phenomena are observed in Table 5 and Figure 7. RFGSFSWOA realizes remarkable forecasting performances.(III)According to Figure 8, which presents the average MAPE and RMSE values of the compared forecasting methods in the four seasons, RFGSFSWOA realizes the lowest MAPE values over all seasons, which are 4.04%, 3.73%, 3.02%, 3.08%, 2.81%, and 4.13% at the six sites. ElmanSCAD still outperforms ElmanLASSO and BPSCAD. The worst value is obtained by BP, which is approximately 20%. BPLASSO and BPSCAD slightly outperform BP, which does not perform as well as Elman.



(a)
(b)
Therefore, from the discussions above, the proposed hybrid forecasting model, namely, RFGSFSWOA, performs well in the wind speed forecasting task by combining the advantages of machine learning module RF, feature selection approach GSFS, and metaheuristic method WOA.
4.4. Statistical Test
To further evaluate the performance of the proposed forecasting model, a nonparametric statistical test, namely, the Friedman test, is conducted. The forecasting performances of the compared methods differ among the six sites [42]. The following hypothesis, namely, the null hypothesis , along with the alternative hypothesis , is considered in the Friedman test:where , with representing the jth rank of algorithms on a dataset of size B. The Friedman statistic is defined as follows [43]:
This statistic follows chisquare distribution . An Fdistribution statistic, namely, , is calculated as follows:
Suppose that there are significant differences among the comparing forecasting methods. Then, the null hypothesis will be rejected. If this occurs, critical differences (CDs) will be applied for further comparisons, which is known as a post hoc test. A BonferroniDunn test is used with CD values , where is determined as 2.291 in [44]. The Friedman test is described in detail in Table 6. All the values are much larger than the critical value = 2.42. This suggests that the null hypothesis should be rejected and a post hoc test is needed. Using the abovedescribed method, the CD value is calculated as , which is used as a benchmark for comparison. The proposed RFGSFSWOA significantly outperforms BPLASSO, BP, and Elman since the differences between the average ranks of these methods exceed 2.85. For instance, the difference between RFGSFSWOA and BPLASSO is 4.67 for MAE, which is larger than 2.85. Compared to other methods, the proposed method does not show strong advantages in forecasting. However, based on the above analysis, the proposed method delivers remarkable forecasting results.

5. Conclusions
Accurate wind speed forecasting has attracted the attention of researchers of wind energy for decades. This paper proposed and investigated a novel hybrid wind speed forecasting model that combines the advantages of CEEMDAN, GSFS, RF, and WOA. The proposed model has been applied to data from six sites in Qinghai province to forecast the wind speed. The forecasting performances are evaluated using four statistical measures: MAE, RMSE, MAPE, and TIC. The forecasting results demonstrate that the proposed hybrid model outperforms other compared forecasting models, namely, Elman, ElmanLASSO, ElmanSCAD, BP, BPLASSO, and BPSCAD, which are famous wind speed forecasting models and variable selection methods. Moreover, the Elmanbased forecasting models deliver better results than the BPbased models, and the SCADbased models outperform the LASSObased models. Furthermore, by using variable selection methods such as GSFS, LASSO, and SCAD, the model performances are boosted substantially. Thus, variable selection is necessary since it can be used to select the meteorological factors that make the most important contributions to the forecasting model while discarding useless factors. To further evaluate the performance of the proposed model, a nonparametric test, namely, the Friedman test, is conducted to evaluate the forecasting results. The test results are consistent with the findings of the analysis of the results. Consequently, according to the experimental results, the proposed hybrid model forecasts the wind speed accurately and efficiently [38, 45].
Data Availability
The data that are used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
Acknowledgments
This research was supported by the National Natural Science Foundation of China (Grant nos. 71901109, 71761016, 61973145, 71861012, and 18ATJ001), Natural Science Foundation of Jiangxi, China (no. 20181BAB211020), Jiangxi Double Thousand Plan, Postdoctoral Foundation of Jiangxi Province (no. 2018KY08), Scientific Research Fund of Jiangxi Provincial Education Department (Grant nos. GJJ180287, GJJ180247, and GJJ190264), and Human and Social Science Foundation of Jiangxi Province (no. TJ19202).
References
 N. A. Shrivastava, K. Lohia, and B. K. Panigrahi, “A multiobjective framework for wind speed prediction interval forecasts,” Renewable Energy, vol. 87, pp. 903–910, 2016. View at: Publisher Site  Google Scholar
 E. T. Renani, M. F. M. Elias, and N. A. Rahim, “Using datadriven approach for wind power prediction: a comparative study,” Energy Conversion and Management, vol. 118, pp. 193–203, 2016. View at: Google Scholar
 I. Okumus and A. Dinler, “Current status of wind energy forecasting and a hybrid method for hourly predictions,” Energy Conversion and Management, vol. 123, pp. 365–371, 2016. View at: Publisher Site  Google Scholar
 Global Wind Energy Council, “Global wind statistics,” Global Wind Energy Council, Brussels, Belgium, 2017, http://gwec.net/globalfigures/graphs/>. View at: Google Scholar
 Y. Noorollahi, M. A. Jokar, and A. Kalhor, “Using artificial neural networks for temporal and spatial wind speed forecasting in Iran,” Energy Conversion and Management, vol. 115, pp. 17–25, 2016. View at: Publisher Site  Google Scholar
 A. B. Asghar and X. Liu, “Adaptive neurofuzzy algorithm to estimate effective wind speed and optimal rotor speed for variablespeed wind turbine,” Neurocomputing, vol. 272, pp. 495–504, 2018. View at: Publisher Site  Google Scholar
 J. Shi, J. Guo, and S. Zheng, “Evaluation of hybrid forecasting approaches for wind speed and power generation time series,” Renewable and Sustainable Energy Reviews, vol. 16, no. 5, pp. 3471–3480, 2012. View at: Publisher Site  Google Scholar
 J. Zhao, Y. Guo, X. Xiao, J. Wang, D. Chi, and Z. Guo, “Multistep wind speed and power forecasts based on a WRF simulation and an optimized association method,” Applied Energy, vol. 197, pp. 183–202, 2017. View at: Publisher Site  Google Scholar
 F. Cassola and M. Burlando, “Wind speed and wind energy forecast through Kalman filtering of Numerical Weather Prediction model output,” Applied Energy, vol. 99, pp. 154–166, 2012. View at: Publisher Site  Google Scholar
 O. Ait Maatallah, A. Achuthan, K. Janoyan, and P. Marzocca, “Recursive wind speed forecasting based on Hammerstein autoregressive model,” Applied Energy, vol. 145, pp. 191–197, 2015. View at: Publisher Site  Google Scholar
 E. Erdem and J. Shi, “ARMA based approaches for forecasting the tuple of wind speed and direction,” Applied Energy, vol. 88, no. 4, pp. 1405–1414, 2011. View at: Publisher Site  Google Scholar
 S. Baran, “Probabilistic wind speed forecasting using Bayesian model averaging with truncated normal components,” Computational Statistics & Data Analysis, vol. 75, no. 6, pp. 227–238, 2014. View at: Publisher Site  Google Scholar
 Z. Song, Y. Jiang, and Z. Zhang, “Shortterm wind speed forecasting with Markovswitching model,” Applied Energy, vol. 130, no. 5, pp. 103–112, 2014. View at: Publisher Site  Google Scholar
 C. W. Potter and M. Negnevitsky, “Very shortterm wind forecasting for Tasmanian power generation,” IEEE Transactions on Power Systems, vol. 21, no. 2, pp. 965–972, 2006. View at: Publisher Site  Google Scholar
 C. D. Zuluaga, M. A. Álvarez, and E. Giraldo, “Shortterm wind speed prediction based on robust Kalman filtering: an experimental comparison,” Applied Energy, vol. 156, pp. 321–330, 2015. View at: Publisher Site  Google Scholar
 G. Li and J. Shi, “On comparing three artificial neural networks for wind speed forecasting,” Applied Energy, vol. 87, no. 7, pp. 2313–2320, 2010. View at: Publisher Site  Google Scholar
 X. Kong, X. Liu, R. Shi, and K. Y. Lee, “Wind speed prediction using reduced support vector machines with feature selection,” Neurocomputing, vol. 169, pp. 449–456, 2015. View at: Publisher Site  Google Scholar
 L. Wang, X. Li, and Y. Bai, “Shortterm wind speed prediction using an extreme learning machine model with error correction,” Energy Conversion and Management, vol. 162, pp. 239–250, 2018. View at: Publisher Site  Google Scholar
 D. Ambach and W. Schmid, “A new highdimensional time series approach for wind speed, wind direction and air pressure forecasting,” Energy, vol. 135, pp. 833–850, 2017. View at: Publisher Site  Google Scholar
 Z. Zhang, L. Ye, H. Qin et al., “Wind speed prediction method using shared weight long shortterm memory network and Gaussian process regression,” Applied Energy, vol. 247, pp. 270–284, 2019. View at: Publisher Site  Google Scholar
 F. Li, G. Ren, and J. Lee, “Multistep wind speed prediction based on turbulence intensity and hybrid deep neural networks,” Energy Conversion and Management, vol. 186, pp. 306–322, 2019. View at: Publisher Site  Google Scholar
 K. G. Sheela and S. N. Deepa, “Neural network based hybrid computing model for wind speed prediction,” Neurocomputing, vol. 122, pp. 425–429, 2013. View at: Google Scholar
 J. Wang, W. Yang, P. Du, and T. Niu, “A novel hybrid forecasting system of wind speed based on a newly developed multiobjective sine cosine algorithm,” Energy Conversion and Management, vol. 163, pp. 134–150, 2018. View at: Publisher Site  Google Scholar
 J. Wang, W. Yang, P. Du, and Y. Li, “Research and application of a hybrid forecasting framework based on multiobjective optimization for electrical power system,” Energy, vol. 148, pp. 59–78, 2018. View at: Publisher Site  Google Scholar
 O. B. Shukur and M. H. Lee, “Daily wind speed forecasting through hybrid KFANN model based on ARIMA,” Renewable Energy, vol. 76, pp. 637–647, 2015. View at: Publisher Site  Google Scholar
 B. Doucoure, K. Agbossou, and A. Cardenas, “Time series prediction using artificial wavelet neural network and multiresolution analysis: application to wind speed data,” Renewable Energy, vol. 92, pp. 202–211, 2016. View at: Publisher Site  Google Scholar
 H. Liu, X. Mi, and Y. Li, “Smart deep learning based wind speed prediction model using wavelet packet decomposition, convolutional neural network and convolutional long short term memory network,” Energy Conversion and Management, vol. 166, pp. 120–131, 2018. View at: Publisher Site  Google Scholar
 S. Zhu, X. Yuan, Z. Xu, X. Luo, and H. Zhang, “Gaussian mixture model coupled recurrent neural networks for wind speed interval forecast,” Energy Conversion and Management, vol. 198, p. 111772, 2019. View at: Publisher Site  Google Scholar
 H. Cai, X. Jia, J. Feng et al., “A combined filtering strategy for short term and long term wind speed prediction with improved accuracy,” Renewable Energy, vol. 136, pp. 1082–1090, 2019. View at: Publisher Site  Google Scholar
 C. Li, Z. Xiao, X. Xia, W. Zou, and C. Zhang, “A hybrid model based on synchronous optimisation for multistep shortterm wind speed forecasting,” Applied Energy, vol. 215, pp. 131–144, 2018. View at: Publisher Site  Google Scholar
 A. Tascikaraoglu, B. M. Sanandaji, K. Poolla, and P. Varaiya, “Exploiting sparsity of interconnections in spatiotemporal wind speed forecasting using Wavelet Transform,” Applied Energy, vol. 165, pp. 735–747, 2016. View at: Publisher Site  Google Scholar
 W. Sun and M. Liu, “Wind speed forecasting using FEEMD echo state networks with RELM in Hebei, China,” Energy Conversion and Management, vol. 114, pp. 197–208, 2016. View at: Publisher Site  Google Scholar
 A. A. Abdoos, “A new intelligent method based on combination of VMD and ELM for short term wind power forecasting,” Neurocomputing, vol. 203, pp. 111–120, 2016. View at: Publisher Site  Google Scholar
 H. Jiang and Y. Dong, “Global horizontal radiation forecast using forward regression on a quadratic kernel support vector machine: case study of the Tibet Autonomous Region in China,” Energy, vol. 133, pp. 270–283, 2017. View at: Publisher Site  Google Scholar
 M. E. Torres, M. A. Colominas, G. Schlotthauer, and P. Flandrin, “A complete ensemble empirical mode decomposition with adaptive noise,” in Proceedings of the IEEE international conference on acoustics, speech and signal processing, ICASSP, pp. 4144–4147, Prague, Czech Republic, May 2011. View at: Google Scholar
 H. Liu, X. Mi, and Y. Li, “Comparison of two new intelligent wind speed forecasting approaches based on wavelet packet decomposition, complete ensemble empirical mode decomposition with adaptive noise and artificial neural networks,” Energy Conversion and Management, vol. 155, pp. 188–200, 2018. View at: Publisher Site  Google Scholar
 L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. View at: Publisher Site  Google Scholar
 C. K. Ing and T. L. Lai, “A stepwise regression method and consistent model selection for highdimensional sparse linear models,” Statistica Sinica, vol. 21, pp. 1473–1513, 2011. View at: Publisher Site  Google Scholar
 S. Mirjalili and A. Lewis, “The whale optimization algorithm,” Advances in Engineering Software, vol. 95, pp. 51–67, 2016. View at: Publisher Site  Google Scholar
 M. Mafarja and S. Mirjalili, “Whale optimization approaches for wrapper feature selection,” Applied Soft Computing, vol. 62, pp. 441–453, 2018. View at: Publisher Site  Google Scholar
 J. Marin, D. Vázquez, A. M. López, B. Amores, and B. Leibe, “Random forests of local experts for pedestrian detection,” in IEEE International Conference on Computer Vision (ICCV), IEEE, pp. 2592–2599, Sydney, NSW, Australia, December 2013. View at: Google Scholar
 J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006. View at: Google Scholar
 R. L. Iman and J. M. Davenport, “Approximations of the critical region of the fbietkan statistic,” Communications in Statistics  Theory and Methods, vol. 9, no. 6, pp. 571–595, 1980. View at: Publisher Site  Google Scholar
 O. J. Dunn, “Multiple comparisons among means,” Journal of the American Statistical Association, vol. 56, no. 293, pp. 52–64, 1961. View at: Publisher Site  Google Scholar
 M. M. Mafarja and S. Mirjalili, “Hybrid Whale Optimization Algorithm with simulated annealing for feature selection,” Neurocomputing, vol. 260, pp. 302–312, 2017. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2020 He Jiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.