Abstract

Short-term wind speed forecasting is crucial to the utilization of wind energy, and it has been employed widely in turbine regulation, electricity market clearing, and preload sharing. However, the wind speed has inherent fluctuation, and accurate wind speed prediction is challenging. This paper aims to propose a hybrid forecasting approach of short-term wind speed based on a novel signal processing algorithm, a wrapper-based feature selection method, the state-of-art optimization algorithm, ensemble learning, and an efficient artificial neural network. Variational mode decomposition (VMD) is employed to decompose the original wind time-series into sublayer modes. The binary bat algorithm (BBA) is used to complete the feature selection. Bayesian optimization (BO) fine-tuned online sequential extreme learning machine (OSELM) is proposed to forecast the low-frequency sublayers of VMD. Bagging-based ensemble OSELM is proposed to forecast high-frequency sublayers of VMD. Two experiments were conducted on 10 min datasets from the National Renewable Energy Laboratory (NREL). The performances of the proposed model were compared with various representative models. The experimental results indicate that the proposed model has better accuracy than the comparison models. Among the thirteen models, the proposed VMD-BBA-EnsOSELM model can obtain the best prediction accuracy, and the mean absolute percent error (MAPE) is always less than 0.09.

1. Introduction

Wind energy has grown substantially for two decades [1]. It has become one of the primary renewable energy sources. However, wind energy is highly variable, which affects the stable operation of the grid. Wind speed prediction can enhance wind farm operations and reduce the influence of wind energy on the grid. As the installed capacity of wind energy increases year by year [2], the industry needs more accurate wind speed prediction, making this subject an essential topic in energy research. Over the past decade, scholars have proposed many wind speed prediction methods. These methods are divided into four categories, i.e., (1) physical methods, (2) statistical methods, (3) artificial intelligence methods, and (4) hybrid methods.

The physical methods are based on fluid dynamics principles to establish numeric weather prediction (NWP) models. These methods need vast calculations and are not suitable for short-term wind speed prediction [3]. Statistical methods can analyze the patterns in historical data and establish linear prediction models. Representative methods include autoregressive (AR) [4], autoregressive moving average (ARMA) [5], autoregressive integrated moving average (ARIMA) [6], and pattern sequence similarity (PSF) [7]. These methods are not capable of characterizing nonlinear relationships in the wind data to produce high-precision prediction results.

Artificial intelligence methods are good at modeling nonlinear relationships. Among the AI models, the most widely used ones are artificial neural networks (ANNs) [8] and support vector machines (SVMs) [9]. However, the ANNs have multilayer structures that contain many parameters to adjust. The SVM is sensitive to parameters and needs massive calculation on large data sets. Extreme learning machine (ELM) is a simple neural network [10]. Compared to the ANNs, ELM has a single hidden layer and therefore has fewer network parameters. Compared to SVM, ELM is more efficient. Consequently, ELM is an excellent predictor [11]. For instance, Liu et al. [12] used the ELM to complete the forecasting for the high-frequency sublayers obtained by the VMD-SSA. Fu et al. [13] proposed a hybrid approach based on dominant ingredient chaotic analysis and the ELM. However, in these papers, ELM is in the offline mechanism, and they cannot support real-time learning. To address this issue, an online sequential extreme learning machine (OSELM) was introduced. Zhang et al. [14] proposed an online sequential outlier robust extreme learning machine (OSORELM) for short-term wind speed prediction. Tian et al. [15] proposed an adaptive OSELM to improve ELM’s prediction ability further.

Ensemble learning, such as Bagging [16] and Boosting [17], can combine multiple weak predictors to complete the forecasting. Bagging can reduce the prediction variance and improve the stability of the fundamental predictors. Zontul et al. [18] proposed a Bagging-based decision tree algorithm for wind speed prediction. Emeksiz and Demir [19] used the Bagging algorithm to estimate wind speed. Boosting can effectively enhance the performance of a weak predictor. Peng et al. [17] used the AdaBoost neural network to solve the lower accuracy defect. Liu et al. [20] proposed an AdaBoost algorithm and the multilayer perceptron (MLP) neural networks.

Besides ensemble learning, hybrid methods can improve the prediction robustness and accuracy of a single model. In a hybrid model, signal decomposition algorithms are employed to reduce the prediction complexity. The representative algorithms are wavelet decomposition (WD), wavelet packet decomposition (WPD), empirical mode decomposition (EMD), and ensemble empirical mode decomposition (EEMD). For instance, Fei and He [21] proposed a hybrid prediction method that combined WD and relevance vector machine. Liu et al. [22] presented a novel approach based on WPD and convolutional long short-term memory (ConvLSTM) networks. Zhang et al. [23] developed a model combining EMD, ANN, and SVM. Tian et al. [24] proposed a prediction approach using EEMD and extreme learning machine (ELM). However, the above decomposition methods have shortcomings. For instance, the wavelet-based approaches do not support adaptive processing; EMD cannot avoid mode mixing, and EEMD can add extra white noise into the wind data. A novel signal processing method, variational mode decomposition (VMD), was proposed to overcome the above obstacles. It can break down the original wind speed time-series into a set of band-limited sublayer modes named intrinsic mode functions (IMFs). These IMFs are stationary to predict. For instance, Zhang et al. [25] presented a hybrid model of the ANN, VMD, and Lorenz disturbance. The results proved the stable prediction performance of the proposed model. Gendeel et al. [26] presented an ANN prediction model with VMD. The comparison results indicated that the proposed model obtained significant improvements in forecasting accuracy.

Feature selection methods can improve the computational efficiency of the hybrid models. The typical filter-based approaches are partial autocorrelation function (PACF) and information theory methods. Sun et al. [27] applied PACF to identify the correlation between the decomposed components of EEMD. Memarzadeh and Keynia [28] used mutual information (MI) for feature selection. Huang et al. [29] used conditional mutual information (CMI) to analyze the correlation between the input features. Compared with the filter methods, the metaheuristic optimization-based wrapper approach can produce better accuracy. Sun et al. [27] used the binary-value gravitation search algorithm (BGSA) to improve the regression performance. Liu et al. [30] used the binary-coded genetic algorithm (BGA) for feature selection. Recently, the binary bat algorithm (BBA) has been proposed for feature selection. Compared with other metaheuristic algorithms, BBA has fewer parameters to adjust and can obtain better accuracy. Naik et al. [31] used the BBA to identify the relevant subset of features for the machine-learning tasks. Xie et al. [32] applied BBA to realize test-cost-sensitive attribute reductions. Liu et al. [33] used BBA to remove redundant features for image steganalysis effectively. Since BBA is superior to PACF, it is employed for feature selection in this paper.

Besides the feature selection, the metaheuristic optimization algorithms can be used to seek the optimal parameters of the prediction models to promote the predictors’ performance on the datasets [34]. Among the metaheuristic algorithms, the genetic algorithm (GA) [35] and particle swarm optimization (PSO) [36] have been widely used in wind speed prediction. Although they are suitable for optimizing the model parameters, they need massive calculations and are vulnerable to improper parameter initialization. In the past few years, Bayesian optimization (BO) has emerged as a powerful tool for fine-tuning hyperparameters. Specifically, BO is capable of optimizing expensive black-box objective functions. Compared with the evolutionary computation methods, BO can achieve desirable results with fewer iterations. For instance, Cho et al. [37] used BO to fine-tune deep neural networks. The experimental results indicated that BO is a robust solution compared to the existing solutions. Muhuri and Biswas [38] used BO to optimize task scheduling. Their approach obtained optimal schedules without violation of the constraints. The experimental results indicated that BO is sample-efficient and can significantly outperform existing optimizers.

This paper proposes a novel approach for short-term wind speed prediction based on the above issues. The proposed model combines VMD, BBA, OSELM, BO, and Bagging. The contributions of the paper are as follows:(1)VMD is utilized to preprocess the original wind time-series into more stationary sublayers for prediction. Compared to EMD and its variants, the proposed approach is more robust to data noise(2)BBA is employed to complete the feature selection. Compared to PACF, BBA can achieve better prediction accuracy(3)BO-optimized OSELM, referred to as BO-OSELM, is used to forecast the low-frequency sublayers of VMD. Compared to ELM, OSELM can provide the capability of online learning. Besides, BO is used to optimize the structure of OSLEM(4)Bagging-based ensemble OSELM, referred to as Bagging-OSELM, is employed to forecast high-frequency sublayers of VMD. The Bagging-OSELM reveals better stability and accuracy than OSELM and AdaBoost-OSELM

The remaining part of the paper proceeds as follows: Section 2 introduces the proposed hybrid model, Section 3 presents the experimental results and discussion, and Section 4 draws the conclusions.

2. The Proposed Hybrid Model

In this section, the proposed hybrid model, referred to as VMD-BBA-EnsOSELM, is presented. This approach combines VMD, BBA, BO, Bagging, and OSELM. The architecture of the proposed model is demonstrated in Figure 1. The process of the proposed method is introduced as follows:(1)VMD is utilized to decompose the denoised original data set into stationary sublayers.(2)The feature selection method of BBA is applied to reserve critical features from the sublayers produced by VMD. The past twenty data points of the wind speed are chosen as the candidate feature sets. BBA determines the most relevant features of the candidate features.(3)OSELM is adopted to complete the forecasting for the low-frequency sublayers obtained by VMD. BO optimizes the parameters of OSELM.(4)Bagging-OSELM is adopted to complete the forecasting for the high-frequency sublayers obtained by VMD.(5)All the forecasting results of the BO-OSELM and Bagging-OSELM are aggregated to produce the final prediction results.(6)The proposed model is evaluated and compared with twelve comparison models, including the GPR model, the LSSVR model, the LSTM model, the OSELM model, the AdaBoost-OSELM model, the Bagging-OSELM model, the BBA-OSELM model, the BO-OSELM model, the PSO-OSELM model, the EMD-BBA-OSELM model, the EEMD-BBA-OSELM model, and the VMD-BBA-OSELM model.

2.1. Variational Mode Decomposition

The VMD algorithm is developed to overcome the limitations of EMD [39]. It can decompose an original signal into IMFs. In the literature, it presented significant advantages in time-series forecasting [40] and fault diagnosis [41]. The core principle of VMD is to realize the IMFs by resolving the constrained optimization problem as follows:subject towhere denotes the IMFs; is a central frequency of each IMF in the Fourier frequency domain; and represents a Dirac function. The constraint conditions are (1) the original signal equals the sum of all the IMFs; and (2) the sum of the modal bandwidths is the least. Moreover, a Lagrange multiplier is introduced aswhere denotes a penalty factor, guaranteeing the decomposition precision, and is a Lagrangian multiplier to assure the rigidity of the constraint conditions. The optimal solution to the above optimization problem is achieved as follows:where is an IMF, is the Fourier transform of , and n denotes the number of iterations to resolve the problem.

2.2. Binary Bat Algorithm

Inspired by bats, a novel metaheuristic algorithm, named the bat algorithm [42], is developed. In this algorithm, each bat can use echolocation to detect prey. In each iteration, a bat actively adjusts the loudness and the rate of pulse emission according to the prey’s distance. Firstly, each bat is initialized with the position , the velocity , and the frequency . Then, for each iteration t, the bat can be updated according to the following equations:where denotes a randomly generated number; denotes the value of decision variable j for bat i at time step t; represents the current global best solution for decision variable j; and and are user-specified constants (Algorithm 1).

Bat Algorithm (f)
Input: Target function
Initialize the bat population with the velocity , the pulse frequency ,
the pulse rates and the loudness , .
For each bat , do
 Employ equations (5)–(7) to produce new solutions.
 If , then
  Choose one candidate solution from the optimal solutions.
 If and , then
  Accept the newly proposed solutions.
  Update and by equations (8) and (9).
Return the current best .

In case of feature selection, a binary version of the bat algorithm is proposed restricting the new bat’s position can be calculated as follows:

2.3. Online Sequential Extreme Learning Machine

ELM is a novel feedforward network with a single hidden layer. The mathematical expression of ELM is illustrated as follows:where is the output weight vector between the single hidden layer and the output layer, and is the hidden layer output matrix. The optimal solution of can be obtained bywhere is the Moore–Penrose inverse of and is the training-target matrix.

OSELM is a novel online learning algorithm [43]. The algorithm can be divided into two phases: the initialization phase and the online learning phase. In the initialization phase, given a training dataset , the hidden layer output matrix and the output weight vector can be calculated as follows:

Then, the online learning process starts, and the algorithm learns the data block by block. In the kth iteration, a batch of new observed training-target matrix was given. The output weight vector can be calculated as follows:

2.4. Bayesian Optimization

Given a global optimization problem of an objective function ,where is an expensive black-box function, and is the design space of . Besides, can be evaluated arbitrarily in . Then, a sequential exploration process is proposed, which, at iteration n, location is examined at which to evaluate and observe . After evaluations, the exploration process terminates, and a final optimal location is obtained, which is the best optimization result. In the problem of wind speed forecasting, the black-box function is a wind speed prediction model with hyperparameters with a prediction error on a validation dataset. Such is nonconvex and expensive to evaluate. Bayesian optimization [44] takes advantage of all the optimization function observations to make the sequential exploration process efficient. Bayesian optimization can be described as a sequential model-based optimization method that solves the objective problem. Initially, a probabilistic surrogate model is specified to represent the prior belief on the objective function, and then the posterior belief is calculated as is evaluated sequentially. The posterior belief represents the belief of on the observations of the objective function. The typical probabilistic surrogate models include Gaussian process regression, sparse pseudo-input Gaussian process, sparse spectrum Gaussian process, random forest, and gradient boosting decision tree. An acquisition function is used to explore the design space , incorporating the posterior belief model. It performs exploration and exploitation for the next evaluation of . As a utility function, it measures how optimal a sequence of evaluations is. The acquisition function returns the utility estimate of candidate points for the next evaluation of and selects , which produces the maximum utility. The main acquisition functions are the PI (probability of improvement), EI (expected improvement), and UBC (upper confidence bounds). Currently, Bayesian optimization has been demonstrated as a powerful tool for optimal design problems, such as industrial control [45], robotics [46], and chemical experiments [47]. In this paper, a novel Bayesian optimization algorithm, named DART-EI Bayesian optimization is proposed for the wind speed forecasting models. The process of the algorithm is described in Algorithm 2. In the process, the probabilistic surrogate model is dropouts meet multiple additive regression trees (DART) [48], and the acquisition function is the EI. In each iteration of the Bayesian optimization process, the next query point is calculated as follows:where denotes the best current value, represents the DART’s prediction mean, and denotes the EI.

Bayesian Optimization (, , , )
Input: Target function ; hyperparameter space ; the number of initiation points M; the number of iteration ;
Result: Optimal hyperparameter
Sample from the hyperparameter space
for to do
 for to do
  Fit a surrogate model on the data set , where denotes a DART model
  Select , where represents the acquisition function EI
  Update
Return

Since the performance of the OSELM model can be impacted by the number of hidden neurons, in this paper, BO is utilized to achieve the optimal performance of OSELM. The objective function of BO is defined as the prediction result of 4-fold cross-validation for OSELM. The input variable of the objective function is the number of hidden neurons, which is a hyperparameter of OSELM. The output variable of the objective function is the mean absolute percent error of 4-fold cross-validation. The objective function is defined as follows:where denotes the 4-fold cross-validation loss on the training data set. Besides, the acquisition function is critical, for that it can determine the exploration and exploitation of BO. In this paper, EI is employed as the acquisition function.

2.5. Bagging

Bagging is an efficient ensemble learning algorithm [49]. It can significantly improve the performance of the primary learner. In this paper, Bagging-OSELM is introduced to complete the prediction of high-frequency sublayers of VMD. Initially, the bootstrap sampling method is used to draw two hundred sample data sets from the given training data set . Then, an OSELM is constructed per each data set , and the final ensemble model is built on averaging the prediction values from . The detailed Bagging-OSELM algorithm is described as follows (Algorithm 3):

Bagging-OSELM Algorithm
For i = 1 to 200
 Draw train data set from through the bootstrap method.
 Build a basal OSELM for
Return
2.6. The Performance Evaluation Metrics

In this paper, the performance of the involved models can be evaluated by the mean absolute error (MAE), the mean absolute percent error (MAPE), and the root mean square error (RMSE). The smaller the evaluation metrics, the better the model performed. The MAE, MAPE, and RMSE are defined aswhere and denote the predicted and observed value at the time , respectively, and represents the number of data points.

Besides, improved percentage indices , , and are used to compare the performance of two models. The , , and are defined as

2.7. Pearson’s Test

Pearson’s test can evaluate the prediction capability of the involved models. In Pearson’s test, the correlation coefficient is calculated to describe the degree of association between the observed data and the predicted data. If the correlation coefficient is 0, then the observed and the predicted values are not correlated. If the coefficient is 1, the observed and the predicted values are 100% correlated. The larger the Pearson correlation coefficient is, the better the model is. Pearson’s correlation coefficient can be described as follows:where is the actual data, is the forecasting data, and are the means of the actual data and the forecasting data, respectively, and denotes the number of data points.

3. Case Study

3.1. Wind Speed Data Description

In this paper, two wind speed time-series are used to evaluate the proposed model. These data were collected from the 135-m research towers of the NREL (National Renewable Energy Laboratory) from January 2012 to August 2012. The descriptive statistics of the data are given in Table 1. Each data set contains 1800 points with 10 min interval. Each original data set is divided into a training data set and a test data set. The training data set includes 1–1700 points, and the test data set contains 1701–1800 points. The wind time-series is depicted in Figures 2 and 3, respectively.

3.2. Parameter Settings

In this paper, two kinds of wind speed prediction models are implemented: the single models and the hybrid models. The single models are the GPR model, the LSSVR model, the LSTM model, the OSELM model, the AdaBoost-OSELM model, the Bagging-OSELM model, the BBA-OSELM model, the BO-OSELM model, and the PSO-OSELM. The hybrid models include the EEMD-BBA-OSELM model, the EMD-BBA-OSELM model, the VMD-BBA-EnsOSELM model, and the VMD-BBA-OSELM model. All the models are developed in the anaconda environment.

In the GPR model, the kernel function is rational quadratic. In LSSVR model, the kernel function is RBF, and gamma is 0.01. In the LSTM model, the number of neurons is 40. In the OSELM models, the number of hidden neurons is 10. In the BBA-based models, the maximum time lag is 20 for selecting relevant input features. In the EMD-BBA-OSELM model, the number of EMD trials is adopted as 100. In the EEMD-BBA-OSELM model, the number of EEMD trials is adopted as 100, and the standard deviation of Gaussian noise is 0.05 for EEMD. In the VMD-based models (VMD-BBA-OSELM and VMD-BBA-EnsOSELM), the number of modes for VMD decomposition is 10. In the PSO-OSELM model and the VMD-BBA-EnsOSELM model, the number of hidden neurons of OSELM is selected using PSO and BO. The search range is as [10, 200].

3.3. Experimental Results

In this section, the forecasting results for wind speed series 1 and 2 are depicted in Figures 4 and 5.The estimation prediction results for wind speed series 1 and 2 are presented in Tables 2 and 3. The improving percentages of the comparison models by the proposed model for wind speed series 1 and 2 are shown in Tables 4 and 5. The results of Pearson’s test for wind speed series 1 and 2 are given in Tables 6 and 7.

3.4. The Comparisons and Analysis

From the above section, it can be seen that the prediction results for all the wind speed series have similar laws. The comparison and discussion of the prediction results are as follows:(1)Among the single models, the OSELM model is the most efficient, while the LSTM model is the least efficient. For instance, in series 1, the calculation time of the OSELM model and the LSTM model is 0.02 s and 165.10 s, respectively. In series 2, the calculation time of the OSELM model and the LSTM model is 0.01 s and 140.12 s, respectively.(2)The ensemble algorithms can improve the prediction accuracy. In series 1, from the OSELM model to the AdaBoost-OSELM model, the MAPE is reduced by 55.73%; from the OSELM model to the Bagging-OSELM model, the MAPE is reduced by 57.80%. In series 2, from the OSELM model to the Bagging-OSELM model, the MAPE is reduced by 64.36%. Besides, Bagging is superior to AdaBoost. For instance, in series 1, from the AdaBoost-OSELM to the Bagging-OSELM model, the MAPE is reduced by 4.67%. In series 2, from the AdaBoost-OSELM to the Bagging-OSELM model, the MAE is reduced by 23.44%, and the RMSE is reduced by 20.51%(3)The optimization algorithms can improve the accuracy of the prediction. For instance, in series 1, from the OSELM model to the PSO-OSELM model, the MAPE is reduced by 53.01%, and the RMSE is reduced by 13.01%; from the OSELM to the BO-OSELM model, the MAE is reduced by 40.06%, and the MAPE is reduced by 67.00%. In series 2, from the OSELM model to the PSO-OSELM model, the MAPE is reduced by 66.34%.(4)The feature selection algorithm of BBA can improve prediction performance. For instance, in series 1, from the OSELM model to the BBA-OSELM model, the MAPE is reduced by 56.67%. In series 2, from the OSELM model to the BBA-OSELM model, the MAPE is reduced by 62.93%.(5)BO is superior to BBA and PSO. For instance, in series 1, from the BBA-OSELM model to the BO-OSELM model, the MAE is reduced by 30.11%, the MAPE is reduced by 23.83%, and the RMSE is reduced by 29.95%; from the PSO-OSELM model to the BO-OSELM model, the MAE is reduced by 32.90% and the RMSE is reduced by 30.70%. In series 2, from the BBA-OSELM model to the BO-OSELM model, the MAPE is reduced by 40.95%; from the PSO-OSELM model to the BO-OSELM model, the MAE is reduced by 39.58%.(6)The signal decomposition algorithms can improve the prediction accuracy. For instance, in series 1, from the BBA-OSELM model to the EMD-BBA-OSELM model, the MAE is reduced by 10.88%, and the RMSE is reduced by 11.96%. Meanwhile, both EEMD and VMD are superior to EMD. For instance, in series 1, from the EMD-BBA-OSELM model to the EEMD-BBA-OSELM model, the MAE is reduced by 51.86%; from the EMD-BBA-OSELM model to the VMD-BBA-OSELM model, the MAE is reduced by 45.00%, and the RMSE is reduced by 47.21%. In series 2, from the BBA-OSELM model to the EMD-BBA-OSELM model, the RMSE is reduced by 29.03%.(7)The proposed model performs best among all the involved models. For instance, in series 1, from the OSELM model to the VMD-BBA-EnsOSELM model, the MAE is reduced by 68.56%, the MAPE is reduced by 82.66%, and the RMSE is reduced by 69.27%; from the BBA-OSELM model to the VMD-BBA-EnsOSELM model, the MAE is reduced by 63.34%, the MAPE is reduced by 59.99%, and the RMSE is reduced by 64.29%; from the Bagging-OSELM model to the VMD-BBA-EnsOSELM model, the MAE is reduced by 62.81% and the RMSE is reduced by 63.67%; from the BO-OSELM model to the VMD-BBA-EnsOSELM model, the MAE is reduced by 47.55%, the MAPE is reduced by 47.47%, and the RMSE is reduced by 49.02%; from the VMD-BBA-OSELM model to the VMD-BBA-EnsOSELM model, the MAE is reduced by 33.36%, and the RMSE is reduced by 32.35%. In series 2, from the OSELM model to the VMD-BBA-EnsOSELM model, the MAE is reduced by 74.07%, and the MAPE is reduced by 89.62%.(8)Pearson’s coefficient of the proposed model is higher than the comparison models in series 1 and series 2, respectively.

3.5. The Sensitivity Analysis

The proposed method involves the number of decomposition modes of VMD, which has to be preconfigured. In this section, several cases are conducted to discuss the sensitivity of the number of modes. The proposed model has performed 1-step predictions for the wind time-series 1 with the various numbers of modes. The forecasting results are shown in Table 8. From Table 8, it is concluded that the prediction errors of the proposed model can be reduced when the number of decomposition modes increases. For instance, when the number of modes grows from 4 to 5, the RMSE index is reduced by 6.62%; when the number of modes grows from 5 to 6, the MAPE index is reduced by 19.90%; when the number of modes grows from 6 to 7, the MAPE index is decreased by 9.19%; when the number of modes grows from 7 to 8, the MAE index is decreased by 19.23%.

4. Conclusion

Short-term wind speed forecasting is significant to wind energy development, and it is widely applied to turbine regulation, electricity market clearing, and preload sharing. This paper has presented a novel hybrid forecasting method based on VMD, BBA, BO, Bagging, and OSELM. In the proposed VMD-BBA-EnsOSELM model, VMD is used to decompose the original wind time-series into stationary subseries. BBA is used to complete the feature selection. BO-OSELM and Bagging-OSELM are utilized to complete wind speed prediction. Two experiments are conducted on the NREL datasets to verify the superiority of the proposed method. Twelve involved models are compared with the proposed method, including the GPR model, the LSSVR model, the LSTM model, the OSELM model, the AdaBoost-OSELM model, the Bagging-OSELM model, the BBA-OSELM model, the BO-OSELM model, the PSO-OSELM model, the EMD-BBA-OSELM model, the EEMD-BBA-OSELM model, and the VMD-BBA-OSELM. The experimental results and Pearson’s test indicate that (1) BBA is suitable for feature selection; (2) Bagging can be better than AdaBoost for enhancing the prediction capability of OSELM; (3) BO can be superior to PSO for effectively improving the accuracy of a hybrid wind prediction model; (4) the proposed method can achieve the best prediction performance among the involved models. In conclusion, the proposed model fully utilizes the virtues of VMD, BBA, BO, Bagging, and OSELM, and it is suitable for the forecasting of short-term wind speed. Future research directions will focus on enhancing the proposed model for multistep wind speed prediction.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was fully supported by the National Natural Science Foundation of China (grant no. 51308553).