Multiperiod-Ahead Wind Speed Forecasting Using Deep Neural Architecture and Ensemble Learning
Accurate forecasting of wind speed plays a fundamental role in enabling reliable operation and planning for large-scale integration of wind turbines. It is difficult to obtain the accurate wind speed forecasting (WSF) due to the intermittent and random nature of wind energy. In this paper, a multiperiod-ahead WSF model based on the analysis of variance, stacked denoising autoencoder (SDAE), and ensemble learning is proposed. The analysis of variance classifies the training samples into different categories. The stacked denoising autoencoder as a deep learning architecture is later built for unsupervised feature learning in each category. The ensemble of extreme learning machine (ELM) is applied to fine-tune the SDAE for multiperiod-ahead wind speed forecasting. Experimental results are made to demonstrate that the proposed model has the best performance compared with the classic WSF methods including the single SDAE-ELM, ELMAN, and adaptive neuron-fuzzy inference system (ANFIS).
Wind has been one of the rapidly growing energies due to its clean and free nature. However, the intrinsic volatility and deviation of wind energy impose major challenges on the operation and planning of power system. Wind speed plays a pivotal role in the wind power output . Since wind power has a functional relationship with wind speed [2–5], accurate WSF can enhance the performance of wind power prediction. Prediction of multiperiod-ahead wind speed and resulting multiperiod-ahead wind power production enable energy scheduling strategies to maintain operational stability and improve cost performance of the microgrid with wind turbines . Numerous approaches have been procured to capture the characteristics of the wind speed series. Generally speaking, WSF models can be classified into three types: physical methods, statistical methods, and artificial intelligence (AI) methods.
Physical models combine the meteorological attributes to highlight the forecasting performance. Numerical weather prediction (NWP) is a well-known physical approach . A novel wind speed model was applied by utilizing the Kalman filter method to minimize the forecasting errors of a NWP model . However, NWP model runs only few times one day, limited to its uncertainties on initial atmosphere conditions, and thus is not suitable for fitting the short-term wind speed series , which evidently restricts the practical application.
Statistical methods mainly design a mathematical architecture to represent the inner characteristics of wind speed series [10, 11]. The statistical models such as autoregressive integrated moving average(ARIMA) models are constructed to deal with the linear and nonlinear features of wind speed series . Hourly wind speed series forecasting is conducted by combining the ARIMA and generalized autoregressive conditional heteroskedasticity (GARCH) , and an assessment on the forecasting ability was compared with the physical models . To evaluate the wind energy at certain site, a large number of probability density functions are performed to analyze the statistical characteristics of wind speed series. The frequency analysis of wind speed was processed by two PDFs including the Pearson type V and the Burr . The bimodal Weibull & Weibull PDF was applied to depict the distribution of wind speed series, which proved better fitting performance than the single Weibull function .
However, the statistical models would present the weak generalization on learning the nonlinear features, especially for the wind speed series.
With the capacity of mapping the nonlinear features, the intelligent methods are receiving great attentions in the WSF area. The evident advantages of AI methods are to predict the future wind speed series without any predefined mathematical tools. As the representative of AI methods, artificial neural network, due to the ability of handling noisy data, has offered a variety of approaches for multiperiod-ahead WSF, such as the modified EMD-based ANN , the radial basis function neural network (RBFNN), , and the feed-forward neural network . The RBFNN as an adaptive linear element neural network and BP neural network were employed for wind speed prediction . A wind speed forecasting strategy was proposed based on a two-hidden-layer BPNN .
Compared to the traditional neural network, deep learning algorithms can capture the deep features of the wind speed series. The deep learning algorithms have been successfully applied in the computer vision and speech signal processing . Therefore, deep learning algorithms have been performed to tackle the complex characteristics of the wind speed series and get magnificent achievements in this domain [22–24]. A deep learning model for multistep wind speed forecasting was established by combining the LSTM network with ELM network. The experimental results demonstrate that the deep learning algorithms have the best performance among the benchmark models. In the WSF field, most approaches produce the predicted wind speed values directly from raw wind speed, but have not effectively analyzed the nonstationary volatility of the wind speed series before building the architectures.
In this paper, a novel multiperiod-ahead WSF method is developed. To mitigate the impacts of volatile wind speed series, the analysis of variance is applied to classify the time series into several categories, and the samples grouped together share near fluctuant level. Furthermore, SDAE is adopted to perform the unsupervised learning to extract hidden low-level nonlinear features and remove the noise data. Meanwhile, the fine-tuning process of SDAE is optimized by the ELM-based ensemble learner. Finally, the 15-min, 1-h, 4-h, 8-h,and 24-h wind speed series have been employed to evaluate the proposed model. The contributions of this paper are as follows.(1)The training samples are classified into several categories according to the variance of the wind speed series. The inherent volatility of wind speed series are fully taken into consideration in the subsequent research.(2)The SDAE is performed in each category which is generalized from (1) to remove the noise data and reduce the dimension of the time series.(3)Nowadays, the top networks for the fine-tuning phase in the training procedure of deep learning architecture have mainly single learner whose performance is easily affected by noise data and new condition data. To get the optimal inferential accuracy, we propose an ensemble predictor of ELMs based on bootstrap sampling approach to quantify the irregular wind speed series.
The rest of paper is organized as follows. In the next section, a brief formulation for multiperiod-ahead WSF is introduced. The analysis of variance, SDAE architecture, and corresponding ensemble learners are presented in Section 3. The numerical results are shown in Section 4. A conclusion is drawn in Section 5.
2. Related Background
2.1. Multiperiod-Ahead Wind Speed Forecasting
The wind speed series can be denoted as , where is the average wind speed in the past 10-min . For multiperiod-ahead WSF, the future wind speed value is obtained by utilizing the previous N data, where i is the index of wind speed series and τ represents the predicted horizon. We assume that the wind speed is a realization of explicit function of the formwhere is the parameters space and is the function modeling the wind speed. The parameters space is estimated by minimizing the loss function which is denoted as where is the real wind speed value and is the predicted value from function .
2.2. Autoencoder Neural Network
As a three-layer network, the autoencoder maps the input data into the output value to perform the reconstructed procedure. It aims at minimizing the reconstruction error by performing unsupervised learning. Figure 1 depicts the structure of autoencoder.
The autoencoder utilizes the deterministic function to obtain the hidden representation from the input data according to where is the activation function, is the weight matrix, and is the bias vector. Then the output layer maps h into the reconstructed vector z by where . The optimal parameters space is searched by minimizing the loss function as given by the following. However, the training process of the autoencoder only focuses on copying the input to output, which will arouse the overfitting problem and cannot guarantee the efficiency of the extracted features.
3. Wind Speed Forecasting Strategy
3.1. Analysis of Variance for Wind Speed Series
The wind speed series is highly varying due to the characteristics of randomness and fluctuation. In the proposed methodology, the variance of each wind speed series is first carried out to categorize the training samples according towhere n is the number of .
Then the variance series is sorted in ascending order as illustrated in Figure 2.
As shown in Figure 2, efficacious intervals are defined between the minimum and maximum variance, which can be denoted as . We can derive the following rule from the value of .
If the variance of one training sample meets the condition , the training sample is added to the corresponding category.
Samples with similar fluctuant level being gathered into one group can alleviate the influence of volatility in wind speed series efficiently. Then each predictive SDAE whose parameters space is built in corresponding category can have better approximated capacity.
3.2. Stacked Denoising Autoencoders
An autoencoder (AE) is an artificial neural network with one single input layer, hidden layer, and output layer. The contribution of AE is to learn the latent representation for the input vectors by minimizing the reconstructed error in an unsupervised way.
The training procedure of a denoising autoencoder (DAE) is similar to AEs except for the corrupted input, as shown in Figure 3.
The corrupted version of inputs which adds noise to the raw inputs by stochastic mapping is encoded by where is the corrupted input by stochastic mapping, is the latent representation of the hidden layer, is the logistics activate function, is the weight matrix connecting input layer and hidden layer, and is the bias vector of units belonging to the hidden layer.
The decoding part is done by mapping the into the original feature space bywhere is reconstructed inputs, is the weight matrix connecting hidden layer and output layer, and is the bias vector of units belonging to the output layer.
The errors between reconstructed inputs and raw inputs must exist. The optimal DAE model can be obtained by where is the raw inputs, is the reconstructed inputs, and is the parameters space containing weight matrix and bias vectors.
SDAE is composed of stacked DAEs that aim at learning more robust features, as presented in Figure 4. After the former layer is trained, the raw inputs are utilized to generate the latent representation as the input to the next DAE. The above-mentioned is the process of pretraining. It should be pointed out that this process is unsupervised and label information is not required.
When the pretraining procedure has been done, the top layer would be used to fine-tune the parameters space with the loss function. The fine-tuning stage which applies the desired forecasting target values for the supervised learning of the parameters in the neural network . In order to promote the ability of fitting big data, the ensemble of ELMs have been applied.
3.3. The Ensemble of ELMs
Most conventional top networks of deep learning architectures only have one single learner, which may result in the degradation of the performance in the situation of handling big data with noise and new condition data. To tackle this problem, the ensemble of ELMs based on bootstrap sampling technology are applied to perform the fine-tuning procedure.
Given n output samples from SDAE and the corresponding target data , the outputs from the ELM with an activation function can be described aswhere is the weight matrix connecting nodes between input layer and hidden layer. In ELM, the weight matrices and are randomly generated without considering the input data. N is the number of units in input layer. is the weight matrix connecting nodes between hidden layer and output layer, and M is the number of units in output layer. is the bias of the jth hidden node. The outputs from ELM can be rewritten as where is the output from hidden nodes. To get the better approximation, ELM aims to solve the following formulation.As described in , the weight matrix connecting nodes between hidden layer and output layer can be easily determined as where is the identity matrix.
Given the superiority within the ELM, we apply it in the ensemble learner which synthesizes all the forecasting results of base learners to generate the final result. In the ensemble learner, bootstrap sample technique is procured to improve diversity among the base learners. Overall, the ensemble learner is applied to perform the fine-tuning process of the SDAE and enhance the forecasting performance. The training process of the ensemble learner is briefly described in Algorithm 1.
|Input: the data output from SDAE|
|(1) Initialization: set the size of ensemble learners , the number of units in visible nodes,|
|hidden nodes and output nodes in each ELM, the length of training samples for each ELM, threshold .|
|(2) The training procedure:|
|while(the size of ensemble|
|(1) select the training samples in bootstrap way|
|(2) train the base predictor, namely, ELM.|
|(3) Filter the test data and compute the error|
|Add it to the ensemble learner.|
|Output: the ensemble learner|
3.4. The WSF Strategy
The proposed model is composed of the analysis of variance, SDAE, and the ELM-based ensemble learner and creates the efficient hybrid system profiting by the inherent advantages from the mentioned methods.
The working principle of the proposed model is divided into two parts: the training procedure and prediction procedure, as illustrated in Figure 5.
The following steps are a summary of training procedure:(1)Compute the variance of each training sample and sort it in the ascending order.(2)Classify the training samples into several categories according to the variance and record the intervals.(3)Build the SDAE architecture for feature learning in each category.(4)Train the ELM-based ensemble learner and take the output from SDAE as input for the ensemble predictor.
The main steps in the predicted procedure are shown as follows:(1)Obtain the variance for the single testing data and match it into the corresponding interval.(2)Perform the unsupervised learning by SDAE for the input data.(3)Make a prediction utilizing the ensemble learner with the input data composed of outputs from SDAE(4)Compute the average of the fitting outputs of the ensemble learners as the forecasting result.
4. Numerical Results
4.1. Performance Criteria
After the analysis of variance, training of the deep learning architecture, and the ELM-based ensemble learner being finished, performance criteria in terms of the root mean square error (RMSE), the mean absolute error (MAE), the bias (BIAS), the standard deviation of the error (SDE), and the average percentage error (APE) are estimated on the testing data.where is the predicted value from the proposed model, is the actual value of wind speed, and is the size of testing data.
4.2. Data Collection
The 15-min, 1-h, 4-h, 8-h, and 24-h wind speed series in six months from three different wind farms in the northern part of China have been collected to analyze the performance of the forecasting model. The original wind speed series consists of average observations in the past five minutes. For 15-min horizon, the experimental series are obtained by every 1 data item. For 1-h, 4-h, 8-h, and 24-h, the training and testing sets are collected by every 3, 15, 31, and 95 data items, respectively. The sizes of multiperiod training samples and testing samples are listed in Table 1.
Figure 6 shows the trends of different training samples. However, noise components inevitably exist in the raw wind speed series. In order to reduce the negative effects caused by the noise data and depict the characteristics of randomness and intermittency of wind speed, the analysis of variance has been effectively employed. Consequently, SDAE can be built to further perform features learning from the training samples.
4.3. Experimental Setting
In the three experiments, to capture the characteristics of wind speed series, we use the historical values to forecast the wind speed . To further improve the fitting ability of the proposed method, the k-fold cross validation partitions the training set into k different subsets. Then (k-1) of these subsets are selected to train the model and the other one is used for testing the performance. The root mean square error is taken as a criterion to select the optimal model.
In the proposed model, the analysis of variance classifies the training samples into r categories and the size of ensemble model would be set as n. The number of neurons in each SDAE is 10-8-6-4. As a result, for the SDAE with 10 inputs, there are 80, 48 and 24 connecting weights in each layer, respectively. The ELM as the base learner in the ensemble model has 8 hidden neurons. Totally, there are (152×r)+(33×n)×r nonlinear parameters trained by the proposed learning algorithm.
To further demonstrate the improvements derived from the analysis of variance and ensemble learning, the single SDAE-ELM model, which has the same architecture with the base learner in the proposed model, is selected as one of the benchmark models. Thus, there are 185 weights to be trained when single SDAE-ELM model is used to predict the wind speed series.
The ELMAN neural network as a recurrent neural network can memorize the dependencies under the wind speed series by feeding the outputs from the hidden layer to the hidden layer. Therefore, the ELMAN neural network which is composed of four layers, namely, input layer, hidden layer, recurrent layer, and output layer, is suitable for modeling and predicting the fluctuation of wind speed series. In this paper, 14 neural nodes in the hidden layer with sig function as the activate function are designed. The predicted procedure of the ELMAN neural network can be expressed as where denotes the output of the hidden layer; is the input time series at time t; are the connecting weights between the input layer and hidden layer, between the recurrent layer and hidden layer, and between the hidden layer and output layer, respectively.
The ANFIS method combining the advantages of fuzzy inference system and neural network is a powerful tool modeling the time series. In this paper, we assign 5 fuzzy sets in each input. Hence, there are linear parameters to be trained during the learning process, and n is the length of input which is set to 10 in this paper.
4.4. The Analysis of Variance for the Training Samples
The analysis of variance, as a strong nonparametric tool, is conducted to extract the components with similar volatility level from the raw wind speed series. Figure 7 illustrates the variance of the wind speed series in ascending order for 15-min, 1-h, 4-h, 8-h, and 24-h horizons.
To further improve forecasting accuracy, we map the similar components into one group. A SDAE architecture and ELM-based ensemble learner are performed in each group. Table 2 shows some of the intervals of the variance and the size of the corresponding group.
4.5. Multiperiod-Ahead Wind Speed Forecasting
To evaluate the feasibility and high accuracy of the proposed model, several simulations are conducted on the 15-min, 1-h, 4-h, 8-h, and 24-h horizons. Figures 8–12 exhibit the predicted wind speed series and the corresponding actual values from the proposed model and classical WSF models including the single SDAE-ELM, ELMAN, and ANFIS models.
See Figures 8–12 for the accuracy and stability of the proposed model. The predicted series generated from different models for the multiperiod-ahead wind speed forecasting are shown in Figures 8–12, which demonstrate that the predicted performance produced by the proposed model in each time scale is the highest. In other words, the proposed model has the lowest predicted error compared with the three classical WSF models. In addition, the classical models perform worse in the long-time forecasts, since the fluctuation of the long-time series is very high.
Here, the aforementioned five statistical metrics are utilized to demonstrate better approximated capacity of the proposed model. Table 3 records the performance criteria for the proposed model, the single SDAE-ELM, ELMAN, and ANFIS for different horizons. As shown in Table 3, the proposed model achieves higher inference than the other classical models. Compared with the single SDAE-ELM which has the same architecture as the base learner in the proposed model, in the forecasting horizons of 1-h, the proposed model reduces the RMSE from 1.59 to 1.38, the MAE from 1.26 to 0.98, the BIAS from -1.12 to -0.67, and the APE from 0.87 to 0.68. This phenomenon is also suitable for the comparison with ELMAN and ANFIS. Additionally, the 4-h forecasting results have been recorded in Table 3. The proposed model produces 0.13, 0.35, 0.72, 0.13, and 0.39 improvements of RMSE, MAE, BIAS, SDE, and APE in terms of the comparison with the single SDAE-ELM, respectively. Besides, the promotion of the proposed model for performance criteria compared with classical models on 15-min, 1-h, 4-h, 8-h, and 24-h WSF is shown in Tables 4–8. Consequently, the performances of the forecasting models degrade as the horizon is prolongs. However, in the 24-h studied horizon, the proposed model in this paper generates the closest value to the real wind speed among the other benchmark models, which demonstrates that the proposed model is more efficient and precise for multiperiod-ahead wind speed forecasting. Finally, in order to further determine the optimal forecasting model, the residuals of the above-mentioned models in the different horizons are given in Figure 11.
As exhibited in Figure 13, most residuals from the proposed model are less than the benchmark models except for very few data. Overall, the residuals of the proposed model can be nearly cancelled out in the short-term WSF. In terms of the long-term horizon, the proposed model achieves the evidently fitting capacity. Accordingly, the proposed approach can perform much better given the forecasting accuracy.
This paper proposes a hybrid architecture employing the analysis of variance, stacked denoising autoencoders, and ELM-based ensemble predictors to conduct multiperiod-ahead wind speed forecasting. This model has been evaluated on three subsets of data collected from a wind farm. Experimental studies reveal that the proposed model has the best approximation compared to the other alternatives including the single SDAE-ELM, ELMAN, and ANFIS models in terms of RMSE, MAE, BIAS, SDE, and APE for different horizons. The superiority of the proposed model is attributed to the analysis of variance, the deep learning architecture, and the ensemble technology. The first categorizes the training and testing samples by similar volatility level to reduce the probability of noise data in the subsequent procedure. The latter effectively performs the unsupervised learning to extract the stochastic features in each wind speed category. Eventually, benefiting from the strong robustness, high accuracy, and stability, the ELM-based ensemble predictor cancels the error of the base learner out. In addition, the application of the proposed model in other fields deserves further investigation.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflicts of interest.
This work is supported by National Natural Science Foundation of China: Robust Distributed Model Predictive Control for Load Frequency of Interconnected Power Systems under Wind Power Intervention under Grants 61803154, and Hebei Province Science and Technology Plan Project: Research on Energy Storage Control of Wind-solar Complementary Generation Grid-connected System under Grant 15214511.
G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics Part B-Cybern, vol. 42, no. 2, pp. 513–529, 2012.View at: Google Scholar