Abstract

With the development of wind power technology, the security of the power system, power quality, and stable operation will meet new challenges. So, in this paper, we propose a recently developed machine learning technique, relevance vector machine (RVM), for day-ahead wind speed forecasting. We combine Gaussian kernel function and polynomial kernel function to get mixed kernel for RVM. Then, RVM is compared with back propagation neural network (BP) and support vector machine (SVM) for wind speed forecasting in four seasons in precision and velocity; the forecast results demonstrate that the proposed method is reasonable and effective.

1. Introduction

Over the past decade, people in many countries worldwide have paid significant attention to wind power generation because of it being pollution-free, clean, and renewable. At the end of 2012, worldwide installed capacity of wind power reached 282.2 GW, increased almost 20% compared to the previous year 2011 which of 240 GW. By the end of 2020, total global installed capacity will reach 1150 GW, and wind power will be over 2800 TWh, accounting for about 12% of global electricity demand; by the end of 2030, installed capacity will exceed 2500 GW, and wind power generating capacity will reach 6600 TWh, accounting for about 23% of global electricity demand [1]. The introduction of such a large-scale wind power has attracted many domestic and foreign scholars for further wind power technology. The wind forecast as a basic link of the wind power research is one of the effective ways to solve the problem and has an important role in the safe and economic operation of the power grid, so a growing number of researchers pay attention to it recently.

We can cluster the wind forecasting techniques into two main groups; the first group are physical methods, taking physical considerations into account, such as temperature and local terrain. In [2, 3], numerical weather prediction (NWP) model could be used directly for wind speed and wind energy predictions.

Another group are statistical methods. Conventional ones are identical to the direct random time-series model, such as autoregressive model (AR), moving average model (MA), autoregressive moving average model (ARMA), and autoregressive integrated moving average model (ARIMA). Kamal and Jafri [4] established an ARMA model and found for long-term or short-term predictions, the values of variances and wind speed with a confidence interval of 95% were acceptable. A fractional-ARIMA (f-ARIMA) model was used by Kavasseri and Seetharaman [5] for day-ahead and two-day-ahead wind speed forecasting. Results showed that forecast accuracy was significantly improved with f-ARIMA model compared to the previous method.

Apart from the mentioned forecasting techniques, machine learning algorithms such as artificial neural network (ANN), Bayesian network (BN), and support vector machine (SVM) are usually adopted for time series-based wind prediction. Bilgili et al. [6] investigated the use of a model based on the ANN method and spatial correlation for monthly wind speed prediction without any topographic details or other meteorological data. The prediction results showed that the maximum MAE was 14.13%, while the minimum was 4.49% in the developed model. Welch [7] compared three types of neural networks (multilayer perceptron neural network, simultaneous recurrent neural network, and Elman recurrent neural network) for short-term prediction of wind speed, and all training data used particle swarm optimization (PSO). Mohandes et al. [8] established a SVM model to predict wind speed, compared with a multilayer perceptron ANN model. The results showed that SVM model gave lower root mean square error than the MLP ANN model. Larson and Westrick [9] used a support vector classifier to estimate the forecasting error, obtaining lower mean square error and mean absolute percentage error than traditional SVM.

However, the SVM had a number of significant and practical limitations. For example, we could not get probabilistic predictions and the kernel function must satisfy Mercer’s condition. In order to overcome these, Tipping [10] and Tipping and Faul [11] proposed an advanced function estimation technique, relevance vector machine (RVM). RVM utilizes a more flexible and sparser function without extra regularization parameters; it is an inherent machine learning technique [12].

This paper establishes RVM model to forecast day-ahead wind speed in JiangSu, compared with BP and SVM models. It can show that the proposed method is more effective and robust and gets rid of the overfitting problem of traditional nonparametric regression models. The rest of the paper is organized as follows. A brief review of the theory of RVM learning for classification is provided in Section 2. Then, in Section 3, several forecasting examples are given to contrast RVM model with other different models. Finally Section 4 will give the conclusions.

2. Classical Relevance Vector Machine

Relevance vector machine (RVM), based on the overall Bayesian framework, is a sparse probability model and now is one of the hot research fields [13]. The principle is as follows.

Given a training dataset (), where are the -dimensional input vectors and are output vectors. Here we assume that the targets are independent and contaminated with zero mean random error (noise) vector , where () and is the normal distribution with mean and variance . The output can be defined as where the weights and , as a linearly-weighted sum of , is generally fixed and nonlinear.

Consider where where is a kernel function.

Therefore, the probability formula for relevance vector machine model is where is the normal distribution with mean and variance .

Because we assume the targets are independent, thus the likelihood of the complete dataset can be defined as

Because there are so many parameters in the model as training examples, maximum likelihood estimation of and would easily lead to overfitting. To avoid this, the sparse Bayesian principles are utilized to give zero-mean Gaussian prior distribution: where are the dimensional vectors and each weight separately corresponds to a hyperparameter which controls how far from zero each weight is allowed to deviate [13]. Having defined the prior, using Bayes’ rule, noninformative prior distribution can be given by

However, we cannot easily gain the full analytical solution to (7); thus, we decompose the posterior according to and get the solution to this integral (7). Then, the posterior distribution over the weights can be written by

After defining the prior distribution and the likelihood distribution, according to Bayes’ theorem, we can obtain the posterior distribution of all unknown parameters:

Then the mean and the covariance of are expressed as where .

In order to calculate and , using the maximum likelihood calculation, we need to only maximize the term : where .

In related Bayesian models, this quality is referred to as “the marginal likelihood’’ [14, 15] or the “evidence for hyperparameter’’ [16]; its maximization is known as the “type II maximum likelihood method’’ or “evidence procedure.’’

Because we cannot obtain values of and which maximize (12) in closed form, so here an iterative reestimation method is needed. We partially differentiated (12) and made equal to 0; then, the following approach gives [17] where is the posterior mean weight at time of from (11), and the quantity can be defined as where the is the diagonal element at time of of the posterior weight covariance (10).

When is large, is highly constrained by the prior, and then and ; conversely, when is small, we can know that . So can be interpreted as the intermediate variables measuring the best parameter .

Meanwhile, for the noise variance , differentiation of (12), equating to 0, then the following approach gives [16] where is the number of data examples.

In practice, because many of are found to tend to infinity during reestimation, the posterior distribution from formula (9) of the corresponding weights will highly peak at zero. In this optimization process, those become zero, and the vectors from the training set that associate with the remaining nonzero weights are called relevance vectors.

At convergence of the hyperparameter estimation procedure, we use the posterior distribution over the weights for predictions, conditioned on the maximizing values and , which are optimum values used for prediction through iteration. Then the predictive distribution can be written as

Because both terms in the integrand obey Gaussian distribution, we can easily compute the predictive distribution: where the new input values are and

Eventually, we get the regression model of RVM in function (18) and the variance function in function (19).

In summary, the forecast process can be summarized as in the following steps:(1)initialize variances and hyperparameters ;(2)compute posteriori statistics of weights and ;(3)compute and update the and ;(4)if it is not convergent, then go back step (2); otherwise, go to step (5);(5)if tend to infinity, the weights can be deleted;(6)get the predictive mean intuitively from .

3. Illustrative Examples

3.1. Data and Pretreatment

In this study, wind speed values throughout 2008 on a wind farm in Jiangsu are taken as the training samples, and all the data have an interval period of 15 min. To evaluate the performance of proposed model, we establish RVM model for 96 points wind speed prediction, compared with SVM and BP in terms of forecast accuracy, model running time, and model complexity. Based on historical data in each quarter, we establish forecasting models to predict day-ahead wind speed on March 25, June 26, September 29, and December 28.

Because the input vectors contain different kinds of physical quantities, in order to ensure the variables are comparable, but also solve problems such as the increasing of training time, first of all, we take normalization process on all input data. The normalized respective variables will be in where and are, respectively, the maximum value and the minimum value for each variable.

For evaluating the forecasting performance, mean relative error (MRE) and root mean square error (RMSE) are used; they are defined in functions where and indicate actual and forecasting values of wind speed at time of .

3.2. Select Kernel Function of RVM

Because the kernel function of RVM does not need to satisfy Mercer’s condition, the selection of the kernel function has a certain degree of freedom. The basic idea of the hybrid kernel function [17] is that several kernel functions can be combined together for different nature, and the new properties can be integrated and reflected.

In this paper, we combine Gaussian kernel function and polynomial kernel function to get mixed kernel of global and local nature: where is RBF kernel function; is binomial kernel function; is the weight of the kernel function; and is the kernel width. There, the grid search method is employed to get the optimal values of and .

3.3. Different Forecasting Models

To evaluate the performance of proposed model, RVM, BP, and SVM are established for day-ahead wind speed predictions on March 25, June 26, September 29, and December 28. Table 1 shows the comparisons of forecast accuracy for three models in each quarter, and Table 2 shows the comparisons of test time and number of vectors or neurons for three models. The specific forecasting results are presented in Figure 1.

It is found from the comparison results that the forecast accuracy of the proposed method is higher than that used by the other models. The average RMSE of RVM model is only 0.1408 m/s, lower than those of BP and SVM, which are 0.1902 m/s and 0.1546 m/s. The average MRE of RVM is 1.72%, while those of BP and SVM are 3.69% and 2.08%. In different seasons, forecast accuracy is different. In spring and winter, wind speed changes are relatively small, so higher precision can be got; in summer and autumn, wind speed at the coast has large fluctuations, so the accuracy will decrease.

Table 2 shows clearly that RVM model can reduce model complexity; the average number of vectors or neurons involved is only 34.75, and comparatively those of BP and SVM are 335 and 261. Although RVM typically utilizes dramatically fewer kernel functions, its generalization performance is comparable to SVM. From Table 2, we know that RVM has a higher sparse network and can quickly converge; its test time is 293.47 s, shorter than those of BP (938.21 s) and SVM (661.17 s) models.

4. Conclusions

In this paper, RVM is proposed for day-ahead wind speed forecasting. Firstly, we combine Gaussian kernel function and polynomial kernel function to get mixed kernel for RVM model. Then, we compare RVM with BP and SVM for wind speed forecasting in four seasons for precision and velocity. Finally, the simulation results show that the proposed method is more effective and robust and has better performance in terms of forecast accuracy, model running time, and model complexity than that used by BP neural network and SVM model. So the theoretical feasibility of RVM for the wind speed prediction has the some meaning.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grants 51277052, 51107032, and 61104045.