#### Abstract

Prior knowledge, such as wind speed probability distribution based on historical data and the wind speed fluctuation between the maximal value and the minimal value in a certain period of time, provides much more information about the wind speed, so it is necessary to incorporate it into the wind speed prediction. First, a method of estimating wind speed probability distribution based on historical data is proposed based on Bernoulli’s law of large numbers. Second, in order to describe the wind speed fluctuation between the maximal value and the minimal value in a certain period of time, the probability distribution estimated by the proposed method is incorporated into the training data and the testing data. Third, a support vector regression model for wind speed prediction is proposed based on standard support vector regression. At last, experiments predicting the wind speed in a certain wind farm show that the proposed method is feasible and effective and the model’s running time and prediction errors can meet the needs of wind speed prediction.

#### 1. Introduction

Wind power is a clean, renewable energy that will play an increasingly important role in the future electricity supply [1]. Unfortunately, due to the stochastic and nonstationary nature of wind, the wind power is variable and uncontrollable. It is difficult to maintain the balance between the supply and the demand of electricity, which is required by the electricity system [2]. Wind speed prediction is a key point in the management of wind farms because it is directly related to the power produced by each of the farm’s turbines, so it is usually the base of wind power forecasts, and it is necessary to increase the accuracy of the wind speed prediction for the effective use of wind energy.

At present, there are mainly two kinds of wind speed prediction methods. One is based on the physical model, and the other is based on historical data. The prediction methods based on physical model often use the numerical weather prediction (NWP) data for wind speed prediction [3, 4]. Wind speed prediction methods based on NWP do not focus on the speed of a farm’s turbines but on the speed of a region. Thus, it needs to solve the problem of how the wind speed of a region is mapped to the wind speed of a certain wind generator. Wind speed prediction methods based on historical data predict the wind speed by using correlations among the initial data. In 2008, Louka et al. [5] improved wind speed forecasts for wind power prediction using Kalman filtering. In 2012, Cao et al. [6] presented a comparative analysis of the wind speed prediction accuracy of univariate and multivariate ARIMA models with their recurrent neural network counterparts. In 2013, Woods et al. [7] developed a method to produce synthetic time series of wind power at several locations based on a measured time series of wind speed from a reference site, and so on.

In the 1990s, Vapnik et al. [8, 9] proposed support vector machines (SVMs), including support vector classifications (SVCs) and support vector regressions (SVRs). SVMs focus on the statistical learning problems for small size samples by solving a convex quadratic optimization problem and can solve the local minimization problem which cannot be avoided by the neural network algorithm. SVMs use a kernel function to map the data in original space to a high dimensional feature space and then solve the nonlinear decision problem in high dimensional space. Thus, SVMs can successfully solve the problem of dimension disaster and have good generalization ability. However, the standard SVMs focus on historical data and cannot incorporate prior knowledge into learning process, which may causes the generalization ability of the standard SVMs to decrease. Therefore, in 2009, Guan et al. [10] proposed a modified method that incorporated prior knowledge into cancer classification based on gene expression data to improve accuracy. In 2011, Zhang et al. [11] proposed a fully Bayesian methodology for generalized kernel mixed models, which are extensions of generalized linear mixed models in the feature space induced by a reproducing kernel. In 2012, Liu and Xue [12] focused on designing a new class of kernels to incorporate the prior information into the training process of support vector regressions. Currently, SVMs have received extensive attention and are attracting more and more scholars to study from different views [13–22].

In 2011, Zhou et al. [23] presented a systematic study on fine tuning of LS-SVM model parameters for one-step ahead wind speed prediction, and Ortiz-García et al. [24] proposed an improvement to an existing wind speed prediction system using banks of regression support vector machines for a final regression step in the prediction system.

However, for the problem of wind speed prediction in practice, there is much prior knowledge. For example, the wind speed has a certain probability distribution in a season or in a day, and the probability distribution can be estimated with historical wind speed data. As the probability distribution can provide much more information about the wind speed, it is necessary to incorporate it into the wind speed prediction. Also, in a wind farm, the output wind speed at a fixed time is the mean value of many measured values during a certain period of time . Assume that and , then the larger the is, the more the fluctuation of wind speed during the period of time is. Conversely, the smaller the is, the less the fluctuation of wind speed during the period of time is. Nevertheless, the mean value does not provide this prior knowledge at all. Therefore, in order to decrease the wind speed prediction errors, it is necessary to find a way to incorporate this prior knowledge in the wind speed prediction. However, the present methods for wind speed prediction often used the historical wind speed data directly to predict the wind speed, instead of dredging information from the data, and the prediction errors are difficult to decrease. Therefore, in order to decrease the prediction errors of the wind speed at a fixed time, the probability distribution of historical wind speed data is estimated and incorporated into the training data and testing data to provide the information about the wind speed fluctuation. Then a support vector regression model for wind speed prediction is proposed combined with the standard SVR.

This paper is structured as follows. Section 2 is the preliminaries. Section 3 is the method of estimating the probability distribution of the historical wind speed data. Section 4 is to incorporate the prior knowledge about the wind speed fluctuation into the training data and testing data and then establish the -support vector regression method for wind speed prediction incorporating probability prior knowledge (PPK--SVR). Section 5 includes two experiments with the historical wind speed data coming from a wind farm in Gansu province, and Section 6 draws the conclusions.

#### 2. Preliminaries

In this section, we briefly review some relevant knowledge of probability theory and the standard support vector regression often used in applications.

Let be a measurable space, a function defined on , and for any . We call the probability of event occurring, and indicates the possibility of event occurring. Assume that is a random variable on , is the distribution function, is the expectation, and is the variance of random variable , respectively.

*Definition 1 (see [25]). *Let be the distribution function of random variable . If there exists a nonnegative and integrable function satisfying
then we call the probability density of continuous random variable .

*Definition 2 (see [25]). *Let be a sequence of random variables. is said to be independent if for any finite random variables are independent.

*Definition 3 (see [25]). *Let the sequence of random variables be independent. is said to be independent and identically distributed if they have the same distribution function .

*Definition 4 (see [25]). *Let be a sequence of random variables and let be a constant. If, for any , we have
or
then the sequence is said to converge in probability to . Denoted by .

Theorem 5 (Bernoulli’s law of large numbers [25]). *Let be the number of event occurring in independent duplicate experiments, and let be the probability of event occurring. Then, for any , we have
**
that is,
*

*Remark 6. *Bernoulli’s law of large numbers proves the frequency stability in theory. In other words, the frequency converges in probability to . Then for a sufficiently large *n*, the frequency almost equals the probability . Hence, if the number of experiments is very large, the frequency can be treated as the probability in practice.

##### 2.1. Regression Problem

Suppose that is a given training set, where . The regression problem is to find a real valued function on according to data (6) to predict the output for any given input .

*Remark 7. *For the above regression problem, if the real valued function is linear, then we call it a linear repression problem. If the real valued function is nonlinear, then we call it a nonlinear repression problem.

##### 2.2. -Support Vector Regression

Support vector regression (SVR) differs from conventional regression in that it maps the training data (6) into a high dimensional reproducing kernel Hilbert space. In -SVR, the goal is to find a function which has at most deviation from the actually obtained targets for all the training data and at the same time as flat as possible. And -SVR uses an -insensitive loss function to solve the optimal problem where and data (6) are mapped to a higher dimensional characteristic space by the function and the constant determines the tradeoff between the flatness of and the amount up to which deviations larger than are tolerated. Similar to support vector classification, may be a huge vector variable, and then we solve the dual problem where is a kernel function.

As a result, SVR has a sparse representation of solutions and hence is relatively fast in training and testing. SVR is the most common application form of SVM and has been popular for regression and function estimation problems in the past decades.

#### 3. Method of Estimating the Probability Distribution of Historical Wind Speed Data

In practical problems, there are a lot of historical wind speed data. In order to dredge more information from them, in this part, we will give a method to estimate the probability distribution of historical wind speed data.

##### 3.1. Method

Assume that in Figure 1 is the probability density of independent and identically distributed random variables and the number and the frequency of falling into interval are and , respectively. By Definition 1, we can obtain that the probability (namely, the area of trapezoid with curved edge ) of falling into interval is By Theorem 5, we have With the infinitesimal method, the area of trapezoid with curved edge can be approximately substituted by the area of trapezoid , namely,

By formulas (10), (11), and (12) we have Similarly, for , and , we have

From formulas (13) and (14) we can obtain that

##### 3.2. Algorithm

Suppose that is the wind speed at a fixed point . Then can be seen as a random variable defined on time set . The samples data can be seen as a sequence of sample data coming from independent random variables . Supposing that the probability density function of wind speed is , then the steps of estimating the probability density are as follows.

*Step 1. *Denote . Insert points between and equidistantly, and interval is divided into intervals with the same length.

*Step 2. *Count the number of samples falling into interval , and we have . By formula (11), the probability of samples falling into interval is .

*Step 3. *Calculate with formula (15).

*Step 4. *Connecting the points in turn with smooth curves, the probability density of wind speed sample data can be obtained.

##### 3.3. Experiments

In order to verify the effectiveness of the above method, in this subsection, sample data coming from normal distribution and exponential distribution are used to make experiments, respectively.

*Experiment 1. *Suppose that is a standard normal random variable; then its probability density is
and the graph of is shown in Figure (10).

Suppose that the sample data
are independent and identically distributed coming from the probability density and . Insert 99 points in interval equidistantly, and interval is divided into small intervals with the same length. With the above proposed method, we can obtain function shown in Figure 2 which is the estimation of the standard normal density .

*Experiment 2. *Supposing that is a random variable following an exponential distribution, then the probability density is
and the graph of is shown in Figure 3. With the similar steps in the above experiment, suppose that the sample data are independent and identically distributed coming from the probability density and . Insert 99 points in interval equidistantly, and interval is divided into small intervals with the same length. With the proposed method above, we can obtain function which is the estimation of exponential density (see Figure 3).

##### 3.4. Results Analysis

From Figure 2 we can see that the estimation density obtained with the sample data is not as smooth as the standard normal density , but the rough shapes of the two functions are almost the same. And from Figure 3 the similar conclusions can be drawn. That is to say, the method of estimating the probability density based on Bernoulli’s law of large numbers and infinitesimal method is effective, and it lays a foundation for establishing the PPK--SVR.

#### 4. Support Vector Regression Method Incorporating Probability Prior Knowledge

In this section, we aim to predict the wind speed at a fixed point .

##### 4.1. Incorporating the Prior Knowledge about the Wind Speed Fluctuation into the Training Data and Testing Data

Supposed that is the probability density estimated with the above method, is the initial time, and is the wind speed at , denoted by . In practice, the wind speed at a fixed point is often measured many times for every certain period of time (where ), and the mean value is output as the predicted wind speed at the fixed point . In other words, the wind speed at is the mean value of the measured values from to , denoted by , where . Of course, the mean value can represent the wind speed at in a sense, but in some cases the mean value is quite different from . For example, if the measured wind speed is the same value during a certain period of time , then “the mean value is the wind speed ” holds with a high probability. Conversely, if the measured wind speed fluctuates wildly during a certain period of time , then “the mean value is the wind speed ” holds with a very low probability.

Hence, in order to incorporate this prior knowledge into the wind speed prediction, the training datum is converted into , where

In fact, from formulas (19) and (20) we can see that the larger the is, the larger the is and, furthermore, the smaller the is. That is to say, the possibility of “wind speed at is ” is very small. On the other hand, the large illustrates that the wind speed from to fluctuates wildly and “the mean value is the wind speed ” holds with a low probability (namely, the possibility of “wind speed at is ” is very small), which is in accord with the information provided by . Thus, the probability provides the fluctuation about the wind speed during a certain period of time . Therefore, datum contains the prior knowledge provided by the historical data.

Let be the times at a fixed point , and let be a training set. Denote ; then training set (21) can be rewritten by where . The problem of wind speed prediction is to find a real valued function on according to training set (22) to predict the wind speed for any given input .

##### 4.2. -Support Vector Regression Method for Wind Speed Prediction Incorporating Probability Prior Knowledge

For the above problem of wind speed prediction, -support vector regression method incorporating probability prior knowledge (PPK--SVR) can be constructed as follows.

*Step 1. *Obtain the training set
where .

*Step 2. *Select a proper kernel function , and punishment parameter .

*Step 3. *Constructing and solving the convex quadratic programming problems:
we can obtain the optimal solution

*Step 4. *Choose the component or of in interval . If is chosen, then
If is chosen, then

*Step 5. *Construct the decision function with

*Remark 8. *Solving regression problems with support vector regression, the kernel function can be selected according to prior knowledge, such as the characteristics of the problem, or the training set. More details about selecting kernel function with prior knowledge will be investigated in another paper.

#### 5. Experiments

In this part, we take a wind farm in Gansu province as an example. For a fixed point , in order to predict the wind speed (m/s) at (Hours), we recorded the wind speed from November 2006 to April 2008 and found that the wind speed had a periodicity with the change of seasons, months, or days. Therefore, the probability distribution of wind speed in the period of the previous year (month or day) can be incorporated into the wind speed prediction of the corresponding period in this year (month or day). In this wind farm, the wind speed is output every ten minutes, so there are 144 sets of data in a day and more than four thousand sets of data in a month. As SVR focuses on the statistical learning problems for small size samples and the wind speed had a periodicity with the change of days, the experiment is aimed at the short-term wind speed prediction, and we choose 144 sets of data to carry out the experiment. Here, without loss of generality, we take the wind speed prediction on 1 April 2008 as an example.

##### 5.1. Estimation of the Probability Density of Historical Wind Speed Data

Supposed indicates that the wind speed at time is . Wind speed was measured ten times during every ten minutes, and the mean value is output as the wind speed at time . By the proposed method in Section 3, the probability density of the wind speed data on 1 April 2008 was estimated and the graph of is shown in Figure 4.

##### 5.2. Incorporating the Prior Knowledge about the Wind Speed Fluctuation into the Training Data and Testing Data

Denote that and . By the estimated probability density of the wind speed data on 1 April 2008 and formula (19), the probability can be calculated. The wind speed sample data on 1 April 2008 are and data (29) are converted into by the method in Section 4.1, where . And the data in data (30) is the training data to train a model to predict the wind speed for the given .

Similarly, if we want to predict the wind speed for the given on 2 April 2008, we can estimate the probability density of wind speed on 1 April 2008 and use the data as the training data.

##### 5.3. -Support Vector Regression Method for Wind Speed Prediction Incorporating Probability Prior Knowledge

In the experiment, a grid search method based on 5-fold cross-validation is chosen to determine model parameters, where , , and the kernel function is the radial basis function (RBF). In order to predict the wind speed for the given with training data in data (30), we make experiment with PPK--SVR and standard -SVR, respectively; the results are shown in Tables 1 and 2, respectively. The wind speeds of training data, normalized wind speeds of testing data, and wind speeds of testing data with PPK--SVR are shown in Figures 5(a), 5(b), and 5(c), respectively. The wind speeds of training data, normalized wind speeds of testing data, and wind speeds of testing data with standard -SVR are shown in Figures 6(a), 6(b), and 6(c), respectively.

**(a) Wind speeds of training data**

**(b) Normalized wind speeds of testing data**

**(c) Wind speeds of testing data**

**(a) Wind speeds of training data**

**(b) Normalized wind speeds of testing data**

**(c) Wind speeds of testing data**

Similar to the steps of predicting wind speed for the given , we make experiment 50 times to predict the wind speed for the given (namely, the former 50 wind speeds monitored on 2 April 2008), the average mean squared errors are shown in Table 3, and the numbers after ± are the standard deviations.

##### 5.4. Result Analysis

From Tables 1 and 2, we can see that the mean squared errors of training data and testing data with PPK--SVR are all smaller than the corresponding ones with standard -SVR. Comparing Figure 5(a) with Figure 6(a) and Figure 5(c) with Figure 6(c), we can find that the predicted wind speeds of training data and testing data with PPK--SVR are more close to the initial wind speeds than those with standard -SVR. This illustrates that the prediction error of PPK--SVR is smaller than that of standard -SVR in predicting the wind speed for the given . From Figure 5(b) we can obtain that the difference between normalized initial wind speed and normalized predicted wind speed is which is equal to the square root of mean squared error in Table 1, which shows the effectiveness of the PPK--SVR.

From Table 3, we can see that the average mean squared error (namely, the average prediction error) of PPK--SVR is smaller than that of standard -SVR, which illustrates that PPK--SVR method is more accurate than standard -SVR. What is more, the standard deviation of PPK--SVR is also smaller than that of standard -SVR, which illustrates that the PPK--SVR is more stable than the standard -SVR. And also the running time of PPK--SVR is less than one minute, which shows that the model’s running time can meet the needs of wind speed prediction in application.

#### 6. Conclusions

In this paper, a method of estimating the probability density of historical wind speed data is proposed, and the estimated probability density is used to describe the wind speed fluctuation between the maximal value and the minimal value in a certain period of time. Then the prior knowledge provided by historical wind speed data is incorporated into the training data and the testing data. Then, based on standard -SVR, a kind of support vector regression for wind speed prediction incorporating probability prior knowledge is proposed. The comparing experiments show that the proposed PPK--SVR is feasible and effective and the model’s running time can meet the needs of wind speed prediction in application. And, how to incorporate prior knowledge into selecting the kernel function to decrease the prediction error further is our future study.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work is supported by the 973 Program (no. 2012CB215201), the National Natural Science Foundation of China (no. 61073121), the Natural Science Foundation of Hebei Province of China (nos. F2012402037, A2012201033), and the Natural Science Foundation of Hebei Education Department (no. Q2012046).