#### Abstract

This paper presents a novel approach for accurately modeling and ultimately predicting wind speed for selected sites when incomplete data sets are available. The application of a seasonal simulation for the synthetic generation of wind speed data is achieved using the Markov chain Monte Carlo technique with only one month of data from each season. This limited data model was used to produce synthesized data that sufficiently captured the seasonal variations of wind characteristics. The model was validated by comparing wind characteristics obtained from time series wind tower data from two countries with Markov chain Monte Carlo simulations, demonstrating that one month of wind speed data from each season was sufficient to generate synthetic wind speed data for the related season.

#### 1. Introduction

One of the most challenging features of wind energy application is the uncertainty of the wind resource. Wind speed and hence wind energy potential has a key influence on the profitability of a wind farm and on the management of transmission and distribution networks by utility systems operators. Accurate and reliable data related to both long-term and short-term wind characteristics are essential for site selection and technology specification [1]. Wind data acquisition is typically required for one to two years and is achieved through the use of anemometers and 60 m meteorological towers. Because of the stochastic nature of the wind resource, long-term wind speed recordings are required to characterize wind patterns, but in some cases wind data may be corrupted or missing for short periods, making accurate and reliable prediction of wind characteristics difficult. Operational wind speed prediction is also a challenge with historic data playing a major role in the prediction of future wind data, both of which require long-term reliable and complete historic data. The topic of wind speed prediction has been addressed by a number of researchers with approaches typically based on the analysis of historical time series wind data [2] or model based using a range of meteorological and/or local topographical input data. Lei et al. [3] provide a comprehensive overview of wind speed forecasting and power prediction methods using numerical and statistical approaches and conclude that different models have different strengths based on inputs and applications such as short-term or long-term prediction. Wind speed data is typically analyzed to characterize the potential annual resource, presented, for example, as a Weibull Density Function [4] for the estimation of wind energy potential. Foley et al. [5] describe wind speed forecasting as physical based, utilizing statistical methods, learning based, using artificial intelligence, or hybrid, utilizing a combination of both methods. Researchers have applied various combinations of physical-based methods and further differentiated between either temporal and/or spatial wind characteristics. Watts and Jara [6] use statistical analysis for the prediction of wind energy in Chile, concluding that wind is a potential contributor to the energy mix of this country. Cadenas and Rievra [7] and Fyrippis et al. [8] have performed similar analyses to evaluate wind potential in Mexico and Greece, respectively. Costa et al. [9] present a review of alternative techniques for short-term prediction of wind power. Ramirez-Rosado et al. [10] compared two different methods of short-term forecasting, producing daily wind predictions, and predicted power loads and Landberg [11] uses a model-based approach using meteorological data to provide short-term prediction, of up to 36 hours, for wind power production. Researchers have also used variants of physical-based systems, such as Louka et al. [12] who use Kalman filtering in an attempt to improve wind speed and wind power forecasting. Calif and Schmitt [13] introduce a lognormal stochastic equation to evaluate wind spectrum and identify a scaling equation for evaluating the impact of wind turbulence on wind power production. Several researchers have employed CALMET [14, 15], a diagnostic wind model, to predict wind speed.

Learning-based techniques predominantly employ advanced analysis using neural- or fuzzy-based approaches or hybrid models. Guo et al. [16] present a hybrid wind speed prediction system which combines historical daily wind speed averages with artificial neural networks to forecast daily average wind speeds for a one-month period, with additional novel approaches presented by Liu et al. [17] and Han et al. [18].

An emerging research area is the use of Markov chain Monte Carlo (MCMC) simulation techniques to evaluate and model wind speed characteristics. Several researchers used first-order MC models. Jones and Lorenz [19] take 8 hour averages of UK wind speed data recorded for one year with an 11-states MC model. They concluded that the probability density functions (PDFs) and the Autocorrelation Functions (ACFs) were accurate but a second-order MC model could fit better. Sahin and Sen [20] used hourly averages of wind speed data recorded in 10 different stations in the northwestern region of Turkey with an 8-states MC model. They concluded that second and third autocorrelation coefficients are significant and that the MC model can sufficiently preserve most of the statistical parameters of wind characteristics. Nfaoui et al. [21] used hourly averages of wind speed data recorded in Tangiers to form the model with 12 states. They also concluded that most of the statistical parameters of wind speed time series are reproduced, and they propose that the lack of similarity in ACFs is due to the nature of the Markov process.

Some researchers have included wind direction in an attempt to improve the MC model. Ettoumi et al. [22] used 3 hour averages of wind speed and wind direction data recorded in Algeria with a 9-states and 3-states MC model for wind direction and wind speed, respectively, stating that both models fit well. They suggest that a combination of Weibull distribution with MC model would provide a better representation of wind data. Karatepe [23] used hourly averages of wind speed data recorded in Turkey to directly generate the economical value of wind electricity using a 16-states MC model. The author states that these synthetic data could be used as an input for long-term economical analysis of wind farm projects as well as site assessments, when long-term wind recordings are not available. This research indicates that the MC model is a suitable method of reproducing the general statistical characteristics of measured wind data, yet the state size and order for optimal characterization still require further investigation. Hocaoğlu et al. [24] used hourly averages of wind speed data recorded in Turkey to show the effect of state size on synthetic generation and concluded that there is a direct correlation between the number of states and model accuracy.

Kaminsky et al. [25] determined that real wind data contain more low-frequency components than those which are produced using first- and second-order MC models. Further improvements are required to preserve the long wave information even though a second-order MC model is more representative of the real data in terms of ACFs and PSDs. Shamshad et al. [26] expand to 12-states MC models preserving the statistical characteristics but at the expense of the ACFs, concluding that this is due to the intrinsic nature of the MC models. The general shape of the spectral density function is similar but both models failed to retain peculiar peaks even though the second-order MC model performed better. The authors suggest that removing periodic/seasonal components before analysis may improve the results.

The following researchers used first-, second-, and third-order MC models for both wind speed and wind power data. Papaefthymiou and Klöckl [27] used wind speed data recorded for 2 years at several time steps. Applying models based on 10 min, 30 min, and 60 min averages of wind speed and wind power data resulted in a reduction of state size without loss of information. Their research determined that 10 min averages and a third-order MC model provide a better fit in terms of ACFs. Brokish and Kirtley [28] suggest that MC models should not be used for time steps shorter than 15 to 40 minutes depending on the order of the MC and the state size. They used different time steps of wind speed/power data and different state sizes to determine when MC models are appropriate for synthetic wind speed/power data generation. Wu et al. [29] used the field-measured wind power data recorded once every minute for 9 months to examine the effect of different numbers of states and seasonal periodicity. Seasonal periodicity was examined by using the data for a specific month to form the transition matrix, which was then utilized to simulate the synthetic data for the month. They concluded that, when the number of states in the transition matrix is properly selected, the MCMC method is an effective technique to generate wind power time series, with relative improvements in accuracy obtained by the incorporation of seasonality in the simulation process.

Erto et al. [30] proposed a wind speed parameter estimation method to reduce the requirements of long-term on-site anemometric monitoring by exploiting initial information about parameters to be estimated via MCMC. The results of the simulation showed that the proposed method, based on one-month data and a priori information, could successfully compete with the maximum likelihood estimates obtained from one-year of measured data. Pang et al. [31] focused on the Bayesian estimation of Weibull parameters using MCMC techniques. The authors stated that the advantage of using MCMC is that the posterior uncertainty about parameters may be evaluated exactly using the MCMC sample without any need for asymptotic approximations. They claim that the MCMC method provides an alternative method for parameter estimation of wind speed distributions.

Finally, Negra et al. [32] developed a synthetic wind speed generator based on MC concepts. Instead of constructing a transition probability matrix, they constructed a wind speed probability table from extrapolated wind speed statistical data which was based on 7 years of wind measurements. From this table, a set of synthetic time series was generated and compared with observed time series in terms of ACFs, averages, PDFs, and seasonal characteristics. The authors claim that the models provide a good fit in PDFs and general seasonal characteristics, but that second-order correlation provided slightly undercorrelated results and third-order correlation produced overcorrelated solutions.

This paper proposes the use of stochastic simulation techniques for synthetic wind data generation and the potential development of a tool for the assessment of the profitability of wind farm projects, accurate prediction for system operations, and wind energy markets. The research summarized above suggests that MC models produce accurate results in terms of PDFs and general statistical characteristics of data in particular mean, median, standard deviation, variance, quartiles, minimum and maximum values, and Weibull distribution parameters. However, because of their short memory, they are unable to reproduce the persistence and/or periodic structure of wind data which manifests itself in ACFs and PSDs. This paper addresses this issue and attempts to retain the integrity of the data while reducing complexity. Instead of using yearly data, the data sets are divided into subsets (therefore keeping the individual seasonal information) and using sufficiently less data to simulate the whole subset while preserving the main statistical characteristics.

#### 2. Experimental Design, Data, and Model

##### 2.1. Data and Modeling Strategy

This research employs two different data sets obtained from operational meteorological towers located in the USA and Turkey. The first tower is located at the National Wind Technology Center (NWTC), Colorado, USA, and it provided hourly average wind speed time series data at 10 m, 20 m, and 50 m for a period of 12 months (01.12.2005–01.12.2006). The second tower is located in the Süpürgelik region of Yalova in Turkey, and it provided hourly average wind speed time series at 30 m for a period of 9 months (01.12.2005–01.09.2006). The two countries are both located in the northern hemisphere but have different seasonal characteristics and provide one method of determining if the model can be applied to other data sets.

The objective of this research is to develop a model that can accurately reflect seasonal variations using limited data from an annual time series data set. To address this, both data sets were arbitrarily divided into categories identified as winter, spring, summer, and autumn. As both data sets have been obtained from locations in the northern hemisphere, seasons have been defined as winter: December, January, and February; spring: March, April, and May; summer: June, July, and August; and autumn: September, October, and November. The first month of each season was selected and used to generate synthetic data for the related season with the aim of using less data in the simulation process. First- and second-order transition matrices were produced from the measured data for the selected months and used to generate synthetic data. Figure 1 displays the histograms of the actual data for each season. Figure 1(a) displays the frequency distribution of each season for Turkey, identified as TR and Figure 1(b) displays the frequency distribution of each season for Colorado, identified as US.

**(a)**

**(b)**

##### 2.2. Markov Chain Monte Carlo Simulation Technique

A first-order MC was used to determine the next state of a stochastic process depending only on its current state. In the case of second-order MC, the next state depends on both the present state and the most recent previous state. This could be generalized for higher order MC where the probability of a future state depends only on the given past history of the process through the present state, not on any other past state. The order of the chain represents the present or/and previous time steps which have been taken into account to calculate the transition probability of the next state. These probabilities are included in the cells of a matrix called a transition probability matrix. The size of the matrix depends on the states which should be discretized by considering the nature of the random variable as well as the modeling purpose. In order to calculate transition probabilities, a further assumption is made which depends on the definition of an MC.

A discrete-time stochastic process is an MC if, for all and all states,

Equation (1) implies that the probability of the state at time depends only on the state at time (). Consider the following: for all states and and all .

If one assumes that the conditional probability stated in (2) is independent of , then, where is the probability that given the system is in state at time , it will be in a state at time . The ’s are often referred to as the transition probabilities for the MC. Moreover, (3) is often called the stationary assumption which implies that the probability of moving from state to state during one period does not change over time [33].

The transition probability matrix (TPM) for a first-order MC with states can be written as
The elements of **P** are nonnegative and, given that the state at time is , the process must be somewhere at time which means that the elements in each row must sum to one.

For all and ,(i), (ii), .

The formula in (5) is used to calculate the transition probabilities as follows: where represents the number of transitions from state to state during one period.

If the transition probability in the th row at the th state is , the cumulative probability can be calculated [20] using (6) as A second-order transition probability matrix can be written as If the transition probability in the th row at the th state is , the cumulative probability can be calculated [26] by using the following: For synthetic data generation using random numbers, the transition probability matrix should be transformed into a cumulative probability transition matrix (CPM) by taking successive summations of its elements in each row. There is a significant tradeoff between model complexity and the model accuracy. By increasing the dimension of the state space, more accurate modeling results could be obtained [24]. However, higher discretization increases the model complexity by introducing a large number of parameters that are difficult to assess from data [26].

The MCMC simulation procedure for synthetic generation of wind speed time series is therefore achieved using the following steps.

*Step 1. *Define states of the associated MC and construct TPM.

*Step 2. *Construct CPM.

*Step 3. *Generate uniformly distributed random numbers between 0 and 1.

*Step 4. *Select an initial state randomly, say, .

*Step 5. *Compare the value of the random number with the elements of th row of the CPM. The next state is found where the value of this random number is greater than the cumulative probability of the previous state but less than or equal to the cumulative probability of the following state, say . In this case, the next random number should be compared with the elements of the th row of the CPM.

*Step 6. *A transition from state to state in CPM can be converted into wind speed data by using the following:
where and are the lower and upper boundaries of the state and is the uniform random number.

By repeating Steps 5 and 6 for each uniform random number, any desired length of synthetic wind speed data can be generated.

#### 3. Results

In order to test the validity of the model, a combination of visual evaluation, comparison of general statistical parameters (descriptive statistics and Weibull distribution parameters), and goodness of fit tests were used. The transition probability matrices of both first- and second-order MC models have the probability mass concentrated on and around the diagonal elements. This result implies that the next wind speed will be most likely in the same state as the current wind speed and the probability of a transition between far states is infrequent. General statistical parameters of observed and generated wind speeds for both locations are presented in Tables 1, 2, 3, and 4. Comparison of the descriptive statistics and the Weibull distribution parameters of simulations indicate that the results are very close to the realizations.

The frequency distributions of observed and generated wind speed time series for both locations are presented in Figures 2 and 3. Both the visual evaluation of histograms and quantitative comparison of the Weibull distribution parameters of observed and generated data indicate a good accordance among the series for both locations. Even though 30% of the simulations failed in the goodness of fit tests, the fit of the PDFs is satisfactory. To test the accordance among the probability distributions of simulated series and actual series, the Ansari-Bradley test was used (Table 5).

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

The Ansari-Bradley test is a nonparametric test which requires that random variables are mutually independent and coming from a continuous population with equal medians but does not require the assumption of normal distribution. The null hypothesis is that and variables have the same probability distribution which is not specified. The alternative hypothesis is that and variables come from distributions that have the same median and shape but different dispersions [34].

Considering that the variability of the simulation output is affected by the stochastic nature of the input wind speed data as well as by the random nature of the MC model itself, the MCMC simulation technique is sufficient to preserve most of the statistical characteristics and stochastic behavior of wind speed time series. Also, it is possible to improve the accuracy by using a second-order MC model.

The simulation outputs for 10 m and 50 m heights for the Colorado data were also examined. While the 10 m output fits better than the 20 m, the 50 m output is a poorer fit than the 20 m. However, by increasing the state size and considering the shape of the PDF of the observed wind speed to determine the state intervals, it is possible to improve the accordance. These results prove that 1 month of wind speed time series is sufficient to generate synthetic wind speed time series for the related season, accurately.

#### 4. Conclusion

The collection of long-term wind speed measurements is essential for the economic evaluation and hence the potential ability to obtain financial support for any wind energy project. In some cases, prolonged and continual data for resource assessment may not be available and there is a need for tools that provide accurate and reliable simulations of wind speed data based on limited data sets. This creates an opportunity for the production of synthesized data sets based on statistical parameters of a wind regime. In this paper, a limited data set was used to produce synthesized data. By using observed wind speed data from selected months, synthetic wind speed data was generated for related seasons. The comparisons between the actual data and the simulations showed that the statistical characteristics were satisfactorily reproduced. Therefore, the most important result is that only one month of wind speed data was sufficient to reproduce most of the general statistical characteristics and the stochastic behavior of wind speed time series for the related season. This result implies that Markov chain models could be used to complete missing data. The study also showed that the models used in this approach are impacted by the characteristics of the data set which prevalently manifests itself by probability distributions. Examining and considering these probabilistic characteristics in discretization of the states of a Markov chain model would provide a better representation of the actual wind pattern. A further study is needed to determine the sensitivity of the simulation outputs with regard to the different probability distributions. Also, it is expected that an application of a continuous-time Markov process may improve the accuracy especially in terms of reproduction of missing data.

#### Acknowledgments

The authors would like to thank Dr. Ahmet Duran Şahin for providing the wind speed data for Turkey. The authors gratefully acknowledge the financial support for this research which was provided by Agri-Futures Nova Scotia.