#### Abstract

Wind farm siting relies on in situ measurements and statistical analysis of the wind distribution. The current statistical methods include distribution functions. The one that is known to provide the best fit to the nature of the wind is the Weibull distribution function. It is relatively straightforward to parameterize wind resources with the Weibull function if the distribution fits what the function represents but the estimation process gets complicated if the distribution of the wind is diverse in terms of speed and direction. In this study, data from a 101 m meteorological mast were used to test several estimation methods. The available data display seasonal variations, with low wind speeds in different seasons and effects of a moderately complex surrounding. The results show that the maximum likelihood method is much more successful than industry standard WAsP method when the diverse winds with high percentile of low wind speed occur.

#### 1. Introduction

One of the first and most crucial steps for wind farm investment is to investigate the characteristics of the local wind resources. The aim is to select the optimum locations for the turbines, maximizing income and minimizing cost. The analysis most commonly relies on in situ measurements collected through conventional meteorological masts (met. masts) [1, 2]. Recently, remote sensing devices with vertical wind profiling capabilities have also been accepted by the market, but these devices have limitations, mostly related to the complexity of the terrain [3–6]. Therefore, the methodology of spatial modeling based on point measurements is practically the only acceptable methodology for the wind site assessment step. A conventional met. mast is usually erected on a location that is considered representative of the area of interest. Sometimes more than one met. mast may be necessary if the wind characteristics are particularly variable in an area. Cup anemometers, vanes, sonics, and atmospheric sensors are used as measurement devices. It is important that data collection has a high recovery rate in order to accurately capture the wind resources, and a rate of 90% is ideally preferable. The final aim is to use the collected data for creating sector wise and generalized wind statistics for the measurement location. The generalized wind statistics can be used to create regional wind climates, also known as wind atlases. Since the atlas represents the larger domain, one can statistically transfer the in situ measurements to desired wind turbine locations. After several iterations of the process, the best production numbers are calculated, which will maximize the income of the wind farm

For many years, wind data have been represented by the Rayleigh distribution function [7, 8]. Although they are similar and come from the same statistical methodology, for almost over three decades, the Weibull distribution function has been the choice of experts, mainly because it is considered to be more representative of the natural wind characteristics [9]. One of the first studies that used the Weibull function as the main statistical methodology was the European Wind Atlas [10], which leads to the industry standard Wind Atlas Methodology. The Weibull distribution function (equation (1)) can be presented as a cumulative (CDF) (equation (2)) or probability (PDF) (equation (3)) density distribution function where is the wind speed, is the scale of the wind speed (m/s), and *k* is the unitless shape parameter [11].

The common way of using the distribution function is to fit measured wind speed frequency data to the probability density function to get the relationship between the wind speed and the wind frequency with two-parameter Weibull distribution. Thereafter, one can define the observational climate of the location by only recording two characteristics, *A* and *k*.

Although the method looks straightforward, there are known problems. The most important one is diverse wind conditions, occurring with high frequency during the year, which add extra uncertainty to computational fitting parameters; this is the case in all used methods [12]. In this study, a dataset with high recovery rate from a semicomplex terrain at 101 m height was used to compare different estimation methods of the Weibull parameters. The aim was to perform a comparison between the most widely used methods for a representative height for common wind turbines.

In the next section, studies on distribution function for wind energy use and specifically Weibull distribution are discussed, followed by the experimental setup and presentation of available wind data used with five different estimation methods. Finally, the results are compared and discussed.

#### 2. Experimental Setup and General Wind Characteristics

Izmir Institute of Technology (IZTECH) is located on the west coast of Turkey, around 40 km to the west of the city of Izmir at the village of Urla. The peninsula surrounding the area has more than 20% of the total installed wind power capacity of Turkey. In August 2017, a mast with 101 m height was erected in the university campus with several instruments mounted (Table 1) at N38.3332° E26.6326° geographic coordinates. The reasoning of the new met. mast is to make academic studies about wind speed distributions, atmospheric stability, and complex terrain short-term predictions. Therefore, the mast is equipped with also two 3D sonic anemometers and measurement devices in different heights.

Data collection started on the 1st of August 2017 and the campaign is ongoing. The best recovery rate, with over 99.9% for a full year’s data, is available between the 21st of December 2017 and the 21st of December 2018, totaling 52191 of 10-minute samples. The available data were filtered with two rules: (i) the standard deviation of the channels is bigger than zero and (ii) wind speeds below a calm threshold level for cup and vane of 0.3 m/s and 0.4 m/s, respectively, are ignored; meanwhile, the final recovery rate for the channels dropped down to 99%. The data can be considered to be of high quality in terms of availability. The air density was measured to be 1.19 kg/m^{3} on average and only changes ±0.01** **kg/m^{3} throughout the day in monthly and yearly average but can sometimes reach 1.01 kg/m^{3} in the occasional 10-minute sample.

The location of the met. mast was chosen for research purposes (Figure 1), such that each wind sector has different combinations of terrain and roughness classifications. The main wind direction is the first sector, S1, where the northerly wind occurs more than 45% of the time. S1 is located 5 km away from coastline and has a flat terrain at 50 m a.s.l., covered with grass. S2 is only 1 km away from the coastline and is under the influence of the sudden roughness change. S3, S4, and S5 are occupied by a small village with narrow streets, where the tallest houses are approximately 6.5 m high. The village is built parallel to the coastline and the sea is nearly 1.25 km from the met. mast at its closest location. S6 and S7 are occupied by the IZTECH campus, with the buildings closest to the mast being 4 m high at maximum, while further away (> 12 m distance), there are taller university buildings but more sparsely located. A hill parallel to the coastline with an average height of 350 m a.s.l. covers the grounds of S8–S12. The vegetation of these sectors is characterized by low bushes (max. 30 cm) mixed with grassland. The only unique sector within this range is sector S9, where 5 units of 3 MW wind turbines are located more than 5 km away from the mast. Due to the long distance, it is assumed by the author that the met. mast is not under the influence of these turbines (Table 2).

Vertical wind profiles were calculated with all the data and the yearly statistics show an almost logarithmic vertical wind speed profile except for the minor speed up at 30 m height (Figure 2(a)). The wind directional turn shows minor terrain effects above 50 m where the averaged wind direction becomes nearly stable (Figure 2(b)). All five wind speed channels were also analyzed for diurnal statistics, which demonstrate that the measurement location is characterized by an almost laminar flow at night and quite unstable conditions during daytime (Figure 3). Based on the yearly averaged wind speed values, the omnidirectional wind shear was calculated with the power law function (equation (4)) as 0.18, being within the safety limits of 0.0 and 0.2 as the IEC 61400-1 standard suggests. The dominant northerly wind direction’s wind shear is even lower, at 0.09. The wind sectors from the urban areas, S3 to S5, display wind shear and turbulence values above the acceptable ranges; it must be noted, however, that few data points were available for these sectors, hindering the wind shear calculations, as discussed later in the manuscript.

**(a)**

**(b)**

Monthly statistics of the top-mounted anemometer, WS101, show high wind speeds during the fall and winter periods, while the averaged wind speeds get lower in the summer and spring as expected due to the local weather conditions (Table 3). Wind data were split into 12 equal wind direction sectors, based on the vane measurements at 98 m coupled with the top-mounted cup anemometer at 101 m. Directional wind speed statistics when the top-mounted anemometer is coupled with the closest wind vane at 98 m show highest energy density at sector 1 and sector 7, which are centered at 0^{°} and 180^{°}, respectively (Table 4).

#### 3. Distribution Analysis

In the last decade, several other distribution functions have been put forward/have been proposed, among which the most commonly used in the literature seem to be the following: Rayleigh, Weibull, Lognormal, Gamma, Pearson, Kappa, Erlang, and Gumble or bimodals of these [13–17]. Among the referenced studies, the general conclusion is that the two- or three-parameter Weibull distribution is the best distribution function to describe the wind characteristics [11, 18]. This is also reflected in industry applications, where any major wind energy or wind field modeling tool employs the two-parameter Weibull function (e.g., WAsP and WindPRO).

In the current study, selected dataset is also used for analysis using sector wise Weibull distribution function. Two-parameter Weibull function is used with each dataset split with sector wise wind direction; each holds 30° sections. The relationship between the mean wind speed *U* and the Weibull parameters, similar to equation (1), is given in equation (5); *A* is the scale and *k* is the shape parameter as before. Mean value of the wind speed can be calculated. Power density, P_{d}, can also be computed from the third moment of the equation (see (6)). Gamma (Γ) is the Euler Gamma Function (equation (7)).

Several different methods have been studied extensively in the past by the wind energy community. A review of the literature shows a great variety of formulations in estimating the Weibull distribution parameters (Table 5). The literature includes Empirical Method (EM), Power Density Method (PDM), Graphical Method (GM), Maximum Likelihood Method (MLM), Modified (Weighted) Maximum Likelihood Method (MMLM), Moment Methods (MM), and Least Square Method (LSM). It is also observed that the Wind Atlas Analysis and Application Program (WAsP) (official website: http://www.WAsP.dk) method has a limited use in such studies. Most of the studies using the WAsP method through the WAsP software are not comparing it to other methods [34, 35], but see [36]. In the current study, a selection of methods from the literature was used for the estimation of Weibull parameters.. Elimination of methods is done through similarities. Between similar methods of EM and PDM and between GM and LSM, the EM and the LSM are chosen, respectively. Moment methodologies have similarities with WAsP; therefore they are ignored. Five methods are shortlisted for the study: EM, MLM, MMLM, LSM, and WAsP.

##### 3.1. Empirical Method (EM)

One of the most well-known and simple methods for estimating the Weibull parameters is the Empirical Method (EM), derived from the power density function [25, 27, 29, 30, 32]. The energy factor can be calculated as the ratio between the mean of the sum of cubes of all wind speeds and the cube of the mean wind speed (equation (8)). After E_{f} is calculated, one can easily use a derived numerical solution of *k* and *A* (equation (9)). The simplicity of the equation makes it very useful for initial calculations, but it has also been observed that the fit is not as good as that in other methods when there is low wind speed and high turbulence, which makes the Weibull PDF not a smooth curve. Nevertheless, it was chosen for this study as a reference formulation in order to explore the possible differences with more advanced methods.

##### 3.2. Maximum Likelihood Method (MLM)

Another well-known and widely used method is the Maximum Likelihood Method (MLM) [20–29]. In MLM, the *k* parameter is found with the iteration of equation (10) for *n* number of samples with the initial shape parameter with the value of *k* = 2. After finding *k* within the desired limits with equation (11), the scale parameter *A* can be computed with equation (12).

##### 3.3. Modified Maximum Likelihood Method (MMLM)

There is an alternative version of MLM, which has a modification on the wind frequencies [20, 21, 26, 27, 30, 32]. The method is mostly preferred when there is a large amount of missing data. All wind frequency values are weighted based on the available data; therefore the method is called Modified Maximum Likelihood Method (MMLM) (sometimes it is also called Weighted Maximum Likelihood Method in literature). Equation (11) is rewritten with the addition of the frequency by knowing that all the samples are above 0 m/s (equation (13)). The same iteration from MLM is applied to calculate the modified scale parameter A_{m} as in equation (14).

##### 3.4. Least Square Method (LSM)

A less preferred method but one that is known to be more accurate for diverse frequencies distributions is the Least Square Method (LSM) [21, 23, 30, 37]. The Weibull function is transformed into a linear function format as in “*y* = Gain · *x* + Offset.” In order to make this transition, log normal of both sides of equation (2) is taken and this leads to the desired format as in equation (15). The left side of the equation can be computed from the wind speed variable and a fit algorithm can be used to calculate *k* and *A*.

##### 3.5. WAsP Weibull Method

This method uses a different approach for estimating the Weibull parameters; it assumes that the wind speed above the mean value is most likely to create the maximum power. Therefore, Weibull fitting by using only the data above the mean wind speed value is considered to be more realistic by the method’s developers [38]. Nevertheless, that does not mean it is more effective for diverse frequencies distributions; therefore it has been selected as one of the methods to be studied. The reference documentation about the method is not detailed but the WAsP method is one of the most used methodologies through the WAsP software family.

The first step in the method is to define a parameter that gives the probability of the wind speeds above the mean value, which can be calculated through the cumulative density function (CDF) with the Weibull parameters (equation (2)). If the mean wind speed is applied to the function, the cumulative density will be the total sum of the probabilities of the wind speeds below the mean values; therefore 1-F (*U*) becomes the proportion of the values above the mean value, *U*_{p} (equation (16)).

When both sides of the equation are taken as logarithmic normal (equation (17)), one can write the parameter *A* as a function of *k* or vice versa and calculate both in two steps. The shape parameter *A* can be written as a function of *k* through the power density function (equation (6)), which is equal to the mean of cube sum of wind speed samples (equation (18)). If parameter *A* is singled out, it can be written as a function of *k*(equation (19)). If equations (17) and (19) are merged as in equation (20), one can use iterative numerical steps to solve *k* and place the result in equation (19) to calculate *A*.

Omnidirectional data and grouped sectors were used in the analysis (see Table 2). Sectors were grouped based on the common roughness and obstacle types of the sectors. Group I includes sectors 1 and 2, with the northerly winds and the highest number of samples. The urban area located in sectors 3, 4, and 5 constitutes Group II, which has the lowest number of samples and a total energy density calculated to be close to zero. Therefore, the results for Group II might be misleading; nevertheless they are still presented for the sake of completeness. The university zone, sectors 7 and 8, constitutes Group III and the sectors covered with hills from 8 to 12 form Group IV. The collected data were analyzed by wind speed bins of 0.5 m/s and all available data for the selected sectors were processed for the calculation.

#### 4. Results

In order to understand the accuracy of the Weibull estimation method, most of the studies address the question as a statistical error and calculate the root mean square error (RMSE) based on the measured data.where *y*_{i} is the measured value and *x*_{i} is the Weibull-parameters-based calculated value. However, this method can be misleading because it places the same importance on every range of wind speed and the total sum of errors is not weighted by the possible energy density. Therefore, in the current study, the accuracy is also evaluated through the power density function because this is the real effective difference between estimation methods when it comes to calculating the wind energy production. It is common to calculate the power density based on two datasets as in equation (22), where *ρ* is the air density and *f*_{m} is the frequency for the given wind speed range. The values can be used to drive an error percentage value, *ε* (equation (23)), compared to the measured value.

The omnidirectional results (Figure 4 and Table 6) show a minor error in power density prediction, as low as ± 3-4% for the methods EM, MLM, and WAsP, which is already within the limits of statistical uncertainty [39], while frequency levels show that almost 25% of the whole data is below 3 m/s, which is the cut-in wind speed for most turbines. When the sector wise results are observed (Figures 5–7 and Table 6), it is also seen that low wind speeds below 3-4** **m/s are common and can affect the calculations for the all groups. Among the compared methods, MLM appears to be the best performing overall, with a maximum error of 3% in power density calculations at the sectors without urban areas. The WAsP gives the second lowest percentage error, arriving at similar results to those of EM, with the difference being that the WAsP method has a better fit for wind speeds close to the mean level, which is intentional by the design of the methodology, as was described in subsection “WAsP Weibull Method.” MMLM and LSM produce high error percentages nearly in every sector and even in the omnidirectional fit. RMSE and *R*^{2} values show an inverse relationship as is expected. When the RMSE increases, the *R*^{2} parameter decreases, and vice versa, but there is no clear relationship between these statistical measures and the calculated power density estimation errors.

#### 5. Conclusion

Yearly 10-minute statistics from a 101 m met. Mast and top-mounted cup anemometer data have been used for comparison of Weibull parameter estimation. The measurement period is from 27 December 2017 to 27 December 2018 for a total of 1 full year with over 99% recovery rate. The data has almost 25% low wind speeds, which causes difficulties on the estimation, which is the core reason of the study. Vertical wind characteristics from the met. mast show that the measurement location has wind shear within limits of the standards and vertical directional turn is negligible between cup anemometer and the vane used in the study. Basic statistics of monthly and 12 equal wind sectors are calculated. Sectors 2 to 6 are excluded from the study due to the low recovery rate and urban areas closer than 500 m. The results leading to the conclusion are made with the other sectors.

The omnidirectional and sector wise calculations for five different estimation methods are tested and the results are presented with observed statistics. The distribution fits are analyzed through root mean square errors (RMSE) and *R*^{2} parameters in addition to the power density error function (*ε*). The results show that the Maximum Likelihood method (MLM) has the best performance. A clear link has not been found between *ε* and RMSE and/or *R*^{2} contrary to the several previously cited studies, where conclusions are made based on these statistical parameters. Other estimation methods show high uncertainty and power density calculation error. These two results lead to the fact that power density error estimation is the most effective way of checking the distribution fit quality even though similar studies. Based on Group III results, it can be said that the low number of samples with low wind speeds causes high uncertainty and deviation in any method.

Several wind data analysis tools in the world use WAsP Weibull fitting method as the core estimation method for Weibull parameters through the WAsP software. Nevertheless, it is observed that, for the diverse winds, Maximum Likelihood Method is much more stable and has a better correlation with measured data. One should create observational wind statistical data by means of MLM and apply the Weibull parameters to the model even if the used model is WAsP software.

In this study, wind speed data in the form of 10-minute statistics from a 101 m met. mast with top-mounted cup anemometers are used to compare different methods for the estimation of Weibull parameters. The measurement period was from 27 December 2017 to 27 December 2018, that is, one full year of data with over 99% recovery rate. A high proportion of the dataset (over 20%) consisted of low wind speeds, which are known to cause difficulties in the parameter estimation, and this is the main driver that motivated the study. The vertical wind characteristics from the met. mast show that the measurement location has wind shear within the limits of the industry standards. Basic statistics of monthly and 12 equal wind sectors were calculated. The sectors are divided into four groups (Figure 1). The results leading to the conclusion were produced with data from these groups. Nevertheless, Group II does not have enough amount of data to compare and present the methods; therefore, it is excluded from the results.

The omnidirectional and sector wise calculations for five different estimation methods were tested and the results are presented with the observed statistics. The distribution fits were analyzed through root mean square errors (RMSE) and *R*^{2} parameters in addition to the power density error function (*ε*). The results show that the Maximum Likelihood method (MLM) had the best performance for the dataset. A clear link has not been found between *ε* and RMSE and/or *R*^{2} contrary to several previous studies, where conclusions on performance were made based on these statistical parameters. Selected list of these referenced studies is in Table 5. The other estimation methods tested showed high uncertainty and higher power density calculation errors. These two results lead to the conclusion that power density error estimation is the most effective way of checking the distribution fit quality of an estimation method even though similar studies only focus on the statistical terms of RMSE and/or *R*^{2}. Based on the results of Group III, it is evident that low number of samples combined with low wind speeds causes high uncertainty and deviation in any method.

Several wind data analysis tools in the world use the WAsP Weibull fitting method as the core estimation method for Weibull parameters through the WAsP software. Nevertheless, it is demonstrated that, in the case of diverse winds, the Maximum Likelihood Method is much more stable and has a better correlation with measured data. For these types of datasets, it is advisable to create observational wind statistical data by means of MLM and apply the Weibull parameters to the model.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this article.

#### Acknowledgments

This project has been supported by the Scientific and Technological Research Council of Turkey (TÜBİTAK) (Grant no. 215M384).