Abstract

This paper presents the development of a wind power forecasting model based on gene expression programming (GEP) for one of the major wind farms in Sri Lanka, Pawan Danavi. With the ever-increasing demand for renewable power generation, Sri Lanka has started harnessing electricity from wind power. Though the initial establishment cost of wind farms is high, the analyses clearly showcased the economic sustainability of wind power generation in long term. In this context, forecasting the wind power generation at Sri Lankan wind farms is important in many ways. However, limited research has been carried out in Sri Lanka to predict the wind power generation against the changing climate. Therefore, to overcome this research gap, a model was developed to forecast wind power generation against two climatic factors, viz. on-site wind speed and ambient temperature. The results showcased the robustness and accuracy of the proposed GEP-based forecasting model (with R2 = 0.92, index of agreement = 0.98, and RMSE = 259 kW). Moreover, the results of the study were compared against three different forecasting models and found comparable in terms of the model accuracy. The GEP-based model is advantageous over machine learning techniques due to its capability in deriving a mathematical expression. As an acceptable relationship was found between wind power generation and climatic factors, the proposed model facilitates the future projection of wind power generations with forecasted climatic factors. Though the application of GEP in the field of wind power generation is reported in a few research publications, this is the first research in which GEP is employed to model the power generation with respect to weather indices. The proposed prediction model is advantageous than machine learning models as the relationship between the wind power and the weather indices can be expressed.

1. Introduction

It is projected that the world energy demand will increase by 4–5% in the Year 2021 [1]. Though the world still mainly relies on fossil fuel and coal [2, 3], nonrenewable power generation creates lots of environmental issues. Nonetheless, the transition to renewable energy generation is being adopted by many countries, catering to the ever-increasing energy demand solely by renewable energy is a challenging process due to the volatile and unpredictable nature of renewable energy sources. In this context, sustainable or renewable energy generation using solar, wind, hydro, geothermal, biomass, and marine waves is widely researched.

Wind power is one of the most economical and environmentally friendly energy sources in the world [4] though the initial establishment cost would be comparably higher. Nevertheless, the operation and maintenance costs are lower [5]. In 2020, 24.8% of the United Kingdom’s electricity supply was generated by wind being second only to natural gas from among the energy sources [6]. Many countries review their potential in wind power generation not only because of economic reasons but also due to its environmental friendliness [79]. The wind power potential was investigated in the Sri Lankan context as well [10], and wind farms are being constructed in some of the identified areas. The present wind power capacity of the country is around 230 MW, and the establishment of more wind farms is in progress to expand renewable energy generation. Therefore, forecasting wind power generation in Sri Lanka is highly useful for the power utilities, policymakers, and the government.

The development of power prediction models for wind farms in Sri Lanka by using artificial neural networks (ANN) to forecast the electricity generation was presented by researchers [11, 12]. Even though those prediction models are highly accurate, the black box nature of the prediction model is one of the major drawbacks in machine learning-based models [13, 14]. By contrast, Ekanayake et al. [11] have presented the use of multiple linear regression (MLR) and power regression (PR) for developing a relationship between wind power generation and weather indices of wind speed and ambient temperature. They have compared their work to Peiris et al. [12] and showcased the advantages over ANN in finding direct and presentable formulae to predict wind power generation. More importantly, the literature showcases a handful of related studies in which genetic programming (GP) techniques were used [1517]. GP is a predictive tool based on artificial intelligence that develops a program and generates a computer-based model to find the optimized solution [18]. Though both GP and genetic algorithm (GA) were developed based on Darwin’s natural selection, the way they represent solutions is different. GA presents the solution as strings of bits called chromosomes, while GP presents strings of different shapes and sizes of nonlinear entities [19]. The solution of GP is in the form of a parse tree with varying string sizes and shape making it a versatile approach for prediction problems.

By contrast, this paper presents a forecasting model developed based on gene expression programming (GEP) to predict the electricity generation of a wind power farm. GEP is an emerging technique widely used in forecasting time series variables [20]. It is transparent and mathematically expresses the related nonlinear functions used for forecasting. As per the literature, GEP is increasingly applied in a wide range of real-world applications due to its high effectiveness and efficiency [21, 22].

Abbas et al. [23] have investigated the droughts in Urmia Lake in Iran using GEPs with several months of delays. The higher prediction accuracy was found in GEPs when the results were compared against the drought indices. Mehdizadeh et al. [24] have carried out a similar study but to six locations in Turkey. However, they have used a hybrid wavelet GEP model for the predictions and showcased superiority over the conventional GEPs. In addition, Karimi et al. [25] have used wavelet-based GEP to predict the short-term and long-term streamflow in Filyos River, Turkey. They have found mixed results for their comparison models using ANN, adaptive network-based fuzzy inference system (ANFIS), and GEP.

Therefore, researchers have used different models to compare their suitability and adaptation behavior of them to many real-world problems. GEPs were extensively used in civil and structural engineering applications to predict the structural performances. The formulations for unconfined compressive strength and elastic modulus of clay soil with bottom ash were presented by GEP modeling [26]. Therefore, Güllü’s study showcased the ability of using GEP as potential tool for developing formulations and functions for nonlinear variations. A similar study was carried out by Onyelowe et al. [27] to find out several important soil parameters for expansive soils, which was treated by an improved composites of rice husk ash (California bearing ratio and unconfined compressive strength). Their findings showcased impressive results using GEPs. In addition, Kalop et al. [28] and Iqbal et al. [29, 30] have investigated the usage of GEP to predict the degradation of tensile strength of glass fiber reinforced polymer rebars in reinforced concrete. They have showcased the applicability of the GEP in structural engineering application. However, Iqbal et al. [29] revealed that a higher accuracy can be achieved using ANFIS compared with ANN and GEP. However, case-by-case analysis is highly important with literature shown mixed results.

Furthermore, applying GEP with artificial neural network (NN) models and autoregressive integrated moving average (ARIMA) to predict oil prices, Mostafa and El-Masri [31] revealed that GEP outperforms traditional statistical techniques. Therefore, irrespective of the discipline, GEPs were successfully used in applied research.

Even though great success was achieved in applying GEP in regression, classification, automatic model design, combinatorial optimization, and real parameter optimization [32], GEP has been scarcely used in the field of wind power generation [33]. For example, GEP has produced better outcomes than the multiple linear regression, ANN, and genetic programming-simulated annealing in the prediction of electricity demand [34]. Nonetheless, the suitability of GEP is yet to be explored in forecasting wind power generation using meteorological factors. Addressing that research gap, the development of a GEP-based wind power forecasting model for the Pawan Danavi wind farm in Sri Lanka is presented in this paper. The ultimate goal of forecasting electricity generation is achieved by applying the time-dependent variables of wind speed and ambient temperature of the wind farm.

2. Gene Expression Programming

Gene Expression Programming comes under the umbrella of evolutionary algorithms and was first introduced by Ferreira [35]. One or multiple genes are encoded by a computer program in GEP, and therefore, they are a modified version of GP. They incorporate fixed-length linear strings to represent candidate solutions. They are later expressed as parse trees (GEP expression trees) with different sizes and shapes [35]. Each chromosome of GEP contains a list of symbols with fixed-length variables, arithmetic operations as a set of functions, and constants as terminal sets. However, each gene has two parts, including the “head” and the “tail.” Heads are formed by functions and terminals, whereas tails are only formed by terminals.

The functions, terminals, cost function, control parameters, and termination criteria are considered to be the main components of GEP. GEP generates a random population and converts each individual into an expression tree to represent the solutions in the form of a mathematical expression [35]. The target is then compared with the predicted one, and the fitness value of each individual entity is determined. The individuals are selected using the method of roulette wheel sampling, and the best survival individuals are passed to the next generation. This process is repeated until the best survival chromosome with the highest fitness is achieved. Once the best fitness value is reached, modeling is terminated (Figure 1).

The main advantage of GEP over traditional methods is random generation of functions and selecting the best fit without using predefined functions in the modeling [35]. The nature of GEP allows the evolution of more complex programs composed of several subprograms. Moreover, the results of the GEP analysis provide transparent programs.

3. Pawan Danavi Wind Farm and Data

Pawan Danavi wind farm is located in Kalpitiya (8.2382 N, 79.7576 E), Sri Lanka (shown in Figure 2), which is an area identified as one of the better geographical locations to have wind throughout the year and thus to have a wind farm [36]. Power generation at the wind farm was started in 2012 by installing 12 wind turbines, each with a nameplate capacity of 10.2 MW, adding up a total generation capacity of 850 kW. The turbine blades have a blade diameter of 58 m, and they are placed at a height of 65 m. The generated electricity is connected to the national electricity grid of the country.

The monthly power generation data of 5 years (60 data sets from January 2015 to December 2019) were obtained from the Wind Farm authorities (Lanka Transformers Private Limited, Sir Lanka). The power generation during the said period is illustrated in Figures 3(a)3(e). A clear annual pattern can be observed in the power generation. The months from November to April are with lower power generations, whereas the peaks can be seen around July. The whole country experiences some high winds over the months of June–August. Therefore, a relationship can be clearly seen between wind power generation and wind potential.

During the past 5 years, the power generation varied between 113 kW and 3,064 kW with a mean of 1,062.2 kW. Significant statistical parameters for the input and output data sets are given in Table 1. Standard deviation is high and validates the monthly and seasonal variations of the power generation with respect to the various climatic factors.

The skewness of the power generation data set is 0.6, and thus, the distribution is moderately skewed. However, the kurtosis is negative; therefore, the data set has less in the tall than the normal distribution. This showcases the flatness of the data set.

Peiris et al. [12] have clearly stated that climate variables like mean wind speed and mean ambient temperature have a significant relationship to the corresponding wind power generation. Therefore, monthly climatic data for wind speed and mean ambient temperature were obtained from the in-house meteorological station at power generation for the corresponding time. Figure 4 showcases the monthly mean wind speed and the monthly mean ambient temperature on-site over time. The observational statics show that the wind speed varies between 2.4 m/s and 11 m/s, and temperature varies between 33.4°C and 41.7°C. The other statistical parameters are given in Table 1.

4. Methodology

As stated, Ekanayake et al. [11] and Peiris et al. [12] have carried out extensive analysis on identifying the relationships between wind power generation to wind speed and ambient temperature. They have used the following generalized relationship to predict the power generation from the independent variables. ANN, MLR, and PR were used in those analyses.

A similar relationship (Equation (1)) was formed in this paper using GEP to determine the prediction capabilities and then to compare the prediction accuracy with ANNs and regression models. The computer program of GeneXproTools 5.0 was used to implement the GEP model considering independent variables as monthly mean wind speed and monthly mean ambient temperature. The terminals were determined including the input variables. Mathematical functions of +, −, , /, Exp, Ln, Inv, X2, 3Rt, Min2, Max2, Avg2, Not, Atan, and Tanh were used in this research study. The fitness function was defined in terms of the root mean squared error (RMSE) calculated as shown in following equation:where is the value predicted by model for record and is the target value for record . When a model fits perfectly,  =  resulting  = 0.

The possible solutions are represented as chromosomes with special architecture in GeneXproTools. This special architecture consists of genes and a linking function to link the genes. The genes are composed of three gene domains as head, tail, and constant domains. The addition was used as the linking function to link the mathematical terms encoded in each gene. The genetic operators under the optimal evolution strategy were selected for the modeling. The parameters used in developing the GEP model are summarized in Table 2. Since this is a regression problem, RMSE, a common fitness function for solving regression problems, was considered as the fitness function. The plots of fitness provided with GeneXproTools were used manually to decide the stop condition. The data set was randomly split into two sections as training set and validation set in a ratio of 3 : 1, respectively. The model was then evaluated considering 30-fold cross-validation (CV) accuracy.

The performance of the prediction model was evaluated in terms of several statistical indicators. The degree of correlation (R2) between the actual and predicted power values was calculated by using the following equation:where and i are the actual power generation and the predicted power generation, respectively. and are the corresponding means or power generations and is the number of data points. In addition, the ratio of RMSE to the standard deviation of actual data () was checked for power generation. The ratio is given in the following equation:where is the standard deviation of actual power generation. Furthermore, the index of agreement () was calculated to check the accuracy of the developed model. This is given in the following equation:

In addition, mean absolute error (MAE) was calculated between the predicted and observed power generations. The MAE is given in the following equation:

Furthermore, the relative variable importance of the two variables (wind speed and temperature) was considered to evaluate the effect of each variable on the model accuracy. For a given variable, the variable importance is computed by first randomizing its values and then calculating the decrease in the between the predicted and actual power values. Finally, the results are normalized such that all variable importance values are added up to 1.

5. Results and Discussion

Figures 5(a)5(c) show the subexpression trees corresponding to the developed GEP model. The final model was developed by cumulating all the subexpression trees. The to are the various constants used in the model (given in Equation (6)), whereas and are the wind speed and temperature, respectively.

The predicted wind power was plotted against actual wind power generations as shown in Figure 6. Figure 6(a) showcases the relationship in the training data sets, while Figure 6(b) presents that of in test data sets.

The coefficient of determination (R2) clearly indicates the accuracy of the developed GEP model in wind power prediction. The values are above 0.9, and therefore, it is well understood the robustness of the model. In addition, the dashed line (in orange) showcases the 45° line, where the predicted and actual power generations are equal to each other. The trend lines based on the model data are almost overlapped the perfect match. Therefore, this is a good indication of the quality of the developed prediction model.

In addition, RMSE, which is a measure of the residual variance, was calculated and found to be 0.259 MW for training and 0.239 MW for testing. Furthermore, RSR was equal or smaller than 0.29 (RSR = 0.29 for training and RSR = 0.26 for testing). This is an indication of the accuracy of the GRP model. The calculated MAE is 0.17 MW (for both training and testing), which is rather a lower error compared with the power generation. The index of agreement (IA) presented a value of 0.977 for training and 0.982 for testing, which is closer to 1. This indicates the nearly perfect fit of the prediction model. In addition, the Nash number and bias were calculated and found to be 0.915 and 0.931 and 2% and 4.2%, respectively. These statistics would also showcase the performance of the GEP model that was developed. They are presented in Table 3 as a summary.

Figure 7 presents the variable importance of two variables: wind speed (d0) and temperature (d1). It shows that the wind speed has higher variable importance than that of temperature in both training and testing processes. This observation further reflects through the histograms of the 2 variables, where wind speed demonstrates higher variation than temperature (refer to Figure 8) and the correlation plots (refer to Figure 9).

The CV performance of the model is given in Table 4. It can be clearly seen that the best and average CV fitness values are the same. This is not only for training but also for validation processes of the GEP model. The R2 values demonstrate the same behavior, and Table 2 further justifies the accuracy of the model.

The objective of this study was set at expressing wind power generation as a function of wind speed and temperature. The mathematical expression given in Equation (6) is obtained at the end of the GEP process to achieve the objective of the research. It defines the relationship between input variables and the output variable.where , , and are defined as follows:where is the wind speed, is the temperature, and numerical values of the optimized coefficients are

Though the derived function in GEP is complex, a simple computer program is sufficient to calculate its outcome. Therefore, the power generation with respect to the wind speed () and temperature () can be easily calculated.

As stated in the introduction, researchers have carried out wind power forecasting of Pawan Danavi wind farm using MLR, PR, and ANN [11,12]. The performance of the GEP-based model is compared with the forecasting models developed by applying the aforementioned techniques in Table 5. The GEP model is comparable with the machine learning and statistical models. However, ANN performs to the best (highest R2and lowest RMSE). Nevertheless, ANN needs more calculations [37] to express the wind power generation in terms of the input variables, that is, whether indices as shown in Equation (6). Therefore, the GEP model with slightly lower R2 and slightly higher RMSE values can be considered an acceptable forecasting model. In addition, GEP outperforms the MLR and PR in terms of R2 and RMSE values.

6. Conclusions

A wind power forecasting model based on GEP is developed in this research work for one of the major wind farms in Sri Lanka, namely, Pawan Danavi. The results showcase the robustness of the proposed model. The process is computationally less expensive and reliable. The comparative results conclude the accuracy of the forecasting model. The ANN-based approach had the highest accuracy, while GEP-based model ranked second in accuracy. However, it is hard to interpret the model produced by ANN, while GEP-based model results in an interpretable model with a comparable accuracy. The analysis of the variable importance shows that the wind speed has more importance than the temperature. Though the relationship concluded between wind power generation and climatic factors of wind speed and ambient temperature is complicated, potential wind power generation can be forecast with the available climatic parameters following a simple subroutine. Therefore, the GEP-based forecasting model would reach more attention from the stakeholders of the wind farm. Nevertheless, more data collection would be better for a solid performance of the modeling process and thus the interpretation. In addition, higher resolution data would produce a better understanding of the prediction. Furthermore, it would be better to understand the performance of the turbines with time. The model that was developed has not considered the service gaps of turbines. Long-term service breaks can influence the accuracy of the model.

The projected climatic data can be extracted, and bias corrected to match realistic future climatic data. These data can be fed to the proposed GEP-based model to forecast futuristic wind power generation. The projected wind power values can be well used by relevant authorities to bring sustainable solutions to the energy demand of Sri Lanka. Similar models can be developed for other wind farms as well to demonstrate their importance. Furthermore, more climatic zones in Sri Lanka can be investigated with the proposed model to assess the wind power potential. The model accuracy can be further improved by feeding more data in modeling.

Data Availability

All data used in this study may be available from the corresponding author upon request only for research purposes after obtaining permission from Lanka Transformers Limited (LTL Holdings), Sri Lanka.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are thankful to Lanka Transformers Limited (LTL Holdings), Sri Lanka, for providing data.