#### Abstract

Since wind power is directly influenced by wind speed, long-term wind speed forecasting (WSF) plays an important role for wind farm installation. WSF is essential for controlling, energy management and scheduled wind power generation in wind farm. The proposed investigation in this paper provides 30-days-ahead WSF. Nonlinear Autoregressive (NAR) and Nonlinear Autoregressive Exogenous (NARX) Neural Network (NN) with different network settings have been used to facilitate the wind power generation. The essence of this study is that it compares the effect of activation functions (namely, tansig and logsig) in the performance of time series forecasting since activation function is the core element of any artificial neural network model. A set of wind speed data was collected from different meteorological stations in Malaysia, situated in Kuala Lumpur, Kuantan, and Melaka. The proposed activation functions tansig of NARNN and NARXNN resulted in promising outcomes in terms of very small error between actual and predicted wind speed as well as the comparison for the logsig transfer function results.

#### 1. Introduction

Not only is the world’s total consumption of electricity rapidly increasing, but also the greenhouse gas (GHG) emission is increasing by the power generation from fossil fuels. Moreover, the world electricity generation rate (2.7% average annual) is increasing from 2003 to 2015 and it will continue until 2030 [1]. However, approximately 40% of GHG emissions of the world’s total emissions are from electricity generation where most of the industries use fossil fuels, namely, coal and oil [2]. GHG emission is considered to be hazardous for the human race, and fortunately fossil fuels can be omitted by renewable energy sources, namely, wind, solar, biomass, and rain, to name a few. Demand of wind energy is increasing to overcome the greenhouse effect and make efficient usage of the surrounding energy resources. Because of the free-cost nature and availability, the wind energy is considered to be the most efficient and technologically advanced renewable energy source accessible [1]. The site selection for wind turbine installation is very crucial to obtain maximum wind energy production, and the maximum wind power generation can be achieved when the available wind speed is higher than the wind turbine’s cut-in wind speed. In addition, the relation between wind speed and wind power is cubic proportional; therefore, slight change of wind speed will give much higher wind power (cubic). Consequently, progress in wind speed prediction for wind energy conversion system will help lessen the risks to install wind turbines in low-effective places.

Although the wind speed is the most challenging factor for wind power generation, the variation of wind speed found in nature is chaotic. Sometimes, wind turbine can be affected by high cut-out wind speed, i.e., the production of wind power generation is stopped when wind speed is very high. The WSF plays a very important role for optimum planning and wind energy applications. Time series forecasting of the wind speed is defined by wind data over time. One-month-ahead wind speed forecasting data can be developed by historical weather or wind data [3]. Basically, forecasting of wind speed can be divided into four-time categories: very short-term (VST), short-term (ST), medium-term (MT), and long-term (LT) forecasting. Meanwhile, VST refers to less than 30-minutes-ahead of WSF. In real time, wind turbine can be controlled by ST wind speed forecasting; moreover, less than 72 hours to 1 hour resides in ST forecasting [4], and planning of load dispatch can be employed by ST forecasting. On the other hand, 6 hours to 1-day-ahead resides in MT wind speed forecasting, which helps to manage power system and secure operation of wind turbines. Lastly, LT forecasting is useful to optimize the operation cost and schedule maintenance. It can also be applied to save cost when operators need to schedule wind project maintenance and construction. Wind projects often require the turbines to be taken down during the commissioning of new turbines, and this can take from hours to weeks depending on the weather. LT forecasting of wind speed can minimize the scheduling errors and in turn increase the reliability of the electric power grid and reduce the power market ancillary service costs [5–7]. The forecasting process of wind speed is very difficult as wind speeds are chaotic depending on the earth’s rotation and properties of topographical condition such as temperature and pressure. Methodologically, wind speed prediction can be classified into four groups, i.e., physical, statistical, artificial intelligence (AI), and hybrid methods [6, 8]. In this study, AI, namely, NAR and NARX, neural network has been chosen for wind speed forecasting due to higher forecasting accuracy and no mathematical model required.

Artificial neural network (ANN) is the most promising artificial intelligence. Neural network emulated not only the human brain but also knowledge gain through a learning process [9]. In the last few years, ANN has been proven to be a promising technique for time series prediction, assessment of energy, and pattern reorganization. For the application of time series forecasting, several ANN types are used, for instance, Nonlinear Autoregressive Exogenous Neural Network (NARXNN), Nonlinear Autoregressive Neural Network (NARNN), and Recurrent Neural Network (RNN). In this paper, NARNN and NARXNN have been used to execute wind speed forecasting for the chosen areas in Malaysia. Since the real-world happenings are dynamic and depend on their current state, only nonlinear system can properly depict them. In such dynamic and nonlinear cases, neural network structures such as the dynamic Recurrent Neural Network (RNN), the Nonlinear Autoregressive (NAR), and the Nonlinear Autoregressive Neural Network with Exogenous inputs (NARX) are very advantageous. One of the major benefits of such structures is that they can accept dynamic inputs represented by time series sets. Neural network (NN) is a non-parametric method for a time series forecasting where the knowledge of the process that generate time series is not crucial. (Please delete 'Time series forecasting using neural network (NN) is such non-parametric method). Although the NAR and NARX model uses the past values of the time series to predict future values, the RNN model does not need past time series values as inputs nor delays [10, 11].

Several researchers have reported different ANN model for WSF ranging from few seconds to more than one year ahead. Guo et al. [12] proposed the hybrid backpropagation neural network for WSF one-year-ahead in order to remove seasonal effects of wind speed from 2001 to 2016 in Minqin, China, and their proposed BPNN shows lower mean absolute percentage error (MAPE) of 28.16% in comparison to single BPNN. Lui and coauthors [13] have employed a hybrid Empirical Mode Decomposition and artificial neural networks (MED-ANN) to forecast and eliminate randomness of wind speed. For WSF, a highly satisfied result was obtained with ANN compared to that of the Autoregressive Integrated Moving Average (ARIMA) method. Masseran et al. [14] have considered 10 wind stations to find out the most potential areas in Malaysia for wind speed forecasting. Although the existing wind speed in Malaysia is quite low compared to other countries, Mersing has found to have considerably higher wind speed than other wind station places in Malaysia, i.e., around 18.2% of power is produced from Mersing wind station. One-day-ahead WSF has been done by Li and Shi [15] using three ANNs in North Dakota, United State of America (USA). Azad et al. [6] considered two meteorological stations in Malaysia for long-term WSF using ANN. They found lower mean absolute error (MAE) of 0.8 ms^{−1} using their proposed algorithm. Short-term WSF at La Venta, Oaxaca in Mexico, was practiced by Cadenas and Rivera [16] using ANN. The accuracy of the proposed ANN is satisfactory based on their error level, i.e., MAE (0.0399) and MSE (0.0016). In addition, Candenas and Rivera [17] have proposed a hybrid ARMIA-ANN model for average WSF in Mexico in 2010 for three places in Mexico. The accuracy of hybrid model was higher than that of single ARMIA and ANN. Jiang et al. [18] applied v-SVM model for WSF to overcome the similar fluctuation information between the adjacent wind turbine generators. The proposed Variant Support Vector machine (v-SVM) has shown better accuracy in comparison to Epsilon Support Vector machine (-SVM) model. Men et al. [19] applied mixture density neural network (MDNN) for ST wind speed and wind power forecasting in Taiwan using wind farm data. The MDNN had three-layer architecture where different numbers of hidden layers and nodes were used for each layer, and this method was effective for multistep-ahead wind power and wind speed forecasting.

In terms of various activation functions in NN, B. Karlik and A. V. Olgac [20] have used Bi-polar sigmoid, Uni-polar sigmoid, Hyperbolic tangent (tansig), Conic Section, and Radial Bases Function (RBF) for the evolution of Multi-Layer Perceptron (MLP) architecture along with Generalized Delta rule learning. They have found that activation function Hyperbolic tangent (tansig) was more accurate than the other functions at 100 and 500 iterations. (Please delete 'In addition, tansig achieved more accuracy to other four activation functions at 100 and 500 iterations’). Regression problem can be solved by Random Vector Functional Link Neural Network (RVFLNN) where statistically tansig function prefers superior result compared to the other two functions (logsig, tribas) [21]. Activation function of ANN applied to forecast flows at the outlet of a watershed that is located in Khosrow Shirin watershed in Iran. They found superior result with tansig-ANN to compare logsig-ANN and conventional hydrological model [22]. Moreover, tansig-ANN provided 94% accuracy than logsig-ANN (84% accuracy) for psychological variables in ascertaining potential archers [23]. M Vafaeipour et al. [24] investigated wind velocity prediction using neural network with two activation functions in Tehran, Iran, and found tansig activation function works better than logsig activation function. Their suggestions were based on mean square error (MSE), root mean square error (RMSE), and correlation coefficient (R) performance indicators.

The most effective way of long-term WSF has been found to be AI methods since they do not require mathematical model other than their own universal algorithm for future time series prediction. So, this paper uses NARNN and NARXNN for WSF. With both of these networks, two different activation functions, namely, “tansig” and “logsig,” have been used separately in this study to find the most suitable one for NARNN and NARXNN. In previous studies, both of these activation functions were widely used for various neural network applications including WSF. However, no study has not yet explored the effect of different activation functions in time series networks to find the most effective one. It is known that activation function is a core component in any neural network model, because they add nonlinearity and enable the network to converge during backpropagation. So, if one activation function is better than the other, it would significantly enhance the time series prediction performance of the network by enhancing the derivative and thus the converging performance. This is why the contribution of this study is that it examined the performance of two activation functions: hyperbolic tangent sigmoid (tansig) and logistic sigmoid (logsig) when used in different time series networks such as NAR and NARX with different time series datasets but with the same network parameters and architectures. Such analysis will reveal if any of the activation functions consistently perform better than the other in different conditions, so that future researchers choose the proper activation functions while conducting neural network-based time series forecasting tasks.

Here, the actual wind speed data from meteorological department of Malaysia has been used for training and testing of NAR and NARX neural networks. To evaluate the proposed models, indicators, namely, MAE, MAPE, and RMSE, have been used.

The layout of the paper is as follows: wind speed in Malaysia is presented in Section 2. Artificial intelligence models are described in Section 3. Accuracy of evolution method is displayed in Section 4. In Section 5, the results and discussion are presented. Finally, the conclusion of this study is presented in Section 6.

#### 2. Wind Speed in Malaysia

All countries are heavily dependent on the energy sector in their development processes, and the world's demand for energy is increasing day by day. According to the British Petroleum, the utilization of primary energy has expanded to 2.2% from year 2013 to 2017. In energy consumption, the largest augmentation among the fuel types is the natural gas and then oil. Notwithstanding, the requirement of energy, renewable energy, still does not have huge quota in total energy portfolio juxtaposed to nonrenewable energy. In 2017, the most consumed global energy source is oil which was approximately 34.2% compared to other energy sources. Likewise, Malaysia is also highly dependent on fossil fuel which is over 90% to generate energy due to the lack of renewable energy sources. Thus, the Malaysian government is emphasising on renewable energy for power generation, in particular, wind energy projects, since wind energy evolution has major drawback because Malaysia is settled in a low wind speed zone [28].

Some of the projects do not carry out to their desired destination. SK Najid et al. proposed a 150 kW wind turbine project at Pulau Terumbu Layang-Layang in Sabah. According to them, that was the first wind turbine installed in Malaysia [29]. That proposed project was expanded by Universiti Kebangsaan Malaysia (UKM), and it was combined with diesel system that generated power supply to an army base and the nearest resort. Moreover, they extended the study for potential wind speed analysis to compare other areas of Malaysia. In the study, Pulau Terumbu was considered to be a promising potential area for wind power generation compared to other areas in Malaysia [30]. In addition, the most famous wind turbine project was installed at Tenaga Nasional Berhad (TNB) in Perhentian Island. This 100 kW project was hybridized with 100 kW photovoltaic and 100 kW diesel generator set. They recorded the maximum and minimum wind speed of 3.6 m/s and 15.6 m/s, respectively [31].

Malaysia is a country located in the south-east part of Asia. It is surrounded by Thailand, Indonesia, and Brunei borders. In Malaysia, total coastline area is about 4675 km which is the longest in the world [32]. For this reason, Malaysia concedes the importance of RE (Renewable Energy) as a source of generating electricity instead of fuel. A program known as Small Renewable Energy Power (SREP) program had been embraced for boosting up the evolution of RE but unfortunately the results were not acquired the way they should be. The development pace of RE is slow with the total volume of electricity generated from RE is still small. After that, Malaysian parliament passed Renewable Energy Act 2011 (Act 725) (a national energy policy) in 2011 for implementation [33–36]. In the year 2015, the wind power production target was 985 MW as reported. However, it produced around 400 MW earlier in 2015. In addition, the success percentage (50%) was fulfilled to the original target [37] as it was reported that the target was impossible to achieve. Other than that, the target for year 2015 was 985 MW, while 2020 and 2030 are projected to contribute 2080 MW and 4000 MW, respectively. In Malaysia, wind energy project is employed only for education research purpose.

The climate of Malaysia is categorized by four seasons: first intermonsoon (April), southwest monsoon (Mid-May to September), the second intermonsoon (October), and northeast monsoon (November to March) [6, 14]. In Malaysia, the wind flow is uniform and the maximum wind flow occurs in the afternoon and the minimum wind flow occurs before sunrise. Figure 1 shows average wind speed each month in Kuala Lumpur, Melaka, and Kuantan. The average wind speed is 6-12 km/h of all places. The wind data with one-hour interval have been collected from the Malaysian Meteorological Department (Table 1) over a period of 4 months from January to April in 2017.

#### 3. Artificial Neural Network

##### 3.1. Nonlinear Autoregressive Neural Network

The application of time series has been characterized by chaotic wind speed. The linear mathematical model is difficult to predict the wind as it varies randomly in real environment. Thus, the fleeting transient and the higher variation wind speed needs to be predicted by nonlinear model as in (1). For this, NARNN can be used for effective nonlinear time series forecasting. The NARNN can be defined as given in [38, 39]. where y is the data series of wind speed at time, is the input delay of wind speed series, and denotes a transfer function. The training of the neural network aims to estimate the function by means of the optimization of the network weights and neuron bias. The y series of wind speed has been determined by approximation of the term which stands for error tolerance. Endogenous input of NARNN can be expressed as follows, given in [10, 40]:where delay of input . NARNN consists of one input layer, one or more hidden layer(s), and one output layer. NARNN is dynamic and recurrent with connection of feedback as shown in Figure 2. Both hyperbolic tangent (tansig, (3)) and sigmoid (logsig, (4)) function have been implemented here using MATLAB with narnet() built-in function for NARNN to compare the network accuracies for wind speed forecasting. These MATLAB functions were implemented with their default settings as given in [41]. To obtain better performance from the network, topology of NARNN was optimized by trial and error. It should be noted that the system will be complex by an increased number of neurons. The trial and error procedure has found that the single hidden layer with 20 hidden neurons yields the best accuracy. Levenberg-Marquardt Backpropagation (LMBP) has been chosen as the only training algorithm of NARNN as it is fast and more accurate than other training algorithms [42]. Although in previous studies logsig transfer function has been used intensively, tansig function offers more advantages, i.e., it provides stronger gradients than logsig and thus reduces the chance of saturation of neurons. In addition, tansig function avoids ‘biasing’ of the gradients as explained in [43]. Moreover, according to [44], networks with large amount of connectivity (as presented in this study) get trained faster with backpropagation algorithm when an antisymmetric activation function, e.g. tansig function, is used. In this study, ‘trainlm’ function of MATLAB has been used with defaults setting for the LMBP [45]. Since the one-step-ahead value is being forecasted, an open loop (series-parallel) structure is used instead of close loop (parallel) structure as it is typically used for multistep-ahead forecasting:

##### 3.2. Nonlinear Autoregressive Exogenous Neural Network

The Nonlinear Autoregressive Exogenous input is to predict time series which is proposed in [46]. For this, NARXNN can be used for effective nonlinear time series forecasting. The time series of NARXNN can be defined as follows [10]:where past value p is predicted time series y(t) and it has another external time series which is defined as x(t). The external time series x(t) has a single dimension or is multidimensional. The NARXNN prediction is based on the last output values with exogenous input for future values estimation. In this study, wind speed is used as input time series at time t-1, y(t-1) and temperature [47, 48] which is used as exogenous input at time t-1, x(t-1). The single output is y(t). The NARXNN and NARNN are almost similar. Temperature is used as an external input in NARX. Figure 3 shows the architecture of NARXNN.

#### 4. Accuracy of Evolution Method

The prime goal of WSF is to obtain a satisfactory accuracy using NARNN and NARXNN, thus selecting potential areas for further wind turbine installation. The accuracy of WSP can be determined by (6)-(8). Here, three indicators have been used, namely, mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE), for the long-term WSP [49, 50]. Mean absolute error is expressed as [51]MAE is used as an uncertainty measurement indicator to assess the risk of trusting in the prediction. The MAE is a measure of the average of the absolute error whose advantage is that it is easier for nonspecialists to understand [6].

Mean absolute percentage error is expressed as [52]The MAPE makes the comparison of results between the two models easier because it is percentage-based [53].

Root mean square error is expressed as [54]where and are the real and predicted of wind speed, respectively, and n is defined by the number of data.

#### 5. Results and Discussion

Firstly, the wind speed data has been collected from Malaysian Meteorological Department (MMD) of three regions in Malaysia from January to April, 2017. The wind speed data was taken in one-hour interval for all the places. The main objective of this study is to compare the performance of activation functions of NAR and NARX neural network for 1-month-ahead WSF for three different regions in Malaysia. The first three months of wind speed data have been used for training and last one-month data have been used for testing in NARNN and NARXNN, respectively. The process of WSF by NARNN and NARXNN is shown in Figure 4. The only difference is in input; henceforth, output y(t) takes account of the external data as it appears in (5). Despite the flexibility of NARX to model exogenous input to help improve results by modelling external dependencies, NAR models are a good alternative because of their simplicity, as discussed in [10, 55]. In the experimental section, we make both approaches.

Figures 5 and 6 show WSF of these places when using tansig and logsig transfer function of NAR and NARX, respectively. From Figures 5(a) and 5(b), it can be seen that the tansig function results in greater accuracy in WSF (MAE 0.014, MAPE 14.79%, and RMSE 1.102) than logsig function (MAE 0.041, MAPE 16.78%, and RMSE 1.281) for Kuala Lumpur based on Table 3. The accuracy of tansig function (MAE 0.025, MAPE 19.27%, and RMSE 1.15) is greater than the logsig function (MAE 0.134, MAPE 28.84%, and RMSE 1.788) which can be shown in Figures 5(c) and 5(d) for Kuantan. As shown in Figures 5(e) and 5(f), a better precision of tansig function (MAE 0.029, MAPE 10.79%, and RMSE 0.583) is obtained in comparison to logsig function (MAE 0.339, MAPE 11.03%, and RMSE 0.858) for Melaka.

**(a) Tansig (KL)**

**(b) Logsig (KL)**

**(c) Tansig (Kuantan)**

**(d) Logsig Kuantan**

**(e) Tansig (Melaka)**

**(f) Logsig (Melaka)**

**(a) Tansig (KL)**

**(b) Logsig (KL)**

**(c) Tansig (Kuantan)**

**(d) Logsig (Kuantan)**

**(e) Tansig (Melaka)**

**(f) Logsig (Melaka)**

Figure 6 shows a 1-month-ahead WSF at these places using tansig and logsig functions of NARXNN. For Kuala Lumpur, the performance of tansig function (MAE 0.046, MAPE 14.22%, and RMSE 1.231) is slightly higher than the logsig function (MAE 0.058, MAPE 12.04%, and RMSE 1.028), as shown in Figures 6(a) and 6(b). For Kuantan, as shown in Figures 6(c) and 6(d), the accuracy of logsig function (MAE 0.880, MAPE 22.55%, and RMSE 1.485) is lower than tansig function (MAE 0.550, MAPE 20.46%, and RMSE 1.212). For Melaka, the performance accuracy between two activation functions, namely, tansig and logsig, is MAE 0.434, MAPE 11.23%, and RMSE 0.853 and MAE 0.180, MAPE 15.15%, and RMSE 1.28, respectively, with tansig function outperforming logsig function in terms of accuracy, as shown in Figures 6(e) and 6(f).

The ratio of contraposition of predicted and measured values’ outcome can be defined as correlation of coefficient which is between -1 and 1. The R is presented as how well a regression model fits the data. The scattered values of predicted and measured wind speed have been shown in Figure 7. Two different activation functions of NARNN provide WSF results for these places. As expected, most of the predicted and measured values are around to the diagonal line in all cases. By using tansig function, the correlation coefficients in case of Kuala Lumpur, Kuantan, and Melaka were obtained as 0.9671, 0.9463, and 0.9703, respectively. By using logsig function, the correlation coefficients of Kuala Lumpur, Kuantan, and Melaka were obtained to be 0.9503, 0.8606, and 0.9645, respectively. From Figure 8, the correlation coefficients from tansig-NARX function were found to be near value 1 (Kuala Lumpur: 0.9665, Kuantan: 0.9288, and Melaka: 0.9780) whereby for logsig function, the correlation coefficients were more deviated from value 1 (Kuala Lumpur: 0.9514, Kuantan: 0.9115, and Melaka: 0.9561). Based on the above evaluation, the coefficient of correlation values for all cases were found in between 0.85 and 0.97 which are almost near to 1. Both tansig-NAR and tansig-NARX functions are displayed to be slightly better than the logsig-NAR and logsig-NARX for wind speed forecasting in all those places.

**(a) Tansig-KL (R = 0.9671)**

**(b) Logsig-KL (R = 0.9503)**

**(c) Tansig-Kuantan (R = 0.9463)**

**(d) Logsig-Kuantan (R = 0.8606)**

**(e) Tansig-Melaka (R = 0.9703)**

**(f) Logsig-Melaka (R = 0.9645)**

**(a) Tansig-KL (R = 0.9665)**

**(b) Logsig-KL (R = 0.9514)**

**(c) Tansig-Kuantan (R = 0.9288)**

**(d) Logsig-Kuantan (R = 0.9115)**

**(e) Tansig-Melaka (R = 0.9780)**

**(f) Logsig-Melaka (R = 0.9561)**

Figures 9 and 10 show the success rates of the forecasted results where the y axis of Figures 9 and 10 represents the number of instances, i.e. number of test datasets, and x axis represents the error in percentage. So, Figures 9 and 10 basically show how many test datasets (in percent) reside in low and high error region. It is noticeable that tansig function provides better success rates than logsig function for these places. At Kuala Lumpur, tansig-NAR provides the 85% success rate where the error percentage is 22%. While the logsig-NAR achieves 78% success rate at 20% error. In Kuantan area, the tansig-NAR provides around 57% instances at 15% error while the logsig-NAR comes around 55% instances within 22% error. For instance, using tansig, success rates in case of Melaka comes around 95% at 29% error. On the other hand, at 18% error, the instances of forecasting deliver 83%.

**(a) Tansig (KL)**

**(b) Logsig (KL)**

**(c) Tansig (Kuantan)**

**(d) Logsig (Kuantan)**

**(e) Tansig (Melaka)**

**(f) Logsig (Melaka)**

**(a) Tansig (KL)**

**(b) Logsig (KL)**

**(c) Tansig (Kuantan)**

**(d) Logsig (Kuantan)**

**(e) Tansig (Melaka)**

**(f) Logsig (Melaka)**

Figure 10 shows “percentage of instances” vs. “error bin in percentage,” for two activation functions of NARX. For instance, using tansig, success rates in case of Kuala Lumpur, Kuantan, and Melaka come around 85%, 64%, and 96%, respectively. For instance, using logsig, success rates in case of Kuala Lumpur, Kuantan, and Melaka come around 76%, 53%, and 64%, respectively. However, logsig-NARX provides the percentage of instance slightly higher than the tansig-NARX in Kuantan while the error percentage of tansig-NARX is better than logsig-NARX. It can be seen that tansig delivers better success rates than logsig for all three palaces.

Table 2 shows the key four parameters of NARNN and NARXNN, namely, epoch, time, performance, and number of hidden neurons. In this study, epoch and number of neurons were fixed, where the other two parameters were varied with input characteristics, i.e., fluctuation of wind speed. For NARNN, the neural network performance values of tansig function are 1.02, 1.21, and 1.34 for Kuala Lumpur, Kuantan, and Melaka, respectively. On the other hand, the logsig-ANN shows higher performance of 1.39 at Kuantan while the lower performance of 2.11 was delivered at Melaka. The tansig training function was completed at the shortest time, i.e., 55s. The performance of tansig function has showed the lowest value of 55s for Kuantan in comparison with Kuala Lumpur and Melaka. In terms of the operation time, 65s is needed for logsig function for Kuantan and KL, which is lower than Melaka. For NARXNN, the performance of neural network of tansig function has shown the lowest value at 1.09 for Kuantan as compared with the other two areas. For Kuantan, Kuala Lumpur, and Melaka, the neural network performance values of logsig function are 1.38, 1.36, and 1.29, respectively. By using tansig function, the operation time values taken for Kuala Lumpur, Kuantan, and Melaka are 66s, 99 s, and 78s, respectively. For logsig function, the operation time values of the same places are 70s, 112s, and 80s in that order. It can be concluded that, based on the above discussion, the performance of the tansig activation function has always made significant contribution not only to assessment of the logsig function, but also at the operation time neural network.

Three performance indicators are used to measure the accuracy of WSF for three different regions with two transfer functions of NARNN and NARXNN, as shown in Table 3. Firstly, by considering MAE, tansig function shows a better result in terms of MAE, i.e., 0.014 for KL as compared with the other two wind stations. The logsig training function provides the best result with MAE of 0.041 for KL wind station in comparison to Kuantan and Melaka. MAE results of both tansig-NARNN and logsig-NARNN are found to be lower than MAE of 0.8 m/s, which was provided by Azad et al. [6] for long-term wind speed forecasting at Malaysia. Secondly, considering the MAPE, the lowest MAPE value was found for Melaka when using tansig function (MAPE of 10.79). In addition, the MAPE values of Kuala Lumpur and Kuantan are 14.79 and 19.27, respectively. The lowest MAPE value among these places when using logsig function is 11.03 for Melaka station. Thirdly, by considering the RMSE, the tansig function provides a smaller value (RMSE of 0.583) for Melaka, whereby the other two areas show almost similar values of RMSE of around 1.15. Moreover, the logsig function shows the lowest RMSE value, i.e., 0.858 for Melaka. The RMSE value of Kuala Lumpur is almost similar to Kuantan (RMSE of around 1.788). For NARXNN, three performance indicators, namely, MAE (0.0317), MAPE (9.53), and RMSE (0.833), have showed lower values for tansig in comparison to the logsig transfer function as shown in Table 3. From Table 3, it can be decided that the tansig function displays lower error based on the RMSE, MAE, and MAPE indicators for wind speed forecasting.

In support of the above outcome, Table 4 shows the outcome of different studies that used both tansig and logsig activation functions for various forecasting tasks. It can be seen that the results of these studies also found tansig to be a better activation function. Therefore, it can be concluded that tansig activation function should be used in NAR and NARX neural networks to obtain a better accuracy on time series forecasting jobs. The primary reason is that logsig function is more prone to neuron-saturation. If an input value is large, logsig function makes the gradient close to zero, whereas tansig function provides much greater gradient. Therefore, for the same number of epochs, logsig function makes NARNN learn lesser than tansig function. This is why for the exact same epochs, topology, initial weights, and other similar settings, tansig always provides a better accuracy than logsig function, as presented above. Therefore, it can be said that the outcome of some previous similar studies on time series forecasting which used only logsig such as [56, 57] could have been better if tansig had been used.

#### 6. Conclusion

The forecasting of wind speed plays an important role in producing wind energy and it is one of the rapid growing renewable energy sources in the world. To improve and optimize wind power generation, an accurate forecasting of wind speed is an important key. Specifically, long-term speed forecast can help us enable model predictive management of wind turbines as well as real-time expansion of wind farm operation. Overall, the WSF is important for engineering, number of operations, and financial reasons. In this paper, accuracy of the proposed NARNN and NARXNN with two different activation functions, namely, tansig and logsig for WSF, is increased by using four statistical indicators such as MAE, MAPE, RMSE, and . It is observed that the most suitable model can be identified with the value of the indicators: MAE, MAPE, and RMSE. The average value of tansig-NARNN has given a promising result (MAE 0.0082, MAPE 11.39%, and RMSE 0.86) compared to that of the logsig-NARNN (MAE 0.0163, MAPE 15.36%, and RMSE 1.13). In addition, the average value of logsig-NARXNN (MAE 0.10, MAPE 15.40%, and RMSE 1.16) has provided a lower result than tansig NARXNN (MAE 0.06, MAPE 9.06%, and RMSE 0.53). The comparison between tansig and logsig functions was carried out in a standard benchmark by keeping the network settings (e.g., topology, number of epochs, number of hidden neurons, and initial weights) fixed. Since tansig function provides better results in both neural networks (NAR and NARX) at network settings with the same input data, it is therefore the suitable activation function compared to logsig function. The effectiveness of tansig-NARNN and tansig-NARXNN can be used for long-term wind speed forecasting based on error evolution.

Apart from the control and optimization of wind farm operation, forecasting the behaviour of the wind resources can provide valuable information for energy managers, energy policy makers, and electricity traders. Moreover, forecasting information can also help in times of operation, repair, and replacement of wind generators and conversion lines.

The tansig and logsig methods are compared and investigated for improving the performance of the proposed neural networks. It is to be mentioned that the performance of the ANNs is heavily dependent on the selection of activation functions. Moreover, compared to other activation functions, tansig can learn more effectively in the training process and was selected as the best nonlinear activation function for both the hidden and output layers of the NAR and NARX neural network to predict nonlinear wind speed environments. This is considered to be one of the most significant findings from this study. However, this study will help the practitioners to gain valuable knowledge about the ANN over the more widely used conceptual wind speed forecasting.

Although an immense number of research and development works are going on in this field, further investigations are required in the following areas to develop wind speed forecasting using tansig and logsig activation functions:(i)to apply more effective activation functions for similar applications(ii)to utilize activation functions at hybrid artificial neural network(iii)to develop short-term and very-short-term wind speed forecasting model in different areas in Malaysia.

#### Nomenclature

ANN: | Artificial neural network |

NAR: | Nonlinear Autoregressive |

NARX: | Nonlinear Autoregressive Exogenous |

WSF: | Wind speed forecasting |

GHG: | Greenhouse gas |

VST: | Very-short-term |

ST: | Short-term |

MT: | Medium-term |

LT: | Long-term |

AI: | Artificial intelligence |

RBNN: | Radial basis neural network |

RNN: | Recurrent Neural Network |

EMD-ANN: | Empirical Mode Decomposition and Artificial Neural Networks |

ARIMA: | Autoregressive Integrated Moving Average |

v-SVM: | Variant Support Vector machine |

ε-SVM: | Epsilon Support Vector machine |

MLP: | Multilayer Perceptron |

MDNN: | Mixture density neural network |

LMBP: | Levenberg-Marquardt Backpropagation |

MAPE: | Mean absolute percentage error |

MAE: | Mean absolute error |

RMSE: | Root mean square error |

R: | Coefficient of correlation |

R^{2}: | Coefficient of determination |

RVFLNN: | Random vector functional link neural network |

MMD: | Malaysian Meteorological Department |

KL: | Kuala Lumpur |

SREP: | Small renewable energy power program |

tansig: | Hyperbolic tangent sigmoid |

logsig: | Logistic sigmoid |

TNB: | Tenaga Nasional Berhad |

*Symbols*

y: | Data series |

n: | Input delay |

Error tolerance | |

y(t): | Time series |

x(t): | External time series |

: | Real wind speed |

: | Predict wind speed. |

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

The authors would like to express their gratitude to the Ministry of Higher Education of Malaysia and University Malaya (ERGS nos. ER0142013A, RP015C-13AET) and High Impact Research Grant (HIR-D000006-16001)) for funding and providing facilities to conduct the research.