Research Article  Open Access
Weight Optimization in Recurrent Neural Networks with Hybrid Metaheuristic Cuckoo Search Techniques for Data Classification
Abstract
Recurrent neural network (RNN) has been widely used as a tool in the data classification. This network can be educated with gradient descent back propagation. However, traditional training algorithms have some drawbacks such as slow speed of convergence being not definite to find the global minimum of the error function since gradient descent may get stuck in local minima. As a solution, nature inspired metaheuristic algorithms provide derivativefree solution to optimize complex problems. This paper proposes a new metaheuristic search algorithm called Cuckoo Search (CS) based on Cuckoo bird’s behavior to train Elman recurrent network (ERN) and back propagation Elman recurrent network (BPERN) in achieving fast convergence rate and to avoid local minima problem. The proposed CSERN and CSBPERN algorithms are compared with artificial bee colony using BP algorithm and other hybrid variants algorithms. Specifically, some selected benchmark classification problems are used. The simulation results show that the computational efficiency of ERN and BPERN training process is highly enhanced when coupled with the proposed hybrid method.
1. Introduction
Artificial neural network (ANN) is a wellknown procedure that has the ability to classify nonlinear problem and is experimental in nature [1]. However it can give almost accurate solution for clearly or inaccurately formulated problems and for phenomena that are only understood during experiment. An alternate neural network approach is to use recurrent neural networks (RNN), which have inside feedback loops within network allowing them to store previous memory to train past history [2–5]. The RNN model has been implemented in various applications, such as forecasting of financial data [6], electric power demand [7], tracking water quality and minimizing the additives needed for filtering water [7], and data classification [8]. In order to understand the advantage of the dynamical processing of recurrent neural networks, researchers have developed an amount of schemes by which gradient methods and, in particular, back propagation learning can be extended to recurrent neural networks [8].
Werbos [9] introduced the back propagation through time approach approximating the time evolution of a recurrent neural network as a sequence of static networks using gradient methods. The simple recurrent networks were first trained by Elman with the standard back propagation (BP) learning algorithm, in which errors are calculated and weights are updated at each time step. The BP is not as effective as the back propagation through time (BPTT) learning algorithm, in which error signal is propagated back through time [10]. However, certain properties of the RNN make many of the algorithms less capable, and it often takes huge amount of time to train a network of even a moderate size. In addition, the complex error surface of the RNN network makes many training algorithms more flat to being trapped in local minima [5]. The back propagation (BP) algorithm is the wellknown method for training network. Otherwise, the BP algorithm suffers from two main drawbacks, that is, low convergence rate and instability. They are caused by a possibility of being trapped in a local minimum and view of overshooting the minimum of the error surface [11–17].
To overcome the weaknesses of the above algorithms, there have been many researches on dynamic system modeling with recurrent neural network. This ability of dynamic modeling system formulate a kind of neural network that is more superior to the conventional feed forward neural networks because the system outputs are function of both the current inputs as well as their inner states [18, 19]. Ahmad et al. in [20] investigated a new method using fully connected recurrent neural network (FCRNN) and back propagation through time (BPTT) algorithm to observe the difference of Arabic alphabetic like “alif” to “ya.” The algorithm is also used to improve the people’s knowledge and understanding of Arabic words using the proposed technique. In 2010, Xiao, Venayagamoorthy, and Corzine trained recurrent neural network integrated with particle swarm optimization (PSO) and BP algorithm (PSOBP) to provide the optimal weights to avoid local minima problem and also to identify the frequency dependent impedance of power electronic system such as rectifiers, inverter, and ACDC conversion [5]. The experimental results described that the proposed method successfully identified the impedance characteristic of the threephase inverter system, which not only can systematically help avoiding the training process being trapped in local minima, but also has better performance as compared to both sample BP and PSO algorithms. Similarly, Zhang and Wu [21] used adaptive chaotic particle swarm optimization (ACPSO) algorithm for classification of crops from synthetic aperture radar (SAR) images. During simulations, ACPSO was found to be superior to the back propagation (BP), adaptive BP (ABP), momentum back propagation (MBP), particle swarm optimization (PSO), and resilient back propagation (RPROP) methods [21]. Aziz et al. [22] carried out a study on the performance of particle swarm optimization algorithm with training Elman RNN to discover the classification accuracy and convergence rate compared with Elman recurrent network with BP algorithm. Based on the simulated result it is illustrated that the proposed Elman recurrent network particle swarm optimization (ERNPSO) algorithm is better than the back propagation Elman recurrent network (BPERN) in terms of classification accuracy. However in terms of convergence time the BPERN is much better than the proposed ERNPSO algorithm.
Cheng and Shen [23] proposed an improved Elman RNN to calculate radio propagation loss, with threedimensional parabola equation method in order to decrease calculation time and to improve approximation performance of the network. Based on results the proposed improved Elman networks show an efficient and feasible performance to predict propagation loss compared with the simple Elman RNN. However, the Elman RNN loses necessary significant data to train the network for predicating propagation. Wang et al. [24] used the Elman RNN to compute the total nitrogen (TN), total phosphorus (TP), and dissolved oxygen (DO) at three different sites of Taihu during the period of water diversion. The conceptual form of the Elman RNN for different parameters was used by means of the principle component analysis (PCA) and validated on water quality diversion dataset. The values of TS, TP, and DO calculated by the model were intimately related to their respective values. The simulated result shows that the PCA can efficiently accelerate the input parameters for the Elman RNN and can precisely compute and forecast the water quality parameter during the period of water diversion, but not free of local minim problem.
In [25], the proposed LM algorithm based on Elman and Jordan recurrent neural network has been used to forecast annual peak load of Java, Madura, and Bali interconnection for 2009–2011. The study is carried out to check the performance of the proposed LM based recurrent network with respect to their forecasting accuracy over the given period. From the simulation results, it is clear that the proposed LM based recurrent neural network has better performance than the LM based feed forward neural network. After reviewing the above algorithms, it is found that the traditional ANN training has a drawback, that is, slow speed of convergence, which is not definite to find the global minimum of the error function since gradient descent may get stuck in local minima.
To overcome the weaknesses and to improve the convergence rate, this work proposes new hybrid metaheuristic search algorithms that make use of the Cuckoo Search via Levy flight (CS) by Yang and Deb [26] to train Elman recurrent network (ERN) and back propagation Elman recurrent network (BPERN). The main goals of this study are to improve convergence to global minima, decrease the error, and accelerate the learning process using a hybridization method. The proposed algorithms called CSERN and CSBPERN imitate animal behaviour and are valuable for global optimization [27, 28]. The performance of the proposed CSERN and CSBPERN is verified on selected benchmark classification problems and compared with artificial bee colony using BPNN algorithm and other similar hybrid variants.
The remainder of the paper is organized as follows. Section 2 describes the proposed method. Result and discussion are explained in Section 3. Finally, the paper is concluded in Section 4.
2. Proposed Algorithms
In this section, we describe our proposed Cuckoo Search (CS) to train Elman recurrent network (ERN) and back propagation Elman recurrent network (BPERN).
2.1. CSERN Algorithm
In the proposed CSERN algorithm, each best nest represents a possible solution, that is, the weight space and the corresponding biases for ERN optimization. The weight optimization problem and the size of a population represent the quality of the solution. In the first epoch, the weights and biases are initialized with CS and then those weights are passed on to the ERN. The weights in ERN are calculated. In the next cycle, CS will update the weights with the best possible solution and CS will continue searching the best weights until either the last cycle/epoch of the network is reached or the MSE is achieved. The CS is a population based optimization algorithm; it starts with a random initial population. In the proposed CSERN algorithm, the weight space and the corresponding biases for ERN optimization are calculated by the weight matrices given in (1) and (2) as follows:where th weight value in a weight matrix. The in (1) is the random number in the range , where is any constant parameter for the proposed method and it is less than 1, and is a bias value. Hence, the list of weights matrix is given as follows:Now from neural network process sum of square errors is easily planned for every weight matrix in . For the ERN structure three layers’ network with one input layer, one hidden or “state” layer, and one “output” layer are used. Each layer will have its own index variable, that is, for output nodes, and for hidden nodes, and for input nodes. In a feed forward network, the input vector is propagated through a weight layer andwhere is the number of inputs and is a bias.
In a simple recurrent network, the input vector is similarly propagated through a weight layer but also combined with the previous state activation through an additional recurrent weight layer, , andwhere is the number of “state” nodes. The output of the network is in both cases determined by the state and a set of output weights and where is an output function. Hence, the error can be calculated as follows:The performances index for the network is given as follows:In the proposed method the average sum of squares is the performance index and it is calculated as follows:where is the output of the network when the th input is presented. The equation is the error for the output layer, is the average performance, is the performance index, and is the number of Cuckoo populations in th iteration. At the end of each epoch the list of average sums of square errors of th iteration SSE can be calculated as follows:The Cuckoo Search is replicating the minimum sum of square error (MSE). The MSE is found when all the inputs are processed for each population of the Cuckoo nest. Hence, the Cuckoo Search nest is calculated as follows:The rest of the average sum of squares is considered as other Cuckoo nests. A new solution for Cuckoo is generated using a Levy flight according to the following equation:Hence, the movement of the other Cuckoo toward can be drawn from (13) as follows:The Cuckoo Search can move from toward through Levy flight; it can be written as where is a small movement of toward . The weights and bias for each layer are then adjusted as follows:The pseudocode for CSERN algorithm is given in Pseudocode 1.

2.2. CSBPERN Algorithm
In the proposed CSBPERN algorithm, each best nest represents a possible solution, that is, the weight space and the corresponding biases for BPERN optimization. The weight optimization problem and the size of the solution represent the quality of the solution. In the first epoch, the best weights and biases are initialized with CS and then those weights are passed on to the BPERN. The weights in BPERN are calculated. In the next cycle CS will update the weights with the best possible solution, and CS will continue searching the best weights until either the last cycle/epoch of the network is reached or the MSE is achieved.
The CS is a population based optimization algorithm, and like other metaheuristic algorithms it starts with a random initial population. In the proposed CSBPERN algorithm, each best nest represents a possible solution, that is, the weight space and the corresponding biases for BPERN optimization. The weight optimization problem and the size of a nest represent the quality of the solution. In the first epoch, the best weights and biases are initialized with CS and then those weights are passed on to the BPERN. The weights in BPERN are calculated. In the next cycle CS will update the weights with the best possible solution and CS will continue searching the best weights until either the last cycle/epoch of the network is reached or the MSE is achieved.
In CSBPERN, the weight value of a matrix is calculated with (1) and (2) as given in Section 2.1. Also, the weight matrix is updated with (3). Now from neural network process sum of square errors (SSE) is easily planned for every weight matrix in . For the BPERN structure three layers’ network with one input layer, one hidden or “state” layer, and one “output” layer are used. In CSBPERN network, the input vector is propagated through a weight layer using (4). In a simple recurrent network, the input vector is not only similarly propagated through a weight layer, but also combined with the previous state activation through an additional recurrentweight layer , as given in (5). The output of the network in both cases is determined by the state and a set of output weights , as given in (6).
According to gradient descent, each weight change in the network should be proportional to the negative gradient of the cost with respect to the specific weights as given inThus, the error for output nodes is calculated as follows: and for the hidden nodes the error is given as follows:Thus the weights and bias are simply changed for the output layer asand for the input layer the weight change is given asAdding a time subscript, the recurrent weights can be modified according to (21) as follows:The network error is calculated for CSBPERN using (7) from Section 2.1. The performance indices for the network are measured with (8) and (9). At the end of each epoch the list of average sums of square errors of th iteration SSE can be calculated with (10). The Cuckoo Search is imitating the minimum SSE, which is found when all the inputs are processed for each population of the Cuckoo nest. Hence, the Cuckoo Search nest is calculated using (11). A new solution for Cuckoo is generated using a Levy flight according to (12). The movement of the other Cuckoo toward is controlled through (13). The Cuckoo Search can move from toward through Levy flight as written in (14). The weights and bias for each layer are then adjusted with (15).
The pseudocode for CSBPERN algorithm is given in Pseudocode 2.

3. Result and Discussion
3.1. Datasets
This study focuses on two criteria for the performances analysis: (a) to get less mean square error (MSE) and (b) to achieve high average classification accuracy on testing data from the benchmark problem. The benchmark datasets were used to validate the accuracy of the proposed algorithms taken from UCI Machine Learning Repository. For the experimentation purpose, the data has to be arranged into training and testing datasets; the algorithms are trained on training set, and their performance accuracy is calculated on the corresponding test set. The workstation used for carrying out the experimentation comes equipped with 2 GHz processor, 2GB of RAM, while the operating system used is Microsoft XP (Service Pack 3). Matlab version R2010a software was used to carry out simulation of the proposed algorithms. For performing simulation, seven classification problems, that is, Thyroid Disease [29], Breast Cancer [30], IRIS [31], Glass [32], Australia Credit Card Approval [33], Pima Indian Diabetes [34], and 7Bit Parity [35, 36] datasets, are selected. The following algorithms are analyzed and simulated on these problems:(a)Conventional back propagation neural network (BPNN) algorithm.(b)Artificial bee colony back propagation (ABCBP) algorithm.(c)Artificial bee colony neural network (ABCNN) algorithm.(d)Artificial bee colony Levenberg Marquardt (ABCLM) algorithm.(e)Cuckoo Search recurrent Elman network (CSERN) algorithm.(f)Cuckoo Search back propagation Elman recurrent network (CSBPERN) algorithm.To compare the performance of proposed algorithms such as CSERN and CSBPERN with conventional BPNN, ABCBP, and ABCLM, the network parameters such as number of hidden layers, node in the hidden layer, the value for the weight initialization, and value of learning rate are used similarly. Three layers’ NN is used for training and testing of the model. For all problems the NN structure has single hidden layer consisting of five nodes while the input and output layers nodes vary according to the data given. From the input layer to hidden layer and from hidden to output layer logsigmoid activation function is used as the transform function.
Although the simple Elman neural network (SENN) used the pure line as the activation function for the output layer, learning rate of 0.4 is selected for the entire test. All algorithms were tested using the initial weights and biases are randomly initialized in range ; for each problem, one trial is limited to 1000 epochs. A total of 20 trials are run for each dataset to validate these algorithms. For each trial the network results are stored in the result file. Mean square error (MSE), standard deviation of error mean square (SD), the number of epochs, and the average accuracy are recorded in separate file for each trial for selected classification problem.
3.2. Wisconsin Breast Cancer Classification Problem
The Breast Cancer dataset was created by William H. Wolberg. This dataset deals with the breast tumor tissue samples collected from different patients. The cancer analysis are performed to classify the tumor as benign or malignant. This dataset consists of 9 inputs and 2 outputs with 699 instances. The input attributes are, for instance, the clump thickness, the uniformity of cell size, the uniformity of cell shape, the amount of marginal adhesion, the single epithelial cell size, frequency of bare nuclei, bland chromatin, normal nucleoli, and mitoses. The selected network architecture used for the Breast Cancer Classification Problem consists of 9 input nodes, 5 hidden nodes, and 2 output nodes.
Table 1 illustrates that the proposed CSERN and CSBPERN algorithms show better performance than BPNN, ABCBP, and ABCLM algorithms. The proposed algorithms achieve small MSE (, 0.00072) and SD (, 0.0004) with 99.95 and 97.37 percent accuracy, respectively. Meanwhile, the other algorithms such as BPNN, ABCBP, and ABCLM fall behind the proposed algorithms with large MSE (0.271, 0.014, 0.184, and 0.013) and SD (0.017, 0.0002, 0.459, and 0.001) and lower accuracy. Similarly, Figure 1 shows the performances of MSE convergence for the used algorithms. From the simulation results, it can be easier to understand that the proposed algorithms show better performance than the BPNN, ABCBP, and ABCLM algorithms in terms of MSE, SD, and accuracy.

3.3. IRIS Classification Problem
The Iris flower multivariate dataset was introduced by Fisher to demonstrate the discriminant analysis in pattern recognition and machine learning to find a linear feature sets that either merge or separates two or more classes in the classification process. This is maybe the best famous database to be found in the pattern recognition literature. There were 150 instances, 4 inputs, and 3 outputs in this dataset. The classification of Iris dataset involves the data of petal width, petal length, sepal length, and sepal width into three classes of species, which consist of Iris setosa, Iris versicolor, and Iris virginica. The selected network structure for Iris classification dataset is 453, which consists of 4 input nodes, 5 hidden nodes, and 3 output nodes. In total 75 instances are used for training dataset and the rest for testing dataset.
Table 2 shows the comparison between performances of the proposed CSERN and CSBPERN algorithms with the BPNN, ABCNN, ABCBP, and ABCLM algorithms in terms of MSE, SD, and accuracy. From Table 2 it is clear that the proposed algorithms have better performances by achieving less MSE and SD and higher accuracy than that of the BPNN, ABCNN, ABCBP, and ABCLM algorithms. Figure 2 illustrates the MSE convergences performances of the algorithms. From Figure 2, it is clear that the proposed algorithms show higher performances than the other algorithms in terms of MSE, SD, and accuracy.

3.4. Thyroid Classification Problem
This dataset is taken for UCI Learning Repository, created based on the “Thyroid Disease” problem. This dataset consists of 21 inputs, 3 outputs, and 7200 patterns. Each case contains 21 attributes, which can be allocated to any of the three classes, which were hyper, hypo, and normal function of thyroid gland, based on the patient query data and patient examination data. The selected network architecture for Thyroid classification dataset is 2153, which consists of 21 input nodes, 5 hidden nodes, and 3 output nodes.
Table 3 summarizes the comparison of performance of the all algorithms in terms of MSE, SD, and accuracy. From the table, it is easy to understand that the proposed CSERN and CSBPERN algorithms have small MSE and SD and high accuracy, while the BPNN, BACNN, ABCBP, and ABCLM still have large MSE and SD with low accuracy. Figure 3 also shows the MSE convergence performances of the proposed and compared algorithms. From the simulation results, it is realized that the proposed algorithms have better performances in terms of MSE, SD, and accuracy than that of the other compared algorithms.

3.5. Diabetes Classification Problem
This dataset consists of 768 examples, 8 inputs, and 2 outputs and consists of all the information of the chemical change in a female body whose disparity can cause diabetes. The feed forward network topology for this network is set to 852. The target error for the Diabetes Classification Problem is set to 0.00001 and the maximum number of epochs is 1000. It is evident from Table 4 that the proposed CSERN and CSBPERN algorithms show better performance than the BPNN, ABCNN, ABCBP, and ABCLM algorithms in terms of MSE, SD, and accuracy. From Table 4, it is clear that the proposed algorithms have MSE of , 0.039, and SD of , 0.003, and achieved 99.96, 89.53 percent of accuracy. Meanwhile, the other algorithms such as BPNN, ABCNN, ABCBP, and ABCLM have MSE of 0.26, 0.131, 0.2, and 0.14, SD of 0.026, 0.021, 0.002, and 0.033, and accuracy of 86.96, 68.09, 88.16, and 56.09 percent, which is quite lower than the proposed algorithms. Figure 4 describes the MSE convergence performance of the used algorithms for Diabetes Classification Problem.

3.6. Glass Classification Problem
The Glass dataset is used for separating glass splinters in criminal investigation into six classes taken from UCI Repository or Machine Learning database which consists of float processed or nonfloat processed building windows, vehicle windows, containers, tableware, or head lamp. This dataset is made of 9 inputs and 6 outputs which consist of 214 examples. The selected feed forward network architecture for this network is set to 956.
Table 5 summarises the comparison performances of the algorithms. From the table it is clear to understand that the proposed algorithms outperform the other algorithms. The proposed CSERN and CSBPERN algorithms achieve small MSE of , 0.0005, SD of , 0.0002, and high accuracy of 99.96 and 97.81 percent. Meanwhile, the BPNN, ABCNN, ABCBP, and ABCLM algorithms have large MSE of 0.36, , 0.025, and 0.005, SD of 0.048, 0.003, 0.002, and 0.009, and accuracy of 94.04, 91.93, 94.09, and 93.96 percent, which is quite lower than the proposed algorithms. Figure 5 shows the convergence performance of the algorithms for MSE via epochs. From the overall results, it is clear that the proposed algorithms have better performances than the other compared algorithms in case of MSE, SD, and accuracy.

3.7. Australian Credit Card Approval Classification Problem
This dataset is taken from UCI Machine Learning Repository, which contains all the details on the subject of card and application. The Australian Credit Card dataset consists of 690 instances, 51 inputs, and 2 outputs. Each example in this dataset represented a real detail about credit card application, whether the bank or similar institute generated the credit card or not. All attributes names and value have been changed to meaningless symbols to defend the privacy of the data. The selected architecture of NN is 5152.
Table 6 gives the detailed result of the proposed algorithms with the compared algorithms which shows that the proposed CSERN and CSBPERN algorithms achieve high accuracy of 99.92, 85.75, with MSE of , 0.021, and SD of , 0.0091. Meanwhile, the other algorithms, that is, BPNN, ABCNN, ABCBP, and ABCLM, have accuracy of 88.89, 76.79, 77.78, and 89.99, SD of 0.015, 0.012, 0.005, and 0.04, and MSE of 0.271, 0.13, 0.055, and 0.17, which is quite larger than the proposed algorithms. Similarly, Figure 6 shows the MSE convergence performances of the algorithms for the Australian Credit Card Approval Classification Problem. From these figures it can be easy to understand that the proposed algorithms have better result than that of the other compared algorithms.

3.8. SevenBit Parity Classification Problem
The parity problem is one of the most popular initial testing tasks and is a very demanding classification problem for neural network. In parity problem if given input vectors contain an odd number of one, the corresponding target value is 1; otherwise the target value is 0. The Nbit parity training set consists of 2N training pairs, with each training pair comprising an Nlength input vector and a single binary target value. The 2N input vector represents all possible combinations of the binary numbers. The selected architecture of NN is 751. Table 7 gives the detailed summary of the algorithms in terms of MSE, SD, and accuracy. From the table, it is clear that the proposed CSERN and CSBPERN algorithms have better performance than BPNN, ABCNN, ABCBP, and ABCLM algorithms in terms of MSE, SD, and accuracy. The proposed algorithms have MSE of , 0.052, and SD of , 0.005, and achieve 99.98 and 89.28 percent of accuracy. Meanwhile, the other BPNN, ABCNN, ABCBP, and ABCLM algorithms converge with MSE of 0.26, 0.10, 0.12, and 0.08, SD of 0.014, 0.015, 0.008, and 0.012, and 85.12, 67.85, 82.12, and 69.13 percent of accuracy, which is quite lower than that of the proposed algorithms. Finally, Figure 7 represents the MSE convergence performance of the algorithms for the 7Bit Parity Classification Problem.

4. Conclusion
This paper has studied the data classification problem using the dynamic behavior of RNN trained by nature inspired metaheuristic Cuckoo Search algorithm which provides derivativefree solution to optimize complex problems. This paper has also proposed a new metaheuristic Cuckoo Search based on ERN and BPERN algorithms in order to achieve fast convergence rate and to avoid local minima problem in conventional RNN. The proposed algorithms called CSERN and CSBPERN are unlike the existing algorithms; CSERN and CSBPERN imitate animal behaviour and are valuable for global convergence. The convergence behaviour and performance of the proposed CSERN and CSBPERN are simulated on some selected benchmark classification problems. Specifically, 7Bit Parity and some selected UCI benchmark classification datasets are used for training and testing the network. The performances of the proposed models are compared with artificial bee colony using BPNN algorithm and other hybrid variants. The simulation results show that the proposed CSERN and BPERN algorithms are far better than the baseline algorithms in terms of convergence rate. Furthermore, CSERN and BPERN achieved higher accuracy and less MSE on all the designated datasets.
Summary of Acronyms, Mathematical Symbols, and Their Meanings Used
ABP:  Adaptive back propagation 
ANN:  Artificial neural network 
ACPSO:  Adaptive chaotic particle swarm optimization 
BP:  Back propagation 
BPERN:  Back propagation Elman recurrent network 
BPTT:  Back propagation through time 
CS:  Cuckoo Search 
CSERN:  Cuckoo Search Elman recurrent network 
CSBPERN:  Cuckoo Search back propagation Elman recurrent network 
DO:  Dissolved oxygen 
ERN:  Elman recurrent network 
ERNPSO:  Elman recurrent network particle swarm optimization 
FCRNN:  Fully connected recurrent neural network 
MBP:  Momentum back propagation 
PCA:  Principle component analysis 
PSO:  Particle swarm optimization 
PSOBP:  Particle swarm optimization back propagation 
RPROP:  Resilient back propagation 
RNN:  Recurrent neural network 
SAR:  Synthetic aperture radar 
SSE:  Sum of square errors 
TN:  Total nitrogen 
TP:  Total phosphorus 
:  Weight value at each layer in the feed forward network 
:  Weight value at each addition layer in the recurrent feedback 
:  Bias values for the network 
:  Total weight matrix for the network 
:  Output function for the hidden layer 
:  Output function for the output layer 
:  Net activation function for the hidden layer 
:  Net input activation function for the output layer 
:  Actual output 
:  Predicted output 
:  Average performance 
:  Performance index 
:  Random function for generating random variables 
:  Error at the output layer 
:  Cuckoo nest at position 
:  Cuckoo nest at position 
:  New solution for Cuckoo 
:  Stepwise multiplication 
:  Small movement of Cuckoo towards 
:  Error for output nodes 
:  Error for hidden nodes 
:  Change in weights for the layers 
:  Change in bias weights for the layers 
:  Change in weights for the recurrent layer. 
Conflict of Interests
There is no conflict of interests reported regarding the publication of this paper.
Authors’ Contribution
All authors equally contributed to this paper.
Acknowledgments
This work is supported by Fundamental Research Grant Scheme (FRGS) Vote no. 1236 from MoHE Malaysia and Program Rakan Penyelidikan University of Malaya (PRPUM) Grant Vote no. CG0632013.
References
 Y. Zhang and L. Wu, “Stock market prediction of S&P 500 via combination of improved BCO approach and BP neural network,” Expert Systems with Applications, vol. 36, no. 5, pp. 8849–8854, 2009. View at: Publisher Site  Google Scholar
 L. Fauseeti, Fundamental of Neural Network Architecture, Algorithm and Application, Prentice Hall, Englewood Cliffs, NJ, USA, 1994.
 S. Haykin, Neural Network: A Comprehensive Foundation, Macmillan, New York, NY, USA, 1994.
 M. H. Hassoun, Foundamental of Artificial Neural Network, Massachusetts Institute of Technology Press, London, UK, 1995.
 P. Xiao, G. K. Venayagamoorthy, and K. A. Corzine, “Combined training of recurrent neural networks with particle swarm optimization and backpropagation algorithms for impedance identification,” in Proceedings of the IEEE Swarm Intelligence Symposium (SIS '07), pp. 9–15, April 2007. View at: Publisher Site  Google Scholar
 C. L. Giles, S. Lawrence, and A. C. Tsoi, “Rule inference for financial prediction using recurrent neural networks,” in Proceedings of the IEEE/IAFE Conference on Computational Intelligence for Financial Engineering, pp. 253–259, IEEE, March 1997. View at: Google Scholar
 S. Li, D. C. Wunsch, E. O'Hair, and M. G. Giesselmann, “Wind turbine power estimation by neural networks with Kalman filter training on a SIMD parallel machine,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '99), pp. 3430–3434, Washington, DC , USA, July 1999. View at: Publisher Site  Google Scholar
 N. M. Nawi, A. Khan, and M. Z. Rehman, “CSBPRNN: a new hybridization technique using cuckoo search to train back propagation recurrent neural network,” in Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng2013), vol. 285 of Lecture Notes in Electrical Engineering, pp. 111–118, 2014. View at: Publisher Site  Google Scholar
 P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990. View at: Publisher Site  Google Scholar
 P. Coulibaly, F. Anctil, and J. Rousselle, “Realtime shortterm natural water inflows forecasting using recurrent neural networks,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '99), vol. 6, pp. 3802–3805, IEEE Press, July 1999. View at: Google Scholar
 N. M. Nawi, R. S. Ransing, and N. A. Hamid, “BPGDAG: a new improvement of backpropagation neural network learning algorithms with adaptive gain,” Journal of Science and Technology, vol. 2, pp. 83–102, 2011. View at: Google Scholar
 A. Khan, N. M. Nawi, and M. Z. Rehman, “A new backpropagation neural network optimized with cuckoo search algorithm,” in Computational Science and Its Applications—ICCSA 2013: Proceedings of the 13th International Conference, Ho Chi Minh City, Vietnam, June 2427, 2013, Part I, vol. 7971 of Lecture Notes in Computer Science, pp. 413–426, Springer, Berlin, Germany, 2013. View at: Publisher Site  Google Scholar
 A. Khan, N. M. Nawi, and M. Z. Rehman, “A new cuckoo search based LevenbergMarquardt (CSLM) algorithm,” in Computational Science and Its Applications—ICCSA 2013: Proceedings of the 13th International Conference, Ho Chi Minh City, Vietnam, June 2427, 2013, Part I, vol. 7971 of Lecture Notes in Computer Science, pp. 438–451, Springer, Berlin, Germany, 2013. View at: Publisher Site  Google Scholar
 N. M. Nawi, A. Khan, and M. Z. Rehman, “A new Levenberg Marquardt based back propagation algorithm trained with cuckoo search,” Procedia Technology, vol. 11, pp. 18–23, 2013. View at: Publisher Site  Google Scholar
 N. M. Nawi, M. Z. Rehman, and A. Khan, “A new bat based backpropagation (BATBP) algorithm,” in Advances in Systems Science: Proceedings of the International Conference on Systems Science 2013 (ICSS 2013), vol. 240 of Advances in Intelligent Systems and Computing, pp. 395–404, Springer, 2014. View at: Publisher Site  Google Scholar
 N. M. Nawi, R. Ghazali, and M. N. M. Salleh, “The development of improved backpropagation neural networks algorithm for predicting patients with heart disease,” in Information Computing and Applications: Proceedings of the 1st International Conference, ICICA 2010, Tangshan, China, October 15–18, 2010, vol. 6377 of Lecture Notes in Computer Science, pp. 317–324, Springer, Berlin, Germany, 2010. View at: Publisher Site  Google Scholar
 Y. Zhang, S. Wang, G. Ji, and P. Phillips, “Fruit classification using computer vision and feedforward neural network,” Journal of Food Engineering, vol. 143, pp. 167–177, 2014. View at: Publisher Site  Google Scholar
 T. G. Barbounis, J. B. Theocharis, M. C. Alexiadis, and P. S. Dokopoulos, “Longterm wind speed and power forecasting using local recurrent neural network models,” IEEE Transactions on Energy Conversion, vol. 21, no. 1, pp. 273–284, 2006. View at: Publisher Site  Google Scholar
 E. D. Übeyli, “Recurrent neural networks employing Lyapunov exponents for analysis of doppler ultrasound signals,” Expert Systems with Applications, vol. 34, no. 4, pp. 2538–2544, 2008. View at: Publisher Site  Google Scholar
 A. M. Ahmad, S. Ismail, and D. F. Samaon, “Recurrent neural network with backpropagation through time for speech recognition,” in Proceedings of the IEEE International Symposium on Communications and Information Technologies: Smart InfoMedia Systems (ISCIT '04), pp. 98–102, October 2004. View at: Google Scholar
 Y. Zhang and L. Wu, “Crop classification by forward neural network with adaptive chaotic particle swarm optimization,” Sensors, vol. 11, no. 5, pp. 4721–4743, 2011. View at: Publisher Site  Google Scholar
 M. F. A. Aziz, H. N. A. Hamed, and S. M. H. Shamsuddin, “Augmentation of Elman Recurrent Network learning with particle swarm optimization,” in Proceedings of the 2nd Asia International Conference on Modelling & Simulation (AMS '08), pp. 625–630, May 2008. View at: Publisher Site  Google Scholar
 F. Cheng and H. Shen, “An improved recurrent neural network for radio propagation loss prediction,” in Proceedings of the International Conference on Intelligent Computation Technology and Automation (ICICTA '10), pp. 579–582, May 2010. View at: Publisher Site  Google Scholar
 H. Wang, Y. Gao, Z. Xu, and W. Xu, “An recurrent neural network application to forecasting the quality of water diversion in the water source of Lake Taihu,” in Proceedings of the International Conference on Remote Sensing, Environment and Transportation Engineering (RSETE '11), pp. 984–988, June 2011. View at: Publisher Site  Google Scholar
 Y. Tanoto, W. Ongsakul, and C. O. P. Marpaung, “LevenbergMarquardt recurrent networks for longterm electricity peak load forecasting,” Telkomnika (Telecommunication, Computing, Electronics and Control), vol. 9, pp. 257–266, 2013. View at: Google Scholar
 X.S. Yang and S. Deb, “Cuckoo search via Lévy flights,” in Proceedings of the World Congress on Nature and Biologically Inspired Computing (NABIC '09), pp. 210–214, Coimbatore, India, December 2009. View at: Publisher Site  Google Scholar
 X.S. Yang and S. Deb, “Engineering optimisation by cuckoo search,” International Journal of Mathematical Modelling and Numerical Optimisation, vol. 1, no. 4, pp. 330–343, 2010. View at: Publisher Site  Google Scholar
 M. Tuba, M. Subotic, and N. Stanarevic, “Modified cuckoo search algorithm for unconstrained optimization problems,” in Proceedings of the 5th European Computing Conference (ECC '11), pp. 263–268, April 2011. View at: Google Scholar
 J. R. Quinlan, P. J. Compton, K. A. Horn, and L. Lazarus, “Inductive knowledge acquisition: a case study,” in Proceedings of the 2nd Australian Conference on Applications of Expert Systems, pp. 137–156, 1986. View at: Google Scholar
 W. H. Wolberg and O. L. Mangasarian, “Multisurface method of pattern separation for medical diagnosis applied to breast cytology,” Proceedings of the National Academy of Sciences of the United States of America, vol. 87, no. 23, pp. 9193–9196, 1990. View at: Publisher Site  Google Scholar
 R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179–188, 1936. View at: Publisher Site  Google Scholar
 I. W. Evett and E. J. Spiehler, “Rule induction in forensic science,” in Knowledge Based Systems in Government, pp. 107–118, 1987. View at: Google Scholar
 J. R. Quinlan, “Simplifying decision trees,” International Journal of ManMachine Studies, vol. 27, no. 3, pp. 221–234, 1987. View at: Publisher Site  Google Scholar
 R. A. Jacobs, “Increased rates of convergence through learning rate adaptation,” Neural Networks, vol. 1, no. 4, pp. 295–307, 1988. View at: Google Scholar
 7Bit Parity Training data, http://homepages.cae.wisc.edu/~ece539/data/parity7r.
 7Bit Parit Testing data, http://homepages.cae.wisc.edu/~ece539/data/parity7t.
Copyright
Copyright © 2015 Nazri Mohd Nawi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.