#### Abstract

Multistep ahead prediction of a chaotic time series is a difficult task that has attracted increasing interest in the recent years. The interest in this work is the development of nonlinear neural network models for the purpose of building multistep chaotic time series prediction. In the literature there is a wide range of different approaches but their success depends on the predicting performance of the individual methods. Also the most popular neural models are based on the statistical and traditional feed forward neural networks. But it is seen that this kind of neural model may present some disadvantages when long-term prediction is required. In this paper focused time-lagged recurrent neural network (FTLRNN) model with gamma memory is developed for different prediction horizons. It is observed that this predictor performs remarkably well for short-term predictions as well as medium-term predictions. For coupled partial differential equations generated chaotic time series such as Mackey Glass and Duffing, FTLRNN-based predictor performs consistently well for different depths of predictions ranging from short term to long term, with only slight deterioration after k is increased beyond 50. For real-world highly complex and nonstationary time series like Sunspots and Laser, though the proposed predictor does perform reasonably for short term and medium-term predictions, its prediction ability drops for long term ahead prediction. However, still this is the best possible prediction results considering the facts that these are nonstationary time series. As a matter of fact, no other NN configuration can match the performance of FTLRNN model. The authors experimented the performance of this FTLRNN model on predicting the dynamic behavior of typical Chaotic Mackey-Glass time series, Duffing time series, and two real-time chaotic time series such as monthly sunspots and laser. Static multi layer perceptron (MLP) model is also attempted and compared against the proposed model on the performance measures like mean squared error (MSE), Normalized mean squared error (NMSE), and Correlation Coefficient (). The standard back-propagation algorithm with momentum term has been used for both the models.

#### 1. Introduction

Predicting the future which has been the goal of many research activities in the last century is an important problem for human, arising from the fear of unknown phenomenon and calamities all around the infinitely large world with its many variables showing highly nonlinear and chaotic behavior. Chaotic time series have many applications in various fields of science, for example, astrophysics, fluid mechanics, medicine, stock market, weather, and are also useful in engineering such as speech coding [1], radar modeling of electromagnetic wave propagation and scattering [2]. The chaotic interconnected complex dynamical systems in nature are characterized by high sensitivity to initial conditions which results in long-term unpredictability. The dynamical reconstruction seems to be extremely difficult, even in developing era of super computers, not because of computational complexity, but due to inaccessibility of perfect inputs and state variables. Many different methods have been developed to deal with chaotic time series prediction. Among them neural networks occupy an important place being adequate model of the nonlinearity and nonstationarity.

Inspired from the structure of the human brain and the way it is supposed to operate, neural networks are parallel computational systems capable of solving number of complex problems in such a diverse areas as pattern recognition, computer vision, robotics, control and medical diagnosis, to name just few [3]. Neural networks are an effective tool to perform any nonlinear input output mappings and prediction problem [4]. Predicting a chaotic time series using a neural network is of particular interest [5]. Not only it is an efficient method to reconstruct a dynamical system from an observed time series, but it also has many applications in engineering problems like radar noise cancellation [6], radar [7] demodulation of chaotic secure communication systems [8], and spread spectrum/code division multiple access (CDMA) systems [9, 10]. It is already established that, under appropriate conditions, they are able to uniformly approximate any complex continuous function to any desired degree of accuracy [11]. Later, similar results were published independently in [12]. It is these fundamental results that allow us to employ neural network in time series prediction. Since neural networks’ models do not need any a priori assumption about the underlying statistical distribution of the series to be predicted, they are commonly classified as “data-driven” approach, to contrast them with the “model-driven” statistical methods. Neural networks that are the instruments in broad sense can learn the complex nonlinear mappings from the set of observations [13]. The static MLP network has gained an immense popularity from numerous practical application published over the past decade, there seems to be substantial evidence that multilayer perceptron indeed possesses an impressive ability [14]. There have been some theoretical results that try to explain the reasons for the success in [15, 16]. Most applications are based on feed-forward neural networks, such as the back-propagation (BP) network [17] and Radial basis function (RBF) network [18, 19]. It has also been shown that modeling capacity of feed-forward neural networks can be improved if the iteration of the network is incorporated into the learning process [20].

Several methods with different performance measures have been attempted in the literature to predict the chaotic time series. It is has been predicted for Mackey-Glass chaotic time series for short-term ahead prediction with a percentage error of 20% [21]. A new class of wavelet network was developed with a standard deviation of 0.0029 for short-term ahead prediction of Mackey-Glass chaotic time series and annual sunspots for 1 step ahead prediction [22]. By using recurrent predictor neural network for monthly sunspots chaotic time series for 6 months ahead prediction with (Prediction accuracy) equals 0.992 and ( Root mean squared error) equals 4.419, for 10 months ahead prediction with of 0.980 and of 7.050, for 15 months ahead prediction of equals 0.9222 and equals 13.658, and 20 months ahead prediction with of 0.866 and of 16.79323 [23]. Also by using radial basis function with orthogonal least square Fuzzy model for monthly sunspots with prediction error to and for Mackey-Glass chaotic time series with of 0.0015 [24]. It is also attempted with Hybrid network for Mackey-Glass time series with iterative prediction and Normalized Mean Square Error NMSE of 0.053 [25]. By using Elman neural network for yearly sunspots for 1 year ahead prediction with equals 30.2931 and prediction accuracy of 0.9732 [26].

From the scrupulous review of the related research work, it is noticed that no simple model is available for long-term prediction of chaotic time series so far. It is necessary to develop a simple model that is able to perform short-, medium- and long-term predictions of chaotic time series with reasonable accuracy. In view of the remarkable ability of neural network in learning from the instances, it can prove as a potential candidate with a view to design a versatile predictor (forecaster) for the chaotic time series. Hence in this paper a novel focused time-lagged recurrent neural network model with gamma memory filter is proposed as an intelligent tool for predicting the two non linear differential equation Mackey-Glass and Duffing time series and two real-time monthly sunspots and Laser chaotic time series not only for short-term but long-term prediction also because they acquire temporal processing ability through the realization of short-term memory and information about the preceding units, which is important when the long-term prediction is required. The Mackey-Glass chaotic time series was first proposed as a model for white blood cell production, the Duffing chaotic time series describes a specific nonlinear circuit or the hardening spring effect observed in many mechanical problems, monthly sunspots number is a good measure of solar activity which has a period of 11 years, so-called solar cycle. The solar activity has a measure effect on earth, climate, space weather, satellites, and space missions, and a highly nonlinear laser time series. These chaotic time series are the good benchmark for the proposed model. The various parameters like number of hidden layers, number of processing elements in the hidden layer, step size, the different learning rules, the various transfer functions like tanh, sigmoid, linear-tan-h, and linear sigmoid, different error norms , and the different memories TDNN, Laguarre and gamma filter, and different combination of training and testing samples are exhaustively varied and experimented for obtaining the optimal values of performance measures as mentioned in the flow chart. The obtained results indicate the superior performance of estimated dynamic FTLRNN-based model with gamma memory over the MLPNN in various performance measures such as Mean Square Error (MSE), Normalized Mean Square Error (NMSE), and correlation coefficient (*) * on testing as well as training data set. The proposed network is attempted for training up to 20 000 numbers of epochs for obtaining the improved values of performance measures. The experimentation process is demonstrated in flow chart of Figure 1. This paper is organized as follows in Section 2 the static MLP model is presented, and the learning procedure is explained. In Section 3 the proposed FTLRNN model is explained. In Section 4 the performance measures and their importance are discussed. Section 5 explains about the significance of the benchmark chaotic time series. Section 6 explains the experimental procedure and analysis. Section 7 summarizes the evaluation results and analyses for the proposed model. Finally concluding remarks on empirical findings are provided in Section 8.

#### 2. Static NN-Based Model

Static Neural networks typically use Multilayer perceptron MLP as a backbone. They are layered feed-forward networks typically trained with static back propagation. MLP solid-based model has a solid foundation [27, 28]. The main reason for this is its ability to model simple as well as complex functional relationships. This has been proven through a number of practical applications [29]. In [11] it is shown that all continuous functions can be approximated to any desired accuracy, in terms of the uniform norm, with a network of one hidden layer of sigmoid or (hyperbolic tangent) hidden units and a layer of linear or tanh output unit to include in the hidden layer. The paper does not explain how many units to include in the hidden layer. This is discussed in [30], and a significant result is derived approximation capabilities of two layer perception networks when the function to be approximated shows certain smoothness. The biggest advantage of using MLP NN for approximation of mapping from input to the output of the system resides in its simplicity and the fact that it is well suited for online implementation. The objective of training is then to determine a mapping from a set of training data to the set of possible weights so that the network will produce predictions , which in some sense are close to the true outputs . The prediction error approach is based on the introduction of measure of closeness in terms of mean square error (MSE) criteria:

The weights are then found as

by some kind of iterative minimization scheme: where specifies the current iterate (number “’’), is the search direction, and is the step size.

When NN has been trained, the next step is to evaluate it. This is done by standard method in statistics called independent validation [31]. It is never a good idea to assess the generalization properties of an NN-based on training data alone. This method divides the available data sets into two sets, namely, training data set and testing data set. The training data set are next divided into two partitions: the first partition is used to update the weights in the network and the second partition is used to assess (or cross-validate) the training performance. The testing data set are then used to assess how the network has generalized. The learning and generalization ability of the estimated NN-based model is assessed on the basis of certain performance measures such as MSE, NMSE, and the regression ability of the NN by visual inspection of the correlation coefficient characteristics for different outputs of system under study.

#### 3. FTLRNN Model

Time-lagged recurrent networks (TLRNs) are MLPs extended with short-term memory structures. Here, a “static” NN (e.g., MLP) is augmented with dynamic properties [14]. This, in turn, makes the network reactive to the temporal structure of information bearing signals. For an NN to be dynamic, it must be given memory. This memories may be classified into “short-term” and “long-term” memories. Long-term memory is built into an NN through supervised learning, whereby the information content of the training data set is stored (in part or in full) in the synaptic weights of the network [32]. However, if the task at hand has a temporal dimension, some form of “short-term” memory is needed to make the network dynamic. One simple way of building short-term memory into the structure of an NN is through the use of time delays, which can be applied at the input layer of the network (focused). A short-term memory structure transforms a sequence of samples into a point in the reconstruction space [33]. This memory structure is incorporated inside the learning machine. This means that instead of using a window over the input data, processing elements (PEs) created are dedicated to store either the history of the input signal or the PE activations.

The input PEs of an MLP are replaced with a tap delay line, which is followed by the MLPNN. This topology is called the focused time-delay NN (TDNN). The focused topology only includes the memory Kernels connected to the input layer. This way, only the past of the input is remembered. The delay line of the focused TDNN stores the past samples of the input. The combination of the tap delay line and the weights that connect the taps to the PEs of the first hidden layer is simply linear combiners followed by a static nonlinearity. Typically, a gamma short-term memory mechanism is combined with nonlinear PEs in restricted topologies called focused. Basically, the first layer of the focused TDNN is a filtering layer, with as many adaptive filters as PEs in the first hidden layer. The outputs of the linear combiners are passed through a nonlinearity (of the hidden-layer PE) and are then further processed by the subsequent layers of the MLP for system identification, where the goal is to find the weights that produce a network output that best matches the present output of the system by combining the information of the present and a predefined number of past samples (given by the size of the tap delay line) [32]. Size of the memory layer depends on the number of past samples that are needed to describe the input characteristics in time. This number depends on the characteristics of the input and the task. This focused TDNN can still be trained with static back propagation, provided that a desired signal is available at each time step. This is because the tap delay line at the input layer does not have any free parameters. So the only adaptive parameters are in the static feed-forward path.

The memory PE receives in general many inputs and produces multiple outputs which are delayed versions of the combined input,

where is a delay function.

These short-term memory structures can be studied by linear adaptive filter theory if is a linear operator. It is important to emphasize that the memory PE is a short-term memory mechanism, to make clear the distinction from the network weights, which represent the long-term memory of the network.

There are basically two types of memory mechanisms: memory by delay and memory by feedback. We seek to find the most general linear delay operator (special case of the Auto Regressive Moving Average model), where the memory traces would be recursively computed from the previous memory trace . This memory PE is the generalized feed-forward memory PE. It can be shown that the defining relationship for the generalized feed-forward memory PE is mentioned

where, * is the convolution operation, is a causal time function, and is the tap index. Since this is a recursive equation, should be assigned a value independently. This relationship means that the next memory trace is constructed from the previous memory trace by convolution with the same function the memory Kernel yet unspecified. Different choices of will provide different choices for the projection space axes. When we apply the input to the generalized feed-forward memory PE, the tap signals become

the convolution of with the memory Kernel. For , we have

where may be specified separately. The projection of the input signal is obtained by linearly weighting the tap signals according to

The most obvious choice for the basis is to use the past samples of the input signal directly, that is, the tap signal becomes . This choice corresponds to

In this case is also a delta function (delta function operator used in the tap delay line). The memory depth is strictly controlled by , that is, the memory traces store the past samples of the input. The time delay NN uses exactly this choice of basis.

The gamma memory PE attenuates the signals at each tap because it is a cascade of leaky integrators with the same time constant gamma model. The gamma memory PE is a special case of the generalized feed-forward memory PE, where

and The gamma memory is basically a cascade of low-pass filters with the same time constant . The over all impulse response of the gamma memory is

where (:) is a binomial coefficient defined by For integer values of and , the overall impulse response for varying represents a discrete version of the integrand of the gamma function, hence the name of the memory.

The gamma memory PE has a multiple pole that can be adaptively moved along the real axis, that is, the gamma memory can implement only low-pass or high-pass transfer functions. The high-pass transfer function creates an extra ability to model fast-moving signals by alternating the signs of the samples in the gamma PE (the impulse response for has alternating signs). The depth in samples parameters () is used to compute the number of taps () contained within the memory structure () of the network.

#### 4. Performance Measures

Three different types of statistical performance evaluation criteria were employed to evaluate the performance of these models developed in this paper. These are as follows.

MSE: the mean square error is given by: where of output PEs, of exemplars in the data set, output for exemplar at PEj, and output for exemplar at PEj.

*NMSE (normalized mean square error)*

The normalized mean square error is defined by the following formula, where of output PEs, of exemplars in data set,
square error, output for exemplar at PEj ( jth element of PEs) Correlation Coefficient . The mean square error (MSE) can be used to determine how well the network output fits the desired output but it does not necessarily reflect whether the two sets of data move in the same direction. For instance by simply scaling the network output, we can change the MSE without changing the directionality of the data. The correlation coefficient solves this problem. By definition, the correlation coefficient between a network output x and a desired output is
The correlation coefficient is confined to the range

#### 5. Benchmark Chaotic Time Series

In science, chaos is used as a synonym for irregular behavior, whose long-term prediction is essentially unpredictable. Chaotic differential equations exhibit not only irregular behavior but they are also unstable with respect to small perturbations of their initial condition. Consequently it is difficult to forecast the future of time series based on chaotic differential equations; they are the good benchmark for a neural network design algorithm.

##### 5.1. Mackey-Glass Time Series

The Mackey-Glass equation is time delay differential equation, which was first proposed as model of white blood cells production [34]. It is often used in practice as a benchmark set because of its nonlinear chaotic characteristics. Chaotic time series do not converge or diverge in time, and their trajectories are highly sensitive to initial conditions. Data are generated by using fourth order Runge-Kutta method .The equation is given by:

where are constant coefficients and is time delay, the coefficient we are selected: The Mackey-Glass time series is shown in Figure 2.

##### 5.2. Duffing Time Series

Duffing time series describes a specific nonlinear circuit or the hardening spring effect observed in many mechanical problems [35]. The Duffing equation is time delay differential equation which is given as

where driving force position 1.0, damping constant (), frequency Delay The chaotic time series is as shown in Figure 3.

##### 5.3. Sunspot Time Series

A sunspot number is a good measure of solar activity which has a period of 11 years, so-called solar cycle. The solar activity has a measure effect on earth, climate, space weather, satellites, and space missions, thus is an important value to be predicted. But due to intrinsic complexity of time behavior and the lack of a quantitative theoretical model, the prediction of solar cycle is very difficult. Many prediction techniques have been examined on the yearly sunspots number time series as an indicator of solar activity. However, in more recent studies the international monthly sunspot time series, which has a better time resolution and accuracy, has been used. In particular, a nonlinear dynamics approach has been developed in [36], and prediction results are compared between several prediction techniques from both statistical and physical classes. There has been a lot of work on controversial issue of nonlinear characteristics of the solar activity [36–39]; several recent analyses have provided evidence for low-dimensional deterministic nonlinear chaotic behavior of the monthly smoothed sunspot time series [36–38] which has intense. The data considered the monthly variations from January 1749 to December 2006. The total samples are 3096 considered and demonstrated in Figure 4. The series is normalized in the range of to The monthly smoothed sunspot number time series is downloaded from the SIDC (World Data Center for the Sunspot Index) [40].

##### 5.4. Laser Time Series

The laser data were recorded from a Far Infrared-laser in a chaotic state. The measurements were made on an 81.5-micron 14 cw (FIR) laser, pumped optically by the (16) line of an laser, via the vibrational aQ(8, 7) transition. The basic laser setup can be found in [41]. The intensity data was recorded by an Le Croy oscilloscope. It was made available worldwide during a time series prediction competition organized by Santa Fe Institute and a highly nonlinear data set, since then, it has been used in benchmark studies. The time series has 1000 samples points which has been rescaled to the range of . The time series is shown in Figure 5.

#### 6. Experimental Results

The choice of the number of hidden layers and the number of hidden units in each hidden layers is critical [42]. It has been established that an MLPNN that has only one hidden layer, with sufficient number of neurons, acts as a universal approximators of nonlinear mappings [43]. The tradeoff between accuracy and complexity of the model should be resolved accurately [44]. In practice, it is very difficult to determine a sufficient number of neurons necessary to achieve the desired degree of approximation accuracy. Frequently the number of units in the hidden layer is determined by trial and error. To determine the weight values, one must have a set of examples of how the output should relate to the inputs. The task of determining the weights from these examples is called training or learning and is basically a conventional estimation problem. That is, the weights are estimated from the examples in such away that the network, according to metric, models the true relationship as accurately as possible. Since learning is a stochastic process, the learning curve may be drastically different from run to run. In order to compare the performance of a particular search methodology or the effects of different parameters have on a system, it is needed to obtain the average learning curve over the number of runs so that the randomness can be averaged out. An exhaustive and careful experimentation has been carried to determine the configuration of the static MLP model and the optimal proposed FTLRNN model with gamma memory for short-term and long-term ahead predictions for the benchmark chaotic time series for 60% training, 15% cross-validation and 25% testing samples for the considered benchmark chaotic time series. It is found that the performance of the selected model is optimal for 38, 21, 15, and 43 neurons in the hidden layer with regard to the MSE, NMSE, and the correlation coefficient performance for the testing data sets for Mackey-Glass, Duffing, Monthly Sunspots, and Laser time series, respectively, and the different parameters like transfer function, Learning rule, step size, and momentum values are mentioned in Table 1 for Mackey-Glass and Duffing chaotic time series and in Table 2 for monthly sunspots and Laser time series.

When we attempted to increase the number of hidden layer and the number of processing element in the hidden layer, the performance of the model is not to seen to improve significantly. On the contrary it takes too long time for training because of complexity of the model. As there is single input and single output for the given system, the number of input and output Processing Elements is chosen as one. Now the NN models are trained three times with different weight initialization with 1000 iterations of the static back-propagation algorithm with momentum term for these two models. All the possible variations for the model such as number of hidden layers, number of processing elements in each hidden layer, different transfer functions like tanh, linear tanh, sigmoid, linear sigmoid in output layer, the different supervised learning rules like momentum, conjugant gradient, and quick propagation are attempted for 10-step ahead prediction for Mackey-Glass, Duffing, and Laser time series, and 6 months ahead prediction for the monthly sunspots time series. The results are placed in Table 3 for Mackey-Glass and Duffing time series and in Table 4 for real-time monthly sunspots, and Laser time series for different learning rules on testing data set.

Also the various error norms and are varied, and FTLRNN model is trained and tested for the optimum transfer function. The results are obtained and placed in Table 5 for artificial Mackey-Glass and Duffing time series and in Table 6 for real-time monthly sunspots and Laser time series. It is clear from Table 5 that for error norm and tanh transfer function the value of MSE is minimum and correlation coefficient is maximum for the Mackey-Glass chaotic time series. For Duffing chaotic time series the optimal values of MSE, NMSE, and correlation coefficient is obtained for linear tanh transfer function and error norm can be seen from Table 5.

Similarly, for real-time monthly sunspots time series, the minimum value of MSE and maximum value correlation relation coefficient are obtained for error norm and tanh transfer function. For Laser time series the minimum value of MSE and maximum value of correlation coefficient resulted for error norm and tanh transfer function.

Then on these resulted optimal parameters the FTLRNN model is trained and tested for short-term (1, 5, 10) and long-term (20, 50, and 100) step ahead prediction. The FTLRNN structure is the MLP extended with the short-term memory structures. So these optimal parameters obtained for FTLRNN model are used for training and testing the MLPNN. Then on the same optimal parameters the static MLPNN model was attempted, and the performance measures like MSE, NMSE, and correlation coefficient for the short-term (1, 5, and 10) and long-term (20, 50, and 100) step ahead prediction were obtained as stated in Table 7 for Mackey-Glass chaotic time series, in Table 8 for Duffing chaotic time series, in Table 9 for laser time series, and in Table 10 for monthly sunspots time series. It is obvious from Tables 7, 8, 9, and 10 that for all the time series considered for short-term and long-term ahead predictions, the performance of this FTLRNN model is optimal on the test dataset for the following number of taps = 6, Tap Delay = 1, Trajectory Length = 50 with regards to the value of correlation coefficient , MSE, and NMSE.

Then training and testing samples are varied from 10% to 80% as training with the increments of 10% samples and 75% to 5% as testing on the proposed optimal FTLRNN model, the number of training and testing samples was as testing samples with the decrements of 10% keeping exemplars for Cross-Validation (CV) 15% constant. The performance measures were obtained and compared on testing and training datasets and to gauge the performance and robustness of the FTLRNN. The results are obtained and are placed in Table 11 for 1-step ahead predictions, Table 12 for 5-step ahead prediction, Table 13 for 10-step ahead prediction, Table 14 for 20-step ahead prediction, Table 15 for 50-step ahead prediction and Table 16 for 100-step ahead prediction for Mackey-Glass and Duffing chaotic time series.

In similar way, for real-time monthly sunspots and Laser time series, the number of training and testing samples was varied from 10% to 80% as training with the increments of 10% samples and 75% to 5% as testing samples with the decrements of 10% keeping exemplars for Cross-Validation (CV) 15% constant. The performance measures were obtained and compared on testing and training data sets and to gauge the performance and robustness of the FTLRNN. The obtained results are placed in Tables 17, 18, 19, 20, and 21 for monthly sunspots and Laser time series for the value in ahead prediction.

Next for the optimum values of performance measures obtained from the combinations data partition as training and testing samples for the chaotic time series as mentioned in Tables 11 to 16 for the Mackey-Glass chaotic time series for all the cases of multistep ahead prediction and Duffing time series all the step ahead prediction. Similarly, for the real-time monthly sunspots and Laser time series for all the multistep ahead prediction.

Then for the optimal data partition of training and testing samples combination resulted for the ahead prediction for the cases of in ahead prediction for all the considered time series. For those combinations, the number of epochs is varied from 2000 to in a step of 2000, and again the proposed FTLRNN model is trained to observe the more prominent values of performance measures for all the chaotic time series and for all the steps ahead prediction.

#### 7. Discussion

It can be clearly observed that dynamic FTLRNN model with gamma memory clearly outperforms the static MLP not only for short-term prediction but also for long-term prediction for testing data as well as training data set. From the results of Table 7, for Mackey-Glass chaotic time series, it is noticed that up to the 20-step ahead prediction the Performance measures values of MLP and dynamic FTLRNN are slightly deviating but for long 50- and 100-step ahead prediction the Performance measures values of MSE, NMSE, and correlation coefficient (*) * for FTLRNN model are significantly improved as compared to the static MLP. For the Duffing time series from Table 8 it is observed that for short- and long-step ahead predictions the proposed dynamic FTLRNN with gamma memory filter clearly outperforms well as compared to the static MLP with regards to the performance metrics like MSE, NMSE, and correlation coefficient .

Also for the real monthly sunspots time series, it is observed from Table 9 that for the 1, 6, 12, months ahead predictions the performance metrics values of MLP and dynamic FTLRNN are slightly deviating but for 18 and 24 months ahead prediction, the performance metrics values for FTLRNN are significantly improved as compared to the static MLP. For the Laser time series from Table 10 it is observed that for short- and long-term ahead () prediction the proposed dynamic FTLRNN with gamma memory filter clearly outperforms well as compared to the static MLP for the values of MSE, NMSE, and correlation coefficient

Next for the resulted optimal parameters the FTLRNN model is trained for different training and testing samples combinations varying from 10% to 80% as a training sample and 75% to 5% as a testing sample and Keeping cross-validation samples constant equal to 15% for short-term and long-term step ahead predictions for finding robustness of model and obtaining the significant results of performance measures for training and testing samples combinations. Tables 11 to 16 show the performance metrics values for one-step ahead to 100-step ahead predictions for different combination of training and testing samples for the equation generated Mackey-Glass and Duffing time series.

Similarly for real-time series the training and testing samples are varied in combination as mentioned in Tables 17 to 21 for one month ahead to twenty four months ahead predictions for monthly sunspot time series and for one-step to fifty-step ahead predictions for Laser time series.

For which training and testing samples the values of MSE NMSE are minimum, and correlation coefficient is closed to unity for short-term and long-term predictions for the chaotic time series for that training and testing samples the FTLRNN with gamma filter is trained for 2000 to epochs in the steps of 2000 for obtaining more significant values and observing the network performance with regards to the performance measures. The results are plotted in Figure 6 for 1- and 5-step ahead predictions, Figure 7 for 10- and 20-step ahead predictions and Figure 8 for 50- and 100-step ahead predictions for performance on MSE and NMSE due to epochs variation, and Figure 9 for performance on regression due to epochs variation for 1-, 5-, 10-, 20-, 50- and 100-step ahead predictions for Mackey-Glass chaotic time series. It is observed that up to 50 steps ahead prediction the performance metrics are slightly deviating but for long-term 100-step ahead predictions above the values of 12 000 epochs training the performance measure like MSE and NMSE values are decreased, and the correlation coefficient is substantially increased.

Also it can be visually inspected closely from Figures 10, 11, 12, and 13 that the output of the proposed FTLRNN model closely follows the desired output for 1-, 5-, 10-, 20-step ahead predictions.

From Figure 14 and Figure 15, it is clear that, the output of the network is slightly deviating from the desired output for long 50-step and 100-step ahead predictions.

For the Duffing time series the results are shown in Figure 16 for 1- and 5-step ahead predictions, Figure 17 for 10- and 20-step ahead predictions and Figure 18 for 50- and 100-step ahead predictions for performance on MSE and NMSE due to epochs variation, and Figure 19 for Correlation coefficient for 1-, 5-, 10-, 20-, 50- and 100-step ahead predictions. It is observed that for short-term prediction and up to 50-step ahead predictions the results are not deviating much. But for 100-step ahead prediction, the performance measures values are improved well.

From the close inspection of Figure 20 and Figure 21 for 1- and 5-step ahead predictions the output of the FTLRNN closely follows the desired output. From Figure 22 and Figure 23 for 10- and 20-step ahead predictions, up to the first 20 samples the FTLRNN output is slightly deviating and after that the actual output closely follows the desired output. Similarly from the Figure 24 and Figure 25 the FTLRNN output follows the desired output for long-term 50- and 100-step ahead predictions.

Similarly for real monthly sunspots time series the results are plotted in Figure 26 and Figure 27 for performance on MSE and NMSE, respectively, due to epochs variations for 1 and 6 months ahead prediction, Figure 28 and Figure 29 for due to epochs variations for 12, 18, and 24 months ahead prediction and Figure 30 for the correlation coefficient (*) * for all the month ahead prediction. For short-term 1 and 6 months ahead prediction the performance values are slightly improved. Also for 12 and 18 months ahead prediction the results are slightly deviating but for 24 months ahead prediction for 10 000 epochs training the values of MSE , NMSE, and Correlation Coefficient are significantly improved that is, MSE of 0.00824 and NMSE of 0.23236, respectively, and the value of Correlation Coefficient is 0.88460 where as for the 1000 epochs training the value of MSE and NMSE is 0.0339, 0.57734, and is 0.8012.

Also from the close inspection of Figure 31 , Figure 32 and Figure 33 for short-term 1, 6, and 12 months ahead prediction the FTLRNN output closely follows the desired output and from Figure 34 and Figure 35 for long-term 18 and 24 months ahead prediction the FTLRNN output is slightly deviating from the desired output.

For Laser time series the results are plotted in Figure 36 and Figure 37 for 1-, 5-, and 10-step ahead prediction and Figure 38 and Figure 39 for 20- and 50-step ahead prediction for performance on MSE and NMSE due to epochs variations, and Figure 40 for the performance on Correlation coefficient due to epochs variation for all the step ahead predictions. It is observed that for short-step ahead prediction the performance metrics are slightly deviating but for 50-step ahead prediction, the error values MSE and NMSE and correlation coefficient are improved substantially. For 1000 epochs training the values of MSE and NMSE are 0.03447 and 0.89215, and value of is 0.44330 and when the number of epochs for training is increased to 20 000 the values of MSE and NMSE are 0.154 and 0.412, and the value of is significantly improved to 0.789.

Also from the close inspection Figures 41, 42 and 43 for short-term 1-, 5-, and 10-step ahead prediction the FTLRNN output closely follows the desired output and from Figure 44 and Figure 45 for long-term 20- and 50-step ahead predictions, it is clear that the FTLRNN output is slightly deviating from the desired output.

#### 8. Conclusions

It is seen that focused time-lagged recurrent network with gamma memory is able to predict the differential equation generated Mackey-Glass and Duffing time series and real-world monthly sunspots and laser time series quite elegantly in comparison with the Multilayer perceptron (MLP). Static NN configuration such as MLPNN-based model fails to cope up with the underlying nonlinear dynamics of the all the time series for short-term and long-term ahead predictions. It is observed that MSE NMSE of the proposed focused time-lagged recurrent neural network (FTLRNN) dynamic model for testing data set as well as for training data set are significant better than those of static MLP NN. In addition, it is also observed that the correlation coefficient of this model for testing and training exemplars is much higher than that of MLPNN for the short-term and long-term ahead predictions for the chaotic time series considered. The FTLRNN is trained for different combination of training samples and tested on testing data sets to find the robustness and sustainability of FTLRNN model for all the step ahead predictions for the differential equation generated and real-time chaotic time series. Also for the proposed FTLRNN model, the output closely follows the desired output and learned the true trajectory for short-term prediction. For long-term prediction the output is slightly deviating for the considered benchmark chaotic time series as discussed .It is inferred from the experiments that the FTLRNN model with gamma memory has learned the dynamics of chaotic time series quite well as compared to multilayer perceptron network for testing data set (data set not used for training) with reasonable accuracy. On the contrary, it is observed that static MLP performs poorly bad, because on the one hand it yields much higher MSE and NMSE on testing data sets and on the other hand the correlation coefficient for testing data set is far less than unity. Hence the focused time-lagged recurrent neural network with gamma memory filter has outperformed the static MLP-based neural network better for short-term and for long-term ahead predictions. Then the number of epochs is varied from 2000 to 20 000 in the steps of 2000, and the network is trained for finding out the performance of the model for epoch’s variation.