Impact of Noise on a Dynamical System: Prediction and Uncertainties from a Swarm-Optimized Neural Network
An artificial neural network (ANN) based on particle swarm optimization (PSO) was developed for the time series prediction. The hybrid ANN+PSO algorithm was applied on Mackey-Glass chaotic time series in the short-term . The performance prediction was evaluated and compared with other studies available in the literature. Also, we presented properties of the dynamical system via the study of chaotic behaviour obtained from the predicted time series. Next, the hybrid ANN+PSO algorithm was complemented with a Gaussian stochastic procedure (called stochastic hybrid ANN+PSO) in order to obtain a new estimator of the predictions, which also allowed us to compute the uncertainties of predictions for noisy Mackey-Glass chaotic time series. Thus, we studied the impact of noise for several cases with a white noise level from 0.01 to 0.1.
Currently, the prediction of time series has played an important role in many science fields of practical application as engineering, biology, physics, meteorology, and so forth. In particular, and due to their dynamical properties, the analysis and prediction of chaotic time series have been of interest for the science community. In general, the chaotic time series are usually modeled by delay-differential equations; standard examples are the Mackey-Glass system , or the Ikeda equation  (for more examples, see ). Also, many methods have been used in the chaotic time series analysis . However, in the last decades, different types of artificial neural networks (ANN) have been widely used for forecasting of chaotic time series, for example, backpropagation algorithm , radial basic function , and recurrent network .
On the other hand, the analysis of real-life time series requires taking into account the error propagation of input uncertainties. The observed data could be contaminated for different instrumental noise types as white noise or proportional to signal (the latter mainly arises from instrumental calibration). In the modeling of chaotic time series, the impact of noise can be treated as errors-invariable problem where the noise is propagated into the prediction model. In the literature, the noisy impact on chaotic time series prediction has been barely considered. We can find studies where the algorithms were tested from a theoretical point of view (e.g., see [8–12]) and works where the implementation was applied on real-life time series (e.g., see [9, 13, 14]). In addition, some authors have proposed a modification to the standard methods in order to improve the performance prediction in presence of noise [9, 14].
In this work, we used the Mackey-Glass chaotic time series in order to study the short-term prediction with an artificial neural network optimized with a particle swarm algorithm (ANN+PSO). The method was applied on noiseless and noisy chaotic time series. In order to carry out the error propagation of the input noise, this hybrid algorithm was complemented with a Gaussian stochastic procedure to compute a new estimator of the predictions and their uncertainties. Note that ANNs have been used in combination with PSO in several applications. Principally, these applications include feed-forward neural network training [15–18], design of recurrent neural networks , design of radial basis function networks , and neural network control for nonlinear processes . In addition, there are several current versions of PSO available in the literature (e.g., see the following reviews [22–24]), but our application uses a standard PSO with inertial weight . In this point, the use of a PSO with inertial weight is based on the following reasons: (1) this version of PSO is easy to understand and implement due to its simple concept and learning strategy; (2) as pointed out in , the PSO with inertia weight  and PSO with constriction factor  are mathematically equivalent, and PSO with constriction factor can be considered as a special case of PSO with inertia weight [22, 26] (note that this equivalence can be applied to other improved PSO algorithms that include a varying inertia weight schedule); (3) inertia weight PSO algorithm is quite stable to population changes ; (4) the advantages and disadvantages of variants of PSO depend on the problem to solve [22–24]; (5) as a first approach for study of noise effect on dynamical systems using an ANN combined with inertia weight PSO algorithm, the present study may motivate and help the researchers working in the field of evolutionary algorithms to develop new hybrid models or to apply other existing PSO models to solve this problem. To the best of the authors’ knowledge, there is no application for forecasting the noisy chaotic time series such as the one presented here, using a hybrid method that combined ANN with PSO algorithm.
Organization of this paper is as follows. In Section 2, we present a detailed description of the hybrid ANN+PSO method. Sections 3 and 4 present the simulation, algorithm implementation, and the principal results obtained for the forecasting of noiseless chaotic time series and noisy time series, respectively. Finally, conclusions are given in Section 5.
2. Hybrid ANN+PSO Algorithm
Artificial neural networks (ANNs) are similar to biological neural networks in performing functions collectively and in parallel using connection nodes. Thus, ANNs are a family of statistical learning algorithms biologically inspired.
In this study, we consider one of the most successful and frequently used types of neural networks: a multilayer feed-forward neural network with a backpropagation learning algorithm (gradient descent error). This ANN was implemented replacing standard backpropagation with particle swarm optimization (PSO).
PSO is a population-based optimization tool, where the system is initialized with a population of random particles and the algorithm searches for optima by updating generations . In each iteration, the velocity of each particle is calculated according to the following formula :where and denote a particle position and its corresponding velocity in a search space, respectively. is the current step number, is the inertia weight, and are the acceleration constants, and , are elements from two random sequences in the range . is the current position of the particle, is the best one of the solutions that this particle has reached, and is the best solutions that all the particles have reached. In general, the value of each component in can be clamped to the range  control excessive roaming of particles outside the search space [28, 29]. After calculating the velocity, the new position of each particle is
The procedure to calculate the output values, using the input values, is described in detail in .
The net inputs () are calculated for the hidden neurons coming from the inputs neurons. In the case of a neuron in the hidden layer, one haswhere is the vector of the inputs of the training, is the weight of the connection among the input neurons with the hidden layer , and the term corresponds to the bias of the neuron of the hidden layer , reached in its activation. The PSO algorithm is very different than any of the traditional methods of training . Each neuron contains a position and velocity. The position corresponds to the weight of a neuron . The velocity is used to update the weight . Starting from these inputs, the outputs () of the hidden neurons are calculated, using a transfer function associated with the neurons of this layer:
The transfer functions can be linear or nonlinear. We used one hidden layer with as a tangent hyperbolic function (tansing) and as a linear function in the output layer:All the neurons of the ANN have an associated activation value for a given input pattern, and the algorithm continues finding the error that is presented for each neuron, except those of the input layer. After finding the output values, the weights of all layers of the network are actualized by PSO, using (1) and (2) . The velocity is used to control how much the position is updated. On each step, PSO compares each weight using the data set. The network with the highest fitness is considered the global best. The other weights are updated based on the global best network rather than their personal error or fitness . In this paper, we used the mean square error (MSE) to determine network fitness for the entire training set: where is the real data and is the calculated output value obtained from the normalized output () of the network. This process was repeated for the total number of patterns in the training set. For a successful process, the objective of the algorithm is to modernize all the weights minimizing the total root mean squared error (RMSE):
In PSO, the inertial weight , the constants and , the number of particles , and the maximum speed of particle summarize the parameters to synchronize for their application in a given problem. Then, an exhaustive trial-and-error procedure was applied to tune the PSO+ANN parameters. Firstly, the effect of population is analyzed for values of 25 to 100 individuals in the swarm. For other applications, some authors have shown that a larger swarm increases the number of function evaluations to converge to an error limit . In addition, Shi and Eberhart  illustrated that the population size has hardly any effect on the performance of a swarm algorithm. Figure 1(a) shows that the best population to solve the problem is of 50 individuals. Next, the effect of is analyzed for values of 0.1 to 0.9. Figure 1(b) shows the values of that favoured the search of the particles and accelerated the convergence. This figure shows that for a linearly decreasing inertia weight starting at 0.7 and ending at 0.5, the PSO+ANN presents a good convergence. In other aspect, a usual choice for the acceleration coefficients and is . The effect of variation of constants was evaluated for the commonly used values of and such as 1.49 and 2.00 [31, 32]. For this analysis, presents a better convergence than other values. Table 1 shows the selected parameters for this hybrid algorithm.
The step-to-step approach of PSO+ANN can be summarized as follows.
Step 1. Initialize the positions (weights and biases) and velocities of a group of particles randomly. The particles represent the weight vectors of ANN, including biases. The dimension of the search space is therefore the total number of weights and biases.
Step 2. The ANN is trained using the initial particles position in PSO. The learning error produced from ANN network can be treated as particles fitness value according to initial weight and bias. The current best fitness achieved by particle is set as . The with best value is set as and this value is stored.
Step 3. Evaluate the desired optimization fitness function (7) over a given data set.
Step 4. Compare the evaluated fitness value of each particle () with its value. If , then is the coordinates corresponding to best particle so far.
Step 5. The objective function value is calculated for new positions of each particle. If a better position is achieved by an agent, value is replaced by the current value. As in Step 1, value is selected among values. If the new value is better than the previous value, it is replaced by the current value and this value is stored. If , then is the particle having the overall best fitness over all particles in the swarm.
Step 6. The learning error at current epoch will be reduced by changing the particles position, which will update the weight and bias of the network. Change the velocity and location of the particle according to movement equations (1) and (2). The new sets of positions (weights and biases) are produced by adding the calculated velocity value to the current position value. Then, the new sets of positions are used to produce new learning error in ANN.
Step 7. This process is repeated until the stopping conditions either minimum learning error or maximum number of iterations are met and then stop; otherwise, loop to Step 3 until convergence.
Step 8. The optimum weight and biases for ANN model are obtained by PSO. Best training process is obtained for ANN.
In our time series analysis, if the input noise level contribution is available, the RMSE in the training phase will be computed as follows: where is the noise level of each -element. Note that , for a white noise assumption.
Henceforth, we refer as the standard ANN+PSO to the hybrid ANN+PSO defined above.
2.1. The Stochastic ANN+PSO
Up to now, the standard ANN+PSO is not developed to carry out the error propagation of the input noise level contribution. Nevertheless, once the standard ANN+PSO has been executed and has provided the optimal topology, we can apply an additional method in order to compute uncertainty of the prediction.
Note that once the topology is established (number of hidden layers, neurons in each hidden layer, transfer functions , and weights and biases ( and )), the neural network acts as a function (called function ANN) whose output only depends on the input vector (see (4)). The idea is to generate simulations from the input data () via Gaussian random number generator in order to propagate the intrinsic data noise through the function ANN.
For each -element of the input time series, we generate -simulations as where the input noise level is known. is a random number generator following a Gaussian distribution with mean zero and standard deviation equal to .
Finally, for the th element, each input data set provides an output . These are used in the estimation of a new estimator of prediction () and an error on the prediction () as follows:
3. Noiseless Chaotic Time Series Prediction
We computed the chaotic time series from the Mackey-Glass time-delay differential system [1, 33], which is described as follows: where (unitless) is the series in the time and is the time delay. Here, we assumed that , , and . Note that if , the time series shows a chaotic behaviour [33, 34]. The nominal Mackey-Glass time series is obtained from numerical integration by a fourth order Runge-Kutta method. This series was computed with a time sampling of 1 second. Thus, is derived from with for , where is the time horizon considered.
Mackey-Glass chaotic time series with is considered as the nominal case (without noise contribution). Here, we generate two thousand data points ().
From this data set, the input is created as a vector using points of the time series spaced apart; that is, . The output is generated with the value .
According to the standard analysis of the Mackey-Glass chaotic time series, we consider four nonconsecutive points in the chaotic time series in order to predict the short-term : where this standard test assumes and [6, 34].
For this input, the first thousand data sets were used for learning (training), while the others were used for the prediction validation (prediction). In the ANN+PSO implementation on the nominal case, the optimum value of found was six; that is, the architecture is described as 4-6-1.
Figure 2 presents a comparison between recorded and predicted values of the Mackey-Glass time series for the training and prediction phases. This figure shows that, for training and validation phases, the nominal and reconstructed values are in total agreement. In fact, for training, we computed a remainder average, , of and a remainder maximum, , of . Similar results are obtained for the prediction phase, with a maximum of and an average of .
Table 2 shows the RMSE (for short-term prediction of Mackey-Glass chaotic time series) from different computational methods obtained from literature, for example, the backpropagation NN , the conjugate gradient ANN , the product operator -norm , and the fuzzy system  (see references in Table 2). In the ANN+PSO configuration used here, the RMSE = 0.014 indicates that the performance prediction is in good agreement with other methods. Clearly, the inclusion of the PSO approach allows us to improve methods based on ANN without PSO, for example, the conjugate gradient ANN (RMSE = 0.229) and the backpropagation NN (RMSE = 0.026).
3.1. Chaotic Behaviour
As the Mackey-Glass time series without noise is a known system, it is possible to compare the ability of ANN+PSO method of reproducing its chaotic behavior. Figure 3 shows a representation of the chaotic attractor studied from Mackey-Glass time series. This figure shows that with the system operates in a high-dimensional regime. The Mackey-Glass system is infinite dimensional system (because it is a time-delay equation) and, thus, has an infinite number of Lyapunov exponents () . The Lyapunov exponents of dynamical systems are one of a number of invariants that characterize the attractors of the system in a fundamental way . Table 3 shows a comparison of the first four largest Lyapunov exponents of the Mackey-Glass system reported in , with the Lyapunov exponents obtained for the ANN+PSO method for .
An approach to determine an appropriate cutoff value for the number of exponents can be related to the Lyapunov dimension . This idea was originally explored by Kaplan and York . Thus, Kaplan and York conjecture that this dimension () is equal to the information dimension . In our case, is computed as 2.10. Note that, in Farmer , the authors reported a fractal dimension and a Lyapunov dimension calculated by the Kaplan-York conjecture of .
4. Noisy Chaotic Time Series Prediction
In the previous section, the ANN+PSO has proven to be an efficient method to the prediction of chaotic time series. Nevertheless, up to now, effects of noise on the hybrid ANN+PSO implementation have not been studied.
In order to study the impact of noise on chaotic series time prediction, we constructed the noisy time series as the contribution of a noise level on the nominal case without noise. The Mackey-Glass noisy chaotic time series, , is generated as where is the particular contribution of noise on the -element. It is estimated as , with , a Gaussian random number generator.
Note that corresponds to the noise level considered. Here, we assume that the original data are effected by a white noise; that is, the noise level is the same in each -element, (for clarification, although the noise level is the same in each time, the noise contribution is not the same (the latter depends on the Gaussian random number generator)). Different white noise levels are considered: , , , , and . These values are nearly related to the 1%, 4%, 6%, 9%, and 11% of the pick-to-pick amplitude of nominal case (~). Figure 4 shows that the noisy chaotic time series for is equal to 0.01 (green), 0.04 (blue), and 0.1 (red). As expected, the noisy time series with is the closest to the nominal case. However, the cases with and show a slightly more modified shape from the noiseless case, in particular with .
4.1. Noise Effect on ANN+PSO
The standard ANN+PSO is applied to our noisy time series, which provides the optimum topology and the prediction. Then, the stochastic ANN+PSO is run in order to obtain a new prediction estimator and the uncertainty of the prediction ().
Impact on Architecture. For each noisy time series, in the standard ANN+PSO implementation, we carry out a detailed study of the architecture characterization. In the determination of the optimum , the RMSE is computed for different number of neurons in the hidden layer (from two up to thirty), which are presented in Figure 5. For each series, the optimum is obtained when the RMSE reaches a minimum. As expected, the characterization of the architecture is strongly related to the noise level in the input data. In lower noise (as 0.01), the optimum is clearly identified from Figure 5; in contrast, in the most contaminated case (), the selection depends on the fourth decimal of the RMSE (0.1292, 0.1291, and 0.1293 for 19, 20, and 21 neurons in the hidden layer, resp.). The RMSE and the optimum are presented in Table 4. Using these values and according to the trend seen in Figure 5, we fit a lineal model, which provides a correlation with a slope of 0.0085. Although the for is not well characterized for this model, we can find a clear lineal correlation between the RMSE and the for different noise levels. In this context, as an illustration, in the overplot (in top-right side of Figure 5), we show the relation of the and the noise level, whose best lineal fit model is . Therefore, the impact of noise on the architecture of this hybrid neural network, for contributions lower than 0.1, can be characterized by a lineal correlation of the RMSE with the and the with the input noise .
The Prediction Performance. As an illustration, the predictions obtained for noisy case , from the standard ANN+PSO () and the stochastic ANN+PSO () procedures, are presented in Figure 6. As expected, even on this high noise level case, the and predictions are in total agreement. Actually, the RMSE obtained from both methods is the same (in the approximation of the third decimal) for each noisy case. For this reason, the RMSE shown in Table 4 represents the RMSE of both methods.
On the other hand, as expected, the RMSE increases with the growing the noise level (see Figure 7). For example, we obtained RMSE of 0.0138 and 0.13 for the noiseless and noisy (with ) cases, respectively. From Figure 7, we observe a linear correlation between the RMSE and the input noise level. The best fit model, without considering the RMSE of the noiseless case, corresponds to , which shows a strong lineal correlation. Therefore, we confirm that a higher noise level in input data leads to a poor estimation of the prediction estimator, which is related linearly to the input noise level.
Also, the ratio (third column in Table 4) can be used to study the impact of noise on the performance efficiency of our implementation (with respect to nominal case). The bottom-right panel of Figure 7 shows the performance efficiency against the input noise level. In the worst case, the performance efficiency () is strongly affected by one order of magnitude with respect to the noiseless case. Even so, the standard and stochastic ANN+PSO confirm to be a powerful tool for making predictions of chaotic time series.
In the literature, we do not find a similar implementation (due to the ahead prediction, type and level of noise, etc.) that allows for us a straightforward comparison of results. For example, we can contrast our results with those presented by Sheng et al. 2012 . They applied the Echo State Network (ESN) based on dual estimation on a noisy Mackey-Glass time series (with a sampling of 2 seconds) with a white noise level of . However, the prediction ahead was one, which is considered lower than ours. Yet, let us carry out a plain comparison. Depending on the prediction performance, they obtained RMSE of 0.05 for Generic ESN (hereafter GESN) and 0.04 for CKF/KF based ESN (henceforth CESN). In this context, the impact of the noise on the performance efficiency is lower in ANN+PSO implementation (with respect to the ESN). In fact, we have a performance efficiency of 9.4, while they obtained of 1161 and 33.5 for GESN and CESN, respectively.
Prediction Uncertainties. One of the main goals of this work is to estimate the uncertainty on the prediction. The prediction measurement () and the error bars () obtained from the stochastic ANN+PSO, for the noisy time series with , are presented in Figure 8. We confirm that our forecast and input data, for the strong noise contribution, are in agreement at one sigma (at 68.5% of confidential level) when the error bars are considered. The uncertainties obtained are presented in the low panel of Figure 8. We found a minimum and maximum uncertainty of 0.024 and 0.13, respectively, with an average of . This value is lower than the input noise level (), and this shows the impact of the error propagation in our methods. According to Figure 8, a relationship between the uncertainties and the times is not appreciated.
Finally, from Figures 6 and 8, we have proven that ANN+PSO (with the standard and/or the stochastic implementation) is a robust tool in the predictability (for the short-term prediction) of time series affected by a white noise. In addition, now the ANN+PSO method can provide, for first time, an estimation of the uncertainty of the prediction.
In this paper, a hybrid algorithm based on artificial neural network and particle swarm optimization (ANN+PSO) is used in the short-term prediction of Mackey-Glass chaotic time series. In addition, a study of the impact of the noise on our hybrid method is presented. Based on the results and discussion presented in this study, we have the following conclusions.(i)The current value and the past values used have influential effects on the good training and predicting capabilities of the chosen network.(ii)In noiseless case, simulation shows that this hybrid ANN+PSO algorithm is a very powerful tool for making prediction of chaotic time series, and the low deviations found with the proposed method show an accuracy comparable with other methods available in the literature.(iii)In noisy cases, we have proven that the hybrid ANN+PSO is a robust tool in the predictability of the short-term prediction of chaotic time series affected by a white noise.(iv)The impact of the noise on the topology and performance efficient of the ANN+PSO is important. However, this study shows that the error propagation through the ANN+PSO has a linear behaviour, which generates a linear relationship between the RMSE (optimization parameter) and the input noise level. Therefore, the PSO optimization provides a linearity which ensures that the neural network will converge to an appropriate solution, even if a noise level contribution is present.(v)For noisy cases, although a straightforward comparison with literature is unavailable, the performance efficient proves that the standard/stochastic ANN+PSO implementation is affected in a lesser degree than the other similar performances.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors acknowledge the support from the Research Directorship of the University of La Serena (DIULS).
J. D. Hamilton, Time Series Analysis, Princeton University Press, Princeton, NJ, USA, 1994.View at: MathSciNet
A. Girard, C. Rasmussen, J. Quinonero-Candela, and R. Murray-Smith, “Gaussian process priors with uncertain inputs—application to multiple-step ahead time series forecasting,” in Advances in Neural Information Processing Systems, MIT Press, 2003.View at: Google Scholar
C. Zhang, H. Shao, and Y. Li, “Particle swarm optimization for evolving artificial neural network,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, vol. 4, pp. 2487–2490, October 2000.View at: Google Scholar
E. A. Grimaldi, F. Grimaccia, M. Mussetta, and R. E. Zich, “PSO as an effective learning algorithm for neural network applications,” in Proceedings of the International Conference on Computational Electromagnetics and Its Applications, pp. 557–560, November 2004.View at: Google Scholar
Y. Shi and R. Eberhart, “Modified particle swarm optimizer,” in Proceedings of the IEEE International Conference on Evolutionary Computation, pp. 69–73, May 1998.View at: Google Scholar
R. C. Eberhart and Y. Shi, “Comparing inertia weights and constriction factors in particle swarm optimization,” in Proceedings of the Congress on Evolutionary Computation (CEC '00), vol. 1, pp. 84–88, July 2000.View at: Google Scholar
S. H. Lee and I. Kim, “Time series analysis using fuzzy learning,” in Proceedings of the International Conference on Neural Information Processing, vol. 6, pp. 1577–1582, Seoul, Republic of Korea, October 1994.View at: Google Scholar
Z. Qin and Y. Tang, Uncertainty Modeling for Data Mining. A Label Semantics Approach, Zhejiang University Press, Hangzhou, China; Springer, Berlin, Germany, 2014.
G. G. Yen, Multi-Objective Machine Learning, Springer, Berlin, Germany, 2006.
J. Kaplan and J. York, Functional Differential Equations and Approximation of Fixed Points, H. O. Peitgen and H. O. Walther, Eds., Springer, New York, NY, USA, 1979.