Wavelet Network: Online Sequential Extreme Learning Machine for Nonlinear Dynamic Systems Identification
A single hidden layer feedforward neural network (SLFN) with online sequential extreme learning machine (OSELM) algorithm has been introduced and applied in many regression problems successfully. However, using SLFN with OSELM as black-box for nonlinear system identification may lead to building models for the identified plant with inconsistency responses from control perspective. The reason can refer to the random initialization procedure of the SLFN hidden node parameters with OSELM algorithm. In this paper, a single hidden layer feedforward wavelet network (WN) is introduced with OSELM for nonlinear system identification aimed at getting better generalization performances by reducing the effect of a random initialization procedure.
In literature, Huang et al. presented an extreme learning machine (ELM) algorithm with a single hidden layer feedforward neural network (SLFN) in  and has taken many researchers’ interest because it offers significant advantages such as providing better generalization performance, fast learning speed, ease of implementation, and minimal human intervention . The main principle of using ELM algorithm is the random initialization procedure of the SLFN input weights, biases, and activation function parameters which are either additive or radial basis functions (RBF) and the analytical determining of the output weights in a single step using least square solution.
However, the ELM algorithm is based on fixed network structure, which means if the hidden nodes activation function (e.g., RBF’s centers and impact factors) parameters are initialized, then they will not be tuned during the learning phase. Therefore, random initialization of SLFN hidden node parameters may have effect on the modeling performances , and to improve the SLFN it requires high complexity performance and this may lead to ill condition, which means that an ELM may not be robust enough to capture variations in data .
Wavelet network (WN) has been applied successfully in many applications related to system identification and black-box modelling for nonlinear dynamic systems with different batch training algorithms [5–7]. The main advantages of WN are the capability of analytical initialization procedure for the hidden nodes (wavelet activation function) parameters . Moreover, wavelets decomposition properties for localization in both time and frequency domains allow better generalization in nonlinear system identification problems .
The similarity between the wavelet decomposing theory and single hidden layer feedforward neural networks (SLFNs) inspired Cao et al. in [9, 10] to propose a structure of the composite function wavelet neural networks (CFWNN) to be used with ELM for different applications. The initialization of the wavelet function parameters, namely, translation and dilation, is done using the input data that takes into account the domain of input space. The results on several benchmark real-world data sets showed that the proposed CFWNN method can achieve better performances.
Latterly, a new WN structure of dual wavelet activation functions in the hidden layer nodes has been introduced by Javed et al. in  with ELM algorithm and called summation wavelet extreme learning machine (SW-ELM). The proposed SW-ELM showed good accuracy and generalization performances by reducing the impact of a random initialization procedure where wavelets and other parameters of hidden nodes are adjusted a priori to learning.
For many industrial applications where online sequential learning algorithms are preferred over batch learning algorithms, an online sequential extreme learning machine (OSELM) algorithm has been introduced for SLFN (NN-OSELM) with additive and RBF hidden nodes functions and showed better generalization performance and fast training capability over the other well-known sequential learning algorithms [11, 12]. However, the NN-OSELM has not been yet verified in nonlinear systems identification problems for control applications. The authors in  stated that if the RBF network is trained from random initial weights for each subset, it could converge to a different minimum that corresponds to weights different from the one corresponding to. This random initialization procedure may cause unacceptable learning behaviour in system identification applications and may lead to a different open loop response of the identified system regardless of the modeling accuracy.
To overcome these differences, the RBF can be replaced by wavelet function based on a fact that wavelets are capable of controlling the order of approximation and the regularity by some of their key mathematical properties and explicit analytical initialization form [13, 14]. In this regard, the authors in  introduced a feedforward WN with OSELM (WN-OSELM) algorithm for nonlinear system identification where the wavelet activation function parameters initialization played a big role to ensure fast and better learning performance over NN-OSELM. However, the proposed WN-OSELM method was based on fixed numbers of the input features and the hidden nodes which is not optimal in any case.
In this paper, feedforward WN framework is introduced with OSELM (WN-OSELM) to limit the impact of random initialization of the SLFN hidden nodes parameters by using density function with recursive algorithm  and the input weights and biases using Nguyen Widrow approach . Moreover, the optimal input features and hidden nodes are selected using sequential forward search approach (SQFS)  and final prediction criterion (FPEC) , respectively.
The simulations will be carried out on three nonlinear systems and it will take in account the optimum number of hidden nodes and the input features for both WN and NN to ensure the generalization property.
In this section, a brief review of the traditional wavelet neural networks and the NN-OSELM is presented.
2.1. Wavelet Network
The standard form of WN with one output iswhere refers to the activation function for the hidden nodes where represents the set of all integers. The wavelet activation functions can be any wavelet mother functions (i.e., Morlet, Mexican-hat, 1st Gaussian function derivative, etc.). The symbol refers to the connection weights between the hidden nodes and the output node, while is the applied input vector to the input nodes . The multivariate wavelet mother function in the hidden nodes can be determined by the product of wavelet mother functions as follows:The wavelet function parameters and are called translation and dilation (scale) parameters, respectively. The translation parameter determines the central position of the wavelet, whereas the dilation parameter controls the waves spread. These parameters can be defined from the data set available .
For single-layer feedforward neural networks with hidden nodes governed by additive sigmoid or RBF activation functions and based on ELM algorithm theories , an online sequential extreme learning machine (OSELM) is developed by  in a unified way to deal with industrial applications that the training data comes one by one or chunk by chunk. To explain the principles of the OSELM, suppose a SLFN of hidden nodes and RBF activation function can be expressed as below,where is a Gaussian type radial basis function and is the center vector for neuron and is the output weights. Now, for a number of input/output training samples that equals the number of SLFN hidden nodes (i.e., ), if the input weights, biases, and hidden node parameters are randomly assigned and independent from the training data sets, an analytic and simple way to calculate the estimated output weights is applying least square solutions on the cost function as follows:where is the SLFN hidden layer output and the real output. Here, the estimated output weights can be determined by inverting with a single step where a zero training error can be realized. However, when the number of training samples is greater than the number of hidden nodes , the estimated weights can be considered by using a pseudoinverse of to give a small nonzero training error ,where is pseudoinverse of and the solution given by (5) can be rewritten as below , Based on the above, the learning procedure of the OSELM algorithm consists of two phases, namely, the initial phase and the sequential learning phase. In initial phase, suppose a chunk of initial training samples , where is reached such that ; then the estimated weights can be found bywhere while is the applied input vector to the input nodes . Now, for the sequential phase, suppose another chunk of data is reached , and ; then the minimizing problem becomeswhereand then where To express in terms of, the detailed derivation formula can be found in , and it is described asFor the subsequenced chunks of data samples, the recursive least square solutions are applicable, and the previous arguments can be generalized for. Suppose a data set , where ; then a recursive algorithm for updating can be written as follows: and (14) will be Finally, the OSELM algorithm for SLFN can be summarized in the following steps:(1)Initialize the network parameters (input weights, biases, and hidden nodes parameters) randomly.(2)Determine and using (8) and (7) for the initial chunk of samples .(3)Set .(4)Determine and using (15) and (16) for the next sample data .(5)Set , , and .(6)Go to Step (4) until all training data finish.
3. Nonlinear System Identification Using WN-OSELM
Many of nonlinear systems can be represented using the nonlinear autoregressive with exogenous inputs (NARX) model. Taking single-input-single-output systems as an example, this can be expressed by the following nonlinear difference equation:where is unknown nonlinear mapping functions, is the number of lagged feedback predictor sequences (output), is the number of lagged feedforward predictors (input), and is the fitting residual assumed to be bounded and uncorrelated with the inputs. Several methods can be applied to realize the NARX model including polynomials, neural networks, and wavelet networks . In this work, the proposed wavelet network with NARX called series-parallel structure is shown in Figure 1, where the real plant outputs are assumed to be available and can be used for prediction so that the stability of the network is guaranteed .
On the other hand, since the type of activation functions has an excessive impact on the network efficacy , the activation function of the WN hidden nodes is selected to be Morlet wavelet function as follows: Morlet wavelet function has been introduced by the authors in [3, 9, 10] and showed better generalizing capability which is proved statistically in different applications, noting that the type of the activation function that has been used with WN-OSELM in  was first derivative of Gaussian function.
It is important to state that the OSELM algorithm for WN is similar to SLFN introduced in Section 2.2. The only difference is the initialization procedure of the input weights and the wavelet activation function parameters which is determined from the available data set. It is preferred to yield attention of the WN parameters to deliver a fine initial point to the OSELM algorithm for the training phase. Assigning wavelet activation function parameters, namely dilation and translation, is a critical issue; wavelet functions with a small dilation value are low frequency filters and may make some wavelets too local, whereas increasing dilation factors the wavelet behaves as high frequency filter and in both cases the components of the gradient of the cost function could be having an effect on speed of convergence and the accuracy of learning algorithm . In this paper, the initializing of wavelet activation function parameters is done using density function and recursive algorithm  which is based on the initial input data that is available. Consider WN in the form of (1) with input nodes, and hidden nodes, and suppose chunks of input/output data set are available for training; then the input domain space can be defined by which holds the input item in all observed samples. Now, consider the dilation parameter defined as the center of the parallelepiped; then . To initialize, a point is selected between and so it divides interval into two parts. To find the point , the estimated density functions from the available input output observations are considered such that the point could be the center of gravity of domain . In this method, to find the point , the “density” functions estimated from noisy input/output observations are used such that the point is the center of gravity of ,where Here, the assumptions that and are followed, the same as in . Recursively, for each subinterval the same procedure applied for all input nodes to get a form of matrix. However, the number of data samples required to fill up appropriate matrix should be greater than the number of hidden nodes .
On the other hand, the WN input layer weights and biases are initialized using Nguyen Widrow approach that is explained in detail in . This algorithm creates initial weight and bias values so that the active region of each neuron in the input layer is distributed evenly across the input space. Therefore, the training phase works faster because each area of the input space has neurons that covers the active region of the data set, and this is the main advantage of Nguyen Widrow approach over purely randomizing weights and biases .
The initialization procedure of Nguyen Widrow approach  is as follows:(1)Initialize small (random) input weights in between : the weights from input nodes to hidden nodes .(2)Compute is a constant .(3)Compute .(4)Initialize bias values randomly between and .
4. Simulation and Examples
The simulations are based on comparisons between WN-OSELM and NN-OSELM that applied to three nonlinear dynamic systems, namely, the magnetic levitation (MagLev), continuous stirred tank reactor (CSTR), and robot arm system. The comparison of this method to some conventional nonlinear dynamic systems identification methods will not be conducted here because a comparison between NN-OSELM and some conventional methods was covered in . All the input output data sets and the simulations are carried out using MATLAB 7.11 environment running in an ordinary PC with 2.27 GHZ CPU. Moreover, the collected data sets of the plants are detailed in Table 1, where the first two rows of the table show the ranges of the amplitudes and widths of the input pulses.
The third and fourth rows indicate the outputs value each with corresponding units and the sampling time for each one. However, the halves of the total of each plant sample are used for training and the other half is used for testing and validation.
In addition, Table 2 defines the architecture of the WN and NN models with each plant, where the activation function of WN is Morlet wavelet function and for NN selected to be a sigmoid function . For the NRAX structure, the input features and the number of the hidden nodes are different for each model because we seek here the best performance of both models by selecting the finest delayed input/output and the optimal hidden nodes numbers that give better performance to the models. The first two rows in Table 2 show the number of delayed inputs and delayed outputs (regressors) and both were determined using a heuristic approach called sequential forward search (SQFS) .
Moreover, the third row states the optimal number of hidden layer nodes based on final prediction criterion (FPEC) . The last row of Table 2 yields the initial chunk of the training data samples which should be greater than the number of hidden nodes and has been determined by adding number 50 over the hidden nodes number. Note that any integer value equal to or larger than 0 can be used, just to make sure that the number of hidden nodes is equal to or less than the presented data samples in the initial stage only.
In order to compare the model output with the plant actual output, the relative error index  can be used to measure the performance of the identified model which is defined as where and are the output measurements over the data set and associated for one-step-ahead predictions, respectively. refers to the length of the validation data set and refers to the mean of the measured output over the validation data set and can be determined byNote that is called also normalized error because the value of is considered as the standard deviation of the measured output and it is always constant for a given test data set; therefore the best model has minimum with validation data test. However, to determine in percentage form the formula below is applicable for performance test,Furthermore, by comparing the PI for the simulated models output over the validation data set, the performance of the model can be evaluated where the best model has maximum PI with validation data.
4.1. Example 1: Magnetic Levitation System
The equation of motion for this system stated in  iswhere is the displacement of the magnet ball, is the current that controls the electromagnet flux, is the mass of the magnet ball, and is the gravitational constant. The factor is a tacky friction parameter (damping constant), and is a field power constant. These parameters can be found in . The magnetic levitation system data was collected at a sampling interval of 0.01 seconds to form the input and output time series data set shown in the Figure 2.
The simulations of WN and NN on the system are carried out and the results of the models are shown in Figure 3. The differences is notable and the PI performances of WN reached 97.50% which means the capability of capturing the plant dynamics is much more than NN that its PI reached 87.20%.
On the other hand, the root mean square error (RMSE) curves of the training phase for both models are shown in Figure 4. It is clear that the convergence capability of the WN over NN is much better in convergence than NN. Note that this system has fast dynamic response, yet WN was capable of extracting plant dynamics correctly due to the capability of the wavelet localization property by means of both translation and dilation parameters.
4.2. Example 2: Continuous Stirred Tank Reactor (CSTR)
The dynamics of the CSTR system  can be described bywhere is the fluid level, is the concentration result at the output, is the flow rate of first liquid feed source with concentration , and is the flow rate of the second feed with concentrations . The CSRT data sets are obtained preparatory for both training and testing as shown in Figure 5, where each sample time is considered as 0.2 seconds.
However, the simulation result displayed WN much better than NN as in Figure 6, where the PI was 91.66% for WN while NN showed 80.13%. Note that NN suffers from sharp discontinuity ends, while WNs with decompositions properties showed were able to overcome the sharp discontinuities with better responses.
The learning evaluation in terms of RMSE for both NN and WN models of the CSTR can be shown in Figure 7. It is clear that the capability of WN is superior to that of NN in terms of fast convergence.
4.3. Example 3: Robot Arm
The system consists of a single-link robot arm that controls the movement of the arm . The equation of motion dynamics iswhere is the angle of the arm and is the DC motor torque. To obtain input and output data sets, different torque levels are applied to get different arm angles of total 8000 samples and Figure 8 shows 4000 samples for the training data set for learning process.
However, the simulation results for robot arm system in Figure 9 showed that PI for WN still better than NN, where PI with WN reached 89.79% while NN was 79.37%.
On the other hand, the training RMSE curves are shown in Figure 10. For robot arm system, it is obvious that the NN curve declines gradually with nonsmooth convergence, while the WN curve was stable from the beginning and much better with RMSE value.
From all the three examples, it is clear that OSELM with WN has a jumping effect of convergence and much smoothness over NN and this can be explained in this way; since the sequential implementation of the least squares solution is applied here, then all the convergence results of recursive least squares (RLS) solutions can be applied also , and this means that the distribution of eigenvalues of has a variable impact on the speed of convergence. In NN case, eliminating eigenvalue spread problem cannot be implemented due to lack of knowledge of due to random initialization of parameters which is not in the case of WN that the way of choosing its initial parameters is known. Therefore, a priori chosen initial parameters depending on the available data set will be very useful to enhance the OSELM convergence performance. Note that, in case of NN, the best of 50 trails (this is stated in ) are selected for the comparison process. Table 3 shows the elapsed time in seconds of the learning process for both WN and NN. In all simulation tests the WN showed faster learning process over NN, and this can be explained by the proper initializing of the WN parameters which enabled to converge faster even when WN both input and hidden node numbers was larger than NN as in case of robot arm system. However, it important to mention that the initialization process time of the WN parameters is higher than the NN initialization process time due to the recursive method of WN and random process of NN.
Wavelet network is proposed with OSELM to be used as nonlinear system identification scheme for control purposes in many industrial applications where sequential learning algorithms do not require retraining whenever new data is received. The initialization of wavelet network parameters is done using density function and recursive algorithm. The numbers of input and hidden nodes are selected using SQFS and FPEC, respectively; this is different from the approach introduced in  in which the numbers of inputs and hidden nodes were fixed. The simulations are carried out on three nonlinear dynamic systems and compared with SLFN-OSELM. From the results, it was found that the initialization procedure of the networks parameters plays a big role in the OSELM performance and consequently in the generated model performance. In other words, there was instability in the training phase of NN because of random initialization of the network parameters, while the training curve of WN was smooth and stable due to analytical initialization method of the wavelet mother function parameters. Moreover, the identified models by WN showed a better modeling accuracy over NN. The results overall showed the superiority of WN over NN in modelling performance and fast learning ability which realize that this work provides a very short learning time and better accuracy in many tests. In other words, the proposed algorithm can provide solutions for real-time varying system identification in some critical applications.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
G. Zhao, Z. Shen, C. Miao, and Z. Man, “On improving the conditioning of extreme learning machine: a linear case,” in Proceedings of the 7th International Conference on Information, Communications and Signal Processing (ICICS '09), Piscataway, NJ, USA, December 2009.View at: Publisher Site | Google Scholar
D. M. Salih, S. B. M. Noor, M. H. Marhaban, and R. M. K. R. Ahmad, “Wavelet network based online sequential extreme learning machine for dynamic system modeling,” in Proceedings of the 9th Asian Control Conference (ASCC '13), pp. 1–5, Istanbul, Turkey, June 2013.View at: Publisher Site | Google Scholar
B. Zhou, A. Shi, F. Cai, and Y. Zhang, “Wavelet neural networks for nonlinear time series analysis,” in Advances in Neural Networks—ISNN 2004, vol. 3174 of Lecture Notes in Computer Science, pp. 430–435, Springer, 2004.View at: Google Scholar
D. Nguyen and B. Widrow, “Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '90), pp. 21–26, San Diego, Calif, USA, June 1990.View at: Google Scholar
L. Ljung, System Identification Theory for the User, Prentice-Hall, Upper Saddle River, NJ, USA, 1999.
C. L. Lawson and R. J. Hanson, Solving Least Squares Problems, Prentice-Hall, New York, NY, USA, 1974.View at: MathSciNet
H. Mikulascaron and M. Kamensky, “Constrained magnetic levitation control,” in Proceedings of the 16th IFAC World Congress, Prague, Czech Republic, 2005.View at: Google Scholar
K. Ankit, B. Mohit, and N. V. Paras, “Implementation of neural model predictive control in continuous stirred tank reactor system,” International Journal of Scientific & Engineering Research, vol. 4, no. 6, pp. 1989–1996, 2013.View at: Google Scholar
M. Vaezi and M. A. Nekouie, “Adaptive control of a robotic arm using neural networks based approach,” International Journal of Robotics and Automation, vol. 1, no. 5, pp. 87–99, 2010.View at: Google Scholar