International Scholarly Research Notices

International Scholarly Research Notices / 2011 / Article

Research Article | Open Access

Volume 2011 |Article ID 145801 | https://doi.org/10.5402/2011/145801

Ahmad Banakar, "Lyapunov Stability Analysis of Gradient Descent-Learning Algorithm in Network Training", International Scholarly Research Notices, vol. 2011, Article ID 145801, 12 pages, 2011. https://doi.org/10.5402/2011/145801

Lyapunov Stability Analysis of Gradient Descent-Learning Algorithm in Network Training

Academic Editor: L. Simoni
Received17 Mar 2011
Accepted13 May 2011
Published05 Jul 2011

Abstract

The Lyapunov stability theorem is applied to guarantee the convergence and stability of the learning algorithm for several networks. Gradient descent learning algorithm and its developed algorithms are one of the most useful learning algorithms in developing the networks. To guarantee the stability and convergence of the learning process, the upper bound of the learning rates should be investigated. Here, the Lyapunov stability theorem was developed and applied to several networks in order to guaranty the stability of the learning algorithm.

1. Introduction

Science has evolved from an attempt to understand and predict the behavior of the universe and the systems within it. Much of this owes to the development of suitable models, which agree with the observations. These models are either in a symbolic form which the humans use or in mathematical form that are found from physical laws. Most systems are causal, which can be categorized as either static, where the output depends on the current inputs, or dynamic, where the output depends on not only the current inputs but also past inputs and outputs. Many systems also possess unobservable inputs, which cannot be measured, but affect the system’s output, that is, time series systems. These inputs are known as disturbances and aggravate the modeling process.

To cope with the complexity of dynamic systems, there have been significant developments in the field of artificial neural network during last three decades which have been applied for identification and modeling [1–5]. One major issue that instigates for proposing these different types of networks is to predict the dynamic behavior of many complex systems existing in nature. ANN is a powerful method in approximating a nonlinear system and mapping between input and output data [1]. Recently, wavelet neural networks (WNNs) have been introduced [6–10]. Such types of networks employ wavelets as the activation function in a hidden layer. Because of the ability of the localized analysis of wavelets collectively in their frequency and time domains and the learning ability of ANN, the WNN prompts a superior system model for complex and seismic applications. The majority of the applications of wavelet function are limited to a small dimension [11] although WNN can handle large-dimension problems as well [6]. Due to the dynamic behavior of recurrent network, they are suitable in dealing with the modeling of dynamic systems as compared with the static behavior of feed-forward network [12–19]. It has already been shown that recurrent networks are less sensitive to noise with relatively smaller network size and simpler structure. Their long-term prediction property makes them more powerful in dealing with dynamic systems. Recurrent networks are less sensitive to noise because the recurrent network could recognize and generate periodic waves in spite of the existence of a large amount of noise. This means that the network is able to regenerate the original periodic waves in the process of learning the teachers' signals with noises [2]. For unknown dynamic systems, the recurrent network results in a smaller-sized network as compared with the feed-forward network [12, 20]. For the time-series modeling, it generates a simpler structure [15–23] and gives long-term predictions [22, 24]. The recurrent network for system modeling learns and memorizes information in terms of embedded weights [21].

Different methods have been introduced for learning the parameters onnetwork based of the gradient descent. All learning methods like backpropagation-through-time [16, 17] or real-time recurrent learning algorithm [18] can be applied in order to adjust parameters of the feed-forward or recurrent networks. In [19], the quasi-Newton method was applied to improve the rate of convergence. In [9, 23], using the Lyapunov stability theorem, a mathematical way was introduced for calculating the upper bound of the learning rate for recurrent and feed-forward wavelet neural network based on the network parameters. Here, the Lyapunov stability theorem is developed and applied to several networks, and the learning procedure of the proposed networks is considered.

2. Methodology

2.1. Gradient-Descent Algorithm

The Gradient-descent (GD) learning can be achieved by minimizing the performance index 𝐽 as follows:1𝐽=2⋅𝑃⋅𝑦2π‘Ÿβ‹…π‘ƒξ“π‘=1ξ‚€ξξ‚π‘Œ(𝑝)βˆ’π‘Œ(𝑝)2,(2.1) where π‘¦π‘Ÿ=(max𝑃𝑝=1π‘Œ(𝑝)βˆ’min𝑃𝑝=1π‘Œ(𝑝)), ξπ‘Œ is the output of the known network, π‘Œ is the actual data, and 𝑃 is the number of dataset. The reason for using a normalized mean square error is that it provides a universal platform for modeling evaluation irrespective of the application and target value specification while selecting an input to the model.

In the batch-learning scheme employing the 𝑃-data set, achange in any parameter is covered by the following equation:Ξ”πœ(π‘ž)=𝑃𝑝=1Ξ”π‘πœ(π‘ž),(2.2) and the parametric update equation is𝜐(π‘ž+1)=𝜐(π‘ž)+πœ•π½πœ•πœ.(2.3)

2.2. Lyapunov Method in Analysis of Stability

Consider a dynamic system, which satisfies𝑑̇π‘₯=𝑓(π‘₯,𝑑),π‘₯0ξ€Έ=π‘₯0,π‘₯βˆˆπ‘….(2.4)

The equilibrium point π‘₯βˆ—=0 is stable (in the sense of Lyapunov) at 𝑑=𝑑0 if for any πœ€>0 there exists a 𝛿(𝑑0,πœ€)>0 such that β€–β€–π‘₯𝑑0ξ€Έβ€–β€–<π›ΏβŸΉβ€–π‘₯(𝑑)β€–<πœ€,βˆ€π‘‘β‰₯𝑑0.(2.5)

Lyapunov Stability Theorem
Let 𝑉(π‘₯,𝑑) be a nonnegative function with the derivative ̇𝑉 along the trajectories of the system. Then(i)The origin of the system is locally stable (in the sense of Lyapunov) if 𝑉(π‘₯,𝑑) is locally positive definite and βˆ’Μ‡π‘‰(π‘₯,𝑑)≀0 is locally in π‘₯ and for all 𝑑;(ii)The origin of the system is globally uniformly asymptotically stable if 𝑉(π‘₯,𝑑) is positive definite and excrescent and βˆ’Μ‡π‘‰(π‘₯,𝑑) is positive definite.
To approve stability analysis of the networks based on GD learning algorithm, we can define discreet function as 1𝑉(π‘˜)=𝐸(π‘˜)=2β‹…[]𝑒(π‘˜)2.(2.6) Change of Lyapunov function is 1Δ𝑉(π‘˜)=𝑉(π‘˜+1)βˆ’π‘‰(π‘˜)=2⋅𝑒2(π‘˜+1)βˆ’π‘’2ξ€»(π‘˜).(2.7)from𝑒(π‘˜+1)=𝑒(π‘˜)+Δ𝑒(π‘˜)βŸΉπ‘’2(π‘˜+1)=𝑒2(π‘˜)+Ξ”2𝑒(π‘˜)+2⋅𝑒(π‘˜)⋅Δ𝑒(π‘˜).(2.8) Then 1Δ𝑉(π‘˜)=Δ𝑒(π‘˜)⋅𝑒(π‘˜)+2⋅Δ𝑒(π‘˜).(2.9) Difference of error is Δ𝑒(π‘˜)=𝑒(π‘˜+1)βˆ’π‘’(π‘˜)β‰ˆπœ•π‘’(π‘˜)ξ‚Ήπœ•πœπ‘‡β‹…Ξ”πœ,(2.10) where 𝜐 is the learning parameter and 𝑒(π‘˜)=̂𝑦(π‘˜)βˆ’π‘¦(π‘˜) is error between output of plant and present output of network Ξ”πœ=βˆ’πœ‚β‹…πœ•π½πœ•πœ.(2.11)
By using (2.10) and (2.1) and putting them in (2.3), Δ𝑉(π‘˜)=πœ•π‘’(π‘˜)ξ‚Ήπœ•πœπ‘‡ξƒ―1β‹…Ξ”πœβ‹…π‘’(π‘˜)+2β‹…ξ‚Έπœ•π‘’(π‘˜)ξ‚Ήπœ•πœπ‘‡ξƒ°,ξ‚Έβ‹…Ξ”πœΞ”π‘‰(π‘˜)=πœ•π‘’(π‘˜)ξ‚Ήπœ•πœπ‘‡β‹…ξ‚΅βˆ’πœ‚β‹…πœ•πΈ(π‘˜)ξ‚Άβ‹…ξƒ―1πœ•πœπ‘’(π‘˜)+2β‹…ξ‚Έπœ•π‘’(π‘˜)ξ‚Ήπœ•πœπ‘‡β‹…ξ‚΅βˆ’πœ‚β‹…πœ•πΈ(π‘˜)ξ‚Άξƒ°,ξ‚Έπœ•πœΞ”π‘‰(π‘˜)=πœ•π‘’(π‘˜)ξ‚Ήπœ•πœπ‘‡1β‹…(βˆ’πœ‚)⋅𝑃⋅𝑦2π‘Ÿβ‹…π‘’(π‘˜)β‹…πœ•Μ‚π‘¦(π‘˜)β‹…ξƒ―1πœ•πœπ‘’(π‘˜)+2β‹…ξ‚Έπœ•π‘’(π‘˜)ξ‚Ήπœ•πœπ‘‡1β‹…(βˆ’πœ‚)⋅𝑃⋅𝑦2π‘Ÿβ‹…π‘’(π‘˜)β‹…πœ•Μ‚π‘¦(π‘˜)ξƒ°,πœ•πœΞ”π‘‰(π‘˜)=𝑒2ξƒ―βˆ’ξ‚Έ(π‘˜)β‹…πœ•Μ‚π‘¦(π‘˜)ξ‚Ήπœ•πœπ‘‡1β‹…πœ‚β‹…π‘ƒβ‹…π‘¦2π‘Ÿβ‹…πœ•Μ‚π‘¦(π‘˜)+1πœ•πœ2β‹…ξ‚Έπœ•Μ‚π‘¦(π‘˜)ξ‚Ήπœ•πœπ‘‡β‹…ξ‚Έπœ•Μ‚π‘¦(π‘˜)ξ‚Ήπœ•πœπ‘‡β‹…πœ‚2β‹…1𝑃⋅𝑦2π‘Ÿξ€Έ2β‹…ξ‚΅πœ•Μ‚π‘¦(π‘˜)ξ‚Άπœ•πœ2Δ𝑉(π‘˜)=βˆ’π‘’21(π‘˜)β‹…2β‹…πœ‚π‘ƒβ‹…π‘¦2π‘Ÿβ‹…ξ‚΅πœ•Μ‚π‘¦(π‘˜)ξ‚Άπœ•πœ2β‹…ξƒ―πœ‚2βˆ’π‘ƒβ‹…π‘¦2π‘Ÿβ‹…ξ‚΅πœ•Μ‚π‘¦(π‘˜)ξ‚Άπœ•πœ2ξƒ°,(2.12) where π‘¦π‘Ÿ=(max𝑃𝑝=1𝑦(𝑝)βˆ’min𝑃𝑝=1𝑦(𝑝)).
Therefore Δ𝑉(π‘˜)=βˆ’πœ†β‹…π‘’2(π‘˜),(2.13) where πœ†=(1/2)β‹…(πœ‚/(𝑃⋅𝑦2π‘Ÿ))β‹…(πœ•Μ‚π‘¦(π‘˜)/πœ•πœ)2β‹…{2βˆ’(πœ‚/(𝑃⋅𝑦2π‘Ÿ))β‹…(πœ•Μ‚π‘¦(π‘˜)/πœ•πœ)2}.
From the Lyapunov stability theorem, the stability is guaranteed if 𝑉(π‘˜) is positive and 𝑉(π‘˜) is negative. From (2.6), 𝑉(π‘˜) is already positive. The condition of stability depends on 𝑉(π‘˜) being negative. Therefore, πœ†>0 is considered for all models.
Because (1/2)β‹…(πœ‚/(𝑃⋅𝑦2π‘Ÿ))β‹…(πœ•Μ‚π‘¦(π‘˜)/πœ•πœ)2>0, then the convergence condition is limited to πœ‚2βˆ’π‘ƒβ‹…π‘¦2π‘Ÿβ‹…ξ‚΅πœ•Μ‚π‘¦(π‘˜)ξ‚Άπœ•πœ2πœ‚>0βŸΉπ‘ƒβ‹…π‘¦2π‘Ÿβ‹…ξ‚΅πœ•Μ‚π‘¦(π‘˜)ξ‚Άπœ•πœ2ξ€·<2βŸΉπœ‚<2⋅𝑃⋅𝑦2π‘Ÿξ€Έ(πœ•Μ‚π‘¦(π‘˜)/πœ•πœ)2.(2.14) The maximum learning rate πœ‚ changes in a fixed range. Since 2⋅𝑃⋅𝑦2π‘Ÿ does not depend on the model, the value of πœ‚Max guarantees that the convergence can be found by minimizing the term of |πœ•Μ‚π‘¦(π‘˜)/πœ•πœπΌ|. Therefore, 0<πœ‚<πœ‚Max,(2.15) where πœ‚Max=(2⋅𝑃⋅𝑦2π‘Ÿ)/Max(πœ•Μ‚π‘¦(π‘˜)/πœ•πœ)2.

3. Experimental Results

In this section, the proposed stability analysis is applied for some networks. The selected networks are neurofuzzy (ANFIA) [25, 26], Wavelet neurofuzzy, and recurrent wavelet network.

3.1. Example 1: Convergence Theorems of the TSK Neurofuzzy Model

TSKmodel has a linear or nonlinear relationship of inputs π‘€π‘š(𝑋) in the output space. The rules of TSK model are in the following way:π‘…π‘šβˆΆif𝐱is𝐀mthen𝑦isπ‘€π‘š(𝑋).(3.1) A linear form of π‘€π‘š(𝑋) in (3.1) is as follows:π‘€π‘š(𝑋)=π‘€π‘š0+π‘€π‘š1π‘₯1+β‹―+π‘€π‘šπ‘›π‘₯𝑛.(3.2) By taking the Gaussian membership function and an equal number of fuzzy sets to the rules with respect to the inputs, the firing strength of rules (3.1) can be written asπœ‡π΄π‘š(𝐱)=𝑛𝑖=1ξƒ©βˆ’ξ‚΅π‘₯expπ‘–βˆ’π‘₯π‘šπ‘–πœŽπ‘šπ‘–ξ‚Ά2ξƒͺ,(3.3) where π‘₯π‘šπ‘– and πœŽπ‘šπ‘– are the center and standard deviation of the Gaussian membership functions, respectively. By applying the T-norm (product operator) of the membership functions of the premise parts of the rule and the weighted average gravity method for de-fuzzification, the output of the TSK model can be defined asξβˆ‘π‘Œ=π‘€π‘š=1πœ‡π΄π‘š(𝐱)β‹…π‘€π‘š(𝐱)βˆ‘π‘€π‘š=1πœ‡π΄π‘š(𝐱).(3.4)

Theorem 3.1. The asymptotic learning convergence of TSK neurofuzzy is guaranteed if the learning rate for different learning parameters follows the upper bound as will be mentioned below: 0<πœ‚π‘€<2⋅𝑃⋅y2r,0<πœ‚πœŽ<2⋅𝑃⋅𝑦2π‘Ÿmaxπ‘š||||𝑀(X)2β‹…ξ€·2/𝜎3minξ€Έ2,0<πœ‚π‘₯<2⋅𝑃⋅𝑦2π‘Ÿmaxπ‘š||||𝑀(X)2β‹…ξ€·2/𝜎2minξ€Έ2.(3.5)

Proof. In equation (2.15) for neurofuzzy models can be written as 0<πœ‚πœ<2⋅𝑃⋅𝑦2π‘Ÿ||πœ•ξπ‘ŒNF||/πœ•πœ2max.(3.6) Because π›½π‘š=πœ‡π΄π‘šβˆ‘(𝐗)/π‘€π‘š=1πœ‡π΄π‘š(𝐗)≀1 for all π‘š and since local models have same variables, that is, 𝐗, therefore, from (3.7), (3.5) easily can be derived πœ•ξπ‘ŒNFπœ•π‘€π‘š0=π›½π‘š,πœ•ξπ‘ŒNFπœ•π‘€π‘šπ‘–=π‘₯π‘–β‹…π›½π‘š,πœ•ξπ‘ŒNFπœ•π‘₯π‘šπ‘–=π‘€π‘šξ€·π—ξ…žξ€Έβ‹…π›½π‘šπœ‡π΄π‘šβ‹…ξ€·1βˆ’π›½π‘šξ€Έβ‹…ξ€·π‘₯2β‹…π‘–βˆ’π‘₯π‘šπ‘–ξ€ΈπœŽ2π‘šπ‘–,πœ•ξπ‘ŒNFπœ•πœŽπ‘šπ‘–=π‘€π‘šξ€·π—ξ…žξ€Έβ‹…π›½π‘šπœ‡π΄π‘šβ‹…ξ€·1βˆ’π›½π‘šξ€Έβ‹…ξ€·π‘₯2β‹…π‘–βˆ’π‘₯π‘šπ‘–ξ€Έ2𝜎3π‘šπ‘–.(3.7)

3.2. Example 2: Convergence Theorems of Recurrent Wavelet Neuron Models

Each neuron model in the proposed recurrent neuron models is summation or multiplication of Sigmoid Activation Function (SAF) and Wavelet Activation Function (WAF) as shown in Figure 1. Morlet wavelet function is considered in the recurrent models. In the series of developing different recurrent networks and neuron models, the proposed neurons’ model is used in a one-hidden-layer feed-forward neural network as shown in Figure 2.

The output of feed-forward network is given in   the following equation:ξπ‘ŒWNN=𝐿𝑙=1π‘Šπ‘™β‹…π‘¦π‘™,(3.8) where 𝑦𝑙 is the output of S-W neurons, π‘Šπ‘™ is the weights between hidden neuron and output neurons, and 𝐿 is the number of hidden neuron,𝑦𝑗(π‘˜)=π‘¦πœƒπ‘—(π‘˜)+π‘¦πœ“π‘—(π‘˜).(3.9)

The functions π‘¦πœƒπ‘— and π‘¦πœ“π‘— are output of SAF and WAF for 𝑗th S-W neuron, in the hidden layer, respectively. The functions π‘¦πœƒπ‘— and π‘¦πœ“π‘— are expressed as follow.π‘¦πœƒπ‘—ξƒ©(π‘˜)=πœƒπ‘›ξ“π‘–=1𝐢𝑗𝑆𝑖⋅π‘₯𝑖ξƒͺ,𝑦(π‘˜)πœ“π‘—ξƒ©(π‘˜)=πœ“π‘›ξ“π‘–=1πΆπ‘—π‘Šπ‘–β‹…π‘₯𝑖ξƒͺ.(π‘˜)(3.10)π‘₯𝑖 is 𝑖th input. 𝐢𝑆 and πΆπ‘Š are weights to input signal for SAF and WAF, in each hidden neuron, respectively.

To prove convergence of the recurrent networks, these facts are needed:Fact 1: let 𝑔(𝑦)=𝑦𝑒(βˆ’π‘¦2). Then |𝑔(𝑦)|<1,forallπ‘¦βˆˆβ„œ.Fact 2: let 𝑓(𝑦)=𝑦2𝑒(βˆ’π‘¦2). Then |𝑓(𝑦)|<1,forallπ‘¦βˆˆβ„œ.Fact 3: let πœƒ(𝑦)=1/(1+π‘’βˆ’π‘¦)be a sigmoid function. Then |πœƒ(𝑦)|<1,forallπ‘¦βˆˆβ„œFact 4: let πœ“π‘Ž,𝑏(𝑦)=π‘’βˆ’((π‘¦βˆ’π‘)/π‘Ž)2cos(5((π‘¦βˆ’π‘)/π‘Ž)) be a Morlet wavelet function. Then |πœ“π‘Ž,𝑏(𝑦)|<1,forall𝑦,π‘Ž,π‘βˆˆβ„œ.

(a) Summation Sigmoid-Recurrent Wavelet
Suppose βˆ‘π‘=𝑛𝑖=1𝐢𝑗𝑆𝑖⋅π‘₯𝑖(π‘˜) and βˆ‘π‘†=𝑛𝑖=1πΆπ‘—π‘Šπ‘–β‹…π‘₯𝑖(π‘˜)+π‘„π‘—π‘Šβ‹…π‘¦π‘—πœ“(π‘˜βˆ’1).
From the facts 3 and 4: For parameter π‘Š in all models πœ•Μ‚π‘¦πœ•π‘Šπ‘—=𝑦𝑗<||π‘¦π‘—πœ“+π‘¦π‘—πœƒ||<1+1=2.(3.11) Therefore 0<πœ‚π‘Š<(2⋅𝑃⋅𝑦2π‘Ÿ)/22=(𝑃⋅𝑦2π‘Ÿ)/2.
Differential of output of the model for another learning parameter is πœ•Μ‚π‘¦(π‘˜)πœ•πΆπ‘—π‘Šπ‘–=π‘₯𝑖(π‘˜)β‹…π‘Šπ‘—β‹…πœ“ξ…žξƒ©π‘›ξ“π‘–=1πΆπ‘—π‘Šπ‘–β‹…π‘₯𝑖(π‘˜)+π‘„π‘—π‘Šβ‹…π‘¦π‘—πœ“ξƒͺ|||(π‘˜βˆ’1)<1β‹…1β‹…βˆ’2π‘Žβ‹…π‘†βˆ’π‘π‘Žβ‹…π‘’βˆ’((π‘†βˆ’π‘)/π‘Ž)2ξ‚€5β‹…cosπ‘†βˆ’π‘π‘Žξ‚βˆ’π‘’βˆ’((π‘†βˆ’π‘)/π‘Ž)2β‹…5π‘Žξ‚€5β‹…sinπ‘†βˆ’π‘π‘Žξ‚|||<ξ‚»2π‘Žmin5β‹…1β‹…1+π‘Žminξ‚Όβ‹…1<7.(3.12) Therefore, 0<πœ‚πΆπ‘Š<(2⋅𝑃⋅𝑦2π‘Ÿ)/72=(2⋅𝑃⋅𝑦2π‘Ÿ)/49πœ•Μ‚π‘¦(π‘˜)πœ•πΆπ‘—π‘†π‘–=π‘₯𝑖(π‘˜)β‹…π‘Šπ‘–β‹…πœƒξ…žξƒ©π‘›ξ“π‘–=1𝐢𝑗𝑆𝑖⋅π‘₯𝑖ξƒͺ(π‘˜)<1β‹…1β‹…πœƒ(𝑧)β‹…(1βˆ’πœƒ(𝑧))<1β‹…1=1.(3.13) Therefore 0<πœ‚πΆπ‘†<(2⋅𝑃⋅𝑦2π‘Ÿ)/12=2⋅𝑃⋅𝑦2π‘Ÿπœ•Μ‚π‘¦(π‘˜)πœ•π‘„π‘—π‘Š=π‘Šπ‘—β‹…π‘¦π‘—πœ“(π‘˜βˆ’1)β‹…πœ“ξ…žξƒ©π‘›ξ“π‘–=1πΆπ‘—π‘Šπ‘–β‹…π‘₯𝑖(π‘˜)+π‘„π‘—π‘Šβ‹…π‘¦π‘—πœ“ξƒͺ|||(π‘˜βˆ’1)<1β‹…1β‹…βˆ’2π‘Žβ‹…π‘†βˆ’π‘π‘Žβ‹…π‘’βˆ’((π‘†βˆ’π‘)/π‘Ž)2ξ‚€5β‹…cosπ‘†βˆ’π‘π‘Žξ‚βˆ’π‘’βˆ’((π‘†βˆ’π‘)/π‘Ž)2β‹…5π‘Žξ‚€5β‹…sinπ‘†βˆ’π‘π‘Žξ‚|||<ξ‚»2π‘Žmin5β‹…1β‹…1+π‘Žminξ‚Όβ‹…1<7.(3.14) Therefore, 0<πœ‚π‘„π‘Š<(2⋅𝑃⋅𝑦2π‘Ÿ)/72=(2⋅𝑃⋅𝑦2π‘Ÿ)/49.

(b) Multiplication Sigmoid-Recurrent Wavelet
From facts 3 and 4 suppose βˆ‘π‘=𝑛𝑖=1𝐢𝑗𝑆𝑖⋅π‘₯𝑖(π‘˜) and βˆ‘π‘†=𝑛𝑖=1πΆπ‘—π‘Šπ‘–β‹…π‘₯𝑖(π‘˜)+π‘„π‘—π‘Šβ‹…π‘¦π‘—πœ“(π‘˜βˆ’1).
For parameter π‘Š in all networks: πœ•Μ‚π‘¦πœ•π‘Šπ‘—=𝑦𝑗=π‘¦π‘—πœ“β‹…π‘¦π‘—πœƒ<1β‹…1<1.(3.15) Therefore, 0<πœ‚π‘Š<(2⋅𝑃⋅𝑦2π‘Ÿ)/1<2⋅𝑃⋅𝑦2π‘Ÿπœ•Μ‚π‘¦(π‘˜)πœ•πΆπ‘—π‘Šπ‘–=π‘₯𝑖(π‘˜)β‹…π‘Šπ‘—ξƒ©β‹…πœƒπ‘›ξ“π‘–=1𝐢𝑗𝑆𝑖⋅π‘₯𝑖ξƒͺ(π‘˜)β‹…πœ“ξ…žξƒ©π‘›ξ“π‘–=1πΆπ‘—π‘Šπ‘–β‹…π‘₯𝑖(π‘˜)+π‘„π‘—π‘Šβ‹…π‘¦π‘—πœ“ξƒͺ|||(π‘˜βˆ’1)<1β‹…1β‹…1β‹…βˆ’2π‘Žβ‹…π‘†βˆ’π‘π‘Žβ‹…π‘’βˆ’((π‘†βˆ’π‘)/π‘Ž)2ξ‚€5β‹…cosπ‘†βˆ’π‘π‘Žξ‚βˆ’π‘’βˆ’((π‘†βˆ’π‘)/π‘Ž)2β‹…5π‘Žξ‚€5β‹…sinπ‘†βˆ’π‘π‘Žξ‚|||<ξ‚»2π‘Žmin5β‹…1β‹…1+π‘Žminξ‚Όβ‹…1<7.(3.16) Therefore, 0<πœ‚πΆπ‘Š<(2⋅𝑃⋅𝑦2π‘Ÿ)/(7)2=(2⋅𝑃⋅𝑦2π‘Ÿ)/49πœ•Μ‚π‘¦(π‘˜)πœ•πΆπ‘—π‘†π‘–=π‘₯𝑖(π‘˜)β‹…π‘Šπ‘—β‹…πœƒξ…žξƒ©π‘›ξ“π‘–=1𝐢𝑗𝑆𝑖⋅π‘₯𝑖ξƒͺ(π‘˜)β‹…πœ“π‘›ξ“π‘–=1πΆπ‘—π‘Šπ‘–β‹…π‘₯𝑖(π‘˜)+π‘„π‘—π‘Šβ‹…π‘¦π‘—πœ“ξƒͺ(π‘˜βˆ’1)<1β‹…1β‹…πœƒ(𝑍)β‹…(1βˆ’πœƒ(𝑍))β‹…1<1β‹…1<1.(3.17) Therefore, 0<πœ‚πΆπ‘†<(2⋅𝑃⋅𝑦2π‘Ÿ)/(1)2=2⋅𝑃⋅𝑦2π‘Ÿπœ•Μ‚π‘¦(π‘˜)πœ•π‘„π‘—π‘Š=π‘Šπ‘—β‹…π‘¦π‘—πœ“ξƒ©(π‘˜βˆ’1)β‹…πœƒπ‘›ξ“π‘–=1𝐢𝑗𝑆𝑖⋅π‘₯𝑖ξƒͺ(π‘˜)β‹…πœ“ξ…žξƒ©π‘›ξ“π‘–=1πΆπ‘—π‘Šπ‘–β‹…π‘₯𝑖(π‘˜)+π‘„π‘—π‘Šβ‹…π‘¦π‘—πœ“ξƒͺ|||(π‘˜βˆ’1)<1β‹…1β‹…1β‹…βˆ’2π‘Žβ‹…π‘†βˆ’π‘π‘Žβ‹…π‘’βˆ’((π‘†βˆ’π‘)/π‘Ž)2ξ‚€5β‹…cosπ‘†βˆ’π‘π‘Žξ‚βˆ’π‘’βˆ’((π‘†βˆ’π‘)/π‘Ž)2β‹…5π‘Žξ‚€5β‹…sinπ‘†βˆ’π‘π‘Žξ‚|||<ξ‚»2π‘Žmin5β‹…1β‹…1+π‘Žminξ‚Όβ‹…1<7.(3.18) Therefore, 0<πœ‚π‘„π‘Š<(2⋅𝑃⋅𝑦2π‘Ÿ)/(7)2=(2⋅𝑃⋅𝑦2π‘Ÿ)/49

3.3. Example 3: Convergence Theorems of the Wavelet Nuro-Fuzzy (WNF) Model

The consequent part of each fuzzy rule corresponds to a sub-WNN consisting of wavelet with the specified dilation value, where, in the TSK fuzzy model, a linear function of inputs is used while π‘€π‘šξπ‘Œ(𝑋)=WNNπ‘š. Figure 1 shows the proposed WNN model which uses a combination of sigmoid and wavelet activation functions as a hidden neuron (Figure 2 without recurrent part) in the consequent part of each fuzzy rule.

Theorem 3.2. The asymptotic learning convergence is guaranteed if the learning rate for different learning parameters follows the upper bound as will be mentioned below: 0<πœ‚πœŽ<2⋅𝑃⋅𝑦2π‘Ÿ||ξπ‘ŒWNN||2maxβ‹…ξ€·2/𝜎3minξ€Έ2,0<πœ‚π‘₯<2⋅𝑃⋅𝑦2π‘Ÿ||ξπ‘ŒWNN||2maxβ‹…ξ€·2/𝜎2minξ€Έ2,0<πœ‚π‘€<2⋅𝑃⋅𝑦2π‘Ÿ||πœ•ξπ‘ŒWNN||/πœ•π‘€2max,0<πœ‚πΆπ‘†<2⋅𝑃⋅𝑦2π‘Ÿ||πœ•ξπ‘ŒWNN/πœ•πΆπ‘†||2max,0<πœ‚πΆπ‘Š<2⋅𝑃⋅𝑦2π‘Ÿ||πœ•ξπ‘ŒWNN/πœ•πΆπ‘Š||2max,(3.19)

where πœ‚π‘€, πœ‚πΆπ‘, or πœ‚πΆπ‘Š and πœ‚πœŽ or πœ‚π‘₯ are the parameters’ learning rates of the consequent and the premise parts of the fuzzy rules. 𝐢𝑆 and πΆπ‘Š are weights to inputs, signal for sigmoid and wavelet activation functions of local WNNs, in each hidden neuron, respectively. π‘₯π‘š and πœŽπ‘š are the center and standard deviation of the Gaussian membership functions of rule number m in WNF model, respectively.

Proof. In equation (2.15) for WNF models can be written as 0<πœ‚πœ<2⋅𝑃⋅𝑦2π‘Ÿ||πœ•ξπ‘ŒWNF||/πœ•πœ2max,πœ•ξπ‘ŒWNFπœ•π‘€=π›½π‘šβ‹…πœ•ξπ‘ŒWNNπ‘š,πœ•ξπ‘Œπœ•π‘€WNFπœ•πΆπ‘=π›½π‘šβ‹…πœ•ξπ‘ŒWNNπ‘šπœ•πΆπ‘,πœ•ξπ‘ŒWNFπœ•πΆπ‘Š=π›½π‘šβ‹…πœ•ξπ‘ŒWNNπ‘šπœ•πΆπ‘Š.(3.20) Because π›½π‘š=πœ‡π΄π‘šβˆ‘(𝐗)/π‘€π‘š=1πœ‡π΄π‘š(𝐗)≀1 for all π‘š, therefore (3.13) to (3.15) are easily derived.
From (2.15) and (3.4) for parameters 𝜎 or π‘₯,there is πœ•ξπ‘ŒWNF=ξπ‘Œπœ•πœŽWNNπ‘šβ‹…π›½π‘šπœ‡π΄π‘šβ‹…ξ€·1βˆ’π›½π‘šξ€Έβ‹…ξ€·π‘₯2β‹…π‘–βˆ’π‘₯π‘šπ‘–ξ€Έ2𝜎3π‘šπ‘–=ξπ‘ŒWNNπ‘šβ‹…ξ€·1βˆ’π›½π‘šξ€Έβˆ‘π‘€π‘š=1πœ‡π΄π‘šβ‹…ξ€·π‘₯2β‹…π‘–βˆ’π‘₯π‘šπ‘–ξ€Έ2𝜎3π‘šπ‘–,πœ•ξπ‘ŒWNFπœ•π‘₯=ξπ‘ŒWNNπ‘šβ‹…π›½π‘šπœ‡π΄π‘šβ‹…ξ€·1βˆ’π›½π‘šξ€Έβ‹…ξ€·π‘₯2β‹…π‘–βˆ’π‘₯π‘šπ‘–ξ€ΈπœŽ2π‘šπ‘–=ξπ‘ŒWNNπ‘šβ‹…ξ€·1βˆ’π›½π‘šξ€Έβˆ‘π‘€π‘š=1πœ‡π΄π‘šβ‹…ξ€·π‘₯2β‹…π‘–βˆ’π‘₯π‘šπ‘–ξ€ΈπœŽ2π‘šπ‘–(3.21) and therefore (3.19) arederived.

4. Conclusion

In this paper, a developed Lyapunov stability theorem was applied to guarantee the convergence of the gradient-descent learning algorithm in network training. The experimental examples showed that the upper bound of the learning parameter could be easily considered using this theorem. So, an adaptive learning algorithm can guaranty the fast and stable learning procedure.

References

  1. K. S. Narendra and K. Parthasarathy, β€œIdentification and control of dynamical systems using neural networks,” IEEE Transactions on Neural Networks, vol. 1, no. 1, pp. 4–27, 1990. View at: Publisher Site | Google Scholar
  2. S. Z. Qin, H. T. Su, and T. J. McAvoy, β€œComparison of four neural net learning methods for dynamic system identification,” IEEE Transactions on Neural Networks, vol. 3, no. 1, pp. 122–130, 1992. View at: Publisher Site | Google Scholar
  3. T. Yabuta and T. Yamada, β€œLearning control using neural networks,” in Proceedings of the IEEE International Conference on Robotics and Automation, (ICRA '91), pp. 740–745, Sacramento, Calif, USA, April 1991. View at: Google Scholar
  4. P. Frasconi, M. Gori, and G. Soda, β€œLocal feedback multilayered networks,” Neural Computation, vol. 7, no. 1, pp. 120–130, 1992. View at: Google Scholar
  5. J. C. Patra, R. N. Pal, B. N. Chatterji, and G. Panda, β€œIdentification of nonlinear dynamic systems using functional link artificial neural networks,” IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 29, no. 2, pp. 254–262, 1999. View at: Publisher Site | Google Scholar
  6. Q. Zhang and A. Benveniste, β€œWavelet networks,” IEEE Transactions on Neural Networks, vol. 3, no. 6, pp. 889–898, 1992. View at: Publisher Site | Google Scholar
  7. J. Zhang, G. G. Walter, Y. Miao, and W. Lee, β€œWavelet neural networks for function learning,” IEEE Transactions on Signal Processing, vol. 43, no. 6, pp. 1485–1497, 1995. View at: Publisher Site | Google Scholar
  8. T. I. Boubez and R. L. Peskin, β€œWavelet neural networks and receptive field partitioning,” in Proceedings of the IEEE International Conference on Neural Networks, pp. 1544–1549, San Francisco, Calif, USA, March 1993. View at: Google Scholar
  9. A. Banakar and M. F. Azeem, β€œArtificial wavelet neural network and its application in neuro-fuzzy models,” Applied Soft Computing, vol. 8, no. 4, pp. 1463–1485, 2008. View at: Publisher Site | Google Scholar
  10. Q. Zhang, β€œUsing wavelet network in nonparametric estimation,” IEEE Transactions on Neural Networks, vol. 8, no. 2, pp. 227–236, 1997. View at: Publisher Site | Google Scholar
  11. A. Benveniste, B. Juditsky, B. Delyon, Q. Zhang, and P. Y. Glorennec, β€œWavelets in identification,” in Proceedings of the 10th IFAC Symposium on System Identification, (SYSID '94), Copenhagen, Denmark, July 1994. View at: Publisher Site | Google Scholar
  12. X. D. Li, J. K. L. Ho, and T. W. S. Chow, β€œApproximation of dynamical time-variant systems by continuous-time recurrent neural networks,” IEEE Transactions on Circuits and Systems, vol. 52, no. 10, pp. 656–660, 2005. View at: Google Scholar
  13. B. Srinivasan, U. R. Prasad, and N. J. Rao, β€œBack propagation through adjoints for the identification of nonlinear dynamic systems using recurrent neural models,” IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 213–228, 1994. View at: Publisher Site | Google Scholar
  14. P. Frasconi and M. Gori, β€œComputational capabilities of local-feedback recurrent networks acting as finite-state machines,” IEEE Transactions on Neural Networks, vol. 7, no. 6, pp. 1520–1525, 1996. View at: Google Scholar
  15. R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, β€œAdaptive mixtures of local experts,” Neural Computation, vol. 3, pp. 79–87, 1991. View at: Publisher Site | Google Scholar
  16. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, β€œLearning internal representations by error propagation,” in Parallel Distributed Processing I, D. E. Rumelhart and J. L. McClelland, Eds., pp. 675–695, MIT Press, Cambridge, UK, 1986. View at: Google Scholar
  17. P. Werbos, β€œGeneralization of backpropagation with application to a recurrent gas Markov model,” Neural Networks, vol. 1, pp. 339–356, 1988. View at: Google Scholar
  18. R. J. Williams and D. Zipser, β€œA learning algorithm for continually running fully recurrent networks,” Neural Networks, vol. 1, pp. 270–280, 1989. View at: Google Scholar
  19. R. J. Williams and D. Zipser, β€œMechanical system modeling using recurrent neural networks via quasi-Newton learning methods,” Applied Mathematical Modeling, vol. 19, no. 7, pp. 421–428, 1995. View at: Publisher Site | Google Scholar
  20. C.-F. Juang, β€œA TSK-type recurrent fuzzy network for dynamic systems processing by neural network and genetic algorithms,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 2, pp. 155–170, 2002. View at: Publisher Site | Google Scholar
  21. C.-H. Lee and C.-C. Teng, β€œIdentification and control of dynamic systems using recurrent fuzzy neural networks,” IEEE Transactions on Fuzzy Systems, vol. 8, no. 4, pp. 349–366, 2000. View at: Publisher Site | Google Scholar
  22. P. A. Mastorocostas and J. B. Theocharis, β€œA recurrent fuzzy-neural model for dynamic system identification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 32, no. 2, pp. 176–190, 2002. View at: Publisher Site | Google Scholar
  23. S. J. Yoo, Y. H. Choi, and J. B. Park, β€œGeneralized predictive control based on self-recurrent wavelet neural network for stable path tracking of mobile robots: adaptive learning rates approach,” IEEE Transactions on Circuits and Systems, vol. 53, no. 6, pp. 1381–1394, 2006. View at: Publisher Site | Google Scholar | MathSciNet
  24. T. G. Barbounis, J. B. Theocharis, M. C. Alexiadis, and P. S. Dokopoulos, β€œLong-term wind speed and power forecasting using local recurrent neural network models,” IEEE Transactions on Energy Conversion, vol. 21, no. 1, pp. 273–284, 2006. View at: Publisher Site | Google Scholar
  25. T. Takagi and M. Sugeno, β€œFuzzy identification of systems and its applications to modeling and control,” IEEE Transactions on Systems, Man and Cybernetics, vol. 15, no. 1, pp. 116–132, 1985. View at: Google Scholar | Zentralblatt MATH
  26. J. S. R. Jang, β€œANFIS: adaptive-network-based fuzzy inference system,” IEEE Transactions on Systems, Man and Cybernetics, vol. 23, no. 3, pp. 665–685, 1993. View at: Publisher Site | Google Scholar

Copyright © 2011 Ahmad Banakar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Related articles

No related content is available yet for this article.
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views2408
Downloads1257
Citations

Related articles

No related content is available yet for this article.

Article of the Year Award: Outstanding research contributions of 2021, as selected by our Chief Editors. Read the winning articles.