Abstract

We propose a new type of neural adaptive control via dynamic neural networks. For a class of unknown nonlinear systems, a neural identifier-based feedback linearization controller is first used. Dead-zone and projection techniques are applied to assure the stability of neural identification. Then four types of compensator are addressed. The stability of closed-loop system is also proven.

1. Introduction

Feedback control of the nonlinear systems is a big challenge for engineer, especially when we have no complete model information. A reasonable solution is to identify the nonlinear, then a adaptive feedback controller can be designed based on the identifier. Neural network technique seems to be a very effective tool to identify complex nonlinear systems when we have no complete model information or, even, consider controlled plants as “black box”.

Neuroidentifier could be classified as static (feed forward) or as dynamic (recurrent) ones [1]. Most of publications in nonlinear system identification use static networks, for example multilayer perceptrons, which are implemented for the approximation of nonlinear function in the right-side hand of dynamic model equations [2]. The main drawback of these networks is that the weight updating utilize information on the local data structures (local optima) and the function approximation is sensitive to the training dates [3]. Dynamic neural networks can successfully overcome this disadvantage as well as present adequate behavior in presence of unmodeled dynamics because their structure incorporate feedback [46].

Neurocontrol seems to be a very useful tool for unknown systems, because it is model-free control, that is, this controller does not depend on the plant. Many kinds of neurocontrol were proposed in recent years, for example, supervised neuro control [7] is able to clone the human actions. The neural network inputs correspond to sensory information perceived by the human, and the outputs correspond to the human control actions. Direct inverse control [1] uses an inverse model of the plant cascaded with the plant, so the composed system results in an identity map between the desired response and the plant one, but the absence of feedback dismisses its robustness; internal model neurocontrol [8] that used forward and inverse model is within the feedback loop. Adaptive neurocontrol has two kinds of structure: indirect and direct adaptive control. Direct neuroadaptive may realize the neurocontrol by neural network directly [1]. The indirect method is the combination of the neural network identifier and adaptive control, the controller is derived from the on-line identification [5].

In this paper we extend our previous results in [9, 10]. In [9], the neurocontrol was derived by gradient principal, so the neural control is local optimal. No any restriction is needed, because the controller did not include the inverse of the weights. In [10], we assume the inverse of the weights exists, so the learning law was normal. The main contributions of this paper are (1) a special weights updating law is proposed to assure the existence of neurocontrol. (2) Four different robust compensators are proposed. By means of a Lyapunov-like analysis, we derive stability conditions for the neuroidentifier and the adaptive controller. We show that the neuroidentifier-based adaptive control is effective for a large classes of unknown nonlinear systems.

2. Neuroidentifier

The controlled nonlinear plant is given aṡ𝑥𝑡𝑥=𝑓𝑡,𝑢𝑡,𝑡,𝑥𝑡𝑛,𝑢𝑡𝑛,(1) where 𝑓(𝑥𝑡) is unknown vector function. In order to realize indirect neural control, a parallel neural identifier is used as in [9, 10] (in [5] the series-parallel structure is used):̇̂𝑥𝑡=𝐴̂𝑥𝑡+𝑊1,𝑡𝜎̂𝑥𝑡+𝑊2,𝑡𝜙̂𝑥𝑡𝛾𝑢𝑡,(2) where ̂𝑥𝑡𝑛 is the state of the neural network, 𝑊1,𝑡,𝑊2,𝑡𝑛×𝑛 are the weight matrices, 𝐴𝑛×𝑛 is a stable matrix. The vector functions𝜎()𝑛,  𝜙()𝑛×𝑛 is a diagonal matrix. Function 𝛾() is selected as 𝛾(𝑢𝑡)2𝑢., for example 𝛾() may be linear saturation function, 𝛾𝑢𝑡=𝑢𝑡||𝑢,if𝑡||<𝑏,||𝑢𝑢,if𝑡||𝑏.(3) The elements of the weight matrices are selected as monotone increasing functions, a typical presentation is sigmoid function: 𝜎𝑖̂𝑥𝑡=𝑎𝑖1+𝑒𝑏𝑖̂𝑥𝑡𝑐𝑖,(4) where 𝑎𝑖,𝑏𝑖,𝑐𝑖>0. In order to avoid 𝜙(̂𝑥𝑡)=0, we select𝜙𝑖̂𝑥𝑡=𝑎𝑖1+𝑒𝑏𝑖̂𝑥𝑡+𝑐𝑖.(5)

Remark 1. The dynamic neural network (2) has been discussed by many authors, for example [4, 5, 9, 10]. It can be seen that Hopfield model is the special case of this networks with 𝐴=diag{𝑎𝑖},𝑎𝑖=1/𝑅𝑖𝐶𝑖,𝑅𝑖>0 and 𝐶𝑖>0.𝑅𝑖 and 𝐶𝑖 are the resistance and capacitance at the ith node of the network, respectively.

Let us define identification error asΔ𝑡=̂𝑥𝑡𝑥𝑡.(6) Generally, dynamic neural network (2) cannot follow the nonlinear system (1) exactly. The nonlinear system may be written aṡ𝑥𝑡=𝐴𝑥𝑡+𝑊01𝜎𝑥𝑡+𝑊02𝜙𝑥𝑡𝛾𝑢𝑡𝑓𝑡,(7) where 𝑊01 and 𝑊02 are initial matrices of 𝑊1,𝑡 and 𝑊2,𝑡𝑊01Λ11𝑊10𝑇𝑊1,𝑊02Λ21𝑊20𝑇𝑊2.(8)𝑊1 and 𝑊2 are prior known matrices, vector function 𝑓𝑡 can be regarded as modelling error and disturbances. Because 𝜎() and 𝜙() are chosen as sigmoid functions, clearly they satisfy the following Lipschitz property:𝜎𝑇Λ1𝜎Δ𝑇𝑡𝐷𝜎Δ𝑡,𝜙𝑡𝛾(𝑢𝑡)𝑇Λ2𝜙𝑡𝛾𝑢𝑡𝑢Δ𝑇𝑡𝐷𝜙Δ𝑡,(9) where 𝜎=𝜎(̂𝑥𝑡)𝜎(𝑥𝑡),𝜙=𝜙(̂𝑥𝑡)𝜙(𝑥𝑡),  Λ1,  Λ2,  𝐷𝜎, and 𝐷𝜙 are known positive constants matrices. The error dynamic is obtained from (2) and (7):̇Δ𝑡=𝐴Δ𝑡+𝑊1,𝑡𝜎̂𝑥𝑡+𝑊2,𝑡𝜙̂𝑥𝑡𝛾𝑢𝑡+𝑊01𝜎+𝑊02𝑢𝜙𝛾𝑡+𝑓𝑡,(10) where 𝑊1,𝑡=𝑊1,𝑡𝑊01,𝑊2,𝑡=𝑊2,𝑡𝑊02. As in [4, 5, 9, 10], we assume modeling error is bounded.(A1) the unmodeled dynamic 𝑓 satisfies𝑓𝑇𝑡Λ𝑓1𝑓𝑡𝜂.(11)Λ𝑓 is a known positive constants matrix.

If we define𝑅=𝑊1+𝑊2+Λ𝑓,𝑄=𝐷𝜎+𝑢𝐷𝜙+𝑄0,(12) and the matrices 𝐴 and 𝑄0 are selected to fulfill the following conditions:(1)the pair (𝐴,𝑅1/2) is controllable, the pair (𝑄1/2,𝐴) is observable,(2) local frequency condition [9] satisfies frequency condition: 𝐴𝑇𝑅11𝐴𝑄4𝐴𝑇𝑅1𝑅1𝐴𝑅𝐴𝑇𝑅1𝑅1𝐴𝑇,(13) then the following assumption can be established. (A2) There exist a stable matrix 𝐴 and a strictly positive definite matrix 𝑄0 such that the matrix Riccati equation:𝐴𝑇𝑃+𝑃𝐴+𝑃𝑅𝑃+𝑄=0(14) has a positive solution 𝑃=𝑃𝑇>0.

This condition is easily fulfilled if we select 𝐴 as stable diagonal matrix. Next Theorem states the learning procedure of neuroidentifier.

Theorem 2. Subject to assumptions A1 and A2   being satisfied, if the weights 𝑊1,𝑡 and 𝑊2,𝑡 are updated as ̇𝑊1,𝑡=𝑠𝑡𝐾1𝑃Δ𝑡𝜎𝑇̂𝑥𝑡,̇𝑊2,𝑡=𝑠𝑡Pr𝐾2𝑃𝜙̂𝑥𝑡𝛾𝑢𝑡Δ𝑇𝑡,(15) where 𝐾1,𝐾2>0,𝑃 is the solution of Riccati equation (14), Pr𝑖[𝜔](𝑖=1,2) are projection functions which are defined as 𝜔=𝐾2𝑃𝜙(̂𝑥𝑡)𝛾(𝑢𝑡)Δ𝑇𝑡[]=𝑊Pr𝜔𝜔,condition,𝜔+2,𝑡2𝑊𝑡𝑟𝑇2,𝑡𝐾2𝑃𝑊2,𝑡𝜔otherwise,(16) where the “condition” is 𝑊2,𝑡<𝑟 or 𝑊[2,𝑡𝑊=𝑟andtr(𝜔2,𝑡)0],𝑟<𝑊02 is a positive constant. 𝑠𝑡 is a dead-zone function 𝑠𝑡=Δ1,if𝑡2>𝜆1min𝑄0𝜂,0,otherwise,(17) then the weight matrices and identification error remain bounded, that is, Δ𝑡𝐿,𝑊1,𝑡𝐿,𝑊2,𝑡𝐿,(18) for any 𝑇>0 the identification error fulfills the following tracking performance: 1𝑇𝑇0Δ𝑡2𝑄0𝑑𝑡𝜅Δ𝜂+𝑇0𝑃Δ0𝑇,(19) where 𝜅 is the condition number of 𝑄0 defined as 𝜅=𝜆max(𝑄0)/𝜆min(𝑄0).

Proof. Select a Lyapunov function as 𝑉𝑡=Δ𝑇𝑡𝑃Δ𝑡𝑊+tr𝑇1,𝑡𝐾11𝑊1,𝑡𝑊+tr𝑇2,𝑡𝐾21𝑊2,𝑡,(20) where 𝑃𝑛×𝑛 is positive definite matrix. According to (10), the derivative is ̇𝑉𝑡=Δ𝑇𝑡𝑃𝐴+𝐴𝑇𝑃Δ𝑡+2Δ𝑇𝑡𝑃𝑊1,𝑡𝜎̂𝑥𝑡+2Δ𝑇𝑡𝑃𝑊2,𝑡𝜙̂𝑥𝑡𝛾𝑢𝑡+2Δ𝑇𝑡𝑃𝑓𝑡+2Δ𝑇𝑡𝑃𝑊1𝜎+𝑊1𝑢𝜙𝛾𝑡̇𝑊+2tr𝑇1,𝑡𝐾11𝑊1,𝑡̇𝑊+2tr𝑇2,𝑡𝐾21𝑊2,𝑡.(21) Since Δ𝑇𝑡𝑃𝑊1𝜎𝑡 is scalar, using (9) and matrix inequality 𝑋𝑇𝑋𝑌+𝑇𝑌𝑇𝑋𝑇Λ1𝑋+𝑌𝑇Λ𝑌,(22) where 𝑋,𝑌,Λ𝑛×𝑘 are any matrices, Λ is any positive definite matrix, we obtain 2Δ𝑇𝑡𝑃𝑊1𝜎𝑡Δ𝑇𝑡𝑃𝑊1Λ11𝑊1𝑇𝑃Δ𝑡+𝜎𝑇𝑡Λ1𝜎𝑡Δ𝑇𝑡𝑃𝑊1𝑃+𝐷𝜎Δ𝑡,2Δ𝑇𝑡𝑃𝑊2𝜙𝑡𝛾𝑢𝑡Δ𝑇𝑡𝑃𝑊2𝑃+𝑢𝐷𝜙Δ𝑡.(23) In view of the matrix inequality (22) and (A1), 2Δ𝑇𝑡𝑃𝑓𝑡Δ𝑇𝑡𝑃Λ𝑓𝑃Δ𝑡+𝜂.(24) So we have ̇𝑉𝑡Δ𝑇𝑡𝑃𝐴+𝐴𝑇𝑃+𝑃𝑊1+𝑊2+Λ𝑓𝑃+𝐷𝜎+𝑢𝐷𝜙+𝑄0Δ𝑡̇𝑊+2tr𝑇1,𝑡𝐾11𝑊1,𝑡+2Δ𝑇𝑡𝑃𝑊1,𝑡𝜎̂𝑥𝑡+𝜂Δ𝑇𝑡𝑄0Δ𝑡̇𝑊+2tr𝑇2,𝑡𝐾21𝑊2,𝑡+2Δ𝑇𝑡𝑃𝑊2,𝑡𝜙̂𝑥𝑡𝛾𝑢𝑡.(25) Since ̇𝑊1,𝑡=̇𝑊1,𝑡 and ̇𝑊2,𝑡=̇𝑊2,𝑡, if we use (A2), we have ̇𝑉𝑡𝐾2tr11̇𝑊𝑇1,𝑡+𝐾1𝑃Δ𝑡𝜎𝑇̂𝑥𝑡𝑊1,𝑡+𝜂Δ𝑇𝑡𝑄0Δ𝑡𝐾+2tr21̇𝑊2,𝑡+𝑃𝜙̂𝑥𝑡𝛾𝑢𝑡Δ𝑇𝑡𝑊2,𝑡.(26)(I)if Δ𝑡2>𝜆1min(𝑄0)𝜂, using the updating law as (15) we can conclude that ̇𝑉𝑡2trPr𝑃𝜙̂𝑥𝑡𝛾𝑢𝑡Δ𝑇𝑡+𝑃𝜙̂𝑥𝑡𝛾𝑢𝑡Δ𝑇𝑡𝑊2,𝑡Δ𝑇𝑡𝑄0Δ𝑡+𝜂,(27)(a)if 𝑊2,𝑡<𝑟 or 𝑊[2,𝑡𝑊=𝑟andtr(𝜔2,𝑡̇𝑉)0],𝑡𝜆min(𝑄0)Δ𝑡2+𝜂<0,(b)if 𝑊2,𝑡=𝑟 and 𝑊tr(𝜔2,𝑡)>0̇𝑉𝑡𝐾2tr2𝑃𝑊2,𝑡2𝑊tr𝑇2,𝑡𝐾2𝑃𝑊2,𝑡𝜔𝑊2,𝑡Δ𝑇𝑡𝑄0Δ𝑡+𝜂Δ𝑇𝑡𝑄0Δ𝑡+𝜂<0.(28)𝑉𝑡 is bounded. Integrating (27) from 0 up to 𝑇 yields 𝑉𝑇𝑉0𝑇0Δ𝑇𝑡𝑄0Δ𝑡𝑑𝑡+𝜂𝑇.(29) Because 𝜅1, we have 𝑇0Δ𝑇𝑡𝑄0Δ𝑡𝑑𝑡𝑉0𝑉𝑇+𝑇0Δ𝑇𝑡𝑄0Δ𝑡𝑑𝑡𝑉0+𝜂𝑇,𝑉0+𝜅𝜂𝑇,(30) where 𝜅 is condition number of 𝑄0(II)If Δ𝑡2𝜆1min(𝑄0)𝜂, the weights become constants, 𝑉𝑡 remains bounded. And𝑇0Δ𝑇𝑡𝑄0Δ𝑡𝑑𝑡𝑇0𝜆max𝑄0Δ𝑡2𝜆𝑑𝑡max𝑄0𝜆min𝑄0𝜂𝑇𝑉0+𝜅𝜂𝑇.(31)
From (I) and (II), 𝑉𝑡 is bounded, (18) is realized. From (20) and 𝑊1,𝑡=𝑊1,𝑡𝑊01,𝑊2,𝑡=𝑊2,𝑡𝑊02 we know 𝑉0=Δ𝑇0𝑃Δ0. Using (30) and (31), (19) is obtained. The theorem is proved.

Remark 3. The weight update law (15) uses two techniques. The dead-zone 𝑠𝑡 is applied to overcome the robust problem caused by unmodeled dynamic 𝑓𝑡. In presence of disturbance or unmodeled dynamics, adaptive procedures may easily go unstable. The lack of robustness of parameters identification was demonstrated in [11] and became a hot issue in 1980s. Dead-zone method is one of simple and effective tool. The second technique is projection approach which may guarantee that the parameters remain within a constrained region and do not alter the properties of the adaptive law established without projection [12]. The projection approach proposed in this paper is explained in Figure 1. We hope to force 𝑊2,𝑡 inside the ball of center 𝑊02 and radius 𝑟. If 𝑊2,𝑡<𝑟, we use the normal gradient algorithm. When 𝑊2,𝑡𝑊02 is on the ball, and the vector 𝑊2,𝑡 points either inside or along the ball, that is, 𝑊(𝑑/𝑑𝑡)2,𝑡2𝑊=2tr(𝜔2,𝑡)0, we also keep this algorithm. If 𝑊tr(𝜔2,𝑡𝑊)>0,tr[(𝜔+(2,𝑡2𝑊/tr(𝑇2,𝑡(𝐾2𝑊𝑃)2,𝑡𝑊))𝜔)2,𝑡]<0, so 𝑊(𝑑/𝑑𝑡)2,𝑡2<0,𝑊2,𝑡 are directed toward the inside or the ball, that is, 𝑊2,𝑡 will never leave the ball. Since 𝑟<𝑊02,𝑊2,𝑡0.

Remark 4. Figure 1 and (7) show that the initial conditions of the weights influence identification accuracy. In order to find good initial weights, we design an offline method. From above theorem, we know the weights will convergence to a zone. We use any initial weights, W01 and W02, after 𝑇0, the identification error should become smaller, that is, 𝑊1,𝑇0and 𝑊2,𝑇0 are better than 𝑊01 and 𝑊02. We use following steps to find the initial weights.(1)Start from any initial value for 𝑊01=𝑊1,0,𝑊02=𝑊2,0.(2)Do identification until training time arrives 𝑇0.(3)If the Δ(𝑇0)<Δ(0), let 𝑊1,𝑇0,𝑊2,𝑇0 as a new 𝑊01 and 𝑊02, go to 2 to repeat the identification process.(4)If the Δ(𝑇0)Δ(0), stop this offline identification, now 𝑊1,𝑇0,𝑊2,𝑇0 are the final initial weights.

Remark 5. Since the updating rate is 𝐾𝑖𝑃 (𝑖=1,2), and 𝐾𝑖 can be selected as any positive matrix, the learning process of the dynamic neural network (15) is free of the solution of Riccati equation (14).

Remark 6. Let us notice that the upper bound (19) turns out to be ‘‘sharp’’, that is, in the case of not having any uncertainties (exactly matching case: 𝑓=0) we obtain 𝜂=0 and, hence, limsup𝑇1𝑇𝑇0Δ𝑡2𝑄0𝑑𝑡=0(32) from which, for this special situation, the asymptotic stability property (Δ𝑡𝑡0) follows. In general, only the asymptotic stability ‘‘in average’’ is guaranteed, because the dead-zone parameter 𝜂 can be never set zero.

3. Robust Adaptive Controller Based on Neuro Identifier

From (7) we know that the nonlinear system (1) may be modeled aṡ𝑥𝑡=𝐴𝑥𝑡+𝑊1𝜎𝑥𝑡+𝑊2𝜙𝑥𝑡𝛾𝑢𝑡+𝑓=𝐴𝑥𝑡+𝑊1,𝑡𝜎̂𝑥𝑡+𝑊2,𝑡𝜙𝑥𝑡𝛾𝑢𝑡+𝑊𝑓+1,𝑡𝜎̂𝑥𝑡+𝑊2,𝑡𝜙𝑥𝑡𝛾𝑢𝑡+𝑊1,𝑡𝜎𝑡+𝑊1𝑢𝜙𝛾𝑡.(33)

Equation (33) can be rewritten aṡ𝑥𝑡=𝐴𝑥𝑡+𝑊1,𝑡𝜎̂𝑥𝑡+𝑊2,𝑡𝜙𝑥𝑡𝛾𝑢𝑡+𝑑𝑡,(34) where𝑑𝑡=𝑊𝑓+1,𝑡𝜎̂𝑥𝑡+𝑊2,𝑡𝜙𝑥𝑡𝛾𝑢𝑡+𝑊1,𝑡𝜎𝑡+𝑊1𝑢𝜙𝛾𝑡.(35) If updated law of 𝑊1,𝑡 and 𝑊2,𝑡 is (15), 𝑊1,𝑡 and 𝑊2,𝑡 are bounded. Using the assumption (A1), 𝑑𝑡 is bounded as 𝑑=sup𝑡𝑑𝑡.

The object of adaptive control is to force the nonlinear system (1) following a optimal trajectory 𝑥𝑡𝑟 which is assumed to be smooth enough. This trajectory is regarded as a solution of a nonlinear reference model:̇𝑥𝑡𝑥=𝜑𝑡,𝑡,(36) with a fixed initial condition. If the trajectory has points of discontinuity in some fixed moments, we can use any approximating trajectory which is smooth. In the case of regulation problem 𝜑(𝑥𝑡,𝑡)=0, 𝑥(0)=𝑐, 𝑐 is constant. Let us define the sate trajectory error asΔ𝑡=𝑥𝑡𝑥𝑡.(37) From (34) and (36) we havėΔ𝑡=𝐴𝑥𝑡+𝑊1,𝑡𝜎̂𝑥𝑡+𝑊2,𝑡𝜙𝑥𝑡𝛾𝑢𝑡+𝑑𝑡𝑥𝜑𝑡,𝑡.(38) Let us select the control action 𝛾(𝑢𝑡) as linear form𝛾𝑢𝑡=𝑈1,𝑡+𝑊2,𝑡𝜙̂𝑥𝑡1𝑈2,𝑡,(39) where 𝑈1,𝑡𝑛 is direct control part and 𝑈2,𝑡𝑛 is a compensation of unmodeled dynamic 𝑑𝑡. As 𝜑(𝑥𝑡,𝑡),𝑥𝑡, 𝑊1,𝑡𝜎(̂𝑥𝑡) and 𝑊2,𝑡𝜙(̂𝑥𝑡) are available, we can select 𝑈1,𝑡 as𝑈1,𝑡=𝑊2,𝑡𝜙̂𝑥𝑡1𝜑𝑥𝑡,𝑡𝐴𝑥𝑡𝑊1,𝑡𝜎̂𝑥𝑡.(40) Because 𝜙(̂𝑥𝑡) in (5) is different from zero, and 𝑊2,𝑡0 by the projection approach in Theorem 2. Substitute (39) and (40) into (38), we have So the error equation iṡΔ𝑡=𝐴Δ𝑡+𝑈2,𝑡+𝑑𝑡.(41) Four robust algorithms may be applied to compensate 𝑑𝑡.

(A) Exactly Compensation
From (7) and (2) we have 𝑑𝑡=̇𝑥𝑡̇̂𝑥𝑡𝑥𝐴𝑡̂𝑥𝑡.(42) If ̇𝑥𝑡 is available, we can select 𝑈2,𝑡 as 𝑈𝑎2,𝑡=𝑑𝑡, that is, 𝑈𝑎2,𝑡𝑥=𝐴𝑡̂𝑥𝑡̇𝑥𝑡̇̂𝑥𝑡.(43) So, the ODE which describes the state trajectory error is ̇Δ𝑡=𝐴Δ𝑡.(44) Because 𝐴 is stable, Δ𝑡 is globally asymptotically stable. lim𝑡Δ𝑡=0.(45)

(B) An Approximate Method
If ̇𝑥𝑡 is not available, an approximate method may be used as ̇𝑥𝑡=𝑥𝑡𝑥𝑡𝜏𝜏+𝛿𝑡,(46) where 𝛿𝑡>0, is the differential approximation error. Let us select the compensator as 𝑈𝑏2,𝑡𝑥=𝐴𝑡̂𝑥𝑡𝑥𝑡𝑥𝑡𝜏𝜏̇̂𝑥𝑡.(47) So 𝑈𝑏2,𝑡=𝑈𝑎2,𝑡+𝛿𝑡, (44) become ̇Δ𝑡=𝐴Δ𝑡+𝛿𝑡.(48) Define Lyapunov-like function as 𝑉𝑡=Δ𝑡𝑇𝑃2Δ𝑡,𝑃2=𝑃𝑇2>0.(49) The time derivative of (49) is ̇𝑉𝑡=Δ𝑡𝐴𝑇𝑃2+𝑃2𝐴Δ𝑡+2Δ𝑡𝑇𝑃2𝛿𝑡,(50)2Δ𝑇𝑡𝑃2𝛿𝑡 can be estimated as 2Δ𝑡𝑇𝑃2𝛿𝑡Δ𝑡𝑇𝑃2Λ𝑃2Δ𝑡+𝛿𝑇𝑡Λ1𝛿𝑡(51) where Λ is any positive define matrix. So (50) becomes ̇𝑉𝑡Δ𝑡𝐴𝑇𝑃2+𝑃2𝐴+𝑃2Λ𝑃2+𝑄2Δ𝑡+𝛿𝑇𝑡Λ1𝛿𝑡Δ𝑡𝑇𝑄2Δ𝑡,(52) where 𝑄 is any positive define matrix. Because 𝐴 is stable, there exit Λ and 𝑄2 such that the matrix Riccati equation: 𝐴𝑇𝑃2+𝑃2𝐴+𝑃2Λ𝑃2+𝑄2=0(53) has positive solution 𝑃2=𝑃𝑇2>0. Defining the following seminorms: Δ𝑡2𝑄2=lim𝑇1𝑇𝑇0Δ𝑡𝑄2Δ𝑡𝑑𝑡,(54) where 𝑄2=𝑄2>0 is the given weighting matrix, the state trajectory tracking can be formulated as the following optimization problem: 𝐽min=min𝑢𝑡𝑥𝐽,𝐽=𝑡𝑥𝑡2𝑄2.(55) Note that lim𝑇1𝑇Δ0𝑇𝑃2Δ0=0(56) based on the dynamic neural network (2), the control law (47) can make the trajectory tracking error satisfies the following property: Δ𝑡2𝑄2𝛿𝑡2Λ1.(57) A suitable selection of Λ and 𝑄2 can make the Riccati equation (53) has positive solution and make Δ𝑡2𝑄2 small enough if 𝜏 is small enough.

(C) Sliding Mode Compensation
If ̇𝑥𝑡 is not available, the sliding mode technique may be applied. Let us define Lyapunov-like function as 𝑉𝑡=Δ𝑡𝑇𝑃3Δ𝑡,(58) where 𝑃3 is a solution of the Lyapunov equation: 𝐴𝑇𝑃3+𝑃3𝐴=𝐼.(59) Using (41) whose time derivative is ̇𝑉𝑡=Δ𝑡𝐴𝑇𝑃3+𝑃3𝐴Δ𝑡+2Δ𝑡𝑇𝑃3𝑈2,𝑡+2Δ𝑡𝑇𝑃3𝑑𝑡.(60) According to sliding mode technique, we may select 𝑢2,𝑡 as 𝑈𝑐2,𝑡=𝑘𝑃31Δsgn𝑡,𝑘>0,(61) where 𝑘 is positive constant, Δsgn𝑡=1Δ𝑡>00Δ𝑡=01Δ𝑡Δ<0sgn𝑡=Δsgn1,𝑡Δ,sgn𝑛,𝑡𝑇𝑛.(62) Substitute (59) and (61) into (60) ̇𝑉𝑡Δ=𝑡2Δ2𝑘𝑡+2Δ𝑡𝑇𝑃𝑑𝑡Δ𝑡2Δ2𝑘𝑡+2𝜆max(Δ𝑃)𝑡𝑑𝑡Δ=𝑡2Δ2𝑡𝑘𝜆max𝑑(𝑃)𝑡.(63) If we select 𝑘>𝜆max𝑃3𝑑,(64) where 𝑑 is define as (35), then ̇𝑉𝑡<0. So, lim𝑡Δ𝑡=0.(65)

(D) Local Optimal Control
If ̇𝑥𝑡 is not available and ̇𝑥𝑡 is not approximated as (B). In order to analyze the tracking error stability, we introduce the following Lyapunov function: 𝑉𝑡Δ𝑡=Δ𝑡𝑃4Δ𝑡,𝑃4=𝑃𝑇4>0.(66) Using (41), whose time derivative is ̇𝑉𝑡=Δ𝑡𝐴𝑇𝑃4+𝑃4𝐴Δ𝑡+2Δ𝑡𝑇𝑃4𝑈2,𝑡+2Δ𝑡𝑇𝑃4𝑑𝑡,(67)2Δ𝑡𝑇𝑃4𝑑𝑡 can be estimated as 2Δ𝑡𝑇𝑃4𝑑𝑡Δ𝑡𝑃4Λ41𝑃4Δ𝑡+𝑑𝑇𝑡Λ4𝑑𝑡.(68) Substituting (68) in (67), adding and subtracting the term Δ𝑡𝑇𝑄4Δ𝑡 and 𝑈𝑑𝑇2,𝑡𝑅4𝑈𝑑2,𝑡 with 𝑄4=𝑄𝑇4>0 and 𝑅4=𝑅𝑇4>0, we formulate ̇𝑉𝑡Δ𝑡𝐴𝑇𝑃4+𝑃4𝐴+𝑃4Λ4𝑃4+𝑄4Δ𝑡+2Δ𝑡𝑇𝑃4𝑈𝑑2,𝑡+𝑈𝑑𝑇2,𝑡𝑅4𝑈𝑑2,𝑡+𝑑𝑇𝑡Λ41𝑑𝑡Δ𝑡𝑄Δ𝑡𝑈𝑑𝑇2,𝑡𝑅4𝑈𝑑2,𝑡.(69) Because 𝐴 is stable, there exit Λ4 and 𝑄4 such that the matrix Riccati equation: 𝐴𝑇𝑃4+𝑃4𝐴+𝑃4Λ4𝑃4+𝑄4=0.(70) So (69) is ̇𝑉𝑡Δ𝑡2𝑄4+𝑈𝑑2,𝑡2𝑅4𝑈+Ψ𝑑2,𝑡+𝑑𝑇𝑡Λ41𝑑𝑡,(71) where Ψ𝑈𝑑2,𝑡=2Δ𝑡𝑇𝑃4𝑈𝑑2,𝑡+𝑈𝑑𝑇2,𝑡𝑅4𝑈𝑑2,𝑡.(72) We reformulate (71) as Δ𝑡2𝑄4+𝑈𝑑2,𝑡2𝑅4𝑈Ψ𝑑2,𝑡+𝑑𝑇𝑡Λ41𝑑𝑡̇𝑉𝑡.(73) Then, integrating each term from 0 to 𝜏, dividing each term by 𝜏, and taking the limit, for 𝜏 of these integrals’ supreme, we obtain lim𝑇1𝑇𝑇0Δ𝑡𝑇𝑄4Δ𝑡𝑑𝑡+lim𝑇1𝑇𝑇0𝑈𝑑𝑇2,𝑡𝑅4𝑈𝑑2,𝑡𝑑𝑡lim𝑇1𝑇𝑇0𝑑𝑇𝑡Λ41𝑑𝑡𝑑𝑡+lim𝑇1𝑇𝑇0Ψ𝑈𝑑2,𝑡+𝑑𝑡lim𝑇1𝑇𝑇0̇𝑉𝑡𝑑𝑡.(74) In the view of definitions of the seminorms (55), we have Δ𝑡2𝑄4+𝑈𝑑2,𝑡2𝑅4𝑑𝑡2Λ41+lim𝑇1𝑇𝑇0Ψ𝑈𝑑2,𝑡𝑑𝑡.(75) It fixes a tolerance level for the trajectory-tracking error. So, the control goal now is to minimize Ψ(𝑈𝑑2,𝑡) and 𝑑𝑡2Λ41. To minimize 𝑑𝑡2Λ41, we should minimize Λ41. From (13), if select 𝑄4 to make (70) have solution, we can choose the minimal Λ41 as Λ41=𝐴𝑇𝑄4𝐴1.(76) To minimizing Ψ(𝑈𝑑2,𝑡), we assume that, at the given 𝑡 (positive), 𝑥(𝑡) and ̂𝑥(𝑡) are already realized and do not depend on 𝑈𝑑2,𝑡. We name the 𝑈𝑑2,𝑡(𝑡) as the locally optimal control, because it is calculated based only on “local” information. The solution of this optimization problem is given by 𝑢minΨ𝑑2,𝑡=2Δ𝑡𝑇𝑃4𝑢𝑑2,𝑡+𝑈𝑑𝑇2,𝑡𝑅4𝑈𝑑2,𝑡.subject:𝐴0𝑈1,𝑡+𝑈𝑑2,𝑡𝐵0.(77) It is typical quadratic programming problem. Without restriction 𝑈 is selected according to the linear squares optimal control law: 𝑢𝑑2,𝑡=2𝑅41𝑃4Δ𝑡.(78)

Remark 7. Approaches (A) and (C) are exactly compensations of 𝑑𝑡, Approach (A) needs the information of ̇𝑥𝑡. Because Approach (C) uses the sliding mode control  𝑈𝑐2,𝑡 that is inserted in the closed-loop system, chattering occurs in the control input which may excite unmodeled high-frequency dynamics. To eliminate chattering, the boundary layer compensator can be used, it offers a continuous approximation to the discontinuous sliding mode control law inside the boundary layer and guarantees the output tracking error within any neighborhood of the origin [13].
Finally, we give following design steps for the robust neurocontrollers proposed in this paper.(1)According to the dimension of the plant (1), design a neural networks identifier (2) which has the same dimension as the plant. In (2), 𝐴 can be selected a stable matrix. 𝐴 will influence the dynamic response of the neural network. The bigger eigenvalues of 𝐴 will make the neural network slower. The initial conditions for 𝑊1,𝑡 and 𝑊2,𝑡 are obtained as in Remark 4.(2)Do online identification. The learning algorithm is (15) with the dead zone in Theorem 2. We assume we know the upper bound of modeling error, we can give a value for 𝜂.  𝑄0 is chosen such that Riccati equation (14) has positive defined solution, 𝑅 can be selected as any positive defined matrix because Λ11 is arbitrary positive defined matrix. The updating rate in the learning algorithm (15) is 𝐾1𝑃, and 𝐾1 can be selected as any positive defined matrix, so the learning process is free of the solution 𝑃 of the Riccati equations (14). The larger 𝐾1𝑃 is selected, the faster convergence the neuroidentifier has.(3)Use robust control (39) and one of compensation of (43), (47), (61), and (78).

4. Simulation

In this section, a two-link robot manipulator is used to illustrate the proposed approach. Its dynamics of can be expressed as follows [14]:𝑀(𝜃)..̇𝜃̇𝜃+𝑉𝜃,𝜃+𝐺(𝜃)+𝐹𝑑̇𝜃=𝜏,(79) where 𝜃2 consists of the joint variables, ̇𝜃2 denotes the links velocity, 𝜏 is the generalized forces, 𝑀(𝜃) is the intertie matrix, ̇𝑉(𝜃,𝜃) is centripetal-Coriolis matrix, and 𝐺(𝜃) is gravity vector, 𝐹𝑑(̇𝜃) is the friction vector. 𝑀(𝜃) represents the positive defined inertia matrix. If we define 𝑥1=𝜃=[𝜃1,𝜃2] is joint position, 𝑥2=̇𝜃 is joint velocity of the link, 𝑥𝑡=[𝑥1,𝑥2]𝑇, (79) can be rewritten as state space form [15]:̇𝑥1=𝑥2,̇𝑥2𝑥=𝐻𝑡,𝑢𝑡,(80) where 𝑢𝑡=𝜏 is control input,𝐻𝑥𝑡,𝑢𝑡𝑥=𝑀11𝐶𝑥1,𝑥2̇𝑥1𝑥+𝐺1+𝐹̇𝑥1+𝑢𝑡.(81) Equation (80) can also be rewritten aṡ𝑥1=𝑡0𝐻𝑥𝜏,𝑢𝜏𝑥𝑑𝜏+𝐻0,𝑢0.(82) So the dynamic of the two-link robot (79) is in form of (1) with𝑓𝑥𝑡,𝑢𝑡=,𝑡𝑡0𝐻𝑥𝜏,𝑢𝜏𝑥𝑑𝜏+𝐻0,𝑢0.(83) The values of the parameters are listed below: 𝑚1=𝑚2=1.53kg,  𝑙1=𝑙2=0.365m,  𝑟1=𝑟2=0.1,  𝑣1=𝑣2=0.4,  𝑘1=𝑘2=0.8. Let define ̂𝜃̂𝑥=[1,̂𝜃2]𝑇, and 𝑢=[𝜏1,𝜏2]𝑇, the neural network for control is represented aṡ̂𝑥=𝐴̂𝑥+𝑊1,𝑡𝜎̂𝑥𝑡+𝑊2,𝑡𝜙(̂𝑥)𝑢.(84) We select 𝐴=1.5001,𝜙(̂𝑥𝑡)=diag(𝜙1(̂𝑥1),𝜙2(̂𝑥2)),𝜎(̂𝑥𝑡)=[𝜎2(̂𝑥2),𝜎2(̂𝑥2)]𝑇𝜎𝑖̂𝑥𝑖=21+𝑒2̂𝑥𝑖12,𝜙𝑖̂𝑥𝑖=21+𝑒2̂𝑥𝑖+12,(85) where 𝑖=1,2. We used Remark 4  to obtain a suitable 𝑊01 and 𝑊02, start from random values, 𝑇0=100. After 2 loops, Δ(𝑇0) does not decrease, we let the 𝑊1,300 and 𝑊2,300 as the new 𝑊01=0.513.82.31.51 and 𝑊02=3.122.785.524.021. For the update laws (15), we select 𝜂=0.1,  𝑟=5,  𝐾1𝑃=𝐾1𝑃=5002. If we select the generalized forces as𝜏1=7sin𝑡,𝜏2=0.(86)

Now we check the neurocontrol. We assume the robot is changed at 𝑡=480, after that 𝑚1=𝑚2=3.5kg,  𝑙1=𝑙2=0.5m, and the friction becomes disturbance as 𝐷sin((𝜋/3)𝑡),𝐷 is a positive constant. We compare neurocontrol with a PD control as𝜏PD=10𝜃𝜃̇̇𝜃5𝜃,(87) where 𝜃1=3;𝜃2 is square wave. So 𝜑(𝜃̇𝜃)==0.

The neurocontrol is  (39)𝜏neuro=𝑊2,𝑡𝜙(̂𝑥)+𝜑𝑥𝑡,𝑡𝐴𝑥𝑡𝑊1,𝑡+𝑊𝜎(̂𝑥)2,𝑡𝜙(̂𝑥)+𝑈2,𝑡.(88)𝑈2,𝑡 is selected to compensate the unmodeled dynamics. Sine 𝑓 is unknown method. (A) exactly compensation, cannot be used.

(B) 𝐷=1. The link velocity ̇𝜃 is measurable, as in (43),𝑈2,𝑡̂𝜃̇𝜃=𝐴𝜃𝜃̇̂.(89) The results are shown in Figures 2 and 3.

(C) ̇𝜃𝐷=0.3. is not available, the sliding mode technique may be applied. we select 𝑢2,𝑡 as   (61).𝑢2,𝑡=10×sgn𝜃𝜃.(90) The results are shown in Figures 4 and 5.

(D) 𝐷=3. We select 𝑄=1/2,  𝑅=1/20,  Λ=4.5, the solution of following Riccati equation:𝐴𝑇𝑃+𝑃𝐴+𝑃Λ𝑃𝑡̇+𝑄=𝑃(91) is 𝑃=0.33000.33. If without restriction 𝜏, the linear squares optimal control law:𝑢2,𝑡=2𝑅1𝑃𝜃𝜃=200020𝜃𝜃.(92) The results of local optimal compensation are shown in Figures 6 and 7.

We may find that the neurocontrol is robust and effective when the robot is changed.

5. Conclusion

By means of Lyapunov analysis, we establish bounds for both the identifier and adaptive controller. The main contributions of our paper is that we give four different compensation methods and prove the stability of the neural controllers.