Abstract

Adaptive dynamic programming (ADP) has been tested as an effective method for optimal control of nonlinear system. However, as the structure of ADP requires control input to satisfy the initial admissible control condition, the control performance may be deteriorated due to abrupt parameter change or system failure. In this paper, we introduce the multiple models idea into ADP, multiple subcontrollers run in parallel to supply multiple initial conditions for different environments, and a switching index is set up to decide the appropriate initial conditions for current system. By taking this strategy, the proposed multiple model ADP achieves optimal control for system with jumping parameters. The convergence of multiple model adaptive control based on ADP is proved and the simulation shows that the proposed method can improve the transient response of system effectively.

1. Introduction

In recent years, multiple model adaptive control (MMAC) has been a research focus on improving the transient response of nonlinear system. In practical control process, system dynamics may change abruptly due to system failure or parameter change. Traditional adaptive control methods can not deal with this kind of change, resulting in bad transient response or even system unstability. According to multiple model adaptive control theory, multiple models will be established to cover system uncertainty; corresponding multiple controllers will also be constructed [1]. Based on the switching mechanism, at every moment, the controller corresponding to the model which is the closest to current system will be selected as the current controller. Thus, the transient response and the control property will be greatly improved.

From 1990s, multiple model adaptive control based on index switching function has obtained satisfying results for linear system, linear time-variant system with jumping parameters, and stochastic system with stochastic disturbance. However, for nonlinear system, there is still no identical research method or satisfying process result. Among the main MMAC researches for nonlinear system, multiple model adaptive control based on neural networks has attached more and more attention [24]. Because the neural network shows outstanding performance in approximating nonlinear system, it can turn the system uncertainty into the uncertainty of weights and structure of neural networks. Thus, multiple adaptive control for nonlinear system can be designed based on the change of weights and structure of neural networks.

In recent years, neural networks (NNs) and fuzzy logic are widely used to handle the control problem of nonlinear systems owing to their fast adaptability and excellent approximation ability. For system without complete model information or system regarded as “black-box,” neural networks show great advantage. For uncertain nonlinear discrete-time system with dead-zone input, [5] introduces NNs to approximate the unknown functions in the transformed systems, so that the tracking error converges with the dead zone handled by an adaptive compensative term. Fuzzy logic systems are used to approximate the unknown functions to achieve control for discrete-time system with backlash [6, 7] or input constraint [8].

Combining dynamic programming, neural networks, and reinforcement learning [9], adaptive dynamic programming (ADP) solved the problem of “curse of dimensionality” in traditional dynamic programming and provides a practical control scheme for optimal control of nonlinear system. ADP adopts two neural networks, one critic neural network to approximate the cost function and one actor neural network to approximate the control strategy, so that the optimal principle can be satisfied [10, 11]. In 2002, Murray proposed the iterative ADP algorithm for continuous-time system firstly. Iterative ADP can update the policy equation and value function by iteration of policy and value [12, 13]. However, iterative ADP can only be used to calculate offline due to its long-time calculation caused by uncertain iteration times. In recent years, online ADP strategies are proposed widely [1417]. They can obtain the optimal solution in an adaptive means rather than by offline calculation.

Paper [18] proposed a ADP tracking strategy which does not require any knowledge of drift dynamics of the system, which means it has the adaptivity to deal with model uncertainty. However, as in most existing online ADP methods, the controller needs the initial control to satisfy the admissible condition for corresponding system [15, 19, 20]. Thus, once system endures abrupt changes of parameters and control signal at the change moment does not satisfy initial admissible condition after parameter change, the ADP controller can not make the state track the desired trajectory any more. In this paper, we introduce MMAC into ADP; multiple models are established to cover uncertainty of system; correspondingly, multiple subcontrollers are constructed and run in parallel. A switching index function is introduced to decide the most accurate model to describe current system. Once there is a model switching, corresponding controller will be selected to provide its current state and control signal as the initial condition of system. Based on this idea, we design multiple fixed models if the submodels are precisely known. And, for imprecise estimation models, multiple fixed models and one adaptive model are combined to obtain an improved transient response.

This paper is organized as follows. System with jumping parameters is described in Section 2. Then, a transformed ADP tracking control scheme is introduced and proved convergent in Section 3. In Section 4 the main structure of MMAC based on ADP is described and two kinds of MMAC strategies are introduced for precisely known submodels and imprecise models. Simulation experiments are shown in Section 5 and Section 6 concludes this paper.

2. Problem Description

Consider the following nonlinear discrete-time system with jumping parameters:where represents system state and constrained control input is denoted by , where is the constraint bound of the th actuator. is a time-varying parameter satisfying the following assumption.

Assumption 1. is a piecewise constant function in respect to , , where is finite integer. does not change frequently, that is, time between two different constants is long enough. And will finally stop at one constant.

The objective of the tracking problem is to design an optimal controller with constrained control signal so that the output state can track the following desired trajectory in an optimal way:

As shown in [21], (2) can generate large class of trajectories satisfying the requirement of most applications, including unit step, sinusoidal waveforms, and damped sinusoids.

3. Trajectory Tracking Based on ADP

For the following nonlinear discrete-time nonlinear system without jumping parameters,define the following tracking error:We have . Combining (2), (3), and (4), the following dynamic equation in respect to , , and is given:

Rewrite (2) and (5) in the following matrix form [21]:Further, the system can be rewritten as the following transformed dynamics in terms of control input :whereand satisfies .

The infinite-horizon scalar cost function can be defined aswhere ; is defined as is positive definite. To deal with constrained control input, we employ the following function [22]:where is a diagonal matrix defined as , , is a positive definite diagonal matrix, and is a one-to-one function satisfying and its first derivative is bounded by a constant. At the same time, it should be a monotonic increasing odd function. Consider

Definition 2 (see [23]). A control policy is said to be admissible if is continuous, , stabilizes (3), and, for every initial state , is finite.

According to Bellman optimal principle and the first-order necessary condition, theoretical optimal control law can be calculated aswhereand theoretical HJB equation is derived aswhere

In the following part of this section, an online actor-critic structure is introduced to solve the optimal tracking problem, the critic neural network (NN) is designed to approximate the value function, and the actor NN is designed to approximate the optimal control signal.

(1) Critic NN. A two-layer NN is utilized as the critic NN to approximate the value functionwhere and are constant target weights of the hidden layer and output layer, respectively, is the activation function and is bounded approximation error, and is the number of neurons in hidden layer. , , and gradient of are assumed to be bounded as , , and , respectively.

The actual output of the critic NN is given aswhere and are the estimations of and .

Then, the approximate HJB function error can be derived as follows:

The goal of critic NN is to minimize the following function:

Using the gradient-descent method, the update law of the critic NN is given as

In this paper, we select the activation function of critic NN as , so we havewhere and . Then

(2) Actor NN. To obtain the optimal control input, a two-layer NN is utilized as the actor NN to approximate :where and are constant target weights of the hidden layer and output layer, respectively, is corresponding activation function, is the bounded approximation error, and is the number of neurons in hidden layer. and are assumed to be bounded as and .

The actual output of the actor NN is given aswhere and are the estimations of and , respectively. Using (17), the actual approximation target is

The goal of the actor NN is to minimize the following function:where the actor NN approximation error is defined asUsing the gradient-descent method, the update law of the actor NN is given as

In this paper, activation function of actor NN is selected as . Define ; we have

Finally, optimal control signal is obtained as follows:

Remark 3. To obtain the optimal control policy, the actor NN is designed to approximate so that the control signal can be strictly restricted in given constraints by using the function as in (30), while, in some cases, the actor NN approximates directly, resulting in control signal out of constraints due to unsuitable weights in the initial period.

Theorem 4. For nonlinear discrete-time system given by (7), let the weight tuning laws of the critic NN and actor NN be given by (20) and (28), respectively, and let the initial weight of the actor NN reflect the initial admissible control of system (7). There exist positive constants and such that system state and estimation errors of two networks are all uniformly ultimately bounded (UUB).

Proof of Theorem 4 is shown in the Appendix.

In contrast with traditional ADP tracking strategies, the above method does not require the knowledge of the system drift dynamics. By this means, it supplies some adaptability that for systems with different drift dynamics this method can still make system state track the desired trajectory. However, for different systems, initial admissible control conditions must be required.

4. Multiple Model Control Scheme Based on ADP

In this section, firstly, we propose the multiple model ADP for system with accurately known submodels. Secondly, an adaptive ADP main controller is introduced so that the new multiple model ADP can deal with the problem of estimated submodel.

4.1. Multiple Model ADP with Accurately Known Submodels

In this section, we consider the case that known submodels can reflect system dynamics at every working point precisely as follows:where

According to the idea of multiple model adaptive control, it is natural to design independent multiple subcontrollers to track the target trajectory in parallel and use a switch index function to decide the best controller to control current system. The main structure of multiple model ADP controller for accurate known submodels is shown in Figure 1.

For every submodel , according to Theorem 4, if initial weights of the actor NN reflecting initial admissible control are given and the weights of two NNs are tuned online according to (20) and (28), respectively, with appropriate learning rates, then, output states can track the desired trajectory in the optimal manner. Thus, multiple subcontrollers can be constructed as follows:where and are the learning rates of the two NNs for model .

For every moment, the following index function is calculated to show the matching degree between current system and every model:where is the forgetting factor, and model error is given as where

At every moment, the most accurate model to describe current system is selected,and, at the switching point, and the controller is selected to control the system.

4.2. Multiple Model ADP with Estimated Submodels

In Section 4.1, we discussed multiple model ADP for accurately known submodels. Because the submodels are precisely known, subcontroller can control the system directly if corresponding submodel matches current system. However, practically, submodels are estimated and somehow unprecise. In this case, using control scheme in Section 4.1 can not obtain satisfying control result as this scheme lacks adaptivity.

For system with jumping parameters as (1), system can be viewed as different dynamic characters described by different fixed parameters. ADP tracking controller discussed in Section 3 does not require any knowledge of drift dynamics of the system, which means it has the adaptivity to deal with model uncertainty. However, the controller needs the initial control to satisfy the admissible condition for corresponding system. Thus, once system endures abrupt changes of parameters and the control signal at the change moment does not satisfy initial admissible condition after parameter change, the ADP controller can not make the state track the desired trajectory accurately.

The main idea of multiple model adaptive ADP to deal with estimated submodels is designing a main controller to improve its adaptivity and using multiple submodels and subcontrollers to guarantee the initial admissible control condition after system changes. The structure of multiple models ADP is shown in Figure 3.

The main procedure of multiple model adaptive ADP tracking control is as follows:

(1)Multiple models are established to cover the uncertainty of system.(2)Multiple ADP subcontrollers are set up according to multiple models.(3)Multiple independent subcontrollers run in parallel to track the same referenced trajectory.(4)At every moment, a switching index function is calculated to decide the closest model corresponding to current system.(5)Once there is model switch showed by switching index function, state and control of corresponding subcontroller are selected as the initial condition of the main ADP controller for the new stage.

4.2.1. Design of Main Controller

For system with jumping parameters as (1), we adopt the ADP controller in Section 3 as the main controller. However, different initial parameters including the initial admissible control and initial state should be given according to the control and state of the most accurate model as shown in Figure 3.

4.2.2. Establishment of Multiple Models and Subcontrollers

According to different working conditions, multiple estimation models are constructed to cover system uncertainty. And it has to be ensured that, for every working condition, there must be at least one model which is close enough to corresponding plant.

Multiple models are set up aswhere .

Theoretically, for given system with jumping parameters, control performance will be improved with more submodels. However, too many submodels will also increase the calculation and may cause frequent model switching. Suitable number of submodels depends on experience and simulation.

According to Theorem 4, for model , we can find corresponding initial weight of the actor NN reflecting the initial admissible control , and appropriate learning rates and , so that output state of model can track the desired trajectory if the weights of two NNs are tuned according to (20) and (28), respectively.

Thus, multiple ADP subcontrollers corresponding to multiple models can be designed as follows:where is initial admissible control, is the initial states, and are selected learning rates of the critic and actor NNs, respectively, for the th model. Figure 2 shows the structure of subcontrollers.

The role of ADP subcontrollers is to supply initial parameters , including initial state and initial weights of the actor NN and which reflect the initial control , for main ADP controller:As the parameter change and model switch can happen at any moment, multiple independent subcontrollers run in parallel at all times.

4.2.3. Choice of Switching Mechanism

At every moment, the switching function determines which model is closest to system and which group of initial parameters should be given to the main controller. To avoid incorrect switch caused by performance of single point, we employ the following accumulation of model error as the index function:where is the forgetting factor, model estimation error of the th model is defined as , and model state for comparing is defined as follows:

At every moment, the best model to describe current system will be selected,and the state and actor NN’s weights of the subcontroller will be selected as the initial parameters of the main controller for the system of new stage.

Remark 5. Theoretically, multiple model ADP with estimated submodels in Section 4.2 is still effective if the estimated submodels describe system models precisely. Actually, multiple model ADP with accurate submodels in Section 4.1 can be viewed as a special case of the multiple model ADP with estimated submodels.

Remark 6. If submodels are precisely known, model error turns zero if the submodel matches current system. And the model indexed by switching performance function (33) is the most accurate normally. However, in few cases such as when there is disturbance, (33) will index the wrong best model. Moreover, in most cases, precise jumping parameters are hard to be obtained. Therefore, we prefer the latter multiple model scheme given in Section 4.2 with main ADP controller as the final optimal control strategy.

Remark 7. For multiple model adaptive ADP described in Section 4.2, convergence can hardly be proved if there are infinite model switches. In order to facilitate the convergence and stability, the following assumption is made. Assume that model switch starts at time step , we set up a period and a tracking error limit to improve transient response. If or , the switching between multiple submodels is stopped and the main ADP controller keeps working. Thus, the optimal tracking control problem can be viewed the same as in Section 3 and convergence can be guaranteed by Theorem 4.

5. Simulation

In this section, an experiment is constructed to show the effectiveness of the proposed method.

Consider the following nonlinear system:where is the state vector with initial state and control input is bounded by . Compared with system (1), drift dynamics and the input dynamics can be denoted aswhere jumping parameters satisfy

The control objective is to force the system state to track the following target trajectory in an optimal manner:where .

In this example, the cost function as (9) is designed with , and , where is a two-dimensional unit diagonal matrix. Two known estimated submodels consist of the same input dynamics as in (42) and the following drift dynamics:where and and .

The proposed ADP control algorithm is applied to construct subcontrollers and the main controller. The two-layer critic and actor NNs are designed with five neurons in the hidden layers and the activation functions both adopt the function. For the subcontrollers, initial weights of the critic NN are set as random values in ; initial weights of the critic NN are selected to reflect the initial admissible control for corresponding submodels.

(1) Single Model ADP. For nonlinear system with jumping parameters as (42), only the first submodel and subcontroller are adopted. From Figure 4, we can see that the system state can track the desired trajectory perfectly at the first stage. However, after parameter change at the 50th time step, adopted submodel does not match system, and the controller can not track the desired trajectory any more. In Figure 4, output state after step 55 is omitted as the state diverges far away from desired trajectory.

(2) Multiple Model ADP. The proposed multiple model ADP strategy is adopted to control the system with jumping parameters. Forgetting factor in the switching index function (39) is set to be .

Figure 5 shows that the model switching process can perfectly match the system change. For example, after parameter changes from to at the 50th time step, models are switched from model with to model with .

The control result by using the proposed multiple model ADP method is showed in Figures 6 and 7. States deviate from the desired trajectory when there are parameter changes. However, as the most precise model is selected and corresponding initial parameters are given to the main controller, system states can track the desired trajectory again after some transient process. Figure 8 shows that the control input is bounded in .

6. Conclusion

This paper proposes a ADP based multiple model adaptive control scheme for nonlinear system with jumping parameters. System uncertainty is covered by multiple submodels; corresponding multiple subcontrollers are constructed and run in parallel. A switch mechanism is introduced to decide the most precise model and corresponding initial parameters, so that the initial admissible condition can be satisfied at the whole time. The proposed method realizes the optimal tracking control for nonlinear system with jumping parameters and improves the transient response and control quality greatly. As the model switch occurs after some period of model error accumulation, transient response may not be satisfying. In the future, the main work focuses on designing a scheme to improve the control quality further by introducing an upper limit of state or adopting multiple set-points.

Appendix

Proof of Theorem 4. For simplification of proof process, we consider the weights of hidden layer of two NNs that keep fixed after some tuning time.
Define weight estimation error of the action and critic network as and , respectively. For simplification, denote and , where and are fixed estimation weights after tuning. Denote and . Assume is bounded as .
Consider the following positive definite Lyapunov candidate:where , , , , and denotes the maximum singular value of . The first difference of is given asFirst, considering , substituting (7) and (13), and then applying the Cauchy-Schwarz inequality [24] obtainThe optimal closed loop system is upper bounded aswhere is a positive constant. Define ; we getNext, considerSubstituting (15) and (16) revealsFurther, combining (17) and (18), we haveThus, we have the dynamics in respect to weight estimation error of the critic NN asDenote ; we haveThird, ,Considering (13) and substituting (16) and (23) revealwhere .
According to (17) and (A.12), can be rewritten asSince , substituting (A.13), we getwhere
Then, substituting (A.5), (A.10), and (A.14) into (A.2) yieldswhereTherefore, if any of the following inequalities holds:and the learning rates of the two networks are selected as and for nonlinear systems with optimal closed loop bounds described as . Therefore, according to the Lyapunov extensions [25], the system states and the weight estimation error of the critic and actor NNs are UUB.
Finally, using (23) and (24),Thus, we haveThis completes the proof.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

Acknowledgments

This work was supported by the Specialized Research Fund for the Doctoral Program of Higher Education (SRFDP) (Grant no. 20130006110008) and National Natural Science Foundation of China (Grant no. 61473034).