Discrete Dynamics in Nature and Society

Volume 2016 (2016), Article ID 6023892, 12 pages

http://dx.doi.org/10.1155/2016/6023892

## Multiple Model Adaptive Tracking Control Based on Adaptive Dynamic Programming

^{1}School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China^{2}College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100124, China^{3}School of International Studies, Communication University of China (CUC), Beijing 100024, China

Received 25 December 2015; Accepted 17 February 2016

Academic Editor: Filippo Cacace

Copyright © 2016 Kang Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Adaptive dynamic programming (ADP) has been tested as an effective method for optimal control of nonlinear system. However, as the structure of ADP requires control input to satisfy the initial admissible control condition, the control performance may be deteriorated due to abrupt parameter change or system failure. In this paper, we introduce the multiple models idea into ADP, multiple subcontrollers run in parallel to supply multiple initial conditions for different environments, and a switching index is set up to decide the appropriate initial conditions for current system. By taking this strategy, the proposed multiple model ADP achieves optimal control for system with jumping parameters. The convergence of multiple model adaptive control based on ADP is proved and the simulation shows that the proposed method can improve the transient response of system effectively.

#### 1. Introduction

In recent years, multiple model adaptive control (MMAC) has been a research focus on improving the transient response of nonlinear system. In practical control process, system dynamics may change abruptly due to system failure or parameter change. Traditional adaptive control methods can not deal with this kind of change, resulting in bad transient response or even system unstability. According to multiple model adaptive control theory, multiple models will be established to cover system uncertainty; corresponding multiple controllers will also be constructed [1]. Based on the switching mechanism, at every moment, the controller corresponding to the model which is the closest to current system will be selected as the current controller. Thus, the transient response and the control property will be greatly improved.

From 1990s, multiple model adaptive control based on index switching function has obtained satisfying results for linear system, linear time-variant system with jumping parameters, and stochastic system with stochastic disturbance. However, for nonlinear system, there is still no identical research method or satisfying process result. Among the main MMAC researches for nonlinear system, multiple model adaptive control based on neural networks has attached more and more attention [2–4]. Because the neural network shows outstanding performance in approximating nonlinear system, it can turn the system uncertainty into the uncertainty of weights and structure of neural networks. Thus, multiple adaptive control for nonlinear system can be designed based on the change of weights and structure of neural networks.

In recent years, neural networks (NNs) and fuzzy logic are widely used to handle the control problem of nonlinear systems owing to their fast adaptability and excellent approximation ability. For system without complete model information or system regarded as “black-box,” neural networks show great advantage. For uncertain nonlinear discrete-time system with dead-zone input, [5] introduces NNs to approximate the unknown functions in the transformed systems, so that the tracking error converges with the dead zone handled by an adaptive compensative term. Fuzzy logic systems are used to approximate the unknown functions to achieve control for discrete-time system with backlash [6, 7] or input constraint [8].

Combining dynamic programming, neural networks, and reinforcement learning [9], adaptive dynamic programming (ADP) solved the problem of “curse of dimensionality” in traditional dynamic programming and provides a practical control scheme for optimal control of nonlinear system. ADP adopts two neural networks, one critic neural network to approximate the cost function and one actor neural network to approximate the control strategy, so that the optimal principle can be satisfied [10, 11]. In 2002, Murray proposed the iterative ADP algorithm for continuous-time system firstly. Iterative ADP can update the policy equation and value function by iteration of policy and value [12, 13]. However, iterative ADP can only be used to calculate offline due to its long-time calculation caused by uncertain iteration times. In recent years, online ADP strategies are proposed widely [14–17]. They can obtain the optimal solution in an adaptive means rather than by offline calculation.

Paper [18] proposed a ADP tracking strategy which does not require any knowledge of drift dynamics of the system, which means it has the adaptivity to deal with model uncertainty. However, as in most existing online ADP methods, the controller needs the initial control to satisfy the admissible condition for corresponding system [15, 19, 20]. Thus, once system endures abrupt changes of parameters and control signal at the change moment does not satisfy initial admissible condition after parameter change, the ADP controller can not make the state track the desired trajectory any more. In this paper, we introduce MMAC into ADP; multiple models are established to cover uncertainty of system; correspondingly, multiple subcontrollers are constructed and run in parallel. A switching index function is introduced to decide the most accurate model to describe current system. Once there is a model switching, corresponding controller will be selected to provide its current state and control signal as the initial condition of system. Based on this idea, we design multiple fixed models if the submodels are precisely known. And, for imprecise estimation models, multiple fixed models and one adaptive model are combined to obtain an improved transient response.

This paper is organized as follows. System with jumping parameters is described in Section 2. Then, a transformed ADP tracking control scheme is introduced and proved convergent in Section 3. In Section 4 the main structure of MMAC based on ADP is described and two kinds of MMAC strategies are introduced for precisely known submodels and imprecise models. Simulation experiments are shown in Section 5 and Section 6 concludes this paper.

#### 2. Problem Description

Consider the following nonlinear discrete-time system with jumping parameters:where represents system state and constrained control input is denoted by , where is the constraint bound of the th actuator. is a time-varying parameter satisfying the following assumption.

*Assumption 1. * is a piecewise constant function in respect to , , where is finite integer. does not change frequently, that is, time between two different constants is long enough. And will finally stop at one constant.

The objective of the tracking problem is to design an optimal controller with constrained control signal so that the output state can track the following desired trajectory in an optimal way:

As shown in [21], (2) can generate large class of trajectories satisfying the requirement of most applications, including unit step, sinusoidal waveforms, and damped sinusoids.

#### 3. Trajectory Tracking Based on ADP

For the following nonlinear discrete-time nonlinear system without jumping parameters,define the following tracking error:We have . Combining (2), (3), and (4), the following dynamic equation in respect to , , and is given:

Rewrite (2) and (5) in the following matrix form [21]:Further, the system can be rewritten as the following transformed dynamics in terms of control input :whereand satisfies .

The infinite-horizon scalar cost function can be defined aswhere ; is defined as is positive definite. To deal with constrained control input, we employ the following function [22]:where is a diagonal matrix defined as , , is a positive definite diagonal matrix, and is a one-to-one function satisfying and its first derivative is bounded by a constant. At the same time, it should be a monotonic increasing odd function. Consider

*Definition 2 (see [23]). *A control policy is said to be admissible if is continuous, , stabilizes (3), and, for every initial state , is finite.

According to Bellman optimal principle and the first-order necessary condition, theoretical optimal control law can be calculated aswhereand theoretical HJB equation is derived aswhere

In the following part of this section, an online actor-critic structure is introduced to solve the optimal tracking problem, the critic neural network (NN) is designed to approximate the value function, and the actor NN is designed to approximate the optimal control signal.

*(**1) Critic NN*. A two-layer NN is utilized as the critic NN to approximate the value functionwhere and are constant target weights of the hidden layer and output layer, respectively, is the activation function and is bounded approximation error, and is the number of neurons in hidden layer. , , and gradient of are assumed to be bounded as , , and , respectively.

The actual output of the critic NN is given aswhere and are the estimations of and .

Then, the approximate HJB function error can be derived as follows:

The goal of critic NN is to minimize the following function:

Using the gradient-descent method, the update law of the critic NN is given as

In this paper, we select the activation function of critic NN as , so we havewhere and . Then

*(**2) Actor NN*. To obtain the optimal control input, a two-layer NN is utilized as the actor NN to approximate :where and are constant target weights of the hidden layer and output layer, respectively, is corresponding activation function, is the bounded approximation error, and is the number of neurons in hidden layer. and are assumed to be bounded as and .

The actual output of the actor NN is given aswhere and are the estimations of and , respectively. Using (17), the actual approximation target is

The goal of the actor NN is to minimize the following function:where the actor NN approximation error is defined asUsing the gradient-descent method, the update law of the actor NN is given as

In this paper, activation function of actor NN is selected as . Define ; we have

Finally, optimal control signal is obtained as follows:

*Remark 3. *To obtain the optimal control policy, the actor NN is designed to approximate so that the control signal can be strictly restricted in given constraints by using the function as in (30), while, in some cases, the actor NN approximates directly, resulting in control signal out of constraints due to unsuitable weights in the initial period.

Theorem 4. *For nonlinear discrete-time system given by (7), let the weight tuning laws of the critic NN and actor NN be given by (20) and (28), respectively, and let the initial weight of the actor NN reflect the initial admissible control of system (7). There exist positive constants and such that system state and estimation errors of two networks are all uniformly ultimately bounded (UUB).*

Proof of Theorem 4 is shown in the Appendix.

In contrast with traditional ADP tracking strategies, the above method does not require the knowledge of the system drift dynamics. By this means, it supplies some adaptability that for systems with different drift dynamics this method can still make system state track the desired trajectory. However, for different systems, initial admissible control conditions must be required.

#### 4. Multiple Model Control Scheme Based on ADP

In this section, firstly, we propose the multiple model ADP for system with accurately known submodels. Secondly, an adaptive ADP main controller is introduced so that the new multiple model ADP can deal with the problem of estimated submodel.

##### 4.1. Multiple Model ADP with Accurately Known Submodels

In this section, we consider the case that known submodels can reflect system dynamics at every working point precisely as follows:where

According to the idea of multiple model adaptive control, it is natural to design independent multiple subcontrollers to track the target trajectory in parallel and use a switch index function to decide the best controller to control current system. The main structure of multiple model ADP controller for accurate known submodels is shown in Figure 1.