Abstract

This study develops an adaptive dynamic programming (ADP) scheme for uncertain systems to achieve the robust trajectory tracking. In this framework, the augmented state is first established via combining the tracking error and reference trajectory, where the robust tracking control problem can be resolved using the regulation control strategy. Then, the robust control problem of uncertain system can be represented as an optimal control problem of nominal system, which provides a new pathway to address the robust control problem. To realize the optimal control, the derived Hamilton–Jacobi–Bellman equation (HJBE) is solved by training a critic neural network (CNN). Finally, two innovative critic learning techniques are suggested to calculate the unknown NN weights, where the convergence of NN weights can be guaranteed. Simulations are carried out to demonstrate the effectiveness of the proposed method.

1. Introduction

Considering the various and complex working conditions, some systems cannot avoid suffering from the uncertainties and nonlinearities, which brings challenge for higher control performance of these systems [1, 2]. Designing a robust controller for uncertain systems that can accommodate such uncertainties has always been an important yet challenging problem. To retain the robustness property, various robust control methods have been developed against model uncertainties [38]. Conventional robust control methods, however, were studied mainly via offline means. Moreover, the majority of existing robust control designs were devoted to addressing regulation issues [9], while the tracking problems of dynamic systems are even more difficult to be tackled. Although some satisfactory results have been yielded, the developed optimal control is implemented offline [10, 11].

Adaptive dynamic programming (ADP) [12] is a new subject formed by the development of artificial intelligence and control. One of the well-known merits of the ADP method is that it can yield an approximation solution [13, 14] due to the employed critic neural network (CNN). Hence, it has been widely tailored and applied to solve the unknown cost function for optimal control. However, it is worth mentioning that the classical ADP framework requires an extra actor NN [12] to estimate the optimal control action, which results in a more complex ADP structure. To further reduce the computational burden, an innovative identifier-critic-based ADP structure was reported in [15], where only one CNN is used, and a new adaptive law is introduced to retain the convergence of CNN weights. Owing to its learning ability, recent work has also been carried out to solve specific robust control problems [16, 17]. Nevertheless, most existing ADP methods aim at the optimal or robust control regulation problem only.

In fact, the trajectory tracking performance of uncertain systems is an important index to realize the stable and reliable work such as robotic systems. The tracking control of the robotic systems is mainly reflected in the tracking reference signal of each joint, so as to realize the end tracking to the desired trajectory. Therefore, it is necessary to design an appropriate controller for the uncertain nonlinear system to ensure the optimal tracking performance. Recently, some ADP-based tracking control results have been studied. In [18], an optimal tracking controller designed by the ADP was incorporated into steady-state control, by which the steady-state tracking response can be guaranteed, but the suboptimal control problem was obtained. In [19], an optimal tracking control scheme with an iterative learning method is introduced. Nevertheless, the aforementioned results are devoted to realizing the optimal tracking control without considering the uncertain system dynamics.

Inspired by the above discussions, we will develop a novel ADP-based online adaptive critic method for robust trajectory tracking of systems with uncertain dynamics. To realize this purpose, the augmented state is established in terms of the tracking error and reference trajectory simultaneously, such that we can design the tracking controller from the perspective of regulation control problem. Then, the robust control problem is represented as an optimal control problem of the nominal systems with an appropriate index function; thus, a Hamilton–Jacobi–Bellman equation (HJBE) can be derived. Furthermore, two critic learning laws are presented to online calculate the unknown CNN weights; thus, the solution of HJBE can be resolved and the convergence of the CNN weights can be retained simultaneously. Finally, simulation results are provided to exemplify efficiency of the proposed method.

The main contributions of this study are summarized as follows:(1)Different from the classical robust control synthesis, the robust trajectory tracking problem is represented as an equivalent optimal regulation problem. Then, we can obtain the online solution using the critic learning algorithms based on ADP scheme.(2)Based on the ADP framework, a simple CNN is used to reconstruct the cost function on the premise of ensuring convergence, by which the actor NN applied in the existing ADP structure is eliminated.(3)A new adaptive critic learning law driven via CNN weights error is extracted to online training the CNN weights, where the convergence of the updated CNN weights can be guaranteed.

The rest of the study is arranged as follows. In Section 2, the system dynamics and problem formulation are described. Section 3 gives the robust tracking control design using the ADP scheme. Simulation results are provided in Section 4. Section 5 summarizes the main content of this study.

2. Problem Description

Consider the following system with uncertain dynamics:where denotes the state variables, is the control input, and are the differentiable functions with , and and are the uncertain terms with and . In this study, we presume the uncertainties and with [10, 11].

The objective of this study is achieving the trajectory tracking control of uncertain system (1); hence, the tracking error between the system output and the reference trajectory is designed aswhere the reference trajectory is Lipschitz continuous.

According to system (1) and reference trajectory , the time derivative of is calculated as

Furthermore, define augmented state , and then, an augmented system can be derived along with (3):where

Then, we have new uncertainties and as

According to the above development, one can find that the original tracking control problem can be transformed into a regulation problem of system (4) [20]. Different from the existed result [20], there exists the uncertain terms and in (4), making a robust control problem. Inspired by [10, 21], we will further exploit the robust trajectory tracking control action via the optimal equation.

Now, the nominal plant of system (4) can be defined:

Then, the following cost function (8) should be minimized via an optimal control [20, 22, 23]:where is the discount factor; this can guarantee the boundedness of the cost function even if the reference trajectory and the embedded feedforward control action do not converge to zero. , where and denote the appropriate symmetric matrices.

The cost function, defined by , is the optimal value of denoted in (8), i.e.,

Using Bellman’s optimality principle to in (9), we know the derivative of satisfieswith

Applying the optimal control theory, the control solution of the optimization problem is

To implement the tracking control, we choose the utility function as

Remark 1. As discussed in [22], this study constructs an equivalence between the robust trajectory tracking problem and an optimal control problem of system (4) with the cost function (9), which gives a novel idea to resolve the robust trajectory tracking problem.

3. Solving Robust Tracking Control via ADP

This section will find an optimal solution for in (12). Within the ADP framework, a CNN is first constructed to update the optimal cost function; then, a CNN self-learning is proposed to learn robust tracking control solution. To improve the convergence of the CNN weights and transient performance, an adaptive learning scheme based on the estimated CNN weight error is developed to approximate the CNN weights; thus, the robust tracking control policy can be obtained.

3.1. CNN Design

Applying the universal approximation property, the estimated cost function is constructed via an CNN given in (14), which can be considered as a continuous function:where is the ideal CNN weight, denotes the activation function, is the number of hidden neurons, and is the construction error. Its derivative along giveswith and .

Generally, the optimal CNN weight is to resolve; thus, the approximated involved in cost function is expressed viawith being the approximated CNN weight. Its derivative is

Based on (14) and (15), we can rewrite (12) as

The practical optimal controller becomes

From (19), one can claim that the tracking performance (i.e., ) heavily relies on the estimation performance of . Nevertheless, most of the traditional learning algorithms based on ADP cannot make the CNN weights converge to the true values. Therefore, an actor NN needs to be introduced to stabilize the system. To overcome this issue, we will present a novel self-learning algorithm to online learn the CNN weights without using the actor NN.

3.2. Self-Learning Robust Tracking Control

This section will develop a self-learning algorithm to online learn the CNN weights, so as to obtain the robust tracking control. To this end, based on Section 2, we have the estimated Hamiltonian function as

To realize the self-learning for CNN weights, we define the objective function . Then, based on the standard steepest descent algorithm, the critic learning law can be designed aswith being the learning gain.

From (20), we rewrite Hamiltonian function aswhere is the residual error due to the neural network approximation.

Denote ; then, recalling (20) and (22), we have ; let ; then, we have with , . Thus, the weight estimation error dynamic can be given as

Then, we include some results as follows.

Theorem 1. Consider system (7) with the learning law (21); the CNN weights’ error is uniform ultimate boundedness (UUB).

Proof. One chooses a Lyapunov function as ; then, we can derive its time derivative:Applying some mathematical operation on (24), we haveConsider the Cauchy–Schwarz inequality and assumption ; we can claim that for andFrom the Lyapunov theory, we can obtain the CNN weights error is UUB.
In this case, using the same Lyapunov method, we can get with . This UUB stability can refer to [24].

3.3. Adaptive Learning Robust Tracking Control

To realize the fast convergence of CNN weights, this section will develop a novel adaptive learning algorithm. To this end, we first substitute (15) into (10) aswith being the HJBE approximate error.

To approximate the CNN weight with the novel adaptive critic learning algorithm, we first introduce two variables to redefine (27), such that

Then, based on (28), we rewrite (27) as

According to (29), the unknown CNN weights are involved. Thus, the adaptive critic learning law can be designed to online approximate the CNN weights . For this purpose, two auxiliary matrices and are denoted aswith being the positive constant. Thus, the solution of and can be calculated based on (30) as

Furthermore, another auxiliary vector is defined along with and as

Based on (29) and (31), one can derive with a bounded variable, i.e., for a positive constant . Then, the auxiliary vector from (30)–(32) can be rewritten aswhere is the estimation error of NN weights.

Therefore, an adaptive critic learning law can be designed via (33) aswhere is the learning gain.

It is noted that the critic learning law (34) contains the weights , which can drive the estimate toward to its true values, i.e., . This helps to achieve tracking .

Before proving the convergence property, the property of matrix is examined:

Lemma 1 (see [25, 26]). If the variable given in (28) satisfies the persistent excitation (PE) condition, then the matrix presented in (31) is positive define.

We can include the advantage of the designed learning technique as follows.

Theorem 2. For (29) with critic learning law (34), the weights error will exponentially converge to a compact set around zero provided that the vector is persistently excited.

Proof. According to Lemma 1, we have the minimal eigenvalue of as . One chooses a Lyapunov function as ; then, its time derivative can be calculated asFrom (35), we can conclude that the estimation error converge to , and the magnitude is decided by the construction error and the excitation level . Ideally, ; we have that the convergence of estimated error can be guaranteed, i.e., . This completes the proof.
Consider system (7) with practical optimal control (19) and critic learning law (34); the error and augmented system state are uniformly ultimately bounded (UUB) provided that the vector satisfies the PE condition. Then, the practical optimal control in (19) is convergent to its optimal solution given in (18), i.e., .

3.4. Analysis for Robust Trajectory Tracking Performance

This section will show the tracking performance of controlled system.

Theorem 3. Considering the augmented plant (7) and cost function (8), the optimal control action (19) can guarantee the tracking error is UUB.

Proof. Based on (10) and (12), we haveUsing (36), we apply the optimal control (19) to the uncertain system (4). Because we can consider the cost function as a Lyapunov function and the fact , then its time derivative can be derived asThen, we haveConsider the fact , and . We denote and as the minimal and maximal eigenvalues of a matrix, respectively. From the fact and the optimal cost function being bounded by , we can finally derive (38):To obtain , the tracking error should satisfy the following condition with being the constant. Moreover, we denote . Then, we have that applying the optimal action (19) to the uncertain system (4), the tracking error is UUB. This completes the proof.

4. Simulations’ Verification

This section will first present numerical simulations to show the effectiveness of proposed method and then apply proposed methods to the 2-DOF robotic system to illustrate the applicability.

4.1. Numerical Simulations

In this section, we will provide numerical simulation results to illustrate the effectiveness of the developed method in terms of an uncertain system [27]:where is the system state, denotes the system input, and the terms with and with and are the uncertain parameters.

The system initial state is set as and the desired trajectory is chosen as and . Then, from the fact and , we can construct the augmented system as (4). Hence, we can obtain

To obtain the optimal control solution, we set matrices , . Moreover, the discounted factor in cost function (8) is set as .

The other parameters in this simulation are selected as and ; the parameters in uncertain terms are selected as , , and .

To obtain the optimal control action, we apply a single NN to reformulate HJBE (27), i.e., estimate ; its convergence of weights can be guaranteed using the learning law (34). In the process of simulation, we set the activation function of CNN as and the initial values for the CNN weights are chosen as . After the online learning of , the CNN weights are convergence to its certain values, which are displayed in Figure 1. Then, the online updated CNN weights are applied to robust trajectory tracking implementation. Figure 2 gives the tracking performance of uncertain system (44). To better show the effectiveness of the developed method, the tracking error between practical system output and desired trajectory is provided in Figure 3, which can converge to zero, and the proposed tracking control action is also displayed in Figure 3, which is bounded and smooth.

To show the effectiveness of the proposed single critic NN-based ADP, a critic-actor NN-based ADP method [18] is proposed to use for comparison. The critic-actor weights can be found in Figure 4, and the corresponding control performances are given in Figure 5. We have that the proposed single critic NN-based ADP has the advantages of fast convergence speed and high control accuracy.

To further show the robustness of the suggested control, we will give other two cases: (1) , , and and (2) , , and . Then, the tracking performance and control action can be yielded in Figures 69, where we have that even under different uncertainties, the system can effectively track the reference trajectory; i.e., the robustness of the designed controller can be guaranteed.

Finally, some simulations for (21) are given in Figures 10 and 11. From these results, we have that the convergence of CNN weights in Figure 10 is more than that in Figure 1 and used adaptive learning 34; this lead to a poor tracking performance, as shown in Figure 11.

4.2. Application to Robotic Systems

In this section, comparative simulations are tested out via a 2-DOF robotic system, to illustrate the feasibility of the presented adaptive robust tracking control method. We first give the dynamics of the robotic system as [28]where denotes the join variables, is the generalized forces, is the inertia matrix, represents the Coriolis/centripetal vector, is the gravity vector, and is the friction vector. This study denotes . and are uncertain because the unknown load and unmodeled frictions. The inertia matrix iswhere . , and .

The centripetal vector iswhere and .

The friction vector and gravity matrix denotewhere and .

Based on the above facts, we can define the state equation of system (42) aswhere , , , , and .

To realize experimental verification, the dynamic parameters of this 2-DOF robot are given in Table 1. The upper bound of uncertainty is given:

To compete the tracking control, the sinusoidal signals , , , and are chosen as the desired trajectories. The regressor vector of CNN is designed as . Because the uncertainties in and are existed due to issues such as the different load to be picked, then we consider the following cases.

Case 1. In this case, we will test the tracking response under the load . The control parameters are selected as , , , , and .
Figure 12 shows the learning results of the online updated weights using the proposed method (34). Then, based on the solved results, the profiles of the tracking performance are given in Figures 13 and 14, in which one can find that both the proposed control method (34) can make both , and , track the given command perfectly. This can be further found in Figure 15, where the tracking errors are given. Figure 16 shows the control action, which is bounded.
Since the robotic system uncertainty mostly comes from the uncertainty of end load, then we set a Case 2; the maximum end load is given in this case.

Case 2. In this case, we will test the tracking response under the load , i.e., maximum end load. The control parameters are given as: , , , , and .
Based on the learning results in Figure 12, the tracking response in this case is given in Figures 17 and 18, and the corresponding tracking errors are given Figure 19. From Figures 1719, we have that even if the end load of the robot becomes larger, axis 1 and axis 2 of the robotic system can track accurately on the basis of previous learning.
Finally, we give some results for proposed learning (21), as shown Figures 20 and 21, in which we have that although the accurate tracking of the robot can finally be realized, compared with the learning law (34), the learning time of neural network weight is too long, which will reduce the transient of tracking.
The above simulation results show the effectiveness of the proposed control method and the feasibility of the learning algorithm.

5. Conclusion

This study aims at resolving the robust trajectory tracking of uncertain systems, which adopts the ideas of ADP presented for optimal control. The basic concept is to represent the robust control problem as an equivalent optimal control problem for the nominal system, where the augmented state is considered. Thus, we can resolve this robust tracking control problem using an optimal control method. To obtain the optimal control solution, two adaptive critic learning techniques are developed via ADP scheme, where the parameter estimation method is adopted. The closed-loop system stability and robust tracking performance are rigorously proved; simulation results are given to exemplify the feasibility of the developed method. In our future work, we will extend the proposed idea to address the robust tracking control problem with several unmatched dynamics cases, which allows to carry out practical experimental validations based on existing test rigs in our lab.

Data Availability

The codes used to support the findings of this study are available from corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by National Key R&D Program (863 Program) (2017YFB0102800).