Abstract

The feedback PID method was mainly used for the navigating control of an unmanned surface vessel (USV). However, when the intelligent control era is coming now, the USV can be navigated more effectively. According to the USV character in its navigating control, this paper presents a parallel action-network ADHDP method. This method connects an adaptive controller parallel to the action network of the ADHDP. The adaptive controller adopts a RBF neural network approximation based on the Lyapunov stability analysis to ensure the system stability. The simulation results show that the parallel action-network ADHDP method has an adaptive control character and can navigate the USV more accurately and rapidly. In addition, this method can also eliminate the overshoot of the ADHDP controller when navigating the USV in various situations. Therefore, the adaptive stability design can greatly improve the navigating control and effectively overcome the ADHDP algorithm limitation. Thus, this adaptive control can be one of the intelligent ADHDP control methods. Furthermore, this method will be a foundation for the development of an intelligent USV controller.

1. Introduction

The development of science and technology is speeding up the informatization to intellectualization. Many systems are facing the new problems of intelligent control. The intelligent control can also be widely used in the unmanned surface vessel (USV) for its learning and adaptive ability. At present, the control research for the USV mainly focuses on the path-following, path planning, formation control, and experiment etc.

Among the above researches, the path-following has been researched at most. In 2017, Shin et al. proposed a path-following control for the USV based on an identified dynamic model [1]. In 2018, Qin et al. solved the path-tracking control of the USV with input saturation and full-state constraints [2]; Qu et al. presented an exponential path-tracking control of the USV with external disturbance, system uncertainties [3], etc. These methods generally require the precise model of the USV or external environment, which are difficult to establish.

The path planning is also one of the major research studies. In 2017, Kim et al. fulfilled the path optimization of the USV under environmental loads using the genetic algorithm [4]. In 2018, Lyu and Yin used a path-guided potential-field method to achieve the path planning in restricted waters [5]. In 2019, Wang et al. adopted an improved grey wolf optimizer to optimize the USV trajectory [6]. This is a new intelligent method, but it still has some shortcomings.

In addition, in 2017, Klinger et al. evaluated the intelligent controller performance through a USV experiment [7]. In 2018, Conte et al. developed a ROS of multi-agent structure for the USV, and tested its path planning [8]. Dai et al. combined many methods to achieve the UAV formation control [9] and so on. For more detailed development and trend analysis, please refer to these review papers [1012].

However, the nonlinear adaptive optimal control of the USV has not been fully investigated. This is an important task in the intelligent USV control, and further research is needed. For the adaptive optimal control of the USV, it usually needs solving the Hamilton–Jacobi–Behrman (HJB) equation. This equation is difficult to solve in many cases except for a linear quadratic system. The classical dynamic programming method still suffers to the curse of dimensionality. Fortunately, in recent years, the approximate dynamic programming (ADP) method has arisen with neural network approximation [13]. Liu et al. gives the most recent developments of the ADP theory and its advances in industrial control [14]. Yang et al. presented a guaranteed cost neural tracking control for a class of uncertain nonlinear systems using the ADP [15]. These researches make the ADP suitable for the optimal control of minimizing a performance index. Thus, the ADP can also be used to learn and adjust the navigating according to the USV conditions.

However, in practice, simply relying on the ADP cannot achieve enough accurate adjustment, especially under an external influence, while the adaptive control has the ability to adapt to different environment. If we combined the advantages of these two kinds of control, the USV navigating can be adjusted more quickly and accurately. Therefore, in this paper, these two methods are jointed for the navigating USV. Then, based on the analysis of the USV, this paper presents a parallel action-network ADP method. This method connects an adaptive controller parallel to the action network of the ADP. This parallel adaptive controller adopts a RBF neural network approximation based on the Lyapunov stability analysis. The simulation results show that the parallel action-network ADP has an adaptive control character and can navigate the USV more accurately and rapidly. In addition, this method can also eliminate the overshoot of the ADP controller when navigating the USV in various situations. Thus, this adaptive control can be one of the intelligent ADP control methods. This method will also be a foundation to the development of an intelligent USV controller.

2. The ADHDP Control Method

The action-dependent heuristic dynamic programming (ADHDP) is one of the main approaches in the ADP family. The critic and action networks of the ADHDP are usually established based on the BP neural network [16, 17]. As shown in Figure 1, the main function of the critic network uses the system state to approximate the cost function . The correctness of the cost function will guide the action network to the optimal approximation. If the output of the action network is received as part of the input to the critic network, it forms a new critic network. The new critic network conceals a controlled object model and approximates the new cost function . The output is an estimation of the cost function in the next time. Thus, , and the ADHDP can perform control without the mathematical plant model.

2.1. The Problem Formulation

The action-critic scheme of the ADHDP and its symbols are both shown in Figure 2. Assuming that a nonlinear system is set up aswhere is the system state of time , is the system action of time , and is the state transition equation of the nonlinear system.

In order to let the system run in an optimal state for the dynamic programming algorithm, the cost function must be defined as [18]where is the discount factor (), and is the utility function of each time step. The objective is to choose the control action, , , so that the cost function defined in equation (2) is minimized [19].

2.2. The Critic Network

As shown in Figure 2, the input vector to the critic network is

The input to the hidden layer is calculated as

in the hidden layer of Figure 2 means the nonlinear Sigmodial function. Thus, the output of the hidden layer is

The / in the output layer means adopting a linear function, so the output layer is

The ADHDP approximates the optimal solution by minimizing the new cost function; that is, . This is achieved by training the new critic network to minimize the following error measured over time:where , is the critic network weights. The output of time is an estimation of the dynamic programming at time , which is the cost function in the next time.

Then, the gradient descent algorithm is used to update the critic network weights and minimize at each time step bywhere is the learning rate of the critic network at time , which decreases to a very small value with time [20].

2.3. The Action Network

As shown in Figure 2, the input to the action network is

The input to the hidden layer of the action network is calculated as

The hidden layer uses the nonlinear Sigmodial function; thus, the output of the hidden layer is

The input to the output layer is calculated as

The output layer also uses the nonlinear Sigmodial function; thus, the output of the action network is

When the critic network has been trained, the objective of training the action network is to minimize the following index:

Then, the gradient descent algorithm is also used to update the weight of the action network by minimizing withwhere is the learning rate of the action network at time , which also decreases to a very small value with time [20].

After getting the optimal function, the optimal control can be obtained by the following equation:that is, to obtain the optimal control output of the action network, it is necessary to make the cost function . This will also make the critic network output equal to zero as close as possible.

As shown in Figure 3, the dashed line means that the error should be eventually reduced to zero by calculating the error of each operation and the increment of the critic and action network weights.

3. The Parallel Action-Network ADHDP: The Adaptive ADHDP Method

The addition of an adaptive control to the navigating of the USV is to ensure its system stability. This is because the gradient descent algorithm cannot guarantee the system stability [21] when updating the ADHDP parameters. In addition, it will lead to a local optimum in some cases. Therefore, the stability design based on the Lyapunov theory will greatly increase the stability of the navigating control and effectively overcome the limitation of the ADHDP algorithm.

Based on the error minimization in equation (18), the error and its differential are used as the inputs to both the action network and adaptive controller, as shown in Figure 4. After getting the error and its differential inputs, the adaptive control algorithm is calculated separately.

When different control signals are obtained, the litter between the adaptive control and action network outputs is selected as input to the controlled object. This will also help the stability of the navigating USV system. In this way, the advantage of the AHDHP and adaptive control can be integrated to achieve the navigating stability of the USV.

The adaptive control process is as follows: Supposing that the nonlinear system iswhere is the tracking navigation command, the error is the difference between the command and the system output , is the system state, is the control input, and and are the nonlinear state functions of the USV system.

Supposing that , and the optimal control is designed aswhere is a coefficient. By substituting equations (18) and (19) in equation (17), and simplifying its form, the system error satisfies

To design , the root of its polynomial function must be in the left half plane of the phase-plane. When tends to infinity, there should be and .

For the adaptive control, the RBF neural network is used for approximation. The input to the network is , and the output of the network iswhere is the estimation of the network weight, is a Gauss basis function of No. weight, and is a vector of the Gauss basis function, please see [22]. Suppose that the matrix is symmetric positive-definite and satisfies the following Lyapunov equation:where and is a unit matrix. After obtaining the matrix , the expected RBF network weights arewhere .

After the weights of the RBF neural network are obtained, the output of this network can be obtained by equation (21). Finally, the optimal control signal can be obtained by equation (19).

4. The USV Model for Navigating Control

The ADHDP is based on the state system and does not require a controlled object model. Thus, the ADHDP is also known as a data-driven control method [23], which enables an online learning and control [24]. Thus, in this study, the USV model is only used as a simulation object, not for the controller design. However, the choice of a suitable controlled object model is also very important for numerical experiment.

The motion model and state space of the USV used in this paper are summarized in [25]. This model was proposed by Abkowitz and other scholars with mathematical integrity and rigor. The standard state space of the USV is expressed as followswhere , , and is the rudder angle of the USV.

In it, is the mass of the USV, is its dimensionless mass during calculation, which is calculated with , is the sea water density, is the length between the perpendiculars; is the inertial distance, which is calculated with , and ; is the USV speed; is the reference area. For other related intermediate variables, please see [25], and they can be calculated withwhere is the width of the USV, is the block coefficient, is the area of the rudder leaf, and is the draught of the USV.

Considering the marine environment effect to the movement of the USV, the interference of a white noise is added. Thus,

5. The Numerical Experiment

5.1. The Parameters of the Control System

The utility function of the adaptive ADHDP controller can be designed as

where is the error in equation (18) at time , and is its differential with respect to time .

The MATLAB/Simulink is used for the simulation. The S-function is used to establish the related model and algorithm because there is no ADHDP toolbox in MATLAB. The parameters of the simulating USV model are shown in Table 1.

5.2. Simulation and Results

(1)The navigating target is set as east (90°) and the initial navigation is north (0°). The simulation time is 50 seconds. The navigating responses of a PID and the adaptive ADHDP control are shown in Figure 5. The detailed comparison information of result curves in Figure 5 is shown in Table 2.From the data in Figure 5 and Table 2, it can be seen that the combination of the ADHDP and adaptive control can shorten the adjusting time and eliminate overshoot. The data in figures and tables also prove that the combination of the ADHDP and adaptive control can adjust the navigating USV quickly and accurately, and it only needs one navigating change.(2)When navigating the USV in a narrow river, the adjusting frequency and time are significantly increased. For the simulation in this condition, the navigating target is set as east (90°) for the first 50 seconds, and the initial heading is north (0°). Then, the navigating target is set as north (0°) in the 50th seconds, and the navigating should turn around 90°. This stage runs for another 50 seconds. Then, the navigating target is set as east (90°) again in the 100th second, and the navigating should turn back 90°. This stage also runs for 50 seconds. The simulation results are shown in Figure 6.According to the simulation results, it can be seen that the PID controller has a long time to adjust the navigating USV, and its overshoot is very big. However, the navigating USV control with the adaptive ADHDP is significantly enhanced in overshoot and rapidity compared with the PID controller.(3)Figure 7 shows the control effect and response under the above continuous and sharp navigating with the individual ADHDP, and compares to that of the adaptive ADHDP. It can be seen from Figure 7 that the ADHDP controller has a larger overshoot in a continuous and sharp navigating change. This is because the ADHDP is based on the data-driven and previous learning effects, which is difficult to make a right response in a short period of time. After combining the adaptive control, the adaptive ADHDP can avoid an overshoot in mutation state and can ensure a safe driving for the USV.

6. Discussion and Conclusion

The ADHDP control is a data-driven based method, which can be used without a plant model. In the simulation, it can be seen that the adjusting time of the ADHDP controller is much smaller than that of the PID controller, but the overshoot is larger than that of the PID. Therefore, an adaptive control is added to eliminate the overshoot of the ADHDP controller under the condition of a continuous and sharp navigating change.

In addition, the simulation proves the feasibility of the adaptive ADHDP control method. This control method also does not need the model of the controlled object, which can simplify the controller design although it needs combining the adaptive method. This method can also increase the speed and stability of the controller. Thus, these methods can lay a foundation for the development of the intelligent USV and have a practical effect for a further enhancing of the intelligent USV control.

Data Availability

The data used to support the findings of this study have been deposited in the Adaptive Navigating Control Based on the Parallel Action-Network ADHDP for Unmanned Surface Vessel repository (7697143).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the NSFC Projects of China under grant nos. 61403250, 51779136, and 51509151.