#### Abstract

In view of the problem of course tracking control of under-driven USV under complex external environment, the adaptive control law is designed by constructing an iterative sliding mode function and using Lyapunov stability theory on the basis of the kinetic model of ship motion. The RBF neural network control technology and adaptive control technology are integrated into the control algorithm, and the iterative sliding mode heading tracking controller of the unmanned surface ship adaptive-neural network is designed. The online reinforcement learning of the RBF neural network is carried out by the reinforcement learning algorithm, which enhances the approximation performance of the network. Besides, the particle swarm optimization with shrinkage factor is applied to the optimization for control parameters to enhance the adaptability and robustness of the control system. The performance experiment verifies the effectiveness and practicability of the adaptive iterative sliding mode control algorithm for unmanned surface ships based on reinforcement learning and particle swarm optimization algorithm; meanwhile, comparative tracking experiments verify that the comprehensive performance of the proposed USV’s course tracking control system is better than that of the genetically optimized neural network sliding mode control system. Therefore, the fusion of multiple algorithms can be applied to improve the performance of the control system.

#### 1. Introduction

Unmanned surface vessel (USV) is a kind of task mainly used to perform dangerous tasks or tasks that are not suitable for manned vessels. It has good application requirements and development prospects. Studying the motion control of USV has important theoretical significance and practical value. Autonomous technology can reduce the USV’s dependence on remote operators and wireless transmission bandwidth. Through autonomous technology, USV can cooperate with marine surface vehicles, underwater submersibles, and low-altitude vehicles. Course tracking control is one of the core technologies of USV’s autonomous navigation, and it has become a research hot spot in academia. At the same time, in view of the unique under-actuated characteristics of the USV, and under the changeable environmental interference, it is a prospective demand for engineering applications to build an appropriate and effective control system to enhance the course tracking performance of USVs.

In response to the specific operational requirements of USV, domestic and foreign scholars try to apply the control method and its improved method to the USV’s motion control system. In [1], aiming at the autonomous navigation control requirements of the twin-body channel type USV, the operation process of the USV course control is analyzed, the fuzzy mathematical theory is used to construct the course control fuzzy set, and the fuzzy PID heading control algorithm is designed. The actual ship sailing test verifies that the algorithm has strong environmental adaptability and rapid convergence ability. To capture the problem of course tracking control of the USV named “Lanxin,” the motion controller for USV based on fuzzy adaptive PID was designed and implemented in [2]. The Kalman filter algorithm was used to fuse the attitude data of the USV, and the “Lanxin” unmanned surface vehicle embedding was designed and implemented. The actual ship verification results show that under the actual marine environment interference conditions, the heading tracking controller has good dynamic and static control performance and anti-interference ability. The literature [3] studies a method of USV heading and speed cooperative control based on the fuzzy adaptive algorithm, avoids the hull motion modeling problem in conventional methods, improves fuzzy control strategies and algorithms, incorporates adaptive factors, and improves the control method in uncertainty and adaptability and robustness in the surface environment. To further improve the adaptability of the path tracking algorithm for autonomous berthing in [4], the PID controller realizes the heading control, fuzzy control is incorporated into the line-of-sight method, and the autonomous berthing control for the unmanned boat is completed. In [5], based on the establishment of the network-based USV heading control system model, with the help of Lyapunov stability principle and convex analysis method, the inverse design of the criterion of the progressive stability of the control system under the network environment is carried out, and the network-based heading controller and observations are obtained. With the backstepping method and the Lyapunov theory, a nonlinear heading controller was designed in [6], and the simulation results proved the controller has dynamic feature and good robustness. In view of the limitation of the control input, a radial basis function (RBF) neural network [7] is used to construct an online compensation, and which is integrated into SMC to realize the robust heading control for USV in the control input saturation.

Since the sliding mode control (SMC) is invariant to system parameter changes and external interference, and the design process of algorithm is relatively simple, so it is suitable for the control design of nonlinear dynamic systems. In [8], a sliding mode controller is designed by constructing a Lyapunov energy function, which simplifies the design process of a heading-maintaining nonlinear controller. However, a controller designed by constructing a Lyapunov function cannot guarantee that the controller has excellent control performance and robust performance. The literature [9] proposes a novel nonlinear feedback control method through modulating the sliding mode control error by a sine function and feeding it back to the controller. Compared with the linear feedback control, the advantage of the method is more safe and energy-saving. On the basis of nonlinear iterative sliding mode variable structure, an incremental feedback algorithm for ships’ heading control is constructed [10], which is applied to the design of the USV heading control system, and application simulation verifies that the algorithm is not sensitive to the perturbation of the system model and external interference changes. For eliminating the steady-state error of control system, in [11], in the process of sliding mode design, the integral function is incorporated to construct a linear auto-disturbance-rejection integral sliding mode heading controller, which speeds up the system convergence speed. The simulation shows that the structure of the designed controller which can track the desired course quickly and accurately is simple. To reduce chattering caused by sliding mode switching, such as slow convergence speed long convergence time and severe chattering, a new terminal sliding mode control with a novel reaching law is proposed for USVs’ course control in [12]. The adaptive dynamic surface sliding mode controller based on the neural network [13] was proposed in [14] to overcome the uncertainty in the system model, and the simulations show that the ship can quickly stabilize on a certain specified heading.

However, the fixed parameters severely limit the adaptability and robustness of the SMC system, and these parameters need to be simultaneously optimized online. The swarm optimization algorithm has the ability of multiparameter overall optimization, which provides a new way to solve this problem. The intelligent controller of auto-piloting for USVs with convolutional neural network and ant colony optimizer was designed in [15], and compared with the PID control algorithm, the new intelligent predictive controller has superior performance. The particle swarm optimization algorithm is applied to the online optimization of the parameters of the course controller of USVs in [16], which improves the controller’s adaptive ability and anti-interference ability. In view of the uncertain factors in the sliding mode control law designed with RBF network, the improved genetic algorithm is adopted to optimize the parameters of RBF network [17] online for obtaining the ideal sliding mode control law [18] to reach the accurate course tracking control for USV. It is a fact that compared with genetic algorithm (GA) and particle swarm algorithm (PSA) in the same situation, the differential evolution algorithm (DEA) [19] is the fastest evolutionary algorithm.

The design of ship motion control system is almost based on the mathematical model of ship maneuvering motion. At present, the models commonly used in design of ship course tracking control mainly include the Nomoto model, the Fossen model, and the MMG model [19]. Among them, some research results use the Nomoto model to design the control system, such as, in [20], after analyzing the interference factors in the process of USV sailing and based on the movement mathematical model of USV, the interference model is classified and modeled. Then, the adaptive movement mathematical model of USV on sailing direction under the interference of wind-wave-currents environment is established based on the theory of fuzzy control [21] and Nomoto model. Aiming at the heading control problem under the influence of the dynamic environment in the navigation of the unmanned boat, on the basis of the USV motion Nomoto mathematical model, the dynamic surface control technology is introduced to solve the nonlinear characteristics of the control system to ensure the global stability; and RBF neural network is used to approximate the uncertain term globally to eliminate the influence of the marine environment and achieve precise heading control effects [22]. A type of robust adaptive steering control method for USVs is designed by incorporating adaptive fuzzy system into dynamic surface control to overcome the problem of control model with uncertainties and input saturation [23, 24]. In design of the nonlinear controller of USV, due to the introduction of the fractional control operator and the nonlinear polynomial, the number of controller tuning parameters is increased, and the difficulty of controller optimization is increased. Therefore, on the basis of the algorithm of the membrane system, the particle swarm algorithm is applied to optimize the cell membrane algorithm to prevent local optima and improve the efficiency and adaptability of the controller [25]. The trajectory linearization control algorithm is applied to the actual engineering of the heading control of the “Lanxin” unmanned surface craft, taking into account the impact of unmodeled dynamics and marine environmental interference on the motion control of the unmanned surface craft. The disturbance observer is combined with the trajectory linearization control, and a trajectory linearization heading controller based on the time-scale separation of the fast and slow loops is proposed in [26]. In [27], aiming at the problem that the unmanned surface vehicle (USV) heading tracking is easily affected by wind, wave, and current interference, a robust heading tracking controller based on the linear parameter varying (LPV) model is proposed. And, the actual ship experiment verified the effectiveness of the controller.

Others use the Fossen model for control design, such as, in reference [28], aiming at the problems of large external interference and unknown parameters of the heading control model in the heading control of USV; on the basis of the Fossen mathematical model of USV motion, a dynamic feedback control algorithm is constructed by using the control performance index as the event driving factor and the switching method of multiple identification models. And, in reference [29], for improving the robustness of the USV motion system under different sea conditions, under the Fossen model, the state error and the system longitudinal speed, as independent variables, are applied to construct online self-optimizing PID method to deal with the influence of the longitudinal speed of the system coupling term on the heading of USV. In addition, Lyapunov theory verifies the stability of the control algorithm. System tests under different sea conditions show that the online self-optimizing control parameters can be optimized online according to the system’s longitudinal speed and heading errors, which verifies that the control system has strong anti-interference ability and robustness. Considering the influence of external disturbances such as ocean currents, on the basis of the USV motion Fossen model, the UAV heading and speed control algorithms are designed using sliding mode adaptive technology to achieve the stabilization of the heading angle and longitudinal speed tracking error [30]. Based on Lyapunov theory and cascade system theory, the uniform semi-global index and uniform global asymptotic stability of the track tracking control system are proved.

Compared with the Nomoto and Fossen models, the advantage of the MMG model is that the inputs of the control system are the main engine speed and rudder angle, which is consistent with the actual drive of the ship. So, in reference [31], aiming at the course-keeping control of podded propulsion USV, a fast, nonsingular terminal sliding mode course-keeping strategy was presented, based on linear sliding mode and nonsingular terminal sliding mode. A fast, nonsingular terminal sliding mode course controller is proposed in [32] based on RBF neural network and fuzzy algorithm. The disturbance observer is used to compensate the disturbance to reduce the control gain, and the RBF neural network together with fuzzy algorithm is applied to approximate the symbolic function to reduce chattering from switching. A type-2 fuzzy neural network [33] is added into sliding mode control for effectively handling the uncertainties of system and enhancing the nonlinear representation capabilities of approach so that the performance of control system will be strengthen. Similarly, in reference [19], for the ship trajectory tracking control problem, the MMG model is used for iterative sliding mode control design, and the obtained system control variables are the input values of the main engine speed and rudder angle, which are in line with the actual driving mode of modern merchant ships. And in the control system, a parameter group optimization [18] unit is integrated to improve the adaptability and robustness of the controller.

Inspired by research results, the method of constructing the iterative sliding mode is used to design the second-order iterative sliding mode surface on the basis of the comprehensive motion mathematical model of “Fossen + MMG” [34], and the heading control problem is transformed into the stabilization problem of the sliding mode surface; then the heading control law is acquired with Lyapunov stability theory.

In view of the existence of system uncertain items and unknown external interference items in the control law, calculations cannot be performed directly. The system uncertainty items are approximated by RBF neural network, and the bounds of unknown external interference items are estimated by the adaptive control technology, then the adaptive iterative sliding mode control law is reached. By defining control input flutter measurement variables and reinforcement learning signals, the parameters of RBF neural network are adjusted online with the reinforcement learning method to improve the network’s approximation performance and further suppress the flutter effect of the control variables.

In addition, considering that there are a large number of design parameters in the designed control system, their value combination has a greater impact on the system control performance, and subjective selection by humans may not necessarily make the system robust. Thus, the particle swarm optimization algorithm is integrated into the control system to form a closed loop and realize the control parameter optimization online. Finally, the system control law obtained through the above design can effectively deal with the influence of model uncertainties and sea state disturbances. The system design scheme based on the fusion of two optimization algorithms lays the foundation for the under-actuated USV trajectory tracking control design under complex working conditions and the autonomous berthing control design of under-actuated unmanned ships.

#### 2. Ship Motion Model

It is difficult to integrate the ship’s sailing speed and external disturbance into the control design using the Nomoto model. And, the Fossen model takes the ship’s longitudinal thrust and steering torque as the controller output, which is convenient for the design and analysis of ship control force and torque, but they are cannot obtained directly. However, the use of the MMG model for under-actuated ship motion control research can further convert them into rudder angle and diesel engine throttle, that is, the input of the control system. Since the ship motion relies on the speed of diesel engine and rudder angle, the control variables designed with MMG model are satisfied with actual.

Taking into account the advantages of the two models, this article merges the two models and learns from each other. On the prototype of the Fossen model, only the longitudinal thrust of the propeller and the steering moment of the steering gear, that is, the two degrees of freedom of heave and ship sway, that play the main role in the MMG model are considered; while the forces of other degrees of freedom that have relatively little influence on the navigation process are regarded as interference forces, such as longitudinal thrust and lateral thrust generated after steering. Furthermore, the under-actuated ship motion model is transformed into an affine nonlinear system [35] with rudder angle and propeller speed as control variables.

Among them, the longitudinal thrust of the propeller and the steering torque of the steering gear are expressed aswhere is the thrust fraction of the propeller; is the longitudinal coordinate of the lateral action point on the rudder; is the correction factor of the steering-induced lateral force of the hull; is the distance from the center of the lateral force to the center of the ship; and is the rudder angle of the steering gear. The expressions of and are as follows [36]:where is the speed of the propeller; and are the seawater density and the diameter of the propeller, respectively; and are the propeller thrust coefficient and speed coefficient, respectively; is the area of the rudder blade; is the slope value of the lift coefficient at the angle of attack ; is the effective flow velocity at the rudder blade; and is the effective angle of attack. In the range of , the approximate calculation of can be used.

For under-actuated USV with single propeller and single rudder, the comprehensive motion mathematical model of “Fossen + MMG” iswhere and are the position of the center; , , and represent the sway speed, yaw speed, and yaw velocity of USV, respectively; is the heading angle of USV; is the speed of current; is the direction of current; , , and are high-order hydrodynamic terms; , , and represent inertial parameters, including additional mass; is the correction term for the viscous hydrodynamic moment of the hull; is the propeller Gain; is the control rudder gain; , , and are mainly used to represent the unmeasured external marine environment interference force and torque including wind, waves, and ocean currents. In actual engineering, the gain of propeller and steering rudder is limited, and the expressions of propeller gain and rudder gain arewhere is related to factors such as ship type, propeller size, and relative position with the hull. It is difficult to calculate from theoretical methods. It is usually obtained from empirical formulas, and it can be known that ; is generally varies within the range of 0∼1; and are always positive, so is always positive. , is the length of the ship; generally varies within the range of 0∼1; varies with the range of −0.5∼−0.4; according to the constraint condition of ship steering , is a constant positive term; , , and are all greater than zero, so is also constant.

#### 3. Design of Course Tracking Control for USV with Iterative Sliding Mode

##### 3.1. Design of Iterative Sliding Mode Control with Hyperbolic Tangent Function Based on Lyapunov Stability

The under-actuated USV motion model in (3) is used to design an iterative sliding mode course tracking controller. The purpose of control is set to the actual course of the USV to be able to track its desired course , that is, . Defining the course deviation as the , and taking as the goal, the purpose of course control is transformed into the stabilization control of the scalar system.

The fact is that the sliding mode determines the control performance of the variable structure system and the control quality of the sliding stage. Since the steering drive input rudder angle of the USV is limited within the range of (−35°, 35°), when the deviation is large, the control gain should be appropriately reduced to avoid saturation of the steering gear; and when the deviation value is small, in order to stabilize the USV on the desired course as soon as possible, the control gain should be increased appropriately.

The abovementioned points cannot be accomplished by linear sliding mode control. For this reason, the nonlinear hyperbolic tangent function with saturation characteristics is introduced for sliding mode design. The hyperbolic tangent function is expressed as

From (5), it can be obtained that when , the slope value of the function is larger; as gradually increases, the slope value nonlinearly decreases until it approaches zero. Therefore, the hyperbolic tangent function satisfied the control target requirements and the constraint conditions of steering rudder. In addition, the use of hyperbolic tangent function for control design can also ensure that the control signal is bounded, and the bounds are clearly known.

The specific steps of integrating the hyperbolic tangent function into design the nonlinear iterative sliding mode controller are as follows:

The first step is designing the nonlinear sliding mode of course deviation .where is the sliding mode surface function of the heading deviation ; and are positive real numbers.

It can be seen from (6) that when , , that is, when is small, will converge in an exponential form; when is large, will converge in a linear form close to . Under the joint action of and , the maximum slope of the nonlinear sliding mode can be adjusted. As a result, the heading control problem is transformed into the stabilization problem of .

The second step is to construct the first-order functional relationship between the sliding mode surface and the control rudder angle . Combining the hyperbolic tangent function, the second sliding mode is designed aswhere and are positive real numbers.

On the basis of (6), it can be obtained that in (7), when , , then the control objective is further transformed into the stabilization control of . To ensure that the convergence speed of is faster than , it is preset that .

Further, (7) is expanded to obtainwhere , .

The third step is to use the Lyapunov stability theory to design the control law of USV course tracking. The following Lyapunov function is constructed:

Further, through (9), it is referred that

In order to make the system progressively stable, that is to ensure that (10) is less than or equal to zero, the following form is constructed:where is positive real number.

Thus, it is obtained that

Therefore, if satisfies (12), the designed control system is gradually stable. This stable condition can restrain the problem of chattering of the sliding mode surface and make the system dynamically adapt to uncertainty and external interference.

At last, it is obtained the rudder angle control law by substituting (8) into (12).

Under the action of the rudder angle control law in (13), the course control error of the under-actuated USV converges asymptotically, and eventually tends to zero.

However, since the nonlinear function of the system model contains high-order hydrodynamic terms, it cannot be directly calculated; and the external cannot be directly measured too. Therefore, the rudder angle control law in (13) needs to be further improved.

##### 3.2. Design of Adaptive Controller with RBF Neural Network and Adaptive Algorithm

Since neural network has strong nonlinear approximation ability, the RBF neural network is introduced to approximate in paper. Besides, an adaptive algorithm is adopted to estimate the boundary value of the unknown outside interference which varies with the change of course of USV.

Based on the ideas, this section designs an adaptive-neural network iterative sliding mode course tracking control algorithm, and its design architecture is shown in Figure 1.

In Figure 1, the input of the system is the desired course , and the output is the actual course . The course deviation obtained by feedback can be derived from the control rudder angle through the nonlinear iterative sliding mode controller.

The and in the sliding mode control law designed in (13) are estimated using RBF neural network algorithm and adaptive algorithm respectively, and then the estimated values of the two are reassigned to the control law.

The input and output algorithm of RBF neural network iswhere is the network output; is the ideal network weight; is the network approximation error; and is the Gaussian function, and the calculation equation of which is,where is the network input; and are the parameters of the Gaussian function.

For the RBF network in Figure 1, its input is taken as , then its output iswhere cannot be obtained in practice, so is used as the estimated value of , and is the estimated error, that is, ; is the estimated value obtained by the RBF neural network of .

Substituting the output value of the RBF neural network in (16) into (13), it is acquired that

Hypothesis: for the disturbance , the estimation error , and approximation error of RBF neural network, there is an ideal boundary value , so that is established.

So, the following adaptive control law of rudder angle can be designed:where , is the design parameter, the function of is to suppress the chattering of the control rudder angle, and is the estimated value of .

Substituting the adaptive control law designed by (18) into in (8), it is obtained that

Then, it is referred thatwhere is the estimation error of , and .

Furthermore, with the help of Lyapunov stability theory, the adaptive law of RBF neural network weights and external disturbance are designed.

The following global Lyapunov function is constructed:where and are design parameters.

By deriving (21), it is acquired,

In order to ensure that the value of (22) is not positive, the following adaptive laws are selected:

Substituting (23) into (22), it is obtained that

Therefore, under the action of the rudder angle control law designed in (18), the course tracking control error of the under-actuated USV motion system gradually converges and eventually approaches zero. Therefore, the designed adaptive-neural network iterative sliding mode control system is stable.

##### 3.3. Parameter Optimization of Adaptive Course Tracking Controller

It can be known from the control characteristics of USV that the magnitude of the course value is closely related to the magnitude of the steering angle. In actual operation, to reduce the risk of lateral inclination, except in emergency situations, excessive rudder angles should be avoided when steering. Once the controller design is unreasonable or the parameters are not suitable, the frequency and magnitude of this fluctuation will increase. So, the flutter output of the rudder should be minimized to prolong the life of the equipment and enhance the control effect [15]. It can be seen that under different external environments and working conditions, if the parameters of the designed nonlinear iterative sliding mode controller can be adjusted and optimized, the performance of control system will be enhanced.

In the controller designed in Section 3.2, in the RBF neural network, only can learn, adjust, and optimize according to the adaptive law, but and do not have this ability. Similarly, in the rudder angle control law, the seven control parameters (, , , , , , , and ) do not have the ability to adjust. It is known that each parameter in the controller affects the control effect more or less. Since the large number of control parameters are hard to fully optimize manually, it is difficult to reach the expected goals [17].

Therefore, to improve the control performance and increase service life of the system, reinforcement learning [37, 38] and particle swarm algorithm [39, 40] are applied to optimize the RBF local network and design parameters in the iterative sliding mode control system, respectively, and then the optimized outputs are integrated into the nonlinear iterative sliding mode course tracking control system for USV, so that the adaptive ability of the design system is achieved.

###### 3.3.1. Self-Learning of Parameters in RBF Neural Network

A learning algorithm [41] with a tutor is usually adopted to realize the parameter learning of RBF neural network. But for the unknown term *f* of the model output of the structure, its expected value under different working conditions is not known. So, it is impossible to obtain a clear tutor signal. So, this article uses the reinforcement learning algorithm in [42] to replace the learning algorithm with a tutor. Using relatively rough training data, only based on the “evaluation” signal, that is, the enhanced signal, the pros and cons of the control effect are evaluated through interaction with the environment, and the RBF network is strengthened by means of “reward” and “penalty,” to heighten the approaching performance of RBF network. And, in this section, the reinforcement signal is calculated according to the chattering measurement value of , and then the abovementioned reinforcement learning method is used to realize the self-learning of the parameters in RBF neural network.

In each learning cycle, the output of each neural network will correspond to the variation of the width parameter of a Gaussian function, and the width parameter will be adjusted according to the following expression:where represents the *j*th width parameter at time ; is the width parameter of the next cycle; is the variation of the *j*th width parameter at time .

The output error of the RBF neural network is defined aswhere is the expected value of unknown .

Since the control effect of the system can indirectly reflect whether the parameters are appropriate, the chattering of the control rudder angle is used to approximate this error [43].

Here, is the expected chattering of rudder angle, and the absolute value of which is the sum of the control change in *n* iteration period.

In *n* cycles, if increases, increases, otherwise, decreases. Thus, the real-time change of is used to dynamically adjust the parameters of the neural network system. The calculation of is [19],where is period, and are rudder angles in adjacent period before and after, indicates the number of accumulations, and is defined as

The change trend of the system output varies with the occurrence of chattering. The chattering is obtained by equation (28) in the last cycle.

Then, the variation of the width parameter is expressed aswhere is the learning efficiency factor.

Thus, the learning process of the *j*th width parameter can be calculated as

Similarly, the learning algorithm of the intermediate parameter of the Gaussian function is

###### 3.3.2. Parameter Optimization of Iterative Sliding Mode Controller

The purpose of parameter optimization of iterative sliding mode controller is to obtain a set of suitable initial parameters (, , , , , , and ) by constructing performance indicator and solving its optimal values. Here, the ITAE index [44] is picked as follows:where is the defined index; is the time; and is the course tracking error.

The purpose of optimization is to minimize of the control system and to find the optimal parameter combination through multiple experiments. The complete desired course of USV is used as the optimized data to obtain the optimal parameters in each time. And it is more efficient to use a computer to search the best combination of parameters.

Particle swarm optimization (PSO) [45] is proposed in 1995. The PSO algorithm is applied to search for the optimal solution in dimensional search space. The population consists of particles. And the *i*th particle is represented as a vector , which represents the position of the *i*th particle in space. The designed fitness function is adopted to calculate the fitness value of each particle position which represents the potential optimal solution. Similarly, the velocity of the *i*th particle is and the individual extreme value of which is ; the extreme value of the population is .

In process of optimization, the speed of each particle is adjusted dynamically in real time according to the experience of itself and the movement situation of other particles to accomplish optimization in the solution space [46]. And the speed and the position of each particle are updated by the following equation:where is the inertia factor; , ; is the iteration number; is the particle velocity; and are factors, and and ; and are the distributions of random number in [0, 1].

To prevent blind search, the update speed of the particle swarm is adjusted according to the method of the PSO with shrinkage factor in [47], and (34) is adjusted to,where , ; is the shrinkage factor, which is used to control the speed of convergence of the system, so that the PSO achieves high quality solution. However, when and are artificially determined, is also determined and fixed.

Due to the slow search speed of the PSO algorithm in the later stage, it is easy to generate the problem of local extremum [48]. The method in [47] is called the fixed contraction factor method, which limits the effect of PSO. To fix the problem, this article introduces a chaotic model to improve the PSO. The chaos optimization algorithm (COA) has sensitivity to initial value, high calculation accuracy, and global convergence [49]. So, it is introduced to optimize the parameters of PSO by the logistic mapping model [50].where is the adjusted value; and are the upper and lower bounds of inertia factor.where .where ; and are the upper and lower bounds.

In the process, the parameters of transition between the chaos particle swarm optimization (CPSO) module and the SIMULINK module are the particles and the fitness value. First, the chaotic particle swarm produces particles; next, the particles are directly assigned to the parameters of the course tracking controller; then, the control system in SIMULINK module is running through USV motion model to obtain the evaluation index; further, the evaluation index is sent back to the CPSO module; finally, it is judged whether the termination condition is satisfied or not. The specific operation process is as follows: Step 1. Initializing the parameters of CPSO to generate the positions and velocities of all particles randomly and to acquire and by equations (34) and (35). Step 2. The chaos particle swarm assigns values to the controller parameters, which are used to carry on the course tracking control for USV, then the fitness value is calculated by the performance index function . Step 3. Selecting the better fitness value as the current from the optimal position which is passed through by each particle. Step 4. Selecting the better fitness value of each particle as the current from all particles of entire chaos particle swarm. Step 5. Judging end or go back. If the termination condition is not satisfied, go back to step 2. If satisfied, the parameter optimization process ends and exits.

#### 4. Computer Simulation

In actual navigation, the USV needs to perform tasks under different working conditions. A single heading maintenance cannot meet the work requirements, and it needs to carry out variable heading tracking control. To verify the effectiveness of the designed course tracking controller for USV on the basis of neural network reinforcement learning and chaotic particle swarm optimization algorithm, the USV named “Lanxin” in [17] is cited for simulation experiments.

##### 4.1. Variable Course Tracking Experiment

###### 4.1.1. Experimental Initial Environment and Parameter Settings

In the simulation process, the desired course of USV selected for change are 30° and 0° respectively, and the period is 500 s. The initial course of the ship is 0°, and the initial ship speed is 8.5 kn. The interference of the external marine environment is set as: the speed and direction of wind are 10 m/s and 110° respectively; the velocity and direction of ocean current are 2 m/s and 110° respectively; wave height is 3 m with 0.75 Hz encounter frequency.

The number of hidden layers of the RBF network is 41, c1 is evenly distributed on [−20, 20], c2 is evenly distributed on [−1, 1], c3 is evenly distributed on [−0.5, 0.5], c4 is evenly distributed on [−0.25, 0.25], , , , and . In the chaotic particle swarm parameter setting, , , , , and the maximum number of iterations is 50.

###### 4.1.2. Results and Analysis of Tracking Control Experiments for Variable Course

The designed course tracking controller for USV based on reinforcement learning and chaotic particle swarm optimization algorithm is applied to tracking control experiments for variable course of under-driven USV in navigating under the conditions preset in Section 4.1.1. The results of tracking control experiments for variable course are demonstrated in Figures 3–6.

Figure 3 demonstrates the desired and the actual curves of under-driven USV course tracking. It obtains that the actual course of the under-driven USV is fully tracked to the desired heading at about 170 s, and it holds after that; meanwhile, the actual curve output is smooth without chattering. So, the effectiveness and stability of the designed course tracking controller for USV based on reinforcement learning and chaotic particle swarm optimization algorithm is verified from the point of view of heading output.

Figure 4 is the input curve of the controller, that is, the input curve of rudder angle control. It acquires that the input curve of rudder angle is smooth and continuous without chattering. Besides, while the course error is close to zero, the control rudder angle changes to near zero, and the rudder angle is implemented reasonably; meanwhile, the peak of the value of rudder angle is small and the transition process is relatively gentle, which is beneficial to the service life of the steering gear. So, the adaptability and anti-jamming capability of the designed course tracking controller for USV based on reinforcement learning and chaotic particle swarm optimization algorithm is verified from the point of view of rudder angle input.

Figure 5 shows the neural network approximation curve. It obtains that within one heading cycle, the RBF network can approximate the unknown item of the upper-layer model, and that within 100 s after the heading is changed, there is only an error of about 0.1. It is verified that the RBF neural network after reinforcement learning has fast and stable approximation performance.

Figure 6 shows the actual value and bounded estimate of the sum of external environmental disturbances and approximation errors. It is obtained from the figure that the course is changed in each time, and the disturbance of wind and wave changes together with the change in estimated bounded value. The designed bound variable adaptive law can estimate the disturbance well. So, the robustness of the designed course tracking controller for USV is verified.

With the result of Figures 3–6, it is obtained that the effectiveness and practicality of the designed course tracking controller for USV on the basis of neural network reinforcement learning and chaotic particle swarm optimization algorithm is verified from a comprehensive perspective.

##### 4.2. Comparative Tracking Experiment of Variable Course

###### 4.2.1. Experimental Initial Environment and Parameter Settings

In the simulation process, the desired course of USV selected for change are 30° and 0°, respectively, and the period is 500 s. The initial course of the ship is 0°, and the initial ship speed is 8.5 __kn__. The interference of the external marine environment is set as: the speed and direction of wind are 10 m/s and 110°, respectively; the velocity and direction of ocean current are 2 m/s and 110°, respectively; and wave height is 3 m with 0.75 Hz encounter frequency.

The number of hidden layers of the RBF network is 41, c1 is evenly distributed on [−20, 20], c2 is evenly distributed on [−1, 1], c3 is evenly distributed on [−0.5, 0.5], c4 is evenly distributed on [−0.25, 0.25], , , , and . In the chaotic particle swarm parameter setting, , , , , and the maximum number of iterations is 50.

###### 4.2.2. Results and Analysis of Comparative Tracking Control Experiments for Variable Course

Furthermore, to highlight the novelty of the designed course tracking controller for USV on the basis of neural network reinforcement learning and chaotic particle swarm optimization algorithm, a comparative tracking control experiments are carried out in the part.

Based on the three performance indicators of “stabilization time,” “overshoot,” and “chattering,” our previous research results in reference [16] found that the control performance relationship of the following three control algorithms is: the intelligent control algorithm based on IGA optimized RBF neural network > the control performance of sliding mode control algorithm based on fuzzy neural network > the control performance of sliding mode control algorithm based on RBF neural network. Specifically, in the literature [16], the same ship model parameters are used, and the step signal is selected for simulation. The results of the three control methods are shown in Figure 7 with the two aspects of heading output and control rudder angle input; under the same conditions, it can be obtained that the tracking accuracy of the intelligent control algorithm of USV based on IGA optimized RBF neural network is the highest compared with the sliding mode control algorithm based on fuzzy neural network and the sliding mode control algorithm based on RBF neural network.

So, a comparative tracking control experiment is performed between the control system proposed and the control system designed in [16]. And the two control systems are simultaneously applied to USV course tracking control experiment for variable course under the conditions preset in Section 4.2.1. And the results of the comparative experiment are acquired and demonstrated in Figures 8 and 9.

Figure 8 demonstrates the course output curves of the above two control algorithms applied to the course tracking control of the under-driven USV. It obtains that that under the same conditions including external disturbance, such as, wind, wave, and current, both of them can stably track the desired heading. But, the rise time of the former is 15 s faster than the later; the output of USV course of the former is stable without overshoot while the latter has a little overshoot; compared with the latter, the former has a better anti-interference ability to external disturbances. So, the stability and rapidity of the control system proposed based on optimized control algorithm in this article is better than control system proposed based on improved genetic algorithm.

Figure 9 demonstrates the input curves of the rudder angle of two control algorithms applied to the course tracking control of the under-driven USV. In Figure 9, the control input of the former is smooth without chattering, while the latter has a small amount of chattering. It obtains that the former further suppresses the chattering of the control system compared with the later. And relatively, the chattering of rudder angle of controller in this article is weaker than the chattering of rudder angle of controller in [16].

In practice, the weaker the rudder angle chattering, the less the load, which helps protect the steering gear. So, the durable performance and energy-saving performance of the designed course tracking controller for USV on the basis of neural network reinforcement learning and chaotic particle swarm optimization algorithm is verified from the point of view of protecting the steering gear.

With the result of Figures 8 and 9, it is obtained that the adaptive and robust performance of the control system for USV proposed in this article has been further improved compared with the control system for USV designed based on the improved genetic algorithm from a comprehensive perspective.

#### 5. Conclusion

Aiming at the course tracking control problem of under-driven USV under complex external environment, the control law is designed by constructing an iterative sliding mode function and using Lyapunov stability theory. Considering that the uncertainty and the interference in the control law are difficult to calculate directly, the RBF neural network is introduced to approximate the system uncertainty, and the adaptive control technology is used to estimate the boundary value of the unknown external interference. Besides, the particle swarm optimization algorithm with scaling factor is integrated into the control system to form a closed loop, which realizes multi-control parameter group optimization, avoids the uncertain influence of artificially selected parameters, and significantly improves the robustness of the control system.

The comparative tracking experiment verifies that the adaptive and robust performance of the control system for USV proposed in this article has been further improved compared with the control system for USV designed based on the improved genetic algorithm. It is further verified that the fusion of multiple algorithms promotes the improvement of the performance of control system.

#### Data Availability

The relevant data used for experimental verification have been listed in the text.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest, financial or otherwise.

#### Authors’ Contributions

All authors contributed equally.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant nos. 52071201 and 51909155 and supported by the Basic Science (Natural Science) Research Project of Universities in Jiangsu Province under Grant no. 21KJB580009 and the “Blue Project” for outstanding young backbone teacher talent project in Jiangsu Province in 2021.