A Novel Particle Swarm Optimization Algorithm Based on Reinforcement Learning Mechanism for AUV Path Planning
In order to solve the problems of rapid path planning and effective obstacle avoidance for autonomous underwater vehicle (AUV) in 2D underwater environment, this paper proposes a path planning algorithm based on reinforcement learning mechanism and particle swarm optimization (RMPSO). A feedback mechanism of reinforcement learning is embedded into the particle swarm optimization (PSO) algorithm by using the proposed RMPSO to improve the convergence speed and adaptive ability of the PSO. Then, the RMPSO integrates the velocity synthesis method with the Bezier curve to eliminate the influence of ocean currents and save energy for AUV. Finally, the path is developed rapidly and obstacles are avoided effectively by using the RMPSO. Simulation and experiment results show the superiority of the proposed method compared with traditional methods.
Autonomous underwater vehicle (AUV) has now become a hotspot area in recent years, especially the multi-AUV system due to the high parallelism, robustness, and collaboration of high efficiency [1–6]. The path planning and obstacle avoidance of AUV are the fundamental issues in the path planning research field. Therefore, a path planning algorithm is applied to plan an effective path and to avoid obstacles autonomously in complex underwater environment, and it is still an open challenging issue in AUV [7–9].
There are a number of achievements reported about studies of path planning in underwater environments. The AUV path planning algorithm can be divided into local path planning and global path planning according to the environment information. Local path planning, like rolling window algorithm  and artificial potential field method [11, 12], aims to avoid the obstacles quickly when a robot’s sensor detects the surrounding obstacles. Global path planning, such as A algorithm [13, 14], fast search algorithm , Dijkstra algorithm , probabilistic roadmaps , estimation of distribution algorithm (EDA) based approach , is a kind of path planning method used when the map environment is known.
Particle swarm optimization (PSO) is a well-known evolutionary algorithm that is generally regarded as an effective optimization method to solve path planning problems . Roberge , Yan , and Kang  have adopted the PSO algorithm for obstacle avoidance and trajectory optimization of AUV path planning. Simulation experiments show that the PSO algorithm has excellent robustness and a fast convergence speed. However, the local optimal solution is obtained for the PSO algorithm. It is fast for the convergence speed of the algorithm in the initial stage of the search, and in the later stage of the search, the convergence speed of the algorithm is slow. Thus, a few improvements of this method have emerged. A cubic spline optimization algorithm based on an improved PSO algorithm is proposed to address multi-AUV path search problems. The path search is regarded as the parameter optimization of a particular cubic spline. In this manner, the convergence of the algorithm can be significantly improved [23, 24]. Compared with the standard PSO algorithm, the new hybrid PSO-LPM algorithm  for an AUV can find better trajectories and successfully implement real-time avoidance of static obstacles and moving obstacles. However, the effect of ocean current is not considered in these algorithms. Zeng  employed the quantum particle swarm optimization (QPSO) algorithm for the path search of AUVs. Although the navigation of AUV in ocean current is studied, the problem of convergence rate of the algorithm still exists. The problem of convergence rate is that the convergence rate is not considered for the exiting PSO algorithm while the ocean current influence on AUV is studied. However, for the proposed RMPSO in the manuscript, not only is the ocean current influence on the AUV convergence speed considered, but also the convergence speed is improved substantially, so the path planning is acquired quickly.
This paper proposes an improved particle swarm optimization algorithm based on the reinforcement mechanism (RMPSO) integrating reinforcement learning mechanism with particle swarm optimization algorithm. The inertia weight ω can determine the speed of convergence of the particle swarm algorithm. The value of ω is reduced linearly with time for the traditional PSO which cannot adjust the value of ω adaptively. The introduction of the reinforcement mechanism can adjust the value of ω adaptively and make the algorithm converge faster. The contributions and novelties of the proposed algorithm can be summarized as follows.(1)In order to make the model of environment more accurate and reduce the amount of calculation, a convex polygon obstacle is built firstly, and then the area of obstacle is extended to form a dangerous area. The MAKLINK phase-free network diagram is constructed by using the dangerous area.(2)In order to save energy consumption of AUV, an ocean current evaluation function is proposed and it is combined with velocity synthesis method. According to ocean current evaluation function, the new fitness function is introduced in the proposed RMPSO to implement obstacle avoidance and estimate the statement of particles.(3)By combining reinforcement learning mechanism with PSO, the parameter of particle swarm optimization can be adjusted adaptively according to the surrounding environment. The simulation results demonstrate the effectiveness of the proposed method compared with traditional methods.
The rest of the paper is organized as follows. Section 2 introduces the AUV underwater path planning problem and the thinking of applying PSO to the MAKLINK diagram. Section 3 presents the RMPSO algorithm for path planning and energy saving. Simulation results and experimental tests are provided in Section 4. Finally, the conclusions and future work are given in Section 5.
2. Problem Statement and Preliminaries
Our mission is to conduct a detailed investigation for the suspicious target point through the AUV in the underwater environment. The underwater environment is harsh and complex with a number of obstacles and ocean currents. The algorithm of path planning and obstacle avoidance, after the AUV receives the information of targets, is discussed in detail. The underwater environment is modeled as shown in Figure 1. Because the sizes of AUV and target appear very small in the vast ocean, both of them are regarded as mass points and their shapes are neglected. In Figure 1, the black area indicates obstacles and the red pentagram represents the suspicious target. There are some arrow lines indicating the ocean current. In this environment, the ocean current velocity is generally less than 1 knot, which is in the region where the AUV can be balanced to reach the target directly. A constant ocean current model is introduced to represent the ocean current effect on AUV.
In order to guarantee that the AUV accomplishes the task and gets access to the target successfully, it is necessary to ensure that obstacles can be avoided and the influence of ocean current on AUV can be eliminated. Therefore, the RMPSO algorithm is proposed, and it can deal with ocean current and plan a safe trajectory rapidly.
2.1. Modeling of Underwater Environment
Since the obstacles in the marine environment are basically irregular concave convex polygons, the MAKLINK graph theory for environmental model is applied to model the obstacles more accurately.
2.1.1. Modeling of Obstacles
To avoid obstacles effectively when the AUV works, a convex polygon obstacle model is established by using the Graham algorithm  according to the boundary information of the obstacle. The dotted line is shown in Figure 2(a). The area of obstacle is extended, and this extended distance is m. The solid line range is shown in Figure 2(a), the space between the dotted line and the solid line is named Danger Zone. Lastly, the distance between AUV and obstacle meets .
2.1.2. Modeling of Two-Dimensional Map
Assume that there are m obstacles in the map and the ith obstacle has n vertices. The mathematical model of obstacle can be expressed as follows:m, n are the numbers of obstacles and vertices, respectively; means that the ith obstacle is the jth vertex coordinate, i = 1, 2, 3,…, m and j = 1, 2, 3,…, n.
The working environment of AUV can be modeled aswhere B represents the environmental boundary.
A 2D map with AUV is shown in Figure 2(b), and there are four polygon obstacles, so m is set as 4, where yellow and black area represent Danger Zone and obstacles, respectively. The dotted line in Figure 2(b) represents the phase-free network graph based on MAKLINK graph theory. The MAKLINK graph algorithm is introduced, shown as Algorithm 1.
2.2. Thinking in PSO
The PSO, as a kind of evolutionary algorithm, imitates birds' foraging behavior to solve the optimization problem . Particles optimize themselves continuously by tracking two extremums in each iteration. The first extremum is the optimal solution found by each particle itself, which is called individual extreme value. The second extremum is the optimal solution found by all particles at present, which is called global extremum value. The location of the particle will be close to the two peak positions, namely, individual extreme value and global extremum value, and the optimal position will be searched.
Assume that there is a d-dimensional search space, and there are two kinds of attributes for each particle, namely, current position and velocity , and are given:where is similar to , , and ; is the dth dimension of the ith particle’s velocity. It is limited to the interval to avoid the explosion of the particles. and represent the individual extreme value and the global extremum value, respectively. Coefficients and are two pseudorandom scalar values. The superscript in (3) denotes the tth iteration. The acceleration coefficients and are 2 for almost all applications. The factor ω is the inertial weight, and this inertial weight plays the role of balancing the global search (large inertial weight) with the local search (small inertial weight). The performance of the algorithm is improved by adaptation parameter during the optimization process significantly. The flow chart of the PSO algorithm applied in path planning is shown in Figure 3.
3. Main Algorithm
In order to further explain how the PSO algorithm is improved into RMPSO, mathematical models and formulas with some discussions are given.
3.1. Fitness Function
The evaluation of particles position is determined by fitness function; then, the PSO algorithm optimizes the position of particles according to fitness value of particles. The traditional fitness function is given:where
Equation (5) gives the Euclid distance of the particle generation path, where and represent the ith particle in the dth dimension.(1)Penalty function: The penalty function P (i) is introduced in (7) to make sure that AUV is not close to obstacles and is defined as follows: where E is a positive constant and it is far greater than the fitness value of other particles. When obstacles are not traversed by a path formed by particles, P (i) = E. Otherwise, P (i) = 0. The fitness function is updated: where 0.5 and 100 can be obtained by adjusting parameters of program. Equation (8) effectively prevents the particles from producing obstacles when the PSO algorithm iterates.(2)Ocean current evaluation function: After obstacles can be avoided by AUV successfully, in the next step, the ocean current evaluation function combined with the velocity synthesis approach  is designed to estimate the influence of ocean current on AUV.
As shown in Figure 4, a moving coordinate system is established. A velocity vector Vc in the moving coordinate system represents the ocean current speed affecting the AUV. VE and VA represent the vectors of synthesis and AUV’s speed. The angles between VA/VE/VC and X-axis are defined as α3, α2, α1, respectively.
Where Vcn is the vertical component of ocean current on VE, and Van is the vertical component of AUV on VE. When Vcn and Van are opposite, the side effects of ocean current will be canceled, namely, . When the ocean current speed is known, the ocean current can be used for path planning by adjusting the speed of AUV and the direction of each road section. The desired speed VE and path angle α3 of AUV can be figured out:
Assume that the speed of AUV is known, then the PSO generates the absolute value of the difference between the expected angle and the actual angle of each path, and the absolute value is . Considering that each path length will affect the energy consumption of AUV, the angle difference and path length are considered comprehensively, and the ocean current evaluation function of each path segment is . The path ocean current evaluation function generated by each particle can be defined as follows:where α3 (id), α (id) and X (id) are the expected angle, actual angle, and path length of the dth segment path generated by the ith particle, respectively. The dimension of particles is D, which indicates the number of path points generated by each particle in the map. The total number of path points including starting point and ending point is d + 2, and the number of path segments generated is d + 1. When the ocean current is taken into consideration, the fitness function is updated:
According to (11), not only are obstacles effectively avoided, but also the influence on AUV of ocean current is taken into account.
3.2. PSO Weight Updating Function
The traditional PSO algorithm has fast convergence speed in the early stage and slow convergence speed in the later stage, and it cannot adjust ω adaptively. The increase of ω appropriately can improve the global search ability of the algorithm, and the decrease of ω improves the local search ability. Therefore, a reinforcement learning mechanism [30, 31] is integrated with PSO to overcome the shortcomings of the PSO algorithm and plan the optimal path rapidly.(1)Reinforcement learning mechanism: As shown in Figure 5, F (i) is the adaptation value function that can calculate the fitness value of each particle according to the environment information, and F (i) can also calculate the global optimal solution and local optimal solution of the particle population. Then, the inertia weights ω (i) of the particles are adjusted adaptively according to F (i). When the global optimal solution, local optimal solution, and inertia weights ω (i) are obtained, the velocities and positions of particles are updated according to (1) and (2). These steps mentioned above are repeated to optimize the inertia weights ω (i). The adaptation value function F (i) is defined as (11) and ω (i) is defined as (13). The inertia weight update formula of the traditional PSO algorithm is shown as (12). T is the number of current iterations, and the inertia weight ω (i) decreases linearly with the increase of T. Compared with the inertia weight formula of the traditional PSO, the inertia weight for RMPSO algorithm can be adjusted adaptively according to the surrounding environment, so the convergence rate of RMPSO increases.(2)PSO weight updating function: Based on the above-mentioned reinforcement mechanism, (13) is introduced into PSO weight updating function to improve the convergence speed and accuracy of the PSO algorithm. The following equation is defined:where 0.3, 0.03, and 0.63 can be obtained by adjusting parameters of program; ω (i) represents the weight of the ith particle, and this weight has a positive relationship with F (i) in (13). It is necessary to improve the global optimization ability when the fitness of particles becomes larger. At the same time, the particle’s weight will be increased according to (13), which can improve the capability of global optimization. When the particle fitness value becomes smaller, leading smaller ω (i), the local optimization ability becomes stronger.
The velocities Vid of particles in the PSO algorithm are updated as
3.3. Bezier Curve
A Bezier curve of degree n can be represented aswhere t indicates the normalized time variable; is the coordinate vector of the ith control point with xi and yi in the X and Y coordinates, respectively. denotes the Bernstein basis polynomials, representing the base function in the expression of a Bezier curve and given by
From (15) and (16), the parameter equation for each control point can be generated as follows:where t is the range of [0, 1]. The Bezier curve is invariance under translation and rotation, and this is called geometry invariance property. In addition, the Bezier curve starts at the starting point (t = 0) and stops at the ending (t = 1). In other words, P0 = R (0) and Pn = R (1). The Bezier curve has some control points which form a control polygon as shown in Figure 6.
By sampling four equidistance points in the polyline generated by the RMPSO algorithm, these four points are P0, P1, P2, P3. A series of dense points P (t) are obtained from (17), and then the smooth curve is obtained by connecting these points.
3.4. The Whole Procedure of the Proposed Algorithm
The flow of the improved PSO algorithm is shown in Algorithm 2. The whole path planning algorithm is a loop procedure that is repeated until an AUV gets close to the target. The AUV moves to its corresponding position, and ocean currents and obstacles are taken into consideration at the same time. The procedure is performed through iterations to calculate the optimal solution of path cost.
In Figure 7, the whole path planning process includes the model of underwater environment, path planning, and path optimization. In order to express the ocean environment information integrally, the Graham algorithm is presented to construct polygon obstacle model, and a 2D map is built based on MAKLINK graph theory.
Firstly, a 2D map is established by MAKLINK theory and the Graham algorithm. Then, a suboptimal path is planned based on the Dijkstra algorithm. When the RMPSO is used to continuously optimize the suboptimal path through iteration, the obstacles will be avoided effectively based on MAKLINK phase-free network graph. Finally, in order to further optimize the path, the Bezier curve is used to smooth the optimized path.
4. Experiment Results and Analysis
The effectiveness of the RMPSO algorithm is demonstrated, and simulations are carried out with different ocean environment. At the same time, the path lengths of ant colony algorithm (ACO), PSO, Dijkstra, and RMPSO in the same map are compared. The algorithm convergence rates of the ACO, PSO, and RMPSO are also compared. The value of is set as 1.875 m, which is three times the AUV’s length. The number of iterations and the population size are set as 150 and 80; the dimension and the maximum particle speed are set as 8 and 3; the learning factors C1 and C2 are set as 2 and 2, respectively.
4.1. Validity Simulations
Figure 8 illustrates the environment model and the initial path planning route. The underwater workspace is designed as 200m × 200 m. The starting position of AUV is (20, 180), and the target position is (160, 90), which are represented by “Start” and “Target,” respectively. The yellow area and the black area are the Danger Zone and obstacles, respectively. The initial path is formed using the Dijkstra algorithm and is shown in green in Figure 8. The nodes through which the route passes are S, P1, P2, P3, P4, P5, P12, P12, P11, T, respectively. It can be clearly seen that the green path along the dotted line is not the optimal path and its length is 252.26 m.
In Figure 8, the Dijkstra algorithm determines the node, and it also determines the real line where the node is located, so each segment of the path can move in the real line and the path line will not intersect with obstacles when moving. The positions of P1, P2, P3, P4, P5, P12, P13, P11 are adjusted continuously to determine the shortest path of each segment, and finally the optimal path is obtained.
4.1.1. Result in Stationary Environment
Based on the Dijkstra algorithm, the final path optimization is performed by employing the algorithm RMPSO. The red route in Figure 9(a) is the final path optimization path result. The AUV get close to the edge of the Danger Zone to avoid the obstacle, and the path length is only 176.2 m, which is shorter than the length of initial path planning. Figures 9(b) and 9(c) show the convergence trend of the RMPSO algorithm which has converged to the optimal value in the 35th iteration of the algorithm.
4.1.2. Result in Ocean Current Environment
The serious influence of ocean currents in the complex underwater environment may lead to AUV mission failure. It is assumed that the AUV speed is 1.5 m/s, the current speed is 0.3 m/s, and the direction is -75°. In Figure 10(a), the red solid line is the optimized path of the RMPSO algorithm under the influence of ocean currents. When the influence of ocean currents is considered comprehensively, the position of the path point is changed in each section of the path, and a tortuous path is formed.
In order to save AUV energy, the Bezier curve smoothing path is used to reduce the number of turning points. Shown as the solid blue line path in Figure 10(b), the smoothed path has no obvious turning points.
4.2. Comparison Studies
In order to more clearly compare the differences between the ACO algorithm , Dijkstra algorithm , PSO algorithm, and RMPSO algorithm in terms of convergence speed and total path length, the ocean current influence is not considered for the above algorithms based on the environment.
4.2.1. Result of the ACO Algorithm
Path No. 4 in Figure 11 shows the optimization results of the ant colony algorithm, and it can be seen that the path is not the shortest path and that the total path length is 188.86 m.
4.2.2. Result of the PSO Algorithm
Path No. 1 in Figure 11 shows the optimization results of the particle swarm algorithm, and it can be seen that the path is almost the same as those of the RMPSO algorithm, with a total path length of 179.38 m.
4.2.3. Comparison and Analysis
Figure 12 shows the comparison of the convergence speed of three optimization algorithms, namely, the ant colony algorithm, PSO algorithm, and RMPSO algorithm. It can be seen that the red curve representing RMPSO algorithm converges much faster than the other two optimization algorithms for the same number of convergences. Table 1 shows the specific values of convergence of the three optimization algorithms. Because the ant colony algorithm relies on pheromone concentration to look for paths and there exists a lot of instability, so it starts to converge only after the 56th iteration. The inertia weight of the PSO algorithm decreases linearly with the number of iterations and cannot take into account the current particles' condition, so it starts to converge only in the 45th iteration. The RMPSO can adjust the inertia weights according to the surrounding environment, and the algorithm starts to converge in the 22nd iteration, which is much better than the other two algorithms.
With the same number of iterations, the RMPSO algorithm converges to the optimal value in the 22nd iteration, while the ant colony and particle swarm algorithm converge to the optimal value only in 45th and 56th iterations, respectively.
Table 2 shows the comparison of the total path lengths of the four optimization algorithms. The Dijkstra algorithm has the longest path which is 252.26 m, while the final path length of the RMPSO algorithm converges to only 176.20 m. Obviously, the RMPSO algorithm is not only shorter than the standard particle swarm algorithm in terms of path length, but also much faster than the other two optimization algorithms in terms of convergence speed.
4.2.4. Comparison Based on Raster Maps
Raster maps have the advantage of being able to simply model and easily validate new algorithms. Therefore, we apply RMPSO to raster maps and compare it with the PSO and A∗ algorithms.
Figure 13 shows the paths planned by three algorithms, with black squares as obstacles, PSO algorithm for line 1, RMPSO algorithm for line 2, and A algorithm for line 3. Through Table 3, we can learn that the path length of RMPSO planning is better than those of the PSO and A algorithms, where RMPSO has 3.75 m more than A. Because the A algorithm relies on the size of the adaptation value of each raster in the raster map, there are many corners in the planned path due to the calculation of each raster, so the path length of the A algorithm is more than those of both PSO and RMPSO.
4.3. Real Experiment and Results
To further test the performance of the proposed RMPSO algorithm in a real environment, the underwater experiment platform is established, shown as Figure 14. The underwater experiment platform is mainly composed of underwater vehicle . The size of the underwater vehicle is shown in Figure 14, with a length of 0.625 m, a width of 0.457 m, and a height of 0.326 m; 8 thrusters are used for driving to achieve free cornering. The underwater vehicle has a maximum power of 1500 W, weighing 20 Kg, and can be equipped with cameras, DVL, and other sensors. The experiment was conducted on a lake with an area of about 3000 square meters.
The underwater vehicle platform is composed of a control host CPU model i5, 8G RAM, 120G SSD with Windows 10 operating system. It is also equipped with a 15″ high brightness LED screen, which can display real-time underwater images, attitude, underwater vehicle depth, temperature, and other information. The ground control platform controls the vehicle operation through cables. The experiment was conducted on the lake surface to verify the algorithm to facilitate observation of the experiment. The real-time position of the underwater vehicle is obtained through the control platform and aerial photography, and then the trajectory of the underwater vehicle is plotted.
As shown in Figure 15, the convex polygonal obstacle endpoints are created by floating balls, and then the MKALINK 2D environment model is completed on the computer. The blue and green pentagons are the starting and terminal points, respectively. The black polygon is the obstacle, and the yellow area is the Danger Zone.
The vehicle speed is set as 1.5 m/s in the experiment. In Figures 16, Figure 16(a) shows the trajectory of the optimized algorithm-controlled robot operation. The yellow line is the path planned by the ant colony algorithm, and the blue line is the path planned by the RMPSO algorithm. It can be seen that although the ant colony algorithm effectively avoids obstacles, the path length is longer than that of the RMPSO algorithm. Figure 16(b) shows the planning path of the RMPSO algorithm considering the case with the influence of ocean currents, and the red curve is the path map after Bezier curve optimization. In the experimental process, the control board can largely reduce the frequent control of the robot through this path and also effectively avoid the obstacles. The stability of underwater vehicle is improved greatly by using the proposed RMPSO to obtain the shortest and smoothed trajectory, resulting in saving energy efficiently.
5. Conclusion and Future Work
We propose an improved particle swarm algorithm based on reinforcement mechanism combined with the ocean current model to solve the path planning problem and energy consumption problem of AUV. Simulation results demonstrate that the proposed RMPSO path planning algorithm can effectively avoid obstacles in the MAKLINK undirected network graph. By comparing various algorithms with RMPSO, the convergence speed of the proposed algorithm is found to be much faster, but there are a few disadvantages for the stability. In future research, the RMPSO algorithm will have practical applications in the fields of underwater search, rescue, and investigation. The statistical analysis will be used to prove the superiority of RMPSO, improve the stability of the algorithm, and use it to solve underwater 3D path planning problems .
The research data used to support the findings of this study are included within the article. The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported by the Fundamental Research Funds for the Central Universities (B200202163), the National Natural Science Foundation of China (61703098), the Natural Science Foundation of Jiangsu Province (BK20160699), and the Fujian Provincial Key Laboratory of Coast and Island Management Technology Study (FJCIMTS2019-03).
H. Q. Huang, R. D. Shi, J. Zhou et al., “Attitude determination method integrating square-root cubature kalman filter with expectation-maximization for inertial navigation system applied to underwater glider,” Review of Scientific Instruments, vol. 90, no. 9, 2019.View at: Publisher Site | Google Scholar
D. C. Yong, Z. X. Yu, J. Ya, and M. Sankaran, “Fuzzy Dijkstra algorithm for shortest path problem under uncertain environment,” Applied Soft Computing, vol. 12, no. 3, pp. 1231–1237, 2012.View at: Google Scholar
X. X. Wu, L. B. Guo, and J. Wang, “Mobile robot path planning algorithm based on particle swarm optimization of cubic splines,” Robot, vol. 31, no. 6, pp. 556–560, 2009.View at: Google Scholar