Penetration Planning and Design Method of Unmanned Aerial Vehicle Inspired by Biological Swarm Intelligence Algorithm
Unmanned aerial vehicles (UAVs) are gradually used in logistics transportation. They are forbidden to fly in some airspace. To ensure the safety of UAVs, reasonable path planning and design is one of the key factors. Aiming at the problem of how to improve the success rate of unmanned aerial vehicle (UAV) maneuver penetration, a method of UAV penetration path planning and design is proposed. Ant colony algorithm has strong path planning ability in biological swarm intelligence algorithm. Based on the modeling of UAV planning and threat factors, improved ant colony algorithm is used for UAV penetration path planning and design. It is proposed that the path with the best pheromone content is used as the planning path. Some principles are given for using ant colony algorithm in UAV penetration path planning. By introducing heuristic information into the improved ant colony algorithm, the convergence is completed faster under the same number of iteratives. Compared with classical methods, the total steps reduced by 56% with 50 ant numbers and 200 iterations. 62% fewer steps to complete the first iteration. It is found that the optimal trajectory planned by the improved ant colony algorithm is smoother and the shortest path satisfying the constraints.
UAV has now become one of the key equipment developed by countries all over the world. Maximizing communication effect, rational use of energy, and avoiding collision have always been the key areas of research in UAV field. Because of the proliferation of drones, they are face the problem of defense shield integrated by multiplatform sensors and interception equipment in the sky. In order to improve the survival probability and task performance of UAV, many typical methods are used as penetration strategies and technologies, such as electronic countermeasures, bait deception, stealth, multiple UAVs, and maneuvering. Among which UAV maneuvering penetration is one of the key penetration methods [1–4]. Peng et al. proposed a three-dimensional multiconstraint route planning of unmanned aerial vehicle low altitude penetration based on coevolutionary multiagent genetic algorithm . Improved adaptive of genetic algorithm is used for route planning problems . Peng et al. proposed an UAV path planning method based on disturbed fluid and trajectory propagation in static environment . Szczerba et al. studied a robust algorithm for real-time route planning. It is taken into account various mission constraints including minimum route leg length, maximum turning angle, route distance constraint, and fixed approach vector to goal position . Improved A algorithm and particle swarm optimization (PSO) are also proposed to solve problem of path planning for UAV in complex environment [9, 10]. Improved quantum differential evolution with multistrategies can be used as global optimization method in UAV path planning applications .
On the route of logistics transportation using UAV, multiple radar obstacles are often set near specific areas to form an obstacle area to prevent or intercept the UAV from causing damage to the infrastructure or pedestrian. These obstacles pose a great threat to the UAV penetration. UAV can only avoid radar detection by further increasing its stealth ability or maneuvering, and increasing stealth ability is often related to the shape design of UAV. After the task performance of UAV is determined, the stealth shape design is often greatly limited, and the effect of stealth coating on improving stealth ability is also limited. Therefore, more consideration should be given to avoiding radar detection through maneuver.
At the end of flight where UAV is easy to be intercepted, common maneuvers include snake maneuver and random maneuver, but these maneuvers are often difficult to get rid of radar detection. It is necessary to design the flight trajectory of UAV to balance the needs of mission target and penetration to the greatest extent. For the design of UAV flight trajectory, there is a constrained two-point boundary value optimization method , Voronoi diagram method [13, 14], Dijkstra algorithm [15, 16], A algorithm [9, 17, 18], Rapidly-Exploring Random Tree (RRT) algorithm [19, 20], D path search algorithm [21, 22], Probabilistic roadmap (PRM) algorithm , genetic algorithm [6, 24], etc.
Different algorithms have their own advantages and disadvantages, but they are all based on certain mathematical theories. With the development of artificial intelligence, swarm intelligence algorithms based on deep reinforcement learning [25–28] and optimization based on various biological swarm intelligence [10, 29–32] have emerged. Because ant colony algorithm is a swarm intelligence algorithm with strong path planning properties , this paper considers using ant colony algorithm to study the path planning and design method of UAV penetration.
The innovations and main contributions of this paper are described as follows. (1) Based on the modeling of UAV planning and threat factors, improved ant colony algorithm is used for UAV penetration path planning and design. (2) It is proposed that the path with the best pheromone content is used as the planning path. (3) Some principles are given for using ant colony algorithm in UAV penetration path planning. (4) By introducing heuristic information into the improved ant colony algorithm, the convergence is completed faster under the same number of iteratives.
The rest of this paper is organized as follows. Section 2 introduces an environment model of UAV. Section 3 describes an improved path planning and design method of UAV penetration based on improved ant colony algorithm. Section 4 gives experimental design. The results are discussed and compared the outcomes by experiment and analysis. Section 5 includes the conclusion and prospection.
For the UAV flying close to the target, anti-UAV devices with different threat levels will be arranged according to the importance of the target. The threat of UAV mainly comes from ground antiradar and UAV catcher, which are often related to the detection range of radar, and the radar detection signal strength is directly proportional to the fourth power of the distance. Therefore, the probability of being detected by radar can be expressed by the following formula: when is a constant, which is related to radar performance. It is defined here that the areas with a discovery probability greater than 10% are radar areas. That is, in the radius of the radar area . The closer it is to the radar point, the higher the probability of being found. By accumulating the probability of being found by each radar, the threat degree of the point can be obtained.
For each point on the map, the distance from the target position is different. The guidance, navigation, and control (GNC) system of general UAV can measure the current distance from the target point. It guides the UAV to fly close the target in the shortest time under the condition of meeting various constraints. Therefore, the distance from the UAV to the target point also needs to be considered in the model.
The shorter the UAV’s flight path, the shorter the UAV’s dwell time. The degrees of mission time and risk are greatly reduced. Considering the survival probability of each point and the closest distance from the target point, the cost function is calculated according to the shortest track method as the performance index to describe the track: where indicates the length of the track.
The UAV departs from the position. The location of the target point is . The turning angle of UAV is affected by its turning ability. Assume that the turning angle is between -30 and 30°.
The grid size is also considered in the model. A reasonable grid size can ensure that the obtained track meets the constraint of the minimum turning radius of UAV. The paper studies a certain type of UAV. Assume that the level flight speed of the UAV is 3600 km/h. The maximum mobility is 15 g. The turning radius can be calculated below: where indicates the maximum maneuver overload of UAV. According to the calculation, the minimum turning radius is 8 km. Each step can only turn between -30 and 30°. Under the condition of maximum maneuverability, the UAV needs three steps to make a 90° turn. The grid size in 2D track planning is set to is more suitable. The distance between adjacent horizontal and vertical lines is greater than 1/3 of the maximum turning radius of UAV. The number of grids is set to . The simulation range is , which is consistent with the actual scene.
3. Proposed Methodology
Ant system or ant colony system was first proposed by Dorigo et al. and Dorigo and Gambardella in the 1990s [34, 35]. In the process of studying ant foraging, they found that the behavior of a single ant is relatively simple. But the ant colony as a whole can reflect some intelligent behavior. For example, the ant colony can find the shortest path to the food source in different environments. Because the ants in the ant colony can transmit information through some information mechanism. It is found that ants will release a substance called pheromone on their path. Ants in the ant colony have the ability to perceive the pheromone. They will walk along the path with high concentration of pheromone. Each passing ant will leave pheromone on the road, which forms a mechanism similar to positive feedback. In this way, the whole ant colony will reach the food source along the shortest path.
The basic idea of applying ant colony algorithm to solve the optimization problem is below. The walking path of ants is used to represent the feasible solution of the problem to be optimized. All paths of the whole ant colony constitute the solution space of the problem to be optimized. The amount of pheromone released by ants with short path is more. With the advance of time, the pheromone concentration accumulated on the short path gradually increases. The number of ants choosing the path is also more and more. The whole ant colony will focus on the best path under the action of positive feedback. At this time, the optimal solution of the problem is optimized.
The basic process of UAV penetration path planning and design based on ant colony algorithm is as Figure 1. Firstly, appropriate initial values are given to all nodes on the grid of the region to form the initial pheromone matrix. Assuming that all ants are placed at the starting point and move towards the target at the same time. The ants will move towards the place with more pheromones. Each ant will make mistakes with a small probability. Not all of them move towards the direction with the most pheromones. During the movement, each ant selects adjacent feasible nodes according to the state transition rules and the wheel gambling method until all ants reach the target point. After the cycle is completed, the global pheromone is updated according to the path of each ant. The pheromone volatilization is carried out for each node that does not pass. While the passing points strengthen the information according to the bioinformatics hormone modification criteria. This process is repeated until the optimized route is obtained.
In addition to releasing pheromones on the path, ants also make decisions when they encounter pheromones. Considering the small probability of making mistakes, they use roulette algorithm to make steering decisions. (1)State transition strategy
The probability of ants transferring from point to adjacent point is
is the probability that ant transfers from node to node . refers to the set of all adjacent feasible nodes, and the feasible points are constrained by the turning direction. is the strength of the information stored in each feasible node. is the visibility of the node . Pheromone sensitive factor characterizes the relative importance of pheromones. It reflects the role of the information accumulated by ants in the process of movement. Visibility sensitivity factor (visibility) characterizes the relative importance of heuristic factors. It reflects the importance of heuristic factors in path selection. At the beginning of the iteration, the local predictability of the target point is weak because the difference between the heuristic factors of 8 adjacent nodes in the Jiugong diagram is small. At this time, the pheromone difference on the path segment between nodes is also small. It is easy to select blindly during state transition. It is difficult to reach the target node quickly. Setting visibility information can speed up the ants to reach the target node. The visibility is set according to the distance between the current node and the target point. According to the corresponding state transition probability of feasible nodes around ants, roulette algorithm is designed to make the state transition direction decision of ants. (2)Design of threat factor
Using ant colony algorithm is not only to find the optimal path but also to consider the constraints of radar area. Threat factor should be introduced in order to improve the tendency of avoiding threat when selecting track nodes, make it predictable, accelerate the convergence speed of the algorithm, and quickly converge to the optimal or suboptimal track. Following the design of pheromone sensitivity factor and visibility sensitivity factor, the threat factor is introduced into the state transition probability as below.
Because the node with the greater transfer probability, the more likely the ant is to transfer, which can make the threat sensitive factor the reciprocal of the threat cost. (3)Strategy for pheromone update
The principle of pheromone updating is below:
is the volatilization coefficient of pheromone. Then, is the persistence coefficient of pheromone. is the increase size of pheromone at the node after the ant arrives, which is determined by the following formula
Dorigo defines three different ant system models: ant density system, ant quantity system, and ant week system. Here, the ant week system is adopted. represents the pheromone released by ants in unit length, and represents the total length of ant trajectory. The appropriate value of and volatilization factor can be determined according to the simulation experiments.
4. Experimental Design and Result Analysis
Since Netlogo model can simulate the behavior of microindividuals, the emergence of macropatterns, and their relationships well, the experiment is mainly implemented by Netlogo programmable modeling environment. For the default turtles, add the list variable “path-, path-x” is added to record the patch coordinates of each turtle.
For the default patches, four variables “temp chemical, chemical, visibility and thread” are added to record the temporary pheromone size, real pheromone size, visibility, and threat degree of the patch position, respectively;
The following routines are defined in Netlogo:
“setup”: initialize the visibility and thread variables of the patch according to the distance between the target point and the threat area to meet the requirements. For visibility, it is defined that the farther away from the target, the lower the visibility. The final attenuation is 0. For the threat degree, the threat degree of a point is obtained by accumulating the probability of being found by different radars. The initial value of pheromone is set to 1. The initial direction range of ant at the starting point is 0 ~ 90°.
“go”: carry out ant cross-border detection, iteration design, updating of patch pheromone, patch color, ant passing track list, etc. The variable value of the temp chemical is increased passing through the patch after an ant reaches the target point.
“turn-to-chemical”: calculate the state transition probability of equation (3), and conduct steering operation according to the roulette algorithm.
“chemical-scent-at-angle,” “visibility-at-angle,” and “threat-at-angle”: these three routines calculate the pheromone strength, visibility, and threat degree of patches (a total of 8 patches) at any angle around the ant.
“update-color”: the patch color is updated according to pheromone intensity information.
“update-chemical”: after each iteration, the updated temp chemical by all ants in the iteration is copied to the chemical variable. The pheromone of the previous round is volatilized to a certain extent.
4.1. Threat Free Experiments
Firstly, the parameters are calculated without adding the threat of radar area, and are set as 4 and 1, respectively. The importance of pheromone is increased. The volatilization coefficient ρ is 0.3. 200 iterations are carried out. The results are shown in Figure 2. It can be seen that the optimal flight trajectory is a two-point connection, which is consistent with the reality.
It can be seen from Figure 3 that in the first round of iteration, the time for all ants to reach the target without pheromone information is 5.7 times that of ants with pheromone information. It indicated that the existence of pheromone can accelerate the convergence of the problem. With the accumulation of pheromones, the time for all ants to reach the target does not decrease significantly. It indicated that random steering plays a role in the problem. It can prevent falling into the local optimal solution. It is conducive to the global search.
4.2. Effects of Different Ant Numbers and Iteration Times on the Model
The parameter setting in this section remains unchanged, focusing on the impact of different ant numbers and iteration times on the model. The experimental results are shown in Figure 4.
The comparison results are given in Table 1. In groups 1, 2, and 3, it can be seen that reducing the number of ants in each iteration can reduce the total number of steps and improve the calculation time. The number of ants in different numbers only affects the number of final pheromones. In comparison groups 4, 5, and 6, when the number of iterations is not many, too many ants will make the planned trajectory less effective. In comparison groups 7, 8, and 9, the number of iterations is too small, and the planned trajectories are poor.
It can be seen that the number of iterations must be greater than a certain threshold in order for the ant colony algorithm to be stable. The number of ants has little influence on the performance of ant colony algorithm. It only has a great influence on pheromones. When the number of iterations is small, too many ants are not easy to stabilize to the optimal value. The number of iterations and the number of ants of ant colony algorithm should follow the principle of “The number of iterations must be greater than the minimum number of iterations required for stable convergence. The number of ants is appropriate, but not too much.”
4.3. Experiments with Different Numbers of Threat Zone
According to the previous simulation analysis, the number of ants is set as 52, and the number of iterations is 80. Parameters and are 4 and 1, respectively. The volatilization coefficients ρ is 0.3. In this part, threat area is designed with considering its central coordinate point and threat radius. At the same time, draw the threat area on the map (gray area). Firstly, the experiment with two threat areas is investigated.
The experimental results in Table 2 and Figure 5 show that the path planning can meet the requirements of threat constraints and obtain the shortest path planning at the same time. Two radar areas are arranged in the longitudinal direction of the path, and the settings are as follows.
After the threat area is set between the target point and the starting point, it can be seen that the ant colony algorithm designs the optimal path on both sides of the threat area in order to avoid the threat area in Table 3 and Figure 6. The position of the threat area is further adjusted as Table 4 and Figure 7.
The number of threat areas has increased to 3 and 4. The starting point is still set in the lower left corner.
Because threat zone 3 is very close to the target point, the UAV makes a big turn in front of the target point in Table 5 and Figure 8. It is consistent with the reality. At the same time, the planned optimal path is very smooth, which proves the excellent path planning ability of ant colony algorithm.
It can be seen from Table 6 and Figure 9 that the initial position of the UAV is very close to threat zones 3 and 4. But it can still plan the optimal flight trajectory. However, the UAV near threat zone 2 produces two planning trajectories, which shows that the two trajectories have the same effect. At the same time, comparing the first iteration with the later iteration, it can be found that in the case of more constraints, the first iteration converges slowly due to the lack of pheromone. The convergence is greatly accelerated after pheromone excitation, which shows that pheromone has a better effect of accelerating convergence in the case of multiple constraints.
4.4. Influence of Different Parameter Values on the Model
Taking the experiment with three threat zones as an example, the parameters of ant colony algorithm are analyzed in Figure 10. The main parameters are pheromone sensitive factor , visibility sensitive factor , threat factor , and the volatilization coefficient of pheromone , which affect the state transition strategy. Other parameters are set to 52 ants and 80 iterations.
Through experiment group 1 in Table 7, it can be found that too large pheromone sensitive factor leads to too small impact of threat on path planning. It breaks constraints. Too small pheromone sensitive factor will lead to difficult convergence. At the same time, large pheromone sensitive factor can accelerate iterative convergence and greatly reduce the total number of steps. In experimental group 2, the visibility sensitive factor had little effect on the final effect. In experimental group 3, when the threat factor is too large or too small, there is no good path planning effect. In experimental group 4, when the volatilization factor is 0.1, the path planning effect is the best. When it is 0.4, the path cannot fully converge. It indicated that the volatilization factor should not be too large. Otherwise, the iterative convergence speed is slow.
Therefore, the selection of parameters of ant colony algorithm should follow the principle that the pheromone sensitivity factor α, visibility sensitivity factor β, and threat sensitive factors γ are moderate selection. Under other constraints, the volatilization coefficient of pheromone ρ should be as small as possible.
4.5. Experiment of Increased Mesh Fineness
Because the fineness of the map is adjusted only, the flight speed, flight direction, and map size of the UAV have not changed. Therefore, the generated optimal trajectory is almost consistent with the optimal trajectory with low level of detail. After increasing the grid resolution, the generated trajectory is clearer.
4.6. Problem Extension
In the case of no threat, for the pheromone importance parameter and heuristic factor (visibility) parameters in equation (4) are fixed values. At the beginning, if we pay more attention to pheromones, more ants can quickly reach the target along the pheromones released by the ants in the last round. However, as the ants gradually reach the target point. Because the turning of ants also has a certain randomness and does not completely follow the instructions of pheromones. There are fewer ants. It takes a long time for ants to reach all the targets. At this time, adding more heuristic information and increasing the attention of visibility can make ants see the direction of the target more clearly. They move towards the target faster and accelerate convergence. Because the guidance, navigation, and control (GNC) system of UAV can generally measure the distance from the current target point. It guides UAV to fly close the target in the shortest time under the condition of meeting various constraints. The distance from the target point can also be considered in the heuristic information to enhance the convergence speed of the algorithm. where is the distance between the current node and the next node. is the distance from the target point after the UAV moves to the next node. Figure 13 shows the comparison between the improved algorithm and the original algorithm after setting the visibility.
Through in Table 8, it can be seen that under the same iterative steps, the steps spent by the improved ant colony algorithm are greatly reduced. The convergence is completed faster, and the performance of the improved algorithm is verified.
4.7. Performance Comparison with Classical Algorithms
This section compares improved ant colony algorithm with RRT algorithm and A algorithm. The advantages and disadvantages are analyzed of ant colony algorithm. They are compared in four threat zone scenarios.
The path planning algorithm based on Rapidly-Exploring Random Tree (RRT) can effectively solve the path planning problems of high-dimensional space and complex constraints by detecting the collision of sampling points in the state space. The characteristic of this method is that it can quickly and effectively search the high-dimensional space. It guides the search to the blank area through the random sampling points in the state space. It takes an initial point as the root node and generates a random extended tree by adding leaf nodes through random sampling. When the leaf nodes in the random tree contain the target point or enter the target area, a path from the initial point to the target point can be found in the random tree.
The disadvantage of RRT is that it is difficult to find a path in an environment with narrow channels. Because the narrow channel area is small, the probability of being hit is low. However, there may be narrow channels for UAV path planning, especially when air defense devices are closely arranged around key targets. Compared with the path planned by ant colony algorithm, it is obvious that the path of ant colony algorithm is smoother and close to reality as shown in Figure 14.
The improved A algorithm combines the advantages of Dijkstra and heuristic algorithm. The sum of the distance from the starting point to the point and the estimated distance from the point to the end point is taken as the priority of the point in the sequence. Based on the evaluation function, the heuristic search can improve the efficiency of the algorithm and ensure that an optimal path can be found.
It can be seen that the A algorithm also has good path planning ability in Figure 15. But unnecessary turns occur in stage of the middle section of threat zone 3 and threat zone 1, the middle section of threat zones 1 and 2, and just fly over threat zone 2. It increased the path length. Compared with ant colony algorithm, the optimal flight trajectory is not obtained.
By comparison, the optimal trajectory planned by ant colony algorithm is not only smooth but also the shortest path satisfying the constraints, which is a feature that RRT algorithm and A algorithm do not have. Ant colony algorithm has high computing time consumption. With the improvement of software and hardware performance of UAV computer, this problem can be solved appropriately.
4.8. Deficiencies and Future Improvement
4.8.1. 3D Path Planning with Terrain Constraints Is Not Considered
Due to the characteristics of radar, the target recognition ability at low angle is very poor. If UAV flies at low altitude, it is conducive to avoid radar detection and further improve survivability. Therefore, the terrain height should be considered to design the UAV’s three-dimensional trajectory. At present, some UAVs have terrain matching navigation ability, which is very suitable for UAV track planning considering terrain height.
The terrain height affects the minimum flight altitude of UAV. The lower the flight altitude, the better the penetration ability of UAV. Therefore, the mountain can also be regarded as a threat, and the mountain threat modeling method can be considered. where are peak height, peak center, and peak attenuation coefficient in peak simulation algorithm parameters, respectively. The higher the mountain, the greater the cost of flying. On the contrary, the smaller the cost of flying. Similarly, the height of the mountain can be added to the calculation formula of state transition probability to complete the three-dimensional path planning.
4.8.2. Ant Colony Algorithm May Have the Phenomenon That the Number of Iterative Steps Increases, the Results Will Diverge
This phenomenon occurs without too many path constraints. Because of the randomness of ants and less path constraints, multiple paths may approximate the optimal solution as shown in Figure 16. There are more pheromones on the same nonoptimal path. The difference of path length is reduced or even ignored. So the solution cannot converge to the optimal solution.
At this time, the final solution can be stabilized to the optimal solution by continuously increasing the number of iterations. It explains the experimental results that 200 iterations are required when there is no threat area. 80 iterations can meet the requirements of path planning when there is a threat zone.
4.8.3. The Ability of Path Planning under Sudden Threat Is Not Considered
In the real case, the intelligence obtained in advance may be different from the actual situation, or the UAV airborne early warning radar finds new threats during flight. At this time, new track planning should be carried out to avoid threats and achieve penetration. In order to simulate the sudden threat during UAV flight, path planning needs to be carried out under certain initial intelligence information, but a threat point suddenly appears on the selected optimal path planning route. At this time, the path replanning ability of the algorithm has to be investigated.
Through the analysis of the experimental results, it is found that the ant colony algorithm has better performance in the case of more path constraints. The more iterations, the more stable the path planning effect. The greater number of iterations will increase the calculation time, and the greater number of ants is not easy to converge. Therefore, the design of the number of iterations and the number of ants of the ant colony algorithm should follow the principle below. The number of iterations must be greater than the minimum number of iterations required for stable convergence. The number of ants is appropriate but not too many. For the presence of threat zone, ant colony algorithm has very good path planning performance. The parameter design of ant colony algorithm, it is found that large pheromone sensitive factor can accelerate the iterative convergence and greatly reduce the total number of steps. However, too large pheromone sensitive factor leads to too little impact on path planning and break through constraints. The visibility sensitive factor and threat factor should not be too large or too small; otherwise, the path planning effect will become worse. The volatilization factor should not be too large; otherwise, the iterative convergence speed is slow. The smaller the volatilization factor, the faster the convergence and the better the path planning effect. Therefore, the selection of ant colony algorithm parameters should follow the principle below. The pheromone sensitive factor , visibility sensitive factor , and threat factor should be selected moderately. Under other constraints, the volatilization coefficient of ant colony pheromone should be as small as possible. At the same time, if possible, increasing the fineness of the mesh can make the generated trajectory clearer.
By improving the ant colony algorithm, it can greatly reduce the total number of steps and complete the convergence faster under the same number of iterative steps. Compared with RRT algorithm and A algorithm, it is found that the optimal trajectory planned by ant colony algorithm is not only smooth but also the shortest path satisfying the constraints. The experimental results show that pheromone can significantly accelerate the convergence of the model. Due to the randomness of ant colony algorithm, the simulation results are not exactly the same every time. With the increase of the number of iterations, the solution of path planning gradually stabilizes to the optimal value.
The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported by the National Natural Science Foundation of China (Nos. U1734208 and 61806212) and the Natural Science Foundation of Hunan Province (Nos. 2021JJ40693 and 2019JJ50724).
Z. Peng, J. Wu, and J. Chen, “Three-dimensional multi-constraint route planning of unmanned aerial vehicle low-altitude penetration based on coevolutionary multi-agent genetic algorithm,” Journal of Central South University of Technology, vol. 18, no. 5, pp. 1502–1508, 2011.View at: Publisher Site | Google Scholar
S. Bruno and K. Oussama, Springer Handbook of Robotics. Springer Handbooks, Springer International Publishing, 2016.
O. Walaa and S. Nikolay, “Deep reinforcement learning for path planning by cooperative robots: existing approaches and challenges,” in Proceedings of the 28th Conference of Open Innovations Association FRUCT, pp. 349–357, Saint-Petersburg, Russia, 2021.View at: Google Scholar