Theory and Applications of Complex Cyber-Physical InteractionsView this Special Issue
Research Article | Open Access
Jingchuan Wang, Ruochen Tai, Jingwen Xu, "A Bi-Level Probabilistic Path Planning Algorithm for Multiple Robots with Motion Uncertainty", Complexity, vol. 2020, Article ID 9207324, 16 pages, 2020. https://doi.org/10.1155/2020/9207324
A Bi-Level Probabilistic Path Planning Algorithm for Multiple Robots with Motion Uncertainty
For improving the system efficiency when there are motion uncertainties among robots in the warehouse environment, this paper proposes a bi-level probabilistic path planning algorithm. In the proposed algorithm, the map is partitioned into multiple interconnected districts and the architecture of proposed algorithm is composed of topology level and route level generating from above map: in the topology level, the order of passing districts is planned combined with the district crowdedness to achieve the district equilibrium and reduce the influence of robots under motion uncertainty. And in the route level, a MDP method combined with probability of motion uncertainty is proposed to plan path for all robots in each district separately. At the same time, the number of steps for each planning is dependent on the probability to decrease the number of planning. The conflict avoidance is proved, and optimization is discussed for the proposed algorithm. Simulation results show that the proposed algorithm achieves improved system efficiency and also has acceptable real-time performance.
In the running of multiple robots in warehouses, motion uncertainties happen unavoidably since there exist some disabled components of robots, the instability of networks, and the interference of people walking. Under this case, robots would not follow the designed paths and the coupled relationship between temporal and space domain for paths is broken. And there is no doubt that other robots are disturbed by the ones where motion uncertainties happen. Finally, the whole multirobot system would be influenced and this brings about chaos or even breakdown of the whole system. It can be seen that this poses a great challenge to the path planning of multiple robots in warehouses. Therefore, taking motion uncertainty into consideration in the path planning of multiple robots is an issue worthy of attention and research.
As a kind of method taking motion uncertainty into consideration for the discretized map, Ma et al.  designed the path planning algorithm which contains relevant restricted principles that an agent cannot enter the vertex that is occupied by some other agents at the last time step. The conflicts can still occur and this would cause the breakdown of the multirobot system in warehouses. Another kind of method which can deal with the issues caused by motion uncertainties needs to replan the paths of multiple robots when the samplings of robots contain abnormal information. Li et al.  proposed a two-stage trajectory planning scheme where a directed graph with cost variable connection is established and cost variable depends on the reachability, the risk of collision, and different state constraints. Gombolay et al.  presented a centralized algorithm that handles tightly intercoupled temporal and spatial constraints and scales to environment with large number of robots, which aims at improving the efficiency firstly. However, the above method cannot guarantee the real-time performance. In , the authors proposed a hybrid architecture which combines centralized coordination with distributed freedom of action to achieve an appropriate interplay. The architecture takes the new sampling for the state of robots at each time step, and path planning algorithm would be adopted to coordinate paths of multiple robots when sampling information containing motion uncertainties . However, the real-time performance cannot be guaranteed and the efficiency of the whole system is reduced due to too frequent planning. The algorithm introduced in [5, 6] based on generalized probabilistic roadmaps (GPRM) also coordinates paths of multiple robots under motion uncertainty by using the feedback information. In the algorithm, the passive planning strategy is introduced in the multiple traveling salesman problem (MTSP) solution methodology. However, the system efficiency is low and the scalability of the algorithm cannot be guaranteed because far more points need to be sampled. A novel method in  based on the sampling considering the motion uncertainty would propagate this delay. Robots’ paths need to be replanned adopting the Pareto-optimal plan repair scheme. However, the problem concerned with the decreased system efficiency is still not solved.
As a classical method devoted to solving the uncertainty problem, the Markov decision process (MDP) is introduced to the field of multiple agents. Corresponding multiagent MDPs (MMDPs) have been designed for multiple agent planning. In , approximating methods are introduced to be combined with MMDPs for the path planning problem of a team of homogeneous microair vehicles (MAVs) towards a set of goals. This algorithm can guarantee that there are no conflicts. Ma et al.  proposed the use of robust plan-execution policies to control how each robot proceeds along its path with a 2-level MDP solver that generates valid plans. To optimize the solutions, a novel optimal solver is designed in  for the problem of motion uncertainties in the transition-independent MMDPs. In , an equilibrium policy has been proposed in MMDPs. Based on the notion of macroactions, the path planning problem is transferred into the decentralized partially observable Markov decision processes (Dec-POMDPs) in [12, 13]. Multiagent reinforcement learning (MARL) [14–20] solves the learning task in multiagent systems (MASs). Nevertheless, the reasoning agent would deal with learning tasks which grow exponentially according to the number of agents. Therefore, the computational complexity of these methods based on MDPs is so high that the real-time scalability performance cannot be guaranteed.
Recently, the planning methods based on homotopy classes have been proposed to solve the problem of motion uncertainties. In [21, 22], the coordination space is established, which is the base for the later path planning. By selecting homotopy classes of the paths which are collision-free, a centralized controller is designed for the path planning problem under motion uncertainties. Furthermore, the encoding based on priorities, which is called the priority graphs, is introduced for the homotopic solutions to solve the coordination problem in . For [21–23], the planning method based on homotrophy classes can be proved that there are no conflicts for the coordinating paths and make sure that all robots will eventually reach their destinations even when some robots are temporarily stopped by a delay disturbance. However, the quality of solutions is decreased to a great extent since this kind of method only considers the next step of motions and all the robots would follow the previous planned paths.
Although the above methods ensure that conflict can be avoided among the planning of multiple agents under motion uncertainty, problem still exists. The efficiency of a multirobot system with motion uncertainty cannot be guaranteed. Therefore, the first objective of this paper is to design an algorithm which can plan conflict-free paths for multiple robots even when motion uncertainty happens. And another objective is to improve the system efficiency that includes the makespan of the whole system and the number of planning.
To attain these goals, in this paper, the map is partitioned into multiple districts and we propose a bi-level planning architecture that is composed of topology level and route level generating from above map.
In the topology level, the passing orders of districts are planned based on the weight of paths in topology map. And in this process, the weight changes dynamically according to the distribution of robots; the aim of planning in the topology level is to achieve the district equilibrium, which can in turn reduce the influence of robots under motion uncertainty and promote the system efficiency. In the route level, a MDP-based probabilistic method is proposed to plan path for all robots in each district separately. The MDP is combined with the probability of motion uncertainty to improve efficiency. And the number of steps for each planning is dependent on the probability to decrease the number of planning. We proved that the designed algorithm is conflict-free. And we also validated the system efficiency improvement of designed algorithm from the optimization discussion and simulation.
The remainder of this paper is organized as follows. Section 2 provides the problem statement. In Section 3, the path planning algorithm for multiple robots is introduced. Section 4 proves that the proposed algorithm is conflict-free and discusses the optimization. Simulation validation results are reported in Section 5. Section 6 concludes our research.
2. Problem Formulation
The problem that this paper attempts to solve is the path planning of multiple AGVs with motion uncertainty in warehouses. In this problem, warehouse environment is discretized and simplified as a graph whose vertices correspond to locations and whose edges correspond to transitions between locations. Figure 1 shows the discretized map.
Let be a set of robots. Time is discretized into time steps. To formally describe a plan, the path for a robot is a map which means that the location of in each time step belongs to the vertices . A robot can either remain stationary or move to an adjacent vertex. Robots are allowed to move simultaneously but restricted to move on the same edge in opposite directions or move at the same vertex simultaneously to ensure that there are no conflicts among multiple robots. Figure 2 gives the illustrations of these restricted conditions.
Let be a set of tasks which are generated continually. For each robot that has been assigned a task, it has a start position (pickup point) and a goal position (storage point). And the makespan of all the tasks is . During the running of robots, motion uncertainty, which means that robots have to stop immediately, could happen at any time. And the probability of the uncertainty is .
This paper assumes the following about the scheduling of multiple AGVs in warehouses:(i)All feasible routes are two-way bidirectional streets, which means that two robots can drive side by side and robots can drive in both directions on each feasible way.(ii)The warehouse is made up of shelves and feasible ways, which is shown in Figure 1 (not limited to this scenario).(iii)The locations of all the robots are known.(iv)The probability of the uncertainty is known.(v)The robots in the state of delay will return to normal. And after the recovery, these robots will be put into operation in the warehouse system.(vi)The task allocation is not taken into consideration in this paper.
In this paper, letters of the subscripts which are not in the parentheses are used for identification. Letters or numbers of the subscripts which are in the parentheses are used for index.
Therefore, the problem studied in this paper can be formulated in the following equationfd1:
3. The Bi-Level Path Planning Algorithm for Multiple Robots
In this section, the bi-level path planning algorithm is introduced. The aim of this structure is that district equilibrium would be taken into consideration in the topology level for reducing the influence of robots under motion uncertainty and distributed path planning in each district can improve system efficiency in the route level even when motion uncertainty occurs. The architecture of the algorithm is shown in Figure 3. It is divided into two levels generating from the map partition: topology level and route level. In the topology level, the algorithm plans the order of passing districts. In the route level, the algorithm plans paths for robots in each district separately, considering the motion uncertainty. And the introduction of map partition is as follows.
3.1. Map Partition
In this paper, the discretized map in Figure 1 is partitioned into multiple districts as shown in Figure 3. For the topology level, the generation of the topology map is listed as follows: (1) each vertex corresponds to district ; (2) each edge represents that districts and are connected. Let the horizontal edges and vertical edges be and . For the route level, all feasible ways of each district in the discretized map of Figure 1 are shown in the route map.
3.2. Topology Level
In the topology level, the first step of the planning problem introduced in Section 3 is to plan the passing orders of districts for the robots which have been assigned tasks. At each time step, there might be multiple tasks assigned to robots. The number of these tasks is and the robots corresponding to these tasks are . According to the task which has been assigned to the robot , the districts which contain the start position and goal position of are and , respectively (here and represent the index of the corresponding district). To formally describe the plan results, the passing order of districts for a robot is a map .
In this paper, is defined as the macroscopical time, which is the independent discrete variable in the topology level. With the increase of , the robot will move from the start district to the goal district . Different from the discrete time step, the macroscopic time represents the temporal space and corresponds to the district in topology level. Therefore, the change of represents the transfer of the robot in the topology level. And if districts and are different, then the corresponding edge that robot will pass in the graph is defined as followsfd2:
In this formulation, the edge represents the transition from to , which is the moving direction for the robot . For the robot which has been assigned the task, there is the corresponding macroscopical time and when reaches the start district and goal district , respectively. Then, the planning problem concerned with the districts can be formulated as follows:
In this formulation of the optimization objective, is the length of the edge , which is the corresponding distance between the center of the two districts. Due to the length difference of the edges in the graph , the durations of passing theses edges are different. In our problem, the length of the horizontal edge is twice that of the vertical edge. Therefore, the duration of passing the horizontal edge is twice that of the vertical edge. For the other environment map with different layout, the same principle applies. According to this principle, the passing order for the robot should satisfy the following:
Based on this principle, equation (2) can be rewritten as
In the optimization objective of equation (3), represents the crowdedness of the edge at macroscopic time (here we assume that and are different). And it is defined as follows:where and are the normalized crowdedness of the district and , respectively. And the crowdedness of each district is denoted as , which represents the number of robots in each district at the macroscopic time . It can be obtained by the passing orders for each robot. Furthermore, these values are changing dynamically since robots are moving. Then, the method of normalization can be adopted to calculate and .
Furthermore, due to the motion uncertainty of robots themselves, robots would not arrive at the planned district precisely. Therefore, at eachmacroscopic time, there are multiple robots which are assigned new tasks and need new path planning in current warehouse environment plan, the normalized crowdedness in equation (6) should be recalculated according to the distribution and passing orders of all robots in topology level. To be specific, assuming that one robot is located at district and will move to district when there are some new tasks and the planning is needed at the macroscopic time , if the edge corresponding to the transition between and belongs to , then this robot would still stay at this edge at next time in equation (6). If the edge belongs to , then this robot would move to next edge at macroscopic time .
Therefore, the optimization objective in equation (3) considers the length of the edges which robots would pass and the crowdedness of the related districts, whose aim is to make sure that the paths of robots would not be too long and the districts which have been planned for these robots would not be too crowded. And this design can guarantee the balance among different districts and efficiency of the whole warehousing system.
The constraints listed in equation (3) make sure that the algorithm can plan paths for robots from their start position to the goal position in topology level.
To solve the optimization problem in equation (3), K shortest path planning algorithm  is adopted to obtain the multiple shortest paths for each robot which needs to be planned at the moment. According to the planned paths, the passing order can be generated by using equations (4) and (5). The optimization objective can be calculated based on the rules introduced above, and the optimal solution among all the combinations can be selected, which contains the information concerned with the passing order for the robots assigned with new tasks in the topology level.
Figure 4 shows an example of the planning results in topology level. And it can be seen that the planning results follow the analysis above; the algorithm in the topology level prefers to choose the paths which are shorter and less crowded since this can minimize the optimization objective mentioned in equation (3).
3.3. Route Level
In the bi-level architecture, when the planning of the topology level has been finished, according to the distribution of the robots, the planning of each district would be conducted for the robots in this district distributedly.
In the route level, the route map is made up of all the feasible ways in each district. And the local route planning is to plan the actions of all the robots in the district being planned.
In the planning of each district, considering the motion uncertainty of robots and the probability , the planning step is introduced that should be intuitively related to ( not only refers to the planning step at each planning but also refers to the interval between each two plannings). The aim of this idea is that the planning step can be adjusted dynamically based on and some redundant planning in [11, 12] would not be needed anymore. Here, we define the planning step as the follows:
It can be seen from this definition that there is an inverse relationship between the planning step and the probability of motion uncertainty . When gets larger, which means that there is higher probability that robots would not follow the previous planned path, the planning step becomes smaller and the algorithm would plan paths for robots more conservatively. The first aim of computation for probability of motion uncertainty is to guarantee the real-time performance of the planning in route level since the traditional replanning all the paths from the current position to the goal position. However, the planning step is dependent on the probability of the motion uncertainty. Another aim of this is the planning step is adjusted based on , which can prevent the situation that frequent replanning might destroy the optimized planned paths.
In the route level, the planning of each district can be formulated as the following optimization problem:
In the formulation of this optimization objective, represents one policy and there are policies which are . Each policy contains the information concerned with actions of the robots at each time step in the district. And for each policy, where indicates the policy at the first time step for all the robots in the district being planned. And where indicates the policy at the first time step for the robot and is the number of the robots located in the district being planned. And represent the step policy of .
indicates the expectation of the reward which is similar to that of MDP. The aim of designing this objective function is that this can obtain the probabilistic optimal solution since there exists motion uncertainty of robots. is the discount factor. has relationship with as equation (9). The aim of equation (9) is that the discount factor can be adjusted dynamically based on and when is high; the long-term reward is not reliable anymore and the path planning based on MDP is concerned more on the optimization of total short-term reward.
represents the state from the start time step when the current district needs to be planned to the time step which is time steps after the start time step. And for , it contains states for all the robots in the district, which means that . The state indicates the state for the robot after the action is taken. defines the reward for each state based on the distribution of the robots in the corresponding district.
Therefore, according to the Markov decision process in , the optimization objective in equation (8) implies the expectation of the rewards for the state from the start time step to after time steps. And it can be rewritten as follows:where indicates the possible state after the action is taken at the state since there exists the motion uncertainty of robots, represents the reward of the robots in this district at the state , and is the probability that the state of robots in this district would turn to when the action is taken at the state . The probability can be calculated as follows:where is the state of the robot after the action is taken at state and is the probability that the state of the robot will change into after the action is taken at state . And this probability is dependent on the motion uncertainty and can be represented as follows:
For the reward of the policy , it is related to the crowdedness of the district and the distances among robots in the district, which can be defined as follows:where is the normalized crowdedness of the district being planned when the policy is taken. And the crowdedness of the district being planned can be calculated as follows:where is the distance between and at state . Then, the crowdedness needs to be normalized by using the principle of the softmax according to the following formula:
In equation (13), indicates the value of the robot at state . It is defined as the distance between the robot and the next district which heads to since the passing order has been planned in the topology level. The rule for calculating this distance is based on the Manhattan distance.
To avoid the conflicts among multiple robots, the factor is introduced. And in equation (13) is defined as follows:
It can be seen from equation (16) that when there would be a conflict between robot and , is set at ∞. When a robot would move to the location at next time step where another robot is staying at for now, then is set at . Otherwise, is set at 1.
According to equations (13) to (16), the first aim of this design is to prevent that each district would be too crowded since this would cause the coordination space to be too small, which means that once there happens motion uncertainty in some robots, it would be hard for the algorithm to find the solution and the system efficiency would be decreased. Another aim for the set of is to prevent the conflicts and make sure that one robot would not choose the action which might cause it to be next to another one. And this is beneficial for the motion coordination of robots since the motion uncertainty of one would cause it to still stay at the previous location, which increases the probability of the conflicts. Therefore, this setting can decrease the probability of the situation like that.
To solve the optimization problem in equation (8), the policy set needs to be generated in the district and the policy with the minimum objective would be selected as the solution. And the principle for the candidate action set of the policies is that robots would not be far away from the next district after taking this action. There are two benefits for this principle. Firstly, the farther the distance from the next district, the larger the objective function in equation (8). Secondly, due to the introduction of this principle, the candidate actions will decrease, which can in turn lower the computational complexity. According to above principle, there are two kinds of candidate action sets when robot is in different locations of each district:(1)When robot is in any location of district and not coming into the boundary, the candidate action set of robot is denoted as . The action means that robot drives to the other side of two-way street at next time step, means that robot follows current side of street, and means that robot stays still at next time step. For example, in Figure 5, robots and are in the district and not coming into boundary at next time step. Therefore, their candidate action set includes , , and as orange arrows show. And as shown in Figure 5, the boundary between two districts is defined as an area composed of four grids, and one half of the boundary belongs to one district and the other belongs to the neighboring district. Considering that path planning in the route level is conducted separately in each district, robot should only follow one direction (drive to the right) in the process of traversing the boundary to the neighboring district, for avoiding the conflict.(2)When robot is in the boundary of district or coming into the boundary, the candidate action set of robot is denoted as or according to the rule that robot should only drive to the right in the process of traversing the boundary. For example, robot in Figure 5 is coming into the boundary at next time step and its candidate action set is which means that it needs to drive to the right side of two-way street firstly. and are in the boundary and their candidate action set is .
According to the above candidate action set, the generation process of robot and policy is shown in Figure 6. And it can obtain multiple policies for each robot. Then, can be generated with the combination of multiple policies of each robot in the district.
4. Discussion and Analysis
4.1. Conflict Avoidance
In this paper, we propose a bi-level path planning algorithm. The architecture of the proposed method is divided into topology level and route level. And in the topology level, the passing order of districts is planned for robots. There is no conflict in topology level because the districts are allowed to traverse for multiple robots at the same time. Therefore, the conflict avoidance of the proposed algorithm in the paper mainly depends on whether or not it could plan conflict-free path in route level. And we try to prove it in the following proposition.
Proposition 1. The path planning in route level could plan conflict-free paths for all robots in the district.
Proof. In route level of the proposed method, the path for robots in the district is planned for many times and steps with MDP at each time. Therefore, we try to use mathematical induction to prove the conflict-free path can always be planned by the proposed method in the district at each time. And we firstly describe some symbols of some variables to facilitate the expression of proof.
For any district with m robots, the route map can be denoted as . And robots in the district can be denoted as . represents the th path planning in each district. In the th planning: the beginning of time step is denoted as , and the planning step is . The solution with MDP is and planned path of for any robot is . The proof detail is as follows:(a)Firstly, when , prove that the path planning is conflict-free: At the first time of path planning, robots are leaving the boundary in each district. These robots of each district are heading to different directions and located in the different sides of two-way street in the map. Therefore, there always exists one solution that makes sure that all robots go on with their current side of two-way street in the time steps and no robots from the opposite direction run in the same side of two-way street. And the planned paths for robots meet equation (17). It can be concluded that reward of the above solution according to equation (16) will not be and could be feasible solution of the optimization objective, which could prove that the proposed method could plan conflict-free path for all robots when .(b)Secondly, when , suppose that the planned path in the th planning is conflict-free and prove the planned path in the th planning is also conflict-free: Considering that the path planning in the th planning is conflict-free, the ending location of planned path for any robot satisfies equation (17), which means that there does not exist the situation of direction conflict and location. On the other hand, the ending location of the th path planning is equivalent to the beginning location of the th path planning. If there exists where represents the next location from with the action of going up and represents the next location from with the action of going left, equation (18) can be concluded. It means that each robot can go left or up without conflict with other robots. Therefore, it can construct a candidate solution that makes robots heading to the same direction run in the same side of two-way streets in the steps of path. And when robots heading to the different direction run in the same side of two-way street, the planned path will not have conflict with other robots. Therefore, it can be proved that the planned path in the th planning is conflict-free.and when robots go to the boundary in the last time planning, considering that the robots always follow one direction when they are coming into boundary or traversing on the boundary, the district-distributed way of the proposed method in route level will not have conflict between robots from different districts.
To summarize, there always would be conflict-free solution for path planning in the route level with the proposed method.
For a multirobot system with each robot subject to uncertainty, directly computing the optimal solution is very difficult and it must recompute many times during the running of robots. Therefore, it is hard to directly prove the optimization or near optimization. In this section, we will discuss the optimization of the proposed method. Considering that the architecture of proposed method in this paper is divided into two levels and paths are planned separately, we discuss the optimization of proposed method from following two parts:(1)For path planning in topology level: in the topology level, the optimal orders of districts for all robots can be obtained and the reason is as follows. Firstly, K shortest path planning algorithm  is adopted to obtain the multiple orders of passing districts for each robot. And combinations of orders of passing districts among robots can be generated. When K is large enough, all possible combinations of orders of passing districts can be obtained. Secondly, equation (3) is put up with in the topology level, which aims to minimize the corresponding distance between the center of the two districts and total crowdedness between districts among all the robots assigned with new tasks. And above optimization objective can be calculated based on all possible combinations of orders of passing of districts for robots.(2)For path planning in route level: the proposed method takes path planning for all robots in each district for many times and plans paths of steps for the robots at each time until all robots leave the district. So, it is difficult to directly prove the optimization. In this part, we compare above method with existing method in  to discuss the optimization of the proposed method. Similar to the proposed method, the existing method also takes path planning in each district for many times. The difference is that at each time planning, the existing method plans the path of static steps with the MDP method and executes the first steps of them. The optimization objective is shown in equation (19). can make sure that the destination of planned path is the location where robots are leaving from the current district. In the existing method in , is also intuitively related to following equation (7). The existing method in  has proved the optimization.
For discussing the optimization of proposed method, we try to discuss the similarity of first α steps for solution at each time planning compared with existing method in  in the following two situations:(a)When the motion certainty probability is low : In this situation, is close to according to (7). And it is obvious that the first steps of solution from the proposed method and existing method in  are close to the same. For example, Figure 7 shows a situation where three robots are located at different states, and Table 1 shows the different solutions of path planning for the situation in Figure 7 using above two methods when is different ( is set to 10 for the existing method in ). As shown in Table 1, when is 0.1 or 0.2, first steps of the proposed method are the same as the existing method in .(b)When the motion certainty probability is high (): In this situation, consider that both the proposed method and existing method in  choose the MDP method where the discount factor exists in optimization objective and has relationship with according to equation (9). Therefore, when is high, in (19) will be lower. This means the last steps of solution have little influence on the computation for equation (19). And in this situation, the first steps of the proposed method are close to the existing method in . For example, in Table 1, when is close to according to (19), it is obvious that the solution from the proposed method is close to first α steps of solution from the existing method in . Figure 7 shows a situation where three robots are located at different states, and Table 1 shows the different solutions of above two methods when is different. As shown in Table 1, in the example of Figure 7, when is 0.4 or 0.5, although the solution of above two methods is not exactly the same, the solution from the proposed method is close to the first steps of solution from the existing method in .
5. Simulation Result and Analysis
In order to evaluate the proposed method, simulations were implemented under the Matlab environment with Intel 3.60 GHz Core i7-4790U CPU and 8G RAM. The map of warehouse is shown in Figure 1.
The motion uncertainty ranges from 0 to to simulate the situations under different motion modes. The number of robots is set at 10, 20, 30, 40, and 50. The number of tasks is 1000.
The makespan, number of planning steps, and computation time are contrasted with those of the existing method in [21, 22] under a different number of robots. And for the existing method in [21, 22], the probability corresponding to the disturbance intensity whether during the following second the robot will be prevented from moving is set to and the lower bound which represents the best possible travel time is set to the shortest distance from start to end for each robot. The average of 20 repeated simulations is shown in Tables 2–4. The tasks have been generated before simulations.