Research Article  Open Access
Multiagent Task Allocation in Complementary Teams: A HunterandGatherer Approach
Abstract
Consider a dynamic task allocation problem, where tasks are unknowingly distributed over an environment. This paper considers each task comprising two sequential subtasks: detection and completion, where each subtask can only be carried out by a certain type of agent. We address this problem using a novel natureinspired approach called “hunter and gatherer.” The proposed method employs two complementary teams of agents: one agile in detecting (hunters) and another skillful in completing (gatherers) the tasks. To minimize the collective cost of task accomplishments in a distributed manner, a gametheoretic solution is introduced to couple agents from complementary teams. We utilize marketbased negotiation models to develop incentivebased decisionmaking algorithms relying on innovative notions of “certainty and uncertainty profit margins.” The simulation results demonstrate that employing two complementary teams of hunters and gatherers can effectually improve the number of tasks completed by agents compared to conventional methods, while the collective cost of accomplishments is minimized. In addition, the stability and efficacy of the proposed solutions are studied using Nash equilibrium analysis and statistical analysis, respectively. It is also numerically shown that the proposed solutions function fairly; that is, for each type of agent, the overall workload is distributed equally.
1. Introduction
Multirobot systems are expected to undertake imperative roles in a wide variety of fields such as urban search and rescue (USAR) [1, 2], agricultural field operations [3], security patrols [4, 5], environmental monitoring [6], and industrial procedures [7]. Studies have shown that multirobot systems have advantage over singlerobot systems by offering more reliability, redundancy, and time efficiency when the nature of the tasks is inherently distributed [8]. Nonetheless, the problem of multirobot task allocation (MRTA) poses many critical challenges that have called for investigation in the past two decades [9, 10]. In this regard, the complexity of MRTA problems increases significantly in a dynamic environment, where the number and location of tasks are unknown for agents [11, 12]. Thus, robots need to explore the environment to find tasks before accomplishing them. In realworld problems, any robot designated to perform one of the tasks in [1–6] needs to be sufficiently dexterous, i.e., to be equipped enough for accomplishing physical tasks such as object manipulation or rubble removal in a rescue mission [13], which inevitably make the robot relatively heavy and incapable of agile exploration. Having said that, the dynamic problem can be turned into a problem where each task is composed of sequential subtasks, each possible to be done only by a certain type of agent. In this case, for each type of subtask, a robot team of appropriate type must be employed. This case poses an unexplored MRTA problem whose coupling and cooperation between those complementary teams are the motivation of this work.
In the context of MRTA, notable attention has been devoted for revealing various aspects of dynamic problems [14–17]. For instance, Lerman et al. [18] present a mathematical model of a general dynamic task allocation mechanism where robots use only local sensing and no direct communication is needed between them for task allocation. Disregarding the communication between agents is a deficiency where the information handled by the agents plays an imperative role in the functionality of a decentralized multiagent system in a dynamic environment. In this regard, Liemhetcharat and Veloso [19] introduce a novel weighted synergy graph model to capture new interactions among agents. In the created model, agents work together in a task throughout communication where weight of the edge indicates the communication cost between agents.
In contrast to the way that Liemhetcharat and Veloso [19] utilize the communication among agents, there are works employing communications among agents to frame negotiations among them. For instance, Chapman et al. [20] pursue a decentralized gametheoretic approach in which planning is achieved via negotiation between agents. Although the results show that their approach is robust to restrictions on the agents’ communication and observation range, this work is not allowing agents to have differing costs for performing the same task which makes it inapplicable to a wide variety of realworld problems. On the contrary, Michael et al. [21] propose a distributed marketbased coordination algorithm in which agents are able to bid for task assignments considering each agent’s cost for accomplishment of tasks to address the dynamic MRTA problem. While in realworld dynamic MRTA problems, tasks are not fully observable for all agents, the authors of this work assumed that the agents have knowledge on all tasks at a time. This assumption is too strong and does not completely reflect a dynamic environment’s situation. In this regard, Sariely and Balch [22] consider a real dynamic environment, present a realtime singleitem auctionbased task allocation method for the multirobot exploration problem, and investigate new bid evaluation strategies.
While the works reviewed above [20–22] present different approaches to address a dynamic MRTA problem, they all similarly do not consider the agents' capabilities while developing the assignment algorithm. On this subject, Wu et al. [23] take agents’ capabilities into account in order to form teams by developing a marketbased novel task allocation method based on the Gini coefficient. Although the authors demonstrated that the proposed method can effectively improve the number of tasks completed by a robot system, the effect of cooperation and coupling between the formed teams is still uninvestigated. In a similar effort, Shiroma and Campos [24] model agents’ capabilities as actions and utilize singleround auction to form teams and then form coordination between agents of the same team. In the same way, coupling and cooperation of the formed teams have been left unexplored in this work, though the developed framework was able to successfully resolve the required allocation issues. By the same token, Prorok et al. [25] model the multirobot system as a community of species considering agents’ capabilities and then present decentralized and centralized methods to efficiently control the heterogeneous teams of robots, regardless of interaction and collaboration between those teams. Given the review above, there is a lack of critical attention paid to the cooperation and coupling between robot teams, formed based on agents’ capabilities, to address a dynamic MRTA problem.
As we discussed earlier, the number and location of tasks are unknown for agents in a dynamic environment [11, 12]. In this case, robots need to explore the environment to find tasks before accomplishing them. Since tasks usually require immense efforts to be completed in realworld problems such as a rescue mission, a suitable robot needs to be equipped with various sensors and devices, much more complex mechanisms, and a higher number of actuators. As a consequence, the robots inevitably become heavy and ponderous which cannot explore the environment agilely and efficiently. Motivated by this complexity, this paper proposes a natureinspired approach called “hunter and gatherer” which employs two teams of robots: a team of agile robots that can quickly explore an environment and detect tasks, called “hunters,” and a team of dexterous robots who accomplish detected tasks called “gatherers.” In fact, we are turning a dynamic MRTA problem into a problem where each task is composed of two sequential subtasks: exploration and completion. Considering this, when there are synchronization and precedence (SP) constraints which specify an ordering constraint for timeextended assignment (TA) problems [26], the MRTA is referred to as a TA:SP problem [27]. To the best of the authors’ knowledge, the MTMRTA:SP problem has not been tackled in the literature so far, while it is a ubiquitous problem in a wide variety of fields such as USAR and agricultural field operations.
Consider the USAR in a disaster site in which a number of victims have got stranded in unknown locations and need immediate rescue operations. Each victim is a task that needs to be detected first and then rescued by a rescue operation that typically needs several dexterity actions, such as providing logistics supports, rubble removal, object manipulation, and in situ medical assessment and intervention [13], which make a rescue robot heavy and incapable of agile search operations. This is because a rescue robot needs to have a heavyduty manipulator, highpower actuators, tracked locomotion mechanism, highcapacity batteries, and many sorts of sensors, cameras, and communication devices to accomplish those tasks which make the robot relatively heavy and ponderous. Hence, let us consider each task comprises two sequential subtasks: detection and completion, where each subtask can only be carried out by a certain type of robot. Thus, the case encounters an STMRTA:SP or MTMRTA:SP problem. In the USAR example, hunters can be a group of small and lightweighted unmanned aerial vehicles (UAVs) which search the site to locate victims and gatherers can be a group of maxisized [13] heavyduty unmanned ground vehicles (UGVs) that rescue detected victims relying on their dexterity capabilities.
According to the proposed hunterandgatherer scheme, we present a gametheoretic solution which considers coupling and cooperation between complementary agents divided into different teams by (1) utilizing marketbased negotiation models, auction [28–30], and reverse auction and (2) introducing decentralized incentivebased decisionmaking algorithms. Proposed algorithms rely on new notions of certainty and uncertainty profit margins (CPM and UPM) that, respectively, determine the levels of confidence and conservativeness of each agent in negotiations to minimize the collective cost of task accomplishments. To enhance the effectiveness of proposed algorithms, a multitaskplanning algorithm is invented for gatherer agents that enables them to queue multiple tasks in their action plan for finding the optimal solution for completing a group of tasks rather than doing one by one. We show that employing two complementary teams of hunters and gatherers can effectually improve the number of tasks completed by agents, while the collective cost of accomplishments is minimized. Moreover, the stability and efficacy of the assignment algorithms are proven by a Nash equilibrium analysis and simulation experiments, respectively. Besides, we investigated the distribution of workload, as the total costs and accomplishments of a mission, among agents and showed that the proposed algorithms function fairly; that is, for each type of agent, the overall workload is distributed equally, and all agents of the same type behave analogously under similar characteristics.
The remainder of this paper is organized as follows: The problem statement and formulation are presented in Section 2. In Section 3, the methodology including conceptual frameworks, reasoning mechanisms, and algorithms is proposed. Nash equilibrium analysis is carried out in Section 4. In Section 5, statistical analysis on simulation results is presented followed by a concluding discussion in Section 6.
2. Problem Statement
In this section, the problem of hunterandgatherer mission planning (HGMP) in the context of dynamic MRTA is explained. Assume that there are tasks distributed randomly over the environment, E. We consider a case that the number and the locations of tasks are unknown for agents before the execution of the HGMP. The set of tasks is denoted as in which each task is split into hunting and gathering subtasks, i.e., with , where and represent hunting and gathering subtasks, respectively. In this case, the set of agents is defined as that comprises two teams of hunters and gatherers , where and . The cost associated with for accomplishment of is denoted as , and is the cost associated with for accomplishment of .
2.1. Assumptions
Throughout this paper, the following are assumed:(1)Tasks are stationary; that is, they are fixed to their locations.(2)The cost of accomplishment of each task is equal to the distance that an agent moves to do a task. An agent is considered done with a task when it reaches the task’s location.(3)All agents of the same team are identical.(4)All agents are rational; that is, they intend to maximize their expected utility.(5)All agents are fully autonomous and have their own utility functions; that is, no global utility function exists.(6)Agents from complementary teams can communicate with each other using a stably connected network.
Now, the HGMP problem can be stated as follows: Suppose that there exists a tuple for the mission such that HGMP = (E, α, Τ). denotes the assignment function which assigns m tasks to agents such that . Under assumptions (1)–(6), the global objective is to minimize the collective cost of :where and are binary decision variables for and :
In (1), weighting parameters and are introduced to sum relative collective costs of complementary teams because of the physical differences of each type Table 1.

This problem has a global objective which can be achieved by determining the binary decision variables optimally. These variables need to be determined by the agents throughout explorations and negotiations in a distributed manner. Since agents are rational, each agent’s objective is to maximize its own expected utility. As a consequence, the objectives of agents may be conflicting during the HGMP. Hence, the methodology should be developed so that it handles these conflicts in order to maximize the effectiveness of the HGMP and achieve the global objective .
3. Methodology
3.1. Conceptual Frameworks
Hunters are assigned to explore the unknown environment. There is as the incentive reward for a hunter, denoted as , who detects a task, denoted as . However, the detected task can only be completed by cooperation with a gatherer. Thus, an extra incentive is added for motivating agents from complementary teams to build up a cooperation, denoted as . Hunters and gatherers involve in negotiation processes to reach agreements for completing the tasks and sharing between themselves. In a negotiation, a hunter who has detected a task on one side and one or more gatherers on the other side are involved. An agreement determines which gatherer is assigned to complete the detected task and how much is its share from . Let us denote and as the proportions that and receive from for accomplishment of , respectively. Also, the gatherer who completes the detected task receives as a gathering incentive, when the task is completed. Since all agents are rational, they intend to maximize their incentives by accomplishing more tasks through building up more cooperation.
To establish the process by which agents come into an agreement, we define an online board on which each hunter announces the location of its new detection to find gathering partners for starting a negotiation process. Each gatherer follows the announcements on the online board to choose a waiting hunter for negotiation by analyzing the location information shared by each waiting hunter. A gatherer then sends a readiness message to the chosen hunter to start a negotiation.
We consider two possible scenarios in order to develop reasoning mechanisms for agents to negotiate and cooperate: (1) a waiting hunter receives only one readiness message and (2) the waiting hunter receives more than one readiness message. The first scenario resembles the bargaining or reverse auction process as there is only one buyer who aims to bargain for finding the most affordable option. The second scenario is similar to an auction process where usually there is more than one buyer interested in a specific object. We utilize these two marketbased methods as negotiation models between negotiating agents. In addition, it is possible that the number of waiting hunters on the board, denoted as , be more than one. In this case, the question that how a gatherer chooses a hunter among waiting agents is addressed in Section 3.5. For the time being, we assume that gatherers already know how to choose a partner and we focus on the negotiation reasoning mechanisms.
Fundamentals of reasoning mechanisms are discussed in the next section, and next, we will explain how agents rely on their reasoning mechanisms to behave in the reverse auction and auction scenarios in Sections 3.3 and 3.4, respectively.
3.2. Reasoning Mechanism
In this section, reasoning mechanisms for both hunters and gatherers are developed to establish their behavior during a mission that determines the way that they communicate, negotiate, and cooperate. Since fundamentals of reasoning mechanisms are similar for both type of agents, for the sake of brevity, we consider a general agent defined as with , where .
Moving on, it is time to introduce the CPM and UPM for . The CPM is a circular margin with a radius of , in which is certain about a profitable agreement even if its share in is zero. The UPM is a circular margin between two concentric circles with radiuses of and , in which is uncertain about making a profit in an agreement; that is, its profit strongly depends on its proportion of . Furthermore, cannot make any profit beyond its UPM even if it receives entirely.
Figure 1 shows the CPM and UPM as two concentric circles with as the center. The agent compares its cost for accomplishing the task with its CPM and UPM to realize its state to make profitable decisions during the negotiation.
The following statements explain the states of with respect to its cost for accomplishing :(i)State 1: if , then can make a profit regardless of its proportion of (ii)State 2: if , then the profit of depends on its proportion of (iii)State 3: if , then cannot make any profit even if it receives all of
We formulize the CPM and UPM for by defining and . If accomplishes , then it receives as an incentive. Since is certain about receiving , we have . Hence, we define by introducing as a scaling parameter for the CPM:
In addition, receives a proportion of that will be determined by the negotiation process, so is uncertain about its share of . Thus, we define by introducing as a scaling parameter for the UPM:
Altogether, for involved in a negotiation, the utility function defined below determines its payoff.
Definition 1 (utility function). gives the profit earned by for accomplishing and building up a cooperation. The utility function of is defined asNow, we define a profit interval for , regarding its state for accomplishing , by which it evaluates its results in a negotiation. A profit interval is an interval for that guarantees the negotiation’s profitability. According to assumption (4), wants to maximize its payoff, so in each negotiation, definitely makes a nonnegative profit such thatThis can be written asThe overlap of and (7) yields the profit interval for . The overlap in all three states is expressed as follows: If is in state 1, then and the left side of (7) is negative. Hence, the overlap is ; that is, in state 1, makes a profit regardless of its share in . If is in state 2, then we have . Therefore, the overlap gives . And if is in state 3, then we have , so the left side of (7) is greater than one. Hence, the overlap of (7) and is a null set; that is, the task is not profitable. Accordingly, the profit interval of for accomplishing is defined as , where and denote lower and upper bounds, respectively.
3.3. The First Scenario: Reverse Auction
Consider the scenario shown in Figure 2(a) and suppose that has detected at the cost of , it has posted an announcement on the online board, and it receives a readiness message only from . This message is a request for a quotation message; that is, offers a proportion for sharing and decides to accept or reject the offer. Accordingly, we explain how makes offers and makes an acceptance or rejection decision, using the proposed reasoning mechanisms.
(a)
(b)
According to the process illustrated in Figure 3, calculates the lower bound, the midpoint, and the upper bound of its profit interval for making 3 offers. Since is making offers to , it should send offers using , as follows:
According to Figure 3, at the first decision node, makes an offer regarding the explained process, and then at the second step, decides to accept or reject the offer. Algorithm 1 illustrates the bargaining procedure for . In this algorithm, denotes the gatherer agent that has sent the readiness message. In line 3 in Algorithm 1, is assigned to the vector “offers” in a random order.

Besides, uses its own profit interval to make an acceptance or rejection decision. For each received offer made by , if the offer is inside , then accepts the offer. Otherwise, it rejects the offer.
3.4. The Second Scenario: Auction
Consider the auction scenario shown in Figure 2(b) and suppose that has detected at cost , it has posted an announcement on the online board, and it receives readiness messages from gatherers. In this case, holds an auction and selects the winner, where gatherers bid for sharing to win the detected task and complete it. Accordingly, both types of agents’ reasoning mechanisms need to be investigated.
We utilize the “secondprice sealedbid auction” as the negotiation framework in which the winning bidder is an agent who has placed the highest bid and it pays the amount equal to the second highest bid to the hunter holding the auction. In this auction, , a gatherer bidding in the auction, can bid its valuation. Since it will not pay as much as it bids if it wins, still has a chance to get a positive benefit from the auction. Therefore, the main advantage of the secondprice auction over the firstprice auction is that truthful bidding is an optimal strategy in a secondprice auction, and as a result, it is ensured to converge to an optimal solution. Trustful bidding means that it is an optimal strategy for a bidder in a secondprice auction to bid however much it values that object [31]. To that end, we explain how bids using its profit intervals first and then we discuss the way that chooses the winning bidder. bids its valuation that is the lowest bound of its profit interval. Since is making an offer to by bidding, it should send the bid using , as follows:
Besides, chooses the winner bidder after a single round of bidding. Firstly, chooses the winning bidder, , regarding the maximum bid in the set of bids, denoted as b, such that
Secondly, checks if the second highest bid satisfies the minimum acceptable bid determined by its profit interval. Since the share of in must satisfy (6), the minimum acceptable bid is the lower bound, , of its profit interval such that
3.5. Multitask Planning
In Figure 4, suppose that has detected . is the only gatherer agent that can send a readiness message to start a negotiation with because is busy with gathering . In this case, is an inefficient planning where .
Alternatively, if is able to plan for multiple tasks at a time, it could gather at a lower cost. Accordingly, to prevent such ineffective plannings in the HGMP, in the following, a multitaskplanning algorithm for gatherers is proposed.
Let us define an action plan in which queues multiple tasks to accomplish in the future such that , where with as the maximum number of tasks that gatherers can queue. Each task has a profile in the gatherer’s action plan containing required information: , where denotes the temporary cost calculated by the agent for . through are the tasks that is already planned to accomplish (actual tasks). In addition, denotes the path that follows to accomplish . Now, assume that wants to add a new task to as , when . The first step is choosing a waiting hunter, and the second step is negotiating with the chosen agent. The negotiation processes have been discussed before, so here we focus on the procedure that chooses a waiting hunter to fill up .
The proposed method relies on the CPM and UPM to develop the gatherers’ reasoning mechanism so that fills up effectually. To that end, a threestep process in which chooses a waiting hunter agent for negotiation is proposed. Before starting the process, follows the online board and lists the waiting hunters in ordered by their waiting time; that is, the oldest is the first in the list. Process steps are elaborated as follows: Step 1: considers the most prior task from , denoted as . Then, plans the shortest multidestination temporary path, using the A^{∗} search algorithm [32], denoted as , to gather all tasks in plus . When is generated, the temporary cost of each task must be updated in each task’s profile. Step 2: verifies the feasibility of making a profit from . Thus, it checks whether is in state 3 regarding . If is not in state 3, then goes to the next step. Otherwise, withdraws and starts over from the first step. Step 3: examines the effect of choosing on the actual tasks in . When generates in step 1, it may have for the actual tasks in . For this reason, checks whether (6) is still true for newly calculated temporary costs for each actual task. If (6) is true for all actual tasks, is verified for starting a negotiation process; otherwise, withdraws and starts over from the first step.
Algorithm 2 illustrates the multitaskplanning procedure for . This algorithm is developed as a function for choosing a candidate task detected by a waiting hunter by considering . Not to mention, the output of this algorithm is not a task in the agent’s action plan, i.e., . The output is a candidate task detected by a waiting hunter that potentially can be added to as depending on the negotiation process.

Figure 5 illustrates an example in which fills out its action plan where . All sequences happen before accomplishing . In sequence 1, has already in its action plan and chooses to negotiate. In sequence 2, has reached an agreement for accomplishing and adds it to its action plan as . Moreover, there are two candidates, and is chosen to negotiate because is not feasible and fails to satisfy the condition mentioned in step 2. In sequence 3, has reached an agreement for accomplishing and adds it to its action plan as . In addition, there are two candidate tasks, namely, and . cannot be verified in step 3, though it is feasible itself and passes step 2. Therefore, chooses for negotiation. In sequence 4, has reached an agreement for accomplishing and adds it to its action plan as . Moreover, it chooses to negotiate because it passes all 3 steps. Although choosing causes change in the path to , it does not bring into state 3 for .
3.6. DecisionMaking Algorithms
Firstly, we propose a distributed decisionmaking algorithm determining the exploration, detection, and negotiating procedure for in the HGMP. Basically, we utilize the distributed approach to ensure the reliability of the MAS, where centralized MASs may not be robust for the reason that they are relying on a single central unit. In addition, the inherence of the proposed reasoning mechanism enables agents to make the decision independently regardless of any central unit. Furthermore, the nature of the profit margins limits all interactions between agents to local regions within the environment, so there is no need for a central unit to play a role. Even in the case of auctions, each hunter who has detected a task holds a local auction and plays the role of an action organizer temporally and locally. Having said that, a distributed decisionmaking algorithm fits the best to the proposed reasoning mechanisms. Accordingly, Algorithm 3 illustrates the decisionmaking procedure for . In each iteration, explores the environment to detect a task. When detects a task, denoted as , it announces the location on the online board and waits to receive readiness messages. According to the number of readiness messages that receives, it starts a reverse auction or auction negotiation process to reach an agreement. If reaches an agreement, then it starts exploring the environment again. Otherwise, announces its detection on the online board again and does the same procedure. denotes the iteration number, and denotes the maximum iterations in a mission.

Secondly, we present a distributed decisionmaking algorithm determining the negotiating and accomplishment procedure in the HGMP for regarding the explained reasoning mechanisms. Algorithm 4 illustrates the decisionmaking procedure for . In each iteration, when , manages its action plan by calling the “choose partner” function first and then negotiating with the chosen hunter upon availability. If the negotiation is succeeded, then it adds the new task to and updates . Moreover, in each iteration, follows to gather tasks in . When a task is gathered, updates by removing the accomplished task.

4. Nash Equilibrium Analysis
It is important to study the stability of the proposed algorithms to ensure that agents do not have motivation to change their behavior during the HGMP, i.e., to make sure that agents can make optimal decisions in the scenarios and do not vacillate in negotiations and task accomplishments. In this section, we study the stability of the proposed algorithms in both reverse auction and auction scenarios.
4.1. The First Scenario: Reverse Auction
Consider a hunter and a gatherer agent whose preferences over outcomes are given by the utility functions and , respectively. As shown in Figure 3, the model in which agents negotiate in the first scenario is a simplified reverse auction or bargaining process. According to assumption (6), each agent obtains sufficient information about all actions and utilities. Thus, the model turns into a perfectinformation extensive form game which resembles a sharing game. We know that every (finite) perfectinformation game in the extensive form has a purestrategy Nash equilibrium (PSNE) [33]. However, the existence of PSNE does not necessarily ensure that the output of the first scenario is a PSNE. It strongly depends on the decisionmaking algorithm of each agent. Therefore, we need to prove if the output of the proposed reasoning mechanisms in the first scenario is a PSNE.
According to the proposed reasoning mechanisms, each agent calculates a profit interval to make the most profitable decision. To be specific, makes its best response by making offers that fall into its profit interval. Similarly, makes its best response to the scenario by accepting the offers within its profit interval. In other words, the decision of each agent is its best possible response to the scenario, and it knows that the counterpart agent is also making its best response. We know that the strategy profile in which each agent is making its best response to another agent is a PSNE [33]. Consequently, the HGMP’s outcome is a PSNE in the first scenario.
Although the model itself ensures the existence of PSNE and the reasoning mechanisms’ outcome is a PSNE, the desirability of PSNE is still a considerable concern. The following numerical example explains the details on how scaling parameters can affect the PSNE in the first scenario.
In the reverse auction scenario, pure strategies for and are defined as and , respectively, where A and R stand for acceptance and rejection actions of , respectively. Now, let us assume that , , and . If we have and for accomplishing , then offers are calculated from (8), as follows: , , and . Hence, {(A, A, A), offer1} is one of the equilibria; that is, accepts the first offer which results . On the contrary, if we only change scaling parameters of such that , then {(A, A, A), offer1} is no longer a PSNE. Instead, {(R, R, R), offer1} is a PSNE; that is, rejects the first offer. In conclusion, the desirability of PSNE in the first scenario can be guaranteed by designating appropriate scaling parameters , , , and .
4.2. The Second Scenario: Auction
In the second scenario, we investigate the existence of NE by a theorem based on the CPM and UPM concepts. We investigate 3 conditions to find the NE in an auction process. We will prove the theorem by contradiction; that is, we show that no agent, involving in an auction scenario, has a motivation to deviate from a strategy profile which satisfies all 3 conditions.
Theorem 1. Consider the HGMP in the second scenario associated with the secondprice sealedbid auction with participation of a hunter and gatherers whose preferences over outcome are given by the utility functions and , respectively. Then, is a Nash equilibrium if and only if conditions (i) and (ii) are satisfied for and condition (iii) is satisfied for the hunter agent:(i), i.e., the winner submitted a sufficiently high bid(ii), i.e., the winner’s valuation is sufficiently high(iii), i.e., the second highest bid satisfies the minimum bid determined by the hunter
Proof. If (i) does not hold, , then has an interval to increase its bid, , in which it can lower its share to and place even a higher bid than and win the auction. Hence, has a motivation to deviate and increase its payoff. If (ii) does not hold, , then for , denoted as the winner, we have ; that is, its payoff is negative. Therefore, it can deviate by submitting a losing bid and increasing its payoff to 0. Finally, if (iii) does not hold, , then the hunter agent’s payoff is negative for the second highest bid. Thus, it can deviate by rejecting all bids and increase its payoff to 0 because it has a strong motivation to hold another auction in the following iterations and avoid a negative payoff.
Nevertheless, the existence of NE does not necessarily ensure that the scenario’s output is a NE. It strongly depends on the decisionmaking algorithm of each agent. In this regard, we know that each gatherer involving in the auction scenario places a bid according to (9). This means each gatherer agent bids its own valuation, i.e., . Accordingly, conditions (i) and (ii) are always true because not only the winner has placed the highest bid among all bidders but also it does not have a negative payoff. Besides, the hunter agent is using (10) to choose the winning bidder and (11) to verify the minimum requirement satisfaction of the second highest bid. Hence, condition (iii) is also true. As a conclusion, according to Theorem 1 and also the decisionmaking algorithms of all agents participating in an auction, the result of the auction scenario is a NE.
5. Simulation Results
In this section, we present simulation results to (1) validate the fairness of the proposed algorithms, i.e., to ensure that the overall workload is distributed equally among agents of both types, by comparing agents’ effectiveness in a set of experiments and analyzing the results by paired Ttest and ANOVA [34] methods, (2) study the effect of profit margins on the total effectiveness of the HGMP, (3) demonstrate the efficacy of the proposed multitaskplanning algorithm for gatherers by investigating its effect on the HGMP’s total effectiveness, and (4) verify the functionality of the hunterandgatherer scheme, i.e., considering each task comprising two sequential exploration and completion subtasks, by a comparison between the HGMP and a basic alternative method in which each agent does both hunting and gathering tasks itself.
To simulate the proposed approaches, we developed a multirobot simulation platform in MATLAB from scratch. In this platform, we can implement the simulations on any custom map, while the number of agents of each type is adjustable. We provide some basic functions for each type of agent to enable them maneuver over the determined environment. For gatherers, we utilized the based motion planning algorithm which enables them to move along two points in a grid environment. In addition, we provided a basic frontierbased exploration algorithm [35] for hunters. Besides, the number of tasks is also adjustable while they get located randomly over the environment. As a matter of fact, we also provided the perpetual mode for implantation of the simulations where for each gathered task, another task will be distributed randomly in the environment. Accordingly, at each iteration, there are a certain number of tasks available in the environment which is adjustable for each mission. Furthermore, in the perpetual mode, each explored and known grid of the environment turns into an unknown grid after certain iterations. The perpetual mode helps the analysis be done in a much more accurate and evidencebased way.
All simulations have been executed under the following conditions: (1) the environment is sectioned as an grid of tiles where , (2) the quantities of each type of agent are adjusted as and , (3) there are always tasks in the environment, (4) the maximum number of iterations is determined as , (5) the rewards are assigned to be , and (6) we considered the weighting parameters as .
5.1. Fairness of the HGMP
To demonstrate that the accomplishment workload is distributed equally for each type of agent, the concept of fairness is introduced. To that end, we define an effectiveness factor for each agent of both types based on their costs and accomplishment. Then, using the statistical analysis, we prove the fairness of the HGMP by comparing effectiveness of different agents of each type. Let and denote the effectiveness of and the number of tasks hunted by the agent, respectively, as follows:
Similarly, and denote the effectiveness of and the total number of tasks gathered by the agent, respectively, such that
Figure 6(a)shows the statistical results of for all hunters in 200 missions. As , an ANOVA test has been applied to the collected data to statistically prove the fairness of the HGMP for hunters. The ANOVA test has been applied as follows: , , and , where denotes the average of for in 200 tests and denotes the significance level. According to the results of the ANOVA test, , , and . Since and , we have to retain the null hypothesis. Thus, it has been proved that , which means that there is no significant difference between averages of hunters’ effectiveness in 200 tests.
(a)
(b)
In addition, as , a paired Ttest has been applied to the data to investigate the fairness of the HGMP for gatherers. The hypothesis testing has been done in a manner such that , , , , , and . According to the test, . Since , we must retain the null hypothesis. Therefore, it has been proven that , as it is illustrated in Figure 6(b), which means that there is no significant difference between averages of gatherers’ effectiveness in 200 tests.
Both statistical analyses indicate that all agents of the same type behave analogously under similar characteristics. In fact, this analysis numerically validates the Nash equilibrium analysis proved for the HGMP. It means that if the fairness concept investigated above is not valid for the HGMP and favors certain agents unfairly, then there are strong motivations for other agents to deviate from the proposed negotiation structure.
5.2. Effects of Agents’ Profit Margins on Mission’s Effectiveness
The effects of scaling parameters of profit margins, , , , and , on the total effectiveness of the HGMP need to be investigated in order to show the functionality of the CPM and UPM for both types of agents. To that end, we define an effectiveness factor for the HGMP, , which is the ratio of the total number of completed tasks, , and the collective cost of the whole mission, , as follows:
We ran the algorithms for all values of and that are multiples of 0.025 such that and , while . has been calculated for each set of values for and , as illustrated in Figure 7. Basically, this figure explains the correlation between the total effectiveness of the HGMP and the profit margin parameters of gatherers. The yellow area shows the area in which the total effectiveness is maximum. In this figure, the horizontal and vertical axes are and , respectively, i.e., the scaling parameters of gatherers, and the color mapping represents the total effectiveness of the HGMP, i.e., . According to the results, is vanishingly small when which means agents cannot reach an agreement for completing the detected tasks. Furthermore, reaches its maximum when . Next, falls gradually when because each gatherer’s CPM and UPM are large so that the agent does not fall into state 3 and easily reach any agreement. As a result, each gatherer accomplishes a significant number of tasks inefficiently which reduces .
(a)
(b)
In the same way, we ran the algorithms for all values of and that are multiples of 0.05 such that and , while . has been calculated for each set of values for and , as illustrated in Figure 8. Basically, this figure explains the correlation between the total effectiveness of the HGMP and the profit margin parameters of hunters. The yellow area shows the area in which the total effectiveness is maximum. In this figure, the horizontal and vertical axes are and , respectively, i.e., the scaling parameters of hunters, and the color mapping represents the total effectiveness of the HGMP, i.e., . Accordingly, is too low when approximately, which means the CPM and UPM of hunters are too small and only a few agreements are reached. Then, for , increases gradually to reach its maximum and then again decreases.
(a)
(b)
According to the proposed reasoning mechanism, when the scaling parameter of an agent’s CPM decreases, the agent gets less confident. And when the scaling parameter of an agent’s UPM increases, the agent gets less conservative. In this regard, for both types of agents, the best strategy to reach the maximum of is neither being completely confident nor being fully conservative, but a combination of both leads to the optimum result.
The oblique yellow area in Figure 7(b), exposing the maximum values of , is much narrower than the one in Figure 8(b). It shows that the CPMs and UPMs of gatherers have a more distinct influence on than the ones of hunters. The rationale behind this dissimilarity is that hunters rely on their CPMs and UPMs after hunting a task, i.e., after accomplishing a task, and then consider them only for finding a gatherer to complete the task. On the contrary, gatherers consider their CPMs and UPMs before gathering a task, i.e., before any accomplishment. Consequently, this difference causes a much more distinct influence of gatherers’ CPMs and UPMs on .
5.3. The Effect of Multitask Planning on the HGMP’s Effectiveness
In this section, we aim to study the effect of the proposed multitaskplanning algorithm for gatherers on the total effectiveness of the HGMP defined in (16). Accordingly, we investigate the effect of , which is the queue size of each gatherer, on . To that end, we ran 200 missions for each value of , varying from 1 to 10, and measured in each mission, as illustrated in Figure 9.
To understand how much increase when changes from to , we applied a paired Ttest to the two of collected data sets. The first data set contains 200 measures of for , and the second data set comprises 200 measures of for . The test has been conducted considering , , , , , and , where and denote the average of for the first and second data sets, respectively. According to the test result, , , and . Since and , we reject . Therefore, the results prove that increases more than 70% by changing from 1 to 10. Moreover, the results also show that the HGMP remains fair for gatherer agents by increasing . Figure 10 demonstrates that there is no significant difference between effectiveness of two gatherers for each value of .
Besides, Figure 11 shows how the HGMP’s total effectiveness converges for different values of in a manner such that , , and . According to the results, by increasing the value of , becomes more variant and the convergence time decreases, while enhances significantly as was proven before.
5.4. Functionality Validation of the HGMP by a Comparison
In this section, we are intended to analyze the functionality of the proposed hunterandgatherer scheme. As discussed before, we consider a dynamic problem to be a TA:SP problem where each task is composed of two sequential detection and completion subtasks. Although we have discussed different aspects of the proposed approach in the previous sections, here we want to explicitly compare the proposed approach with an alternative approach in which there is only one type of agent doing both exploration and completion of tasks together.
According to the rationale behind the hunterandgatherer approach, hunters must be more agile and costefficient in exploration and maneuvering. Therefore, we first plotted the total effectiveness of the HGMP with respect to which ranges from , i.e., , to , i.e., . Second, we ran the explained alternative approach to be able to judge the HGMP’s functionality. Since in this approach there is no hunterandgatherer scheme, we only have one type of agent and the obtained total effectiveness is dependent on the ratio. By this comparison, we basically wanted to answer the following question: Is the HGMP profitable compared to the alternative method? Figure 12 shows the results of the implemented simulations for that purpose, as explained above. Thus, the answer is that it depends to the ratio and this is why we ended up in having a criterion for the HGMP to be profitable. According to the results, for , the HGMP has distinct advantage in terms of over the alternative model for any value of . Furthermore, the HGMP still remains advantageous for . Consequently, it is economic to employ the HGMP for the stated dynamic problem if and only if we utilize the hunterandgatherer agent that satisfies . In other words, if we employ two robots from different types as a hunter and gatherer such that the hunter’s cost for following a certain path is less than 0.6 of the gatherer’s cost for following the same path, then employing the HGMP will be profitable. Considering the USAR example, the hunter can be a small UAV, while the gatherer should necessarily be a heavyduty UGV. If we consider the cost as the power consumption, then the criterion will be satisfied easily.
A screen capture video of the simulation results can be found as a supplementary material along with this paper, by using the YouTube link “youtu.be/HJuiP5DMZfo,” or by scanning the following QR code.
6. Conclusion
Inspired by the problem of “MRTA in an unknown environment,” we proposed the idea of task allocation based on coupling and cooperation between complementary teams in a hunterandgatherer scheme. Furthermore, this work presented distributed reasoning mechanisms relying on the notions of certainty and uncertainty profit margins in which levels of confidence and conservativeness are modeled, while an effective multitaskplanning algorithm for gatherers is proposed that allows them to queue multiple tasks for finding the optimal solution for completing a group of tasks rather than doing one by one. By comparing the proposed hunterandgatherer scheme with an alternative method, where there is only one type of agent doing both exploration and completion of tasks together, we established a criterion to judge profitability of the proposed method. Examining the realworld problems mentioned earlier confirms that the profitability criterion is reasonably satisfiable. We also found that the extreme behavior of an agent, being too confident or too conservative, hurts the total effectiveness of the mission. Furthermore, statistical analysis demonstrates a significant improvement of total effectiveness effected by the multitaskplanning algorithm. However, while computational complexities for execution of the multitaskplanning algorithm manifold by increasing the size of an agent’s queue size, the total effectiveness of the HGMP does not increase linearly.
Future works will consider the problem of adjusting the scaling parameters by an agent during a mission to achieve the optimal performance from both agent and team points of view. We also intend to develop a multirobot exploration algorithm based on the notions of profit margins in the context of dynamic MRTA problems and investigate the effect of different multirobot exploration algorithms on the HGMP.
Data Availability
The data and source code used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was supported by the Lamar University via internal grants.
References
 R. R. Murphy, Disaster Robotics, The MIT Press, Cambridge, MA, USA, 2014.
 H. Kitano, S. Tadokoro, and I. Noda, “RoboCup Rescue: search and rescue in largescale disasters as a domain for autonomous agents research,” in Proceedings of the IEEE SMC’ 99 Conference, Tokyo, Japan, October 1999. View at: Google Scholar
 A. Bechar and C. Vigneault, “Agricultural robots for field operations: concepts and components,” Biosystems Engineering, vol. 149, pp. 94–111, 2016. View at: Publisher Site  Google Scholar
 J. N. K. Liu, M. Wang, and B. Feng, “iBotGuard: an Internetbased Intelligent robot security system using invariant face recognition against intruder,” IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), vol. 35, no. 1, pp. 97–105, 2005. View at: Publisher Site  Google Scholar
 T. Theodoridis and H. Hu, “Toward Intelligent security robots: a survey,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 6, pp. 1219–1230, 2012. View at: Publisher Site  Google Scholar
 M. Dunbabin and L. Marques, “Robots for environmental monitoring: significant advancements and applications,” IEEE Robotics & Automation Magazine, vol. 19, no. 1, pp. 24–39, 2012. View at: Publisher Site  Google Scholar
 K. Jose and D. K. Pratihar, “Task allocation and collisionfree path planning of centralized multirobots system for industrial plant inspection using heuristic methods,” Robotics and Autonomous Systems, vol. 80, pp. 34–42, 2016. View at: Publisher Site  Google Scholar
 L. E. Parker, D. Rus, and G. S. Sukhatme, “Multiple mobile robot systems,” in Springer Handbook of Robotics, Springer, Berlin, Heidelberg, Germany, 2008. View at: Google Scholar
 B. P. Gerkey and M. J. Mataric, “Multirobot task allocation: analyzing the complexity and optimality of key architectures,” in Proceedings of the IEEE International Conference on Robotics and Automation (Cat. No.03CH37422), Taipei, Taiwan, September 2003. View at: Google Scholar
 L. Luo, N. Chakraborty, and K. Sycara, “Provablygood distributed algorithm for constrained multirobot task assignment for grouped tasks,” IEEE Transactions on Robotics, vol. 31, no. 1, pp. 19–30, 2015. View at: Publisher Site  Google Scholar
 R. Simmons, D. Apfelbaum, W. Burgard, D. Fox, and M. Moors, “Coordination for multirobot exploration and mapping,” in Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, AAAI Press, Austin, TX, USA, July 2000. View at: Google Scholar
 M. J. Matarić, G. S. Sukhatme, and E. H. Østergaard, “Multirobot task allocation in uncertain environments,” Autonomous Robots, vol. 14, no. 2, pp. 255–263, 2003. View at: Google Scholar
 R. R. Murphy, “Search and rescue robotics,” in Springer Handbook of Robotics, B. Siciliano and O. Khatib, Eds., Springer, Berlin, Heidelberg, Germany, 2008. View at: Google Scholar
 A. Zhu and S. X. Yang, “A neural network approach to dynamic task assignment of multirobots,” IEEE Transactions on Neural Networks, vol. 17, no. 17, pp. 1278–1287, 2006. View at: Publisher Site  Google Scholar
 J. Turner, Q. Meng, G. Schaefer, A. Whitbrook, and A. Soltoggio, “Distributed task rescheduling with time constraints for the optimization of total task allocations in a multirobot system,” IEEE Transactions on Cybernetics, vol. 48, no. 9, pp. 2583–2597, 2018. View at: Publisher Site  Google Scholar
 D. Zhu, H. Huang, and S. X. Yang, “Dynamic task assignment and path planning of multiAUV system based on an improved selforganizing map and velocity synthesis method in threedimensional underwater workspace,” IEEE Transactions on Cybernetics, vol. 43, no. 2, pp. 504–514, 2013. View at: Google Scholar
 S. Marangoz, M. F. Amasyalı, E. Uslu, F. Çakmak, N. Altuntaş, and S. Yavuz, “More scalable solution for multirobotmultitarget assignment problem,” Robotics and Autonomous Systems, vol. 113, pp. 174–185, 2019. View at: Publisher Site  Google Scholar
 K. Lerman, C. Jones, A. Galstyan, and M. J. Matarić, “Analysis of dynamic task allocation in multirobot systems,” The International Journal of Robotics Research, vol. 25, no. 3, pp. 225–241, 2006. View at: Publisher Site  Google Scholar
 S. Liemhetcharat and M. Veloso, “Weighted synergy graphs for effective team formation with heterogeneous ad hoc agents,” Artificial Intelligence, vol. 208, pp. 41–65, 2014. View at: Publisher Site  Google Scholar
 A. C. Chapman, R. A. Micillo, R. Kota, and N. R. Jennings, “Decentralised dynamic task allocation: a practical game–theoretic approach,” in Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems, pp. 915–922, Budapest, Hungary, May 2009. View at: Google Scholar
 N. Michael, M. M. Zavlanos, V. Kumar, and G. J. Pappas, “Distributed multirobot task assignment and formation control,” in Proceedings of the IEEE International Conference on Robotics and Automation, Pasadena, CA, USA, September 2008. View at: Google Scholar
 S. Sariely and T. Balch, “Efficient bids on task allocation for multirobot exploration,” in Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society, Melbourne Beach, FL, USA, May 2006. View at: Google Scholar
 D. Wu, G. Zeng, L. Meng, W. Zhou, and L. Li, “Gini coefficientbased task allocation for multirobot systems with limited energy resources,” IEEE/CAA Journal of Automatica Sinica, vol. 5, no. 1, pp. 155–168, 2018. View at: Publisher Site  Google Scholar
 P. M. Shiroma and M. F. M. Campos, “CoMutaR: a framework for multirobot coordination and task allocation,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA, October 2009. View at: Google Scholar
 A. Prorok, M. A. Hsieh, and V. Kumar, “The Impact of diversity on optimal control policies for heterogeneous robot swarms,” IEEE Transactions on Robotics, vol. 33, no. 2, pp. 346–358, 2017. View at: Publisher Site  Google Scholar
 B. P. Gerkey and M. J. Matarić, “A formal analysis and taxonomy of task allocation in multirobot systems,” The International Journal of Robotics Research, vol. 23, no. 9, pp. 939–954, 2004. View at: Publisher Site  Google Scholar
 E. Nunes, M. Manner, H. Mitiche, and M. Gini, “A taxonomy for task allocation problems with temporal and ordering constraints,” Robotics and Autonomous Systems, vol. 90, pp. 55–70, 2017. View at: Publisher Site  Google Scholar
 D.H. Lee, S. A. Zaheer, and J.H. Kim, “A resourceoriented, decentralized auction algorithm for multirobot task allocation,” IEEE Transactions on Automation Science and Engineering, vol. 12, no. 4, pp. 1469–1481, 2015. View at: Publisher Site  Google Scholar
 D.H. Lee, “Resourcebased task allocation for multirobot systems,” Robotics and Autonomous Systems, vol. 103, pp. 151–161, 2018. View at: Publisher Site  Google Scholar
 N. Sullivan, S. Grainger, and B. Cazzolato, “Sequential singleitem auction improvements for heterogeneous multirobot routing,” Robotics and Autonomous Systems, vol. 115, pp. 130–142, 2019. View at: Publisher Site  Google Scholar
 M. J. Osborne, An Introduction to Game Theory, Oxford University Press, New York, NY, USA, 2003.
 S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, Pearson, London, UK, 3rd edition, 2010.
 Y. Shoham and K. LeytonBrown, Multiagent Systems: Algorithmic, GameTheoretic, and Logical Foundations, Cambridge University Press, New York, NY, USA, 1st edition, 2008.
 D. C. Montgomery, G. C. Runger, and N. F. Hubele, Engineering Statistics, Wiley, New York, NY, USA, 5th edition, 2010.
 B. Yamauchi, “Frontierbased exploration using multiple robots,” in Proceedings of the Second International Conference on Autonomous Agents, pp. 47–53, ACM Press, 1998. View at: Google Scholar
Copyright
Copyright © 2020 Mehdi Dadvar et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.