Abstract

Game Theory is a promising approach to acquire coalition formations in multiagent systems. This paper is focused on the importance of the distributed computation and the dynamic formation and reformation of pursuit groups in pursuit-evasion problems. In order to address this task, we propose a decentralized coalition formation algorithm based on the Iterated Elimination of Dominated Strategies (IEDS). This Game Theory process is common to solve problems requiring the withdrawal of dominated strategies iteratively. Furthermore, we have used the Markov Decision Process (MDP) principles to control the motion strategy of the agents in the environment. The simulation results demonstrate the feasibility and the validity of the given approach in comparison with different decentralized methods.

1. Introduction

Multiagent System (MAS) is the cooperation of an organized set of intelligent agents situated in an environment in order to coordinate their performances and resolve complex problems. Coalition formation, considered as a major focus in social systems, is an important method of cooperation. In general, coalition formation between the agents is goal-directed and short-lived. Coalitions are formed to achieve a specific objective and dissolve when it is accomplished. Coalition formation has received a considerable amount of attention in recent research [1, 2]. Further, some research activities are based on the notion of Organization that allows the coalition of the agents in the form of groups as well as the cooperation between their members. Consequently, Ferber et al. [3] proposed AALAADIN organizational model based on three principal axes: Agent, Group, and the Role used simultaneously to describe the concrete agents’ organizations (AGR).

Multiagent cooperative pursuit is a known multiagent problem [4, 5]. Based on identical conditions between pursuers and evaders, the pursuit-evasion problem can be classified into single object pursuit and multiobject pursuit. Also, this problem has been considered in many references and its applications were inspired by an equally diverse set of approaches and useful techniques, such as Organization [6], in which we have used the principles of AGR organizational model to propose a pursuit coalition formation algorithm. Also, in order to equip each pursuit group with a dynamic access mechanism, we have introduced a flexible organizational model extended from AGR through the application of fuzzy logic principles that determines the membership degree of each pursuer in relation to each group [7].

Furthermore, Cai et al. introduced an economical auction mechanism (MPMEGBTBA) [8], where an advanced task negotiation process based on Task Bundle Auction was proposed in order to allocate the task dynamically through dynamic coalition formation of multiple agents. Also, we can find several researches treating the pursuit problem through different principles as Graph Theory [9], Polygonal Environment [10, 11], Data Mining [12], and Reinforcement Learning [13]. In this kind of problem, the pursuers and evaders are presented in different kinds of types. The type of a pursuer denotes its pursuit capacity. Otherwise, the type of an evader reflects the number and type of pursuers required to perform its capture. The value of an evader indicates the expected rewards that should be returned to the relevant pursuers after the achievement of the capture.

Game Theory can be considered as the simplest way to model situations of conflict and studies the interactions between interested agents. The classic question relating game theory to multiagent systems is “what is the best action that an agent can perform?” This principle has been widely used in multiagent pursuit problems [1417]. The negotiation based on Game Theory is focused on the value and rewards of each agent, which appropriately reflect the objective of agent’s negotiation (satisfying the goal of each agent). The main advantage provided by Game Theory algorithms in MAS is the coordination appearing through the hypothesis of mutual rationality of the agents. Therefore, The Game Theory algorithms are used to coordinate autonomous rational agents without the use of coordination mechanism explicitly integrated in the model of agents. Also, they provide different methods that define the optimal agents’ coalitions in several types of problems. Otherwise, the disadvantages of these algorithms concern the agents which are frequently considered as perfect rationales. Moreover, Game Theory algorithms are focused on the value of the optimal solution and overlook the most efficient method to achieve it.

In this paper we have focused on the Iterated Elimination of Dominated Strategies (IEDS), Game Theory technique to propose coalition formation algorithm as part of the pursuit-evasion problems. A strategy is the complete specification of the agent’s behavior in any situation (in the case of an extensive form game it means what behavior the agent must undertake according to the set of information provided). Moreover, we have used Markov Decision Process (MDP) principles in order to control the motion strategy of each agent. This process (MDP) provides formalism of model and resolves planning and learning problems under uncertainty [18, 19].

The paper is organized as follows: in Section 2, we discuss the main related works based on the same principles used in this paper. In Section 3, we focus on the pursuit-evasion problem by submitting a detailed explanation of the simulation environment and its different contents. Also, we approach the environmental agents by the definition and the clarification of the principal characteristics of pursuers and evaders. In Section 4, the Iterated Elimination of Dominated Strategies principle is described and detailed by an application’s example of this Game Theory process. In Section 5, the basic principles of Markov Decision Process are motivated. Therefore, we clarify the principles of the primary functions better known as the reward and transition functions. In Section 6, we introduce the distributed coalition formation algorithm with a detailed clarification of the coalition progress. A simulation of the pursuit-evasion game example is shown in Section 7; in this part we describe our simulation environment in a specific manner. Also, we present the results achieved in comparison with other outcomes based on other theories. Finally, Section 8 contains concluding remarks about this paper.

There exist a lot of works based on game theoretic principles regarding the PE problem such as in [14] where the author described the control of an autonomous agents’ team tracking an intelligent evader in a nonaccurately mapped terrain based on a method to calculate the Nash equilibrium policies by resolving an equivalent zero-sum matrix game. In this example, among all Nash equilibrium, the evader selects the one which optimizes its deterministic distance to the pursuers’ team. In order to resolve the problems often encountered in the algorithms of pursuit-evasion games such as computational complexity and the lack of universality, Dong et al. [15] propose a hybrid algorithm founded on improved dynamic artificial potential field and differential game, where Nash equilibrium solution is optimal for both pursuer and evader in barrier-free zone in pursuit-evasion game, and in accordance with environment changes around the pursuit elements the algorithm is applied with flexibility. Moreover in [16], Amigoni and Basilico have presented an approach to calculate the optimal pursuer’s strategy that maximizes the probability of the target’s capture in a given environment. This approach is based on the definition of the pursuit-evasion game theoretic model as well as on its resolution through the mathematical programming.

More recently, Lin et al. [20] proposed a pursuit-evasion differential game based on Nash strategies involving limited observations. On the one hand, the evader undertakes the standard feedback Nash strategy. On the other hand, the pursuers undertake Nash strategies based on the novel concept of best achievable performance indices proposed. This model has potential applications in cases where several weakly equipped pursuing vehicles are tracking well-equipped unmanned vehicle.

In relation to PE, MDP is usually used to provide the motion planning for the mobile pursuers through the maximization of the rewards obtained during the pursuit. In [21], a Partially Observable Markov Decision Process (POMDP) algorithm is used to search the mobile target in a known graph. The main objective is to ensure the capture of the targets via the clearing of the graph in minimal time. In [22], the authors propose a new approach Continuous Time Markov Decision Process (CTMDP) to address the PE problem. In relation to MDP, CTMDP takes into account the impact of the transition time between the states involving a strong robustness against changes in transition probability. In [23] the authors proposed an innovative approach totally based on MDP with the aim of resolving sequential multiagent decision problems by allowing agents to reason explicitly about specific coordination mechanisms. In other words, they determined a value iteration algorithm to compute the optimal policies that recognizes and reasons about Coordination Problems.

Furthermore, we can consider other works based on MDP and IEDS such as [24], in which an exact dynamic programming algorithm for partially observable stochastic games (POSGs) is developed. Also, it is proven that the algorithm iteratively eliminates very weakly dominated strategies without first forming a normal form representation of the game when it is applied to finite-horizon POSGs. Otherwise, several types of coordination mechanisms are currently used such as Stochastic Clustering Auctions (SCAs) [25, 26] which represent a class of cooperative auction methods based on the modified Swendsen-Wang method. It permits each robot to reconstitute the tasks that have been linked and applies to heterogeneous teams. Other mechanisms are market-based as TraderBots [27] applied on greedy agents in order to provide a detailed analysis of the requirements for robust and efficient multirobot coordination in dynamic environments. From the point of view of Game Theory, some research activities [28] investigated the optimal coordination approach for multiagent foraging, indeed, they built the equivalence between the optimal solution of MAS and the equilibrium of the game according to the same case, and then they introduced evolutionarily stable strategy to be of service in resolving the equilibrium selection problem of traditional Game Theory.

3. Problem Description

In this section, we focus on the cooperation problem in which pursuers situated in a limitary toroidal grid environment have to capture evaders of different types. The expressions of and represent the collection of pursuers and evaders, respectively. Pursuers and evaders represent the roles that the agents can play. Each evader is characterized by a type , with to indicate how many pursuers are required to capture it. Here we suppose that the pursuers can evaluate the evaders’ types after the localization. There exist some fixed obstacles with different shapes and sizes in the environment . The position could be destined for the mapping :  , such as : , well then is an obstacle.In our proposal, the strategies of each pursuer are guided by determining factors that reflect the individual development of the pursuer during the execution of the assigned tasks. These factors are detailed as follows.

Self-Confidence Degree. In multiagent systems, each agent must be able to execute the services requested by the other agents. The self-confidence degree is the assessment of the agent’s success in relation to the assigned tasks. It is denoted and computed in the following way: is the number of tasks that the agent has accomplished. is the number of tasks in which the agent has participated.

The Credit. In the case where the agent cannot perform a task, then its credit will be affected. The credit of an agent is designated and calculated as follows: is the number of the abandoned tasks by the agent.

Environment Position. The position of the agent in the environment is a crucial criterion for the pursuit sequences, because the capture will be easier if the pursuer is closer to the evader. The position Pos is computed as follows: is the state (cell) of the pursuer. is the state (cell) of the evader. Dist is the distance between the pursuer and the evader.) is the Cartesian coordinates of the pursuer. ) is the Cartesian coordinates of the evader.

In order to distinguish the different coalitions, each pursuer belonging to a coalition calculates the value returned to itself through this strategy. This computation is based on the factors characterizing the pursuers. For example, a pursuer () belongs to the coalition (Co); the value of this coalition in relation to this pursuer is calculated as follows:Coef is coefficient of each factor.

On the basis of these values and using IEDS method, our mechanism will be able to select the optimal pursuit coalition for each evader detected as detailed in Section 6.

4. The Iterated Elimination of Dominated Strategies (IEDS)

The coalition is a set of pursuers required to capture the evaders detected. In the coalition each pursuer must correspond to a specific strategy. In our proposal, a pure strategy defines a specific pursuit group’s integration that the pursuer will follow in every possible and attainable situation during the pursuit. Such coalitions may not be random or drawn from a distribution, as in the case of mixed strategies. A strategy dominates another strategy if and only if, for every potential combination of the other players’ actions , is function that returns the results obtained through the application of a specific strategy.

Consider the strategic game shown in Table 1, where the column player has three pure strategies and the row player has only two (a). Knowing that, the values shown in each case represent the expected payoffs returned to the players in the case of selecting the current strategy. Playing the Center is always better than playing the Right side for the column player. Consequently, we can assume he will eventually stop playing Right side because it is a dominated strategy (b). So, we can ignore the Right side column after its elimination. Now row player has a dominated strategy, UP. Eventually row player stops playing UP; then, row-UP gets eliminated (c). Finally, we have two remaining choices Down-Left and Down-Center and column player notices that it can only win by playing Left (d). So, we can deduce that the IEDS solution is (Down, Left) with the following payoff: .

5. Markov Decision Process Principles

Markov Decision Processes (MDPs) provide a mathematical framework to model decision making in situations where outcomes are somewhat random and partially under the control of a decision maker. In cooperative multiagent systems, MDP allows the formalization of sequential decision problems. This process only models the cooperative systems in which the reward function is shared by all players. MDP is defined by as follows: is the number of agents in the system, . corresponds to the set of agents’ states .: defines the set of joint actions of the agents, where is the set of local actions of the agent . is the transition function. It returns the probability meaning that the agent goes into the state if it runs the joint action from state . defines the reward function. represents the reward obtained by the agent when it transits from the state to the state by the execution of the action .

5.1. Reward Function

In MDP problem, the next states selected are the states returning maximum definitive reward. In our proposal, we have used Heuristic functions in order to calculate the immediate reward of each state. The reward function defines the goals that the pursuers have to achieve and identifies the environmental obstacles. To calculate this function, we relied on the agents’ environment position detailed in Section 3, which allows the distribution of the rewards on the environmental cells fairly. The calculation of the rewards in each state concerned is effectuated as follows: is the maximum reward. represents the distance value.

Regarding the distribution of the rewards in the standard cells, we note that the reward function is inversely proportional to the distance function.

Figure 1 illustrates a part of our simulation environment detailed in Section 7. The values displayed in the different cells represent the gains generated by the reward function. The dynamic rewards will be awarded to any pursuer situated in the cell concerned during the pursuit.  : the reward could be obtained if the pursuer concerned tracks the first evader.: the reward could be obtained if the pursuer concerned tracks the second evader. is the index of the cell (occupied or free).

5.2. Transition Function

The transition probabilities () describe the dynamism of the environment. They play the role of the next-state function in a problem-solving search, knowing that every state could be the possible next state according to the action undertaken in the actual state. Our approach is developed in grid of cells environment where each agent can move in four different states: , , , and .

The transition probabilities of the pursuers are based on the reward degree as shown: The linkages between the evader and each pursuer shown in Figure 2 reflect the optimal trajectories provided by the application of the method proposed in this section during each different pursuit step.

6. Coalition Formation Algorithm Based on IEDS

A number of coalition formation algorithms have been developed to define which of the potential coalitions should actually be formed. To do so, they typically compute a value for each coalition, known as the coalition value, which provides an indication of the expected results that could be derived if this coalition is constituted. Then, having calculated all the coalitional values, the decision about the optimal coalition to form can be selected. We employ an iterative algorithm in order to determine the optimal coalitions of agents. It begins with a complete set of coalitions (agent-strategy combinations) and iteratively eliminates the coalitions that have lower contribution values to MAS efficiency. The pseudocode of our algorithm is shown in Algorithm 1.

: The number of pursuers
= indicator of the chase iteration.
Calculate the possible coalitions;
While () do
 Calculate the value of each coalition;
While (number of coalitions > 1) do
   Eliminate the dominated strategy of ;
     ;
end while
 Assign the pursuers’ roles according to the
 Selected coalition;
 Chase iteration;
End while
If (capture = true) then
While ()
   Update ()
   ++;
end while
Else
 The guilty pursuers pay some fines;
end if

First, the algorithm calculates all the possible coalitions () that the pursuers can form before their filtration as needed. The expected number of the possible coalitions to form is calculated according to the following: is the number of pursuers in the environment. is the number of evaders detected. .

In order to distribute the calculation of the possible coalitions among the pursuers, the possible general coalitions () will be calculated. A general coalition enrolls all the pursuers required to capture the set of evaders detected. .

The general coalitions generated will be equitably distributed among the agents playing the role Pursuer. Specifically, each general coalition will be composed of pursuit groups. From each general coalition generated through precedent calculation equation (10), a number of possible coalition formations () will be computed:This decentralized technique aims to balance the computation of the possible coalition formations among the pursuers. Furthermore, this method is more detailed in Section 7 via its application to the case study. Noting that, the value of each coalition generated in relation to each pursuer contained will be calculated according to (5). Each pursuer shares the coalitions calculated with the others to start the coalition selection process. Secondly, we apply the Iterated Elimination of Dominated Strategies principle with the aim of finding the optimal coalition through this process. Knowing that, each strategy is represented by a possible coalition formation. Alternately, each pursuer eliminates the coalition with the lower value in relation to itself and sends the update to the next pursuer concerned. Pursuers are assigned in accordance with the selected coalition. Each pursuer performs only one chase iteration. The algorithm repeats these instructions until the end of the chase life. When = 0 and the captures are accomplished, some rewards will be attributed to each one of the participating pursuers; the rewards are determined as follows: is the number of the coalition’s members.

Otherwise, in the case of capture failure, the guilty pursuers must pay some fines to the rest of the coalition’s members. These fines are calculated as the following manner: is the set of states regarding the guilty pursuer. , where represents the index of coalition’s beginning.

Figure 3 reflects the flow chart of this pursuit algorithm resuming the different steps explained in this section from the detection to the capture of the existing evaders.

7. Simulation Experiments

In order to evaluate the approach presented in this paper, we realize our pursuit-evasion game on an example taking place in a rectangular two-dimensional grid with 100 × 100 cells. Also, we can find some obstacles characterized by the constancy and the solidity. As regards the environmental agents, our simulations are based on ten pursuers and two evaders of type Re = IV. As shown in Figure 4, it is specifically detailed how a pursuer of this type can be captured. Each agent is marked with an ID number. Both pursuers and evaders have a similar speed (one cell per iteration) and an excellent communication system. The pursuers’ teams are totally capable of determining their actual positions, and the evaders disappeared after the capture accomplishment. If the capture of the evader is performed, the coalition created to improve this pursuit will be automatically dissolved.

Table 2 resumes the results obtained after the application of the decentralized computation of the possible coalitions on this case study according to the process explained in Section 6. In this case and according to (10) the possible general coalitions () are equal to 45 coalitions, which will be distributed on the existing pursuers as shown in Table 2. From each general coalition, a number of coalitions will be generated () according to (11).

Moreover, we have studied the number of possible coalitions generated in parallel by the pursuers in relation to the number of the existing pursuers as shown in Figure 5. In relation to the centralized method in which only one pursuer computes the possible coalitions, the decentralized method decreases significantly the time concerning this computation through its division on the number of the existing pursuers.

In order to vary the types of coordination mechanisms used in our simulations, we have seen the usefulness to compare this work with our recent pursuit-evasion research activity based on AGR organizational model [6]. We have also seen the usefulness to compare our results with the results achieved after the application of an auction mechanism illustrated in Case-C- [8]. Noting that, these two methods are based on decentralized coalition formation.Case-A- is pursuit based on (AGR) organizational model [6].Case-B- is our new approach based on the Iterated Elimination of Dominated Strategies (IEDS) principle.Case-C- is a pursuit based on an economical auction mechanism (MPMEGBTBA) [8].The results shown in Figure 6 represent the average capturing time achieved during forty different simulation case studies (episodes) from the beginning to the end of each one. In order to showcase the difference between the different cases, we have seen the usefulness to take into consideration the iteration concept, which determines the number of state changes regarding each agent during the pursuits.

In the first case (AGR), the average capturing time obtained equals 144.225 iterations. Furthermore, we note an interesting decrease until 100.57 iterations after the application of MPMEGBTBA due to the appropriate roles’ attribution provided by this auction mechanism. However, the results that occurred through the application of IEDS coalition formation algorithm revealed an average capturing time of 78 iterations.

Figure 7 shows the development of the pursuers’ reward function during the same pursuit period of the different cases and the outcomes reflect the improvement brought by the dynamic formations and reformations of the pursuit teams.

Finally, we have focused on the study of the average pursuers’ rewards obtained in each case of chase iteration during full pursuit. In Figure 8 the -axis represents the value of rewards achieved by a pursuer and each unit -axis represents chase iterations. The results shown in this figure reveal a certain similarity between AGR and MPMEGBTBA, in which the average pursuer’s rewards achieved reach 0.59 and 0.507, respectively. Otherwise in IEDS, the average result increases until 0.88.

The results shown in Figure 9 represent the internal learning development (self-confidence development) of the pursuers during the pursuit applied to the three cases. The positivity of the results is due to the grouping and the equitable task sharing between the different pursuit groups imposed by the different coordination mechanisms applied. Moreover, we can note the superiority of the results obtained through IEDS in relation to the other cases provoked by the dynamism of the coalition formations and the optimality of task sharing provided by our algorithm.

Table 3 summarizes the main results achieved; we deduce that the pursuit algorithm based on the Iterative Elimination of Dominated Strategies (IEDS) is better than the algorithm based on AGR organizational model as well as the auction mechanism based on MPMEGBTBA regarding the reward’s development as well as the capturing time. The leading cause of this fact is the dynamism of our coalitional groups. This flexible mechanism improves the intelligence of the pursuers concerning the displacements and the rewards acquisition, knowing that team reward is optimal in the case where each pursuer undertakes the best path.

8. Conclusion

This paper presents a kind of a decentralized coalition method based on Game Theory principles for different types of pursuit; the proposed method demonstrates the positive impact imposed by the dynamism of the coalition formations. Firstly, we have extended our coalition algorithm from the Iterated Elimination of Dominated Strategies. This process allows us to determine the optimal pursuit coalition strategy according to the Game Theory principles. Secondly, we have focused on the Markov Decision Process as a motion strategy of our pursuers in the environment (grid of cells). To highlight our proposal we have developed a comparative study between our algorithm and a decentralized strategy of coalition based on AGR organizational model as well as an auction mechanism based on MPMEGBTBA. Simulation results shown in this paper demonstrate that the algorithm based on IEDS is feasible and effective.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This paper is supported by National Natural Science Foundation of China (no. 61375081) and a special fund project of Harbin science and technology innovation talents research (no. RC2013XK010002).