Mathematical Problems in Engineering

Volume 2016, Article ID 5192423, 11 pages

http://dx.doi.org/10.1155/2016/5192423

## A New Decentralized Approach of Multiagent Cooperative Pursuit Based on the Iterated Elimination of Dominated Strategies Model

^{1}Harbin Institute of Technology, Computer Science and Technology, Harbin 150001, China^{2}Department of Computer Science, University of Khenchela, 40000 Khenchela, Algeria

Received 20 June 2016; Revised 7 September 2016; Accepted 25 September 2016

Academic Editor: Vladimir Turetsky

Copyright © 2016 Mohammed El Habib Souidi and Songhao Piao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Game Theory is a promising approach to acquire coalition formations in multiagent systems. This paper is focused on the importance of the distributed computation and the dynamic formation and reformation of pursuit groups in pursuit-evasion problems. In order to address this task, we propose a decentralized coalition formation algorithm based on the Iterated Elimination of Dominated Strategies (IEDS). This Game Theory process is common to solve problems requiring the withdrawal of dominated strategies iteratively. Furthermore, we have used the Markov Decision Process (MDP) principles to control the motion strategy of the agents in the environment. The simulation results demonstrate the feasibility and the validity of the given approach in comparison with different decentralized methods.

#### 1. Introduction

Multiagent System (MAS) is the cooperation of an organized set of intelligent agents situated in an environment in order to coordinate their performances and resolve complex problems. Coalition formation, considered as a major focus in social systems, is an important method of cooperation. In general, coalition formation between the agents is goal-directed and short-lived. Coalitions are formed to achieve a specific objective and dissolve when it is accomplished. Coalition formation has received a considerable amount of attention in recent research [1, 2]. Further, some research activities are based on the notion of Organization that allows the coalition of the agents in the form of groups as well as the cooperation between their members. Consequently, Ferber et al. [3] proposed AALAADIN organizational model based on three principal axes: Agent, Group, and the Role used simultaneously to describe the concrete agents’ organizations (AGR).

Multiagent cooperative pursuit is a known multiagent problem [4, 5]. Based on identical conditions between pursuers and evaders, the pursuit-evasion problem can be classified into single object pursuit and multiobject pursuit. Also, this problem has been considered in many references and its applications were inspired by an equally diverse set of approaches and useful techniques, such as Organization [6], in which we have used the principles of AGR organizational model to propose a pursuit coalition formation algorithm. Also, in order to equip each pursuit group with a dynamic access mechanism, we have introduced a flexible organizational model extended from AGR through the application of fuzzy logic principles that determines the membership degree of each pursuer in relation to each group [7].

Furthermore, Cai et al. introduced an economical auction mechanism (MPMEGBTBA) [8], where an advanced task negotiation process based on Task Bundle Auction was proposed in order to allocate the task dynamically through dynamic coalition formation of multiple agents. Also, we can find several researches treating the pursuit problem through different principles as Graph Theory [9], Polygonal Environment [10, 11], Data Mining [12], and Reinforcement Learning [13]. In this kind of problem, the pursuers and evaders are presented in different kinds of types. The type of a pursuer denotes its pursuit capacity. Otherwise, the type of an evader reflects the number and type of pursuers required to perform its capture. The value of an evader indicates the expected rewards that should be returned to the relevant pursuers after the achievement of the capture.

Game Theory can be considered as the simplest way to model situations of conflict and studies the interactions between interested agents. The classic question relating game theory to multiagent systems is “what is the best action that an agent can perform?” This principle has been widely used in multiagent pursuit problems [14–17]. The negotiation based on Game Theory is focused on the value and rewards of each agent, which appropriately reflect the objective of agent’s negotiation (satisfying the goal of each agent). The main advantage provided by Game Theory algorithms in MAS is the coordination appearing through the hypothesis of mutual rationality of the agents. Therefore, The Game Theory algorithms are used to coordinate autonomous rational agents without the use of coordination mechanism explicitly integrated in the model of agents. Also, they provide different methods that define the optimal agents’ coalitions in several types of problems. Otherwise, the disadvantages of these algorithms concern the agents which are frequently considered as perfect rationales. Moreover, Game Theory algorithms are focused on the value of the optimal solution and overlook the most efficient method to achieve it.

In this paper we have focused on the Iterated Elimination of Dominated Strategies (IEDS), Game Theory technique to propose coalition formation algorithm as part of the pursuit-evasion problems. A strategy is the complete specification of the agent’s behavior in any situation (in the case of an extensive form game it means what behavior the agent must undertake according to the set of information provided). Moreover, we have used Markov Decision Process (MDP) principles in order to control the motion strategy of each agent. This process (MDP) provides formalism of model and resolves planning and learning problems under uncertainty [18, 19].

The paper is organized as follows: in Section 2, we discuss the main related works based on the same principles used in this paper. In Section 3, we focus on the pursuit-evasion problem by submitting a detailed explanation of the simulation environment and its different contents. Also, we approach the environmental agents by the definition and the clarification of the principal characteristics of pursuers and evaders. In Section 4, the Iterated Elimination of Dominated Strategies principle is described and detailed by an application’s example of this Game Theory process. In Section 5, the basic principles of Markov Decision Process are motivated. Therefore, we clarify the principles of the primary functions better known as the reward and transition functions. In Section 6, we introduce the distributed coalition formation algorithm with a detailed clarification of the coalition progress. A simulation of the pursuit-evasion game example is shown in Section 7; in this part we describe our simulation environment in a specific manner. Also, we present the results achieved in comparison with other outcomes based on other theories. Finally, Section 8 contains concluding remarks about this paper.

#### 2. Related Work

There exist a lot of works based on game theoretic principles regarding the PE problem such as in [14] where the author described the control of an autonomous agents’ team tracking an intelligent evader in a nonaccurately mapped terrain based on a method to calculate the Nash equilibrium policies by resolving an equivalent zero-sum matrix game. In this example, among all Nash equilibrium, the evader selects the one which optimizes its deterministic distance to the pursuers’ team. In order to resolve the problems often encountered in the algorithms of pursuit-evasion games such as computational complexity and the lack of universality, Dong et al. [15] propose a hybrid algorithm founded on improved dynamic artificial potential field and differential game, where Nash equilibrium solution is optimal for both pursuer and evader in barrier-free zone in pursuit-evasion game, and in accordance with environment changes around the pursuit elements the algorithm is applied with flexibility. Moreover in [16], Amigoni and Basilico have presented an approach to calculate the optimal pursuer’s strategy that maximizes the probability of the target’s capture in a given environment. This approach is based on the definition of the pursuit-evasion game theoretic model as well as on its resolution through the mathematical programming.

More recently, Lin et al. [20] proposed a pursuit-evasion differential game based on Nash strategies involving limited observations. On the one hand, the evader undertakes the standard feedback Nash strategy. On the other hand, the pursuers undertake Nash strategies based on the novel concept of best achievable performance indices proposed. This model has potential applications in cases where several weakly equipped pursuing vehicles are tracking well-equipped unmanned vehicle.

In relation to PE, MDP is usually used to provide the motion planning for the mobile pursuers through the maximization of the rewards obtained during the pursuit. In [21], a Partially Observable Markov Decision Process (POMDP) algorithm is used to search the mobile target in a known graph. The main objective is to ensure the capture of the targets via the clearing of the graph in minimal time. In [22], the authors propose a new approach Continuous Time Markov Decision Process (CTMDP) to address the PE problem. In relation to MDP, CTMDP takes into account the impact of the transition time between the states involving a strong robustness against changes in transition probability. In [23] the authors proposed an innovative approach totally based on MDP with the aim of resolving sequential multiagent decision problems by allowing agents to reason explicitly about specific coordination mechanisms. In other words, they determined a value iteration algorithm to compute the optimal policies that recognizes and reasons about Coordination Problems.

Furthermore, we can consider other works based on MDP and IEDS such as [24], in which an exact dynamic programming algorithm for partially observable stochastic games (POSGs) is developed. Also, it is proven that the algorithm iteratively eliminates very weakly dominated strategies without first forming a normal form representation of the game when it is applied to finite-horizon POSGs. Otherwise, several types of coordination mechanisms are currently used such as Stochastic Clustering Auctions (SCAs) [25, 26] which represent a class of cooperative auction methods based on the modified Swendsen-Wang method. It permits each robot to reconstitute the tasks that have been linked and applies to heterogeneous teams. Other mechanisms are market-based as TraderBots [27] applied on greedy agents in order to provide a detailed analysis of the requirements for robust and efficient multirobot coordination in dynamic environments. From the point of view of Game Theory, some research activities [28] investigated the optimal coordination approach for multiagent foraging, indeed, they built the equivalence between the optimal solution of MAS and the equilibrium of the game according to the same case, and then they introduced evolutionarily stable strategy to be of service in resolving the equilibrium selection problem of traditional Game Theory.

#### 3. Problem Description

In this section, we focus on the cooperation problem in which pursuers situated in a limitary toroidal grid environment have to capture evaders of different types. The expressions of and represent the collection of pursuers and evaders, respectively. Pursuers and evaders represent the roles that the agents can play. Each evader is characterized by a type , with to indicate how many pursuers are required to capture it. Here we suppose that the pursuers can evaluate the evaders’ types after the localization. There exist some fixed obstacles with different shapes and sizes in the environment . The position could be destined for the mapping : , such as : , well then is an obstacle.In our proposal, the strategies of each pursuer are guided by determining factors that reflect the individual development of the pursuer during the execution of the assigned tasks. These factors are detailed as follows.

*Self-Confidence Degree*. In multiagent systems, each agent must be able to execute the services requested by the other agents. The self-confidence degree is the assessment of the agent’s success in relation to the assigned tasks. It is denoted and computed in the following way: is the number of tasks that the agent has accomplished. is the number of tasks in which the agent has participated.

*The Credit*. In the case where the agent cannot perform a task, then its credit will be affected. The credit of an agent is designated and calculated as follows: is the number of the abandoned tasks by the agent.

*Environment Position*. The position of the agent in the environment is a crucial criterion for the pursuit sequences, because the capture will be easier if the pursuer is closer to the evader. The position Pos is computed as follows: is the state (cell) of the pursuer. is the state (cell) of the evader. Dist is the distance between the pursuer and the evader.) is the Cartesian coordinates of the pursuer. ) is the Cartesian coordinates of the evader.

In order to distinguish the different coalitions, each pursuer belonging to a coalition calculates the value returned to itself through this strategy. This computation is based on the factors characterizing the pursuers. For example, a pursuer () belongs to the coalition (Co); the value of this coalition in relation to this pursuer is calculated as follows:Coef is coefficient of each factor.

On the basis of these values and using IEDS method, our mechanism will be able to select the optimal pursuit coalition for each evader detected as detailed in Section 6.

#### 4. The Iterated Elimination of Dominated Strategies (IEDS)

The coalition is a set of pursuers required to capture the evaders detected. In the coalition each pursuer must correspond to a specific strategy. In our proposal, a pure strategy defines a specific pursuit group’s integration that the pursuer will follow in every possible and attainable situation during the pursuit. Such coalitions may not be random or drawn from a distribution, as in the case of mixed strategies. A strategy dominates another strategy if and only if, for every potential combination of the other players’ actions , is function that returns the results obtained through the application of a specific strategy.

Consider the strategic game shown in Table 1, where the column player has three pure strategies and the row player has only two (a). Knowing that, the values shown in each case represent the expected payoffs returned to the players in the case of selecting the current strategy. Playing the Center is always better than playing the Right side for the column player. Consequently, we can assume he will eventually stop playing Right side because it is a dominated strategy (b). So, we can ignore the Right side column after its elimination. Now row player has a dominated strategy, UP. Eventually row player stops playing UP; then, row-UP gets eliminated (c). Finally, we have two remaining choices Down-Left and Down-Center and column player notices that it can only win by playing Left (d). So, we can deduce that the IEDS solution is (Down, Left) with the following payoff: .