Abstract

This paper demonstrates an attempt to incorporate a simple and generic constraint handling technique to the Probability Collectives (PC) approach for solving constrained optimization problems. The approach of PC optimizes any complex system by decomposing it into smaller subsystems and further treats them in a distributed and decentralized way. These subsystems can be viewed as a Multi-Agent System with rational and self-interested agents optimizing their local goals. However, as there is no inherent constraint handling capability in the PC approach, a real challenge is to take into account constraints and at the same time make the agents work collectively avoiding the tragedy of commons to optimize the global/system objective. At the core of the PC optimization methodology are the concepts of Deterministic Annealing in Statistical Physics, Game Theory and Nash Equilibrium. Moreover, a rule-based procedure is incorporated to handle solutions based on the number of constraints violated and drive the convergence towards feasibility. Two specially developed cases of the Circle Packing Problem with known solutions are solved and the true optimum results are obtained at reasonable computational costs. The proposed algorithm is shown to be sufficiently robust, and strengths and weaknesses of the methodology are also discussed.

1. Introduction

The growing complexity and uncertainty of the problem domain motivated some researchers to resort to a distributed and decentralized optimization approach [15]. Such an approach decomposes an entire system into smaller subsystems for individual optimization to reach the system level optimum. These subsystems together can be viewed as a collective, which in other words is a group of learning agents or a multiagent system (MAS). In a distributed MAS, the rational and self-interested behavior of the agents is very important to achieve the best possible local goal/reward/payoff, but it is not trivial to make such agents work collectively to achieve the best possible global or system objective. Moreover, in the context of distributed MAS, the concept of “tragedy of commons” certainly becomes important and requires special attention. It refers to the situation in which the rational and self-interested independent individuals deplete the shared limited resource in a greedy way, even if it is well understood that it may not be beneficial for long-term interest collectively for all; that is, an individual may receive a benefit but on the other hand the loss will be shared among all [6]. In a distributed MAS, if every rational and self-interested agent tries to achieve the individual goal in a greedy way, it may lead to poor system performance [7]. On the other hand, in order to achieve the true global optimum, not every individual agent can receive the best it could have. To achieve the best system objective in a distributed MAS, the tragedy of commons should be avoided. In addition, similar to conventional (centralized) optimization approaches, the problem becomes harder when constraints are involved and thus constraint handling remains a key issue to be addressed [810].

An emerging artificial intelligence tool in the framework of collective intelligence (COIN) for modeling and controlling distributed MAS referred to as probability collectives (PC) was first proposed by Dr. David Wolpert in 1999 in a technical report presented to NASA [11]. It is inspired from a sociophysics viewpoint with deep connections to game theory, statistical physics, and optimization [2, 12]. From another viewpoint, the method of PC theory is an efficient way of sampling the joint probability space, converting the problem into the convex space of probability distribution. PC considers the variables in the system as individual agents/players in a game being played iteratively [3, 13, 14]. Unlike stochastic approaches such as genetic algorithm (GA), swarm optimization or simulated annealing (SA), rather than deciding on the agent’s moves/set of actions, PC allocates the probability values for selecting each of the agent’s moves. At each iteration, every agent independently updates its own probability distribution over a strategy set which is the set of moves/actions affecting its local goal which in turn also affects the global or system objective [2]. The process continues and reaches equilibrium when no further increase in reward is possible for the individual agent by changing its actions further. This equilibrium concept is referred to as Nash equilibrium [15]. The concept is successfully formalized and implemented through the PC methodology. The approach works on probability distributions, directly incorporating uncertainty, and is based on the prior knowledge of actions/strategies of all the other agents. The approach of PC has been implemented for solving both unconstrained [4, 5, 7, 1624] as well as constrained [1, 13, 14, 2528] optimization problems. The associated literature is discussed in the following few paragraphs.

It was demonstrated in [29] optimizing Schaffer’s function that the search process in PC is more robust/reproducible as compared to GA. In addition, PC also outperformed GA in the rate of descent, trapping in false minima and long-term optimization when tested and compared for the multimodality, nonlinearity, and nonseparability in solving other benchmark problems such as Schaffer’s function, Rosenbrock function Ackley Path function, and Michalewicz epistatic function. Some of the fundamental differences between GA and PC were also discussed in [16]. At the core of the GA optimization algorithm is the population of solutions. In every iteration, each individual solution from the population is tested for its fitness to the problem at hand [16] and the population is updated accordingly. GA plots the best-so-far curve showing the fitness of the best individual in the last preset generations. In PC, on the other hand, the probability distribution of the possible solutions is updated iteratively. After a predefined number of iterations, the probability distribution of the available strategies across the variable space is plotted in PC optimizing an associated homotopy function. It also directly incorporates uncertainty due to both imperfect sampling and the stochastic independence of agents’ actions [16]. The above comparison with GA indicated that PC can potentially be applied to wide application areas.

The superiority of the decentralized PC architecture over a centralized one was underlined in [7] solving the 8-queens problem. Both approaches differ from each other because of the distributed sample generation and updating of the probabilities in the former approach. In addition, PC was also compared with the backtracking algorithm referred to as asynchronous distributed optimization (ADOPT) [30]. Although the ADOPT algorithm is a distributed approach, the communication and computational load was not equally distributed among the agents. It was also demonstrated that although ADOPT was guaranteed to find the solution in each run, communication and computations required were more than for the same problem solved using PC.

The approach of PC was successfully applied solving the complex combinatorial optimization problem of airplane fleet assignment having the goal of minimization of the number of flights with 129 variables and 184 constraints. Applying a centralized approach to this problem may increase the communication and computational load. Furthermore, it may add latency in the system and result in the growing possibility of conflict in schedules and continuity. Using PC, the goal was collectively achieved exploiting the advantages of a distributed and decentralized approach by the airplanes selecting their own schedules depending upon the individual payoffs for the possible routes [13]. The approach of PC was also successfully applied solving combinatorial optimization problems such as the joint optimization of the routing and resource allocation in wireless networks [1723].

Two different PC approaches were proposed in [25] avoiding airplanes collision. In the first approach, every airplane was assumed to be an autonomous agent. These agents selected their individual paths and avoided collision with other airplanes traveling in the neighborhood. In order to implement this approach, a complex negotiation mechanism was required for the airplanes to communicate and cooperate with one another. In the semicentralized approach, every airplane was given a chance to become a host airplane which computed and distributed the solution to all other airplanes. It is important to mention that the host airplane computed the solution based on the independent solution shared by previous host airplane. This process continued in a sequence until all the airplanes selected their individual paths. Both approaches were validated solving an interesting airplane conflict problem in which the airplanes were equidistantly arranged on the periphery of a circle. The targets of the individual airplanes were set as the opposite points on the periphery of the circle setting the center point of the circle as a point of conflict. In both approaches, the collision avoidance constraints were incorporated using a penalty function approach.

A variation of the original PC approach in [3, 1214] referred to as sequentially updated PC (SPC) was proposed in [24]. The variation was achieved by changing the sampling criterion and the method for estimating the sampling space in every iteration. The SPC was tested by optimizing the unconstrained Hartman’s functions and the vehicle target assignment type of game. The SPC performed better with higher dimension Hartman’s functions only but failed to converge in the target assignment game.

The sampling method as well as associated sampling space updating scheme of the original PC approach was modified by the authors of this paper. The modified PC approach was validated by successfully optimizing the Rosenbrock function [5]. It was also applied for solving two test cases of the NP-hard combinatorial problem of Multidepot multiple travelling salesmen problem (MDMTSP) [1] as well as the cases of single-depot MTSP (SDMTSP) [26]. In solving the MDMTSP and SDMTSP, in order to handle constraints, several heuristic techniques were successfully incorporated. In addition, a constrained PC approach using penalty function method successfully solving three test problems was proposed in [27].

The potential of PC in mechanical design was demonstrated for optimizing the cross-sections of individual bars and segments of a 10 bar truss [14] and a segmented beam [4], respectively. The 10 bar truss problem in [14] was solved as a discrete constrained problem while the segmented beam problem in [4] was solved as a continuous unconstrained problem. In [14], the solution was feasible but was worse than those obtained by other methods [3133]. The approach of PC [13, 14] was also tested on the discrete constrained problem of university course scheduling [28], but the implementation failed to generate any feasible solution.

The above discussion shows that PC is versatile and applicable to variegated areas including constrained optimization problems such as fleet assignment [13], ten bar truss problem [14], MDMTSP [1], SDMTSP [26], university course scheduling [28], and so forth. It is important to note that in [13, 14, 28] the approach of incorporating constraints into PC algorithms was not explicitly mentioned and demonstrated. Furthermore, in [1, 26], a repair approach pushing the solution into the feasible region was implemented. Such an approach may not be usable for handling generic constraints. If complexity of the problem and related constraints increase, the repair work may become more tedious and may add further computational load, limiting the use of the repair approach to smaller-size problems with fewer constraints. In addition, although a constrained PC approach was implemented in [25] as well as by the authors of this paper in [27] using penalty function method, it was noticed that the approach was sensitive to the choice of penalty parameter and its updating scheme, which required several associated preliminary trials.

This paper demonstrates an attempt to develop a generic constraint handling technique for PC in order to make it an even more versatile optimization algorithm. A variation of the feasibility-based rule originally proposed in [34] and further implemented in [3540] was employed solving two cases of the circle packing problem (CPP). Furthermore, similar to [3440] where additional techniques were implemented to avoid premature convergence, a perturbation approach was incorporated. In addition, attaining the true optimum solution to CPP using PC clearly demonstrated its ability to avoid the tragedy of commons.

The remainder of this paper is organized as follows. Section 2 discusses various prominent characteristics of the PC method highlighting its competence over other algorithms optimizing collectives. The framework and detailed formulation of the constrained PC method is presented in Section 3. It includes the formulation of homotopy function, constraint handling technique using the feasibility-based rule, and the concept of Nash equilibrium. In Section 4, the validation of the constrained PC approach is shown by solving two test cases of the CPP. It also includes the associated problem specific heuristic technique. The evident features, advantages, and some limitations of the constrained PC approach are discussed in Section 5. Finally, the concluding remarks along with the future directions are presented in Section 6. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) scheme minimizing the homotopy function is discussed in the appendix provided at the end of the paper.

2. Characteristics of PC

The PC approach has the following key characteristics that make it a competitive choice over other algorithms for optimizing collectives.(1)PC is a distributed solution approach in which each agent independently updates its probability distribution at any time instance and can be applied to continuous, discrete, or mixed variables, and so forth, [2, 11, 13]. Since the probability distribution of the strategy set is always a vector of real numbers regardless of the type of variable under consideration, conventional techniques of optimization for Euclidean vectors, such as gradient descent, can be exploited.(2)It is robust in the sense that the cost function (global/system objective) can be irregular or noisy; that is, it can accommodate noisy and poorly modeled problems [2, 7].(3)The failed agent can just be considered as one that does not update its probability distribution, without affecting the other agents. On the other hand, it may severely hamper the performance of other techniques [7].(4)It provides the sensitivity information about the problem in the sense that a variable with a peaky distribution (having highest probability value) is more important in the solution than a variable with a broad distribution; that is, peaky distribution provides the best choice of action that can optimize the global goal [2].(5)The formation of the homotopy function for each agent (variable) helps the algorithm to jump out of the possible local minima and further reach the global minima [13, 26].(6)It can successfully avoid the tragedy of commons, skipping the local minima and further reach the true global minima [7].(7)The computational and communication load is marginally less and equally distributed among all the agents [7].(8)It can efficiently handle problems with a large number of variables [13].

With PC solving optimization problems as a MAS, it is worth discussing some of its characteristics to compare the similarities and differences with multiagent reinforcement learning (MARL) methods. Most MARL methods such as fully cooperative, fully competitive, and mixed (neither cooperative nor competitive) are based on game theory, optimization and evolutionary computation [41]. According to [41], most of these types of methods possess less scalability and are sensitive to imperfect observations. Any uncertainty or incomplete information may lead to unexpected behavior of the agents. However, the scalability of the fully cooperative methods such as coordination-free methods can be enhanced by explicitly using the communication and/or uncertainty techniques [4144]. On the other hand, PC is scalable and can handle uncertainty in terms of probability. Moreover, the random strategies selected by any agent can be coordinated or negotiated with the other agents based on the social conventions, right to communication, and so forth. This social aspect makes PC a cooperative approach. Furthermore, indirect coordination-based methods work on the concept of biasing the selection towards the likelihood of the good strategies. This concept is similar to the one used in the PC algorithm presented here, in which agents choose the strategy sets only in the neighborhood of the best strategy identified in the previous iteration. In the case of mixed MARL algorithms, the agents have no constraints imposed on their rewards. It is similar to the PC algorithm in which the agents respond or select the strategies and exhibit self-interested behavior. However, the mixed MARL algorithms may encounter multiple Nash equilibria while in PC a unique Nash equilibrium can be achieved.

3. Conceptual Framework of Constrained PC

PC treats the variables in an optimization problem as individual self-interested learning agents/players of a game being played iteratively [13]. While working in some definite direction, these agents select actions over a particular interval and receive some local rewards on the basis of the system objective achieved because of those actions. In other words, these agents optimize their local rewards or payoffs, which also optimize the system level performance. The process iterates and reaches equilibrium (referred to as Nash equilibrium) when no further increase in the reward is possible for the individual agent through changing its actions further. Moreover, the method of PC theory is an efficient way of sampling the joint probability space, converting the problem into the convex space of probability distribution. PC allocates probability values to each agent’s moves and hence directly incorporates uncertainty. This is based on prior knowledge of the recent action or behavior selected by all other agents. In short, the agents in the PC framework need to have knowledge of the environment along with every other agent’s recent action or behavior.

In every iteration, each agent randomly samples from within its own strategy set as well as from within other agents’ strategy sets and computes the corresponding system objectives. The other agents’ strategy sets are modeled by each agent based on their recent actions or behavior only, that is, based on partial knowledge. By minimizing the collection of system objectives, every agent identifies the possible strategy which contributes the most towards the minimization of the collection of system objectives. Such a collection of functions is computationally expensive to minimize and also may lead to local minima [3]. In order to avoid this difficulty, the collection of system objectives is deformed into another topological space forming the homotopy function parameterized by computational temperature 𝑇 [4548]. Due to its analogy to Helmholtz free energy [12, 4549], the approach of deterministic annealing (DA) converting the discrete variable space into continuous variable space of probability distribution is applied in minimizing the homotopy function. At every successive temperature drop, the minimization of the homotopy function is carried out using a second-order optimization scheme such as the Nearest Newton Descent Scheme [25, 7, 1114, 1629] or BFGS Scheme, and so forth.

At the end of every iteration, each agent 𝑖 converges to a probability distribution clearly distinguishing the contribution of its every corresponding strategy value. For every agent, the strategy value with the maximum probability value is referred to as the favorable strategy and is used to compute the system objective and corresponding constraint functions. This system objective and corresponding strategy values are accepted based on a variation of the feasibility-based rule defined in [34] and further successfully implemented in [3540]. This rule allows the objective function and the constraint information to be considered separately. The rule can be described as follows:(a)any feasible solution is preferred over any infeasible solution;(b)between two feasible solutions, the one with better objective is preferred;(c)between two infeasible solutions, the one with fewer constraint violations is preferred.

In addition to the above, a perturbation approach is also incorporated to avoid premature convergence. It perturbs the individual agent’s favorable strategy set based on its reciprocal and associated predefined interval. The solution is accepted if the feasibility is maintained. In this way, the algorithm continues until convergence by selecting the samples from the neighborhood of the recent favorable strategies. The neighborhood space is reduced or expanded according to the improvement in the system objective for a predefined number of iterations.

In some of the applications, the agents are also needed to provide the knowledge of the interagent relationship. It is one of the information/strategy sets which every other entitled agent is supposed to know. There is also global information that every agent is supposed to know. This allows agents to know the right to model other agents’ actions or behavior. All of the decisions are taken autonomously by each agent considering the available information in order to optimize the local goals and hence to achieve the optimum global goal or system objective. The following section discusses the constrained PC procedure in detail.

3.1. Constrained PC Algorithm

Consider a general constrained problem (in the minimization sense) as follows:Minimize𝐺Subjectto𝑔𝑗0,𝑗=1,2,,𝑠𝑗=0,𝑗=1,2,,𝑤.(1)

According to [810], the equality constraint j=0 can be transformed into a pair of inequality constraints using a tolerance value 𝛿 as follows:j𝑔=0𝑠+𝑗=𝑗𝑔𝛿0𝑗=1,2,,𝑤,𝑠+𝑤+𝑗=𝛿𝑗0.(2)

Thus, 𝑤 equality constraints are replaced by 2𝑤 inequality constraints with the total number of constraints given by 𝑡=𝑠+2𝑤. Then, a generalized representation of the problem in (1) can be stated as follows:Minimize𝐺Subjectto𝑔𝑗0,𝑗=1,2,,𝑡.(3)

In the context of PC, the variables of the problem are considered as computational agents/players of a social game being played iteratively [3, 11]. Each agent 𝑖 is given a predefined sampling interval referred to as Ψ𝑖[Ψlower𝑖,Ψupper𝑖]. As a general case, the interval can also be referred to as the sampling space. The lower limit Ψlower𝑖 and upper limit Ψupper𝑖 of the interval Ψ𝑖 may be updated iteratively as the algorithm progresses.

Each agent 𝑖 randomly samples 𝑋𝑖[𝑟], 𝑟=1,2,,𝑚𝑖 strategies from within the corresponding sampling interval Ψ𝑖 forming a strategy set 𝐗𝑖 represented as𝐗𝑖=𝑋𝑖[1],𝑋𝑖[2],𝑋𝑖[3],,𝑋[𝑚𝑖]𝑖,𝑖=1,2,,𝑁.(4)

Every agent is assumed to have an equal number of strategies; that is, 𝑚1=𝑚2==𝑚𝑖==𝑚𝑁1=𝑚𝑁. The procedure of modified PC theory is explained below in detail with the algorithm flowchart in Figure 1.

The procedure begins with the initialization of the sampling interval Ψ𝑖 for each agent 𝑖, temperature 𝑇0 or 𝑇=𝑇initial or 𝑇 (simply high enough), the temperature step size 𝛼𝑇 (0<𝛼𝑇1), convergence parameter 𝜀=0.0001, algorithm iteration counter 𝑛=1, and number of test iterations 𝑛test. The value of 𝛼𝑇 and 𝑛test are chosen based on preliminary trials of the algorithm. Furthermore, the constraint violation tolerance 𝜇 is initialized to the number of constraints |𝐂|; that is, 𝜇=|𝐂|, where |𝐂| refers to the cardinality of the constraint vector 𝐂=[𝑔1,𝑔2,,𝑔𝑡].

Step 1. Agent 𝑖 selects its first strategy 𝑋𝑖[1] and samples randomly from other agents’ strategies as well. This is a random guess by agent 𝑖 about which strategies have been chosen by the other agents. This forms a “combined strategy set” 𝐘𝑖[1] given by 𝐘𝑖[1]=𝑋1[?],𝑋2[?],,𝑋𝑖[1],,𝑋[?]𝑁1,𝑋𝑁[?].(5) The superscript [?] indicates that it is a “random guess” and not known in advance. In addition, agent 𝑖 forms one combined strategy set for every strategy 𝑟 of its strategy set 𝐗𝑖, as shown below:

𝐘𝑖[2]=𝑋1[?],𝑋2[?],,𝑋𝑖[2],,𝑋[?]𝑁1,𝑋𝑁[?],𝐘𝑖[3]=𝑋1[?],𝑋2[?],,𝑋𝑖[3],,𝑋[?]𝑁1,𝑋𝑁[?],𝐘𝑖[𝑟]=𝑋1[?],𝑋2[?],,𝑋𝑖[𝑟],,𝑋[?]𝑁1,𝑋𝑁[?],𝐘[𝑚𝑖]𝑖=𝑋1[?],𝑋2[?],,𝑋[𝑚𝑖]𝑖,,𝑋[?]𝑁1,𝑋𝑁[?].(6) Similarly, all the remaining agents form their combined strategy sets.

Furthermore, every agent 𝑖 computes 𝑚𝑖 associated objective function values as follows:𝐺𝐘𝑖[1]𝐘,𝐺𝑖[2]𝐘,,𝐺𝑖[𝑟]𝐘,,𝐺[𝑚𝑖]𝑖.(7)

The ultimate goal of every agent 𝑖 is to identify its strategy value which contributes the most towards the minimization of the sum of these system objective values; that is, 𝑚𝑖𝑟=1𝐺(𝐘𝑖[𝑟]), hereafter referred to as the collection of system objectives.

Step 2. The minimum of the function 𝑚𝑖𝑟=1𝐺(𝐘𝑖[𝑟]) is very hard to achieve as the function may have many possible local minima. Moreover, directly minimizing this function is quite cumbersome as it may need excessive computational effort [3]. One of the ways to deal with this difficulty is to deform the function into another topological space by constructing a related and “easier” function 𝑓(𝐗𝑖). Such a method is referred to as the homotopy method [4548]. The function 𝑓(𝐗𝑖) can be referred to as “easier” because it is easy to compute; the (global) minimum of such a function is known and easy to locate [4547]. The deformed function can also be referred to as homotopy function 𝐽 parameterized by computational temperature 𝑇 represented as follows: 𝐽𝑖𝐗𝑖=,𝑇𝑚𝑖𝑟=1𝐺𝐘𝑖[𝑟]𝐗+𝑇𝑓𝑖[,𝑇0,).(8) For further simplicity and understanding the above homotopy function, 𝐽𝑖(𝐗𝑖,𝑇) can be rewritten as 𝐽𝑖𝐗𝑖=,𝑇𝑚𝑖𝑟=1𝐺𝐘𝑖[𝑟]𝑇𝑓𝐗𝑖[,𝑇0,).(9) The approach of deterministic annealing (DA) is applied to minimize the homotopy function in (9). The motivation behind this is its analogy to the Helmholtz free energy [26, 27]. It suggests the conversion of discrete variables into random real valued variables such as probabilities. This converts the original collection of system objectives 𝑚𝑖𝑟=1𝐺(𝐘𝑖[𝑟]) into the “expected collection of system objectives” 𝑚𝑖𝑟=1𝐸(𝐺(𝐘𝑖[𝑟])). Furthermore, a suitable function for 𝑓(𝐗𝑖) is chosen. The general choice is to use the entropy function 𝑆𝑖=𝑚𝑖𝑟=1𝑞(𝑋𝑖[𝑟])log2𝑞(𝑋𝑖[𝑟]) [4446]. The following steps of DA are formulated based on the analogy of the homotopy function to the Helmholtz free energy discussed in [12, 26, 27].
(a)Agent 𝑖 assigns uniform probabilities to its strategies. This is because, at the beginning, the least information is available (the largest uncertainty and highest entropy) about which strategy is favorable for the minimization of the collection of system objectives 𝑚𝑖𝑟=1G(𝐘𝑖[𝑟]). Therefore, at the beginning of the “game,” each agent’s every strategy has probability 1/𝑚𝑖 of being most favorable. Therefore, probability of strategy 𝑟 of agent 𝑖 is 𝑞𝑋𝑖[𝑟]=1𝑚𝑖,𝑟=1,2,,𝑚𝑖.(10) Each agent 𝑖, from its every combined strategy set 𝐘𝑖[𝑟] and corresponding system objective 𝐺(𝐘𝑖[𝑟]) computed previously, further computes 𝑚𝑖 corresponding expected system objective values 𝐸(𝐺(𝐘𝑖[𝑟])) as follows [2, 3, 1114]: 𝐸𝐺𝐘𝑖[𝑟]𝐘=𝐺𝑖[𝑟]𝑞𝑋𝑖[𝑟](𝑖)𝑞𝑋[?](𝑖),(11) where (𝑖) represents every agent other than 𝑖. Every agent 𝑖 then computes the expected collection of system objectives denoted by 𝑚𝑖𝑟=1𝐸(𝐺(𝐘𝑖[𝑟])). This also means that the PC approach can convert any discrete variables into continuous variable values in the form of probabilities corresponding to these discrete variables. As mentioned earlier, the problem now becomes continuous but still not easier to solve.(b)Thus, the homotopy function to be minimized by each agent 𝑖 in (9) is modified as follows: 𝐽𝑖𝑞𝐗𝑖=,𝑇𝑚𝑖𝑟=1𝐸𝐺𝐘𝑖[𝑟]𝑇𝑆𝑖=𝑚𝑖𝑟=1𝐺𝐘𝑖[𝑟]𝑞𝑋𝑖[𝑟](𝑖)𝑞𝑋[?](𝑖)𝑇𝑚𝑖𝑟=1𝑞𝑋𝑖[𝑟]log2𝑞𝑋𝑖[𝑟]𝐘=𝐺𝑖[1]𝑞𝑋𝑖[1](𝑖)𝑞𝑋[?](𝑖)𝑌+𝐺𝑖[2]𝑞𝑋𝑖[2](𝑖)𝑞𝑋[?](𝑖)𝐘++𝐺[𝑚𝑖𝑖1]𝑞𝑋[𝑚𝑖𝑖1](𝑖)𝑞𝑋[?](𝑖)𝑌+𝐺[𝑚𝑖]𝑖𝑞𝑋[𝑚𝑖]𝑖(𝑖)𝑞𝑋[?](𝑖)𝑇𝑚𝑖𝑟=1𝑞𝑋𝑖[𝑟]log2𝑞𝑋𝑖[𝑟],(12) where 𝑇[0,). When the temperature 𝑇 is high enough, the entropy term dominates the expected collection of system objectives and the problem becomes very easy to be solved.

Step 3. In the author’s previous work [1, 4, 5, 26], Nearest Newton Descent Scheme [3] was implemented for minimizing the homotopy function 𝐽𝑖(𝑞(𝐗𝑖),𝑇). Motivated from this scheme [3], the minimization of the homotopy function 𝐽𝑖(𝑞(𝐗𝑖),𝑇) given in (12) is carried out using a suitable second-order optimization technique such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) scheme [50, 51]. It is important to mention that, similar to the Nearest Newton Descent Scheme, the BFGS scheme approximates positive definite Hessian. The BFGS scheme minimizing (12) is discussed in the appendix.

Step 4. For each agent 𝑖, the optimization process converges to a probability variable vector 𝑞(𝐗𝑖) which can be seen as the individual agent’s probability distribution distinguishing every strategy’s contribution towards the minimization of the expected collection of system objectives 𝑚𝑖𝑟=1𝐸(𝐺(𝐘𝑖[𝑟])). In other words, for every agent 𝑖, if strategy 𝑟 contributes the most towards the minimization of the objective compared to other strategies, its corresponding probability certainly increases by some amount more than those for the other strategies’ probability values, and so strategy 𝑟 is distinguished from the other strategies. Such a strategy is referred to as a favorable strategy 𝑋[fav]𝑖. As an illustration, the converged probability distribution for agent 𝑖 may look like that shown in Figure 2 for a case where there are 10 strategies, that is, 𝑚𝑖=10.

Compute the corresponding system objective 𝐺(𝐘[fav]) and constraint vector 𝐂(𝐘[fav]) where 𝐘[fav] is given by𝐘[fav]=𝑋[fav]1,𝑋[fav]2,,𝑋[fav]𝑁1,𝑋[fav]𝑁.(13)

Step 5. Accept the system objective 𝐺(𝐘[fav]) and corresponding 𝐘[fav] as current solution if the number of constraints violated 𝐶violated𝜇. Update the constraint violation tolerance 𝜇=𝐶violated and continue to Step 6.

If Cviolated>𝜇, then discard current system objective 𝐺(𝐘[fav]) and corresponding 𝐘[fav], and retain the previous iteration solution and continue to Step 6.

If the current system objective 𝐺(𝐘[fav]) is feasible, that is, 𝜇=𝐶violated=0 and is not worse than the previous feasible solution, accept the current system objective 𝐺(𝐘[fav]) and corresponding 𝐘[fav] as current solution and continue to Step 6; else discard current feasible system objective 𝐺(𝐘[fav]) and corresponding 𝐘[fav], and retain the previous iteration feasible solution and continue to Step 6.

Step 6. On the completion of prespecified 𝑛test iterations, the following conditions are checked for every further iteration.(a)If 𝐺(𝐘[fav],𝑛)𝐺(𝐘[fav],𝑛𝑛test), then every agent shrinks its sampling interval as follows: Ψ𝑖𝑋[fav]𝑖Ψupper𝑖Ψlower𝑖𝜆down,𝑋[fav]𝑖+Ψupper𝑖Ψlower𝑖𝜆down,0<𝜆down1,(14) where 𝜆down is referred to as the interval factor corresponding to the shrinking of sample space.(b)If 𝐺(𝐘[fav],𝑛) and 𝐺(𝐘[fav],𝑛𝑛test) are feasible and 𝐺(𝐘[fav],𝑛)𝐺(𝐘[fav],𝑛𝑛test)𝜀, then the system objective 𝐺(𝐘[fav],𝑛) can be referred to as a stable solution 𝐺(𝐘[fav],𝑠) or possible local minimum. In order to jump out of this possible local minimum, a perturbation approach is incorporated. It is described below.

Every agent 𝑖 perturbs its current favorable strategy 𝑋[fav]𝑖 by a perturbation factor fact𝑖 corresponding to the reciprocal of its favorable strategy 𝑋[fav]𝑖 as follows:𝑋[fav]𝑖=𝑋[fav]𝑖±𝑋[fav]𝑖×fact𝑖,(15) where fact𝑖=𝜎randomvaluelower1,𝜎upper11if𝑋[fav]𝑖𝜎𝛾,randomvaluelower2,𝜎upper21if𝑋[fav]𝑖>𝛾(16) and 𝜎lower1,𝜎upper1,𝜎lower2,𝜎upper2 are randomly generated values between 0 and 1; that is, 0<𝜎lower1,𝜎upper1,𝜎lower2,𝜎upper2<1, and 𝜎lower1<𝜎upper1𝜎lower2<𝜎upper2. The value of 𝛾 as well as “+” or “−” sign in (15) is chosen based on the preliminary trials of the algorithm.

It gives a chance to every agent 𝑖 to jump out of the local minima and further may help to search for a better solution. The perturbed solution is accepted if and only if the feasibility is maintained. Furthermore, every agent expands its sampling interval as follows: Ψ𝑖Ψlower𝑖Ψupper𝑖Ψlower𝑖𝜆up,Ψupper𝑖+Ψupper𝑖Ψlower𝑖𝜆up,0<𝜆up1,(17) where 𝜆up is referred to as the interval factor corresponding to the expansion of sample space.

Step 7. If either of the two criteria listed below is valid, accept the current stable system objective 𝐺(𝐘[fav],𝑠) and corresponding 𝐘[fav],𝑠 as the final solution referred to as 𝐺(𝐘[fav],nal) and 𝐘[fav],nal={𝑋[fav],nal1,𝑋[fav],nal2,,𝑋[fav],nal𝑁1,𝑋[fav],nal𝑁}, respectively and stop; else continue to Step 8.(a)If temperature 𝑇=𝑇nal or 𝑇0.(b)If there is no significant change in the successive stable system objectives (i.e., 𝐺(𝐘[fav],𝑠)𝐺(𝐘[fav],𝑠1)𝜀) for two successive implementations of the perturbation approach.

Step 8. Each agent 𝑖 then samples 𝑚𝑖 strategies from within the updated sampling interval Ψ𝑖 and forms the corresponding updated strategy set 𝐗𝑖 represented as follows: 𝐗𝑖=𝑋𝑖[1],𝑋𝑖[2],𝑋𝑖[3],,𝑋[𝑚𝑖]𝑖,𝑖=1,2,,𝑁.(18) Reduce the temperature 𝑇=𝑇𝛼𝑇𝑇, update the iteration counter 𝑛=𝑛+1, and return to Step 1.

3.2. Nash Equilibrium

To achieve a Nash equilibrium, every agent in a MAS should have the properties of rationality and convergence [4144]. Rationality refers to the property by which every agent selects (or converges to) the best possible strategy given the strategies of the other agents. The convergence property refers to the stability condition, that is, a policy using which every agent selects (or converges to) the best possible strategy when all the other agents use their policies from a predefined class (preferably same class). The Nash equilibrium is naturally achieved when all the agents in a MAS are convergent and rational. Moreover, a Nash equilibrium is guaranteed when all the agents use stationary policies, that is, those policies that do not change over time. It is worth to mention here that all the agents in the MAS proposed using PC algorithm exhibit the above-mentioned properties. It is elaborated in the detailed PC algorithm discussed in the previous few paragraphs.

In any game, there may be a large but finite number of Nash equilibria present, depending on the number of strategies per agent as well as the number of agents. It is essential to choose the best possible combination of the individual strategies selected by each agent. It is quite hard to go through every possible combination of the individual agent strategies and choose the best out of it that can produce a best possible Nash equilibrium and hence the system objective.

As discussed in the detailed PC algorithm, in each iteration 𝑛, every agent 𝑖 selects the best possible strategy referred to as the favorable strategy 𝑋[fav𝑖],𝑛 by guessing the possible strategies of the other agents. This information about its favorable strategy 𝑋[fav𝑖],𝑛 is made known to all the other agents as well. In addition, the corresponding global knowledge such as system objective value 𝐺(𝐘[fav],𝑛)=𝐺(𝑋[fav1],𝑛,𝑋[fav2],𝑛,𝑋[fav3],𝑛,,𝑋[fav],𝑛𝑁1,𝑋[fav𝑁],𝑛) is also available to each agent which clearly helps all the agents take the best possible informed decision in every further iteration. This makes the entire system ignore a considerably large number of Nash equilibria but select the best possible one in each iteration and accept the corresponding system objective 𝐺(𝐘[fav],𝑛). Mathematically, the Nash equilibrium solution in any iteration can be represented as follows:𝐺𝑋[fav1],𝑛,𝑋[fav2],𝑛,𝑋[fav3],𝑛,,𝑋[fav],𝑛𝑁1,𝑋[fav𝑁],𝑛𝑋𝐺[fav1],𝑛,𝑋[fav2],𝑛,𝑋[fav3],𝑛,,𝑋[fav],𝑛𝑁1,𝑋[fav𝑁],𝑛,𝐺𝑋[fav1],𝑛,𝑋[fav2],𝑛,𝑋[fav3],𝑛,,𝑋[fav],𝑛𝑁1,𝑋[fav𝑁],𝑛𝑋𝐺[fav1],𝑛,𝑋[fav2],𝑛,𝑋[fav3],𝑛,,𝑋[fav],𝑛𝑁1,𝑋[fav𝑁],𝑛,𝐺𝑋[fav1],𝑛,𝑋[fav2],𝑛,𝑋[fav3],𝑛,,𝑋[fav],𝑛𝑁1,𝑋[fav𝑁],𝑛𝑋𝐺[fav1],𝑛,𝑋[fav2],𝑛,𝑋[fav3],𝑛,,𝑋[fav],𝑛𝑁1,𝑋[fav𝑁],𝑛,(19) where 𝑋[fav𝑖],𝑛 represents any strategy other than the favorable strategy 𝑋[fav𝑖],𝑛 from the same sample space Ψ𝑛𝑖.

Furthermore, from this current Nash equilibrium point with system objective 𝐺(𝐘[fav],𝑛), the algorithm progresses to the next Nash equilibrium point with better system objective 𝐺(𝐘[fav],𝑛+1), that is, 𝐺(𝐘[fav],𝑛)𝐺(𝐘[fav],𝑛+1). As the algorithm progresses, those ignored Nash equilibria as well as the best Nash equilibria selected at previous iterations would be noticed as inferior solutions.

This process continues until there is no change in the current solution 𝐺(𝐘[fav],𝑛), that is, no new Nash equilibrium has been identified that proves the current Nash equilibrium to be inferior. Hence, the system exhibits stage-wise convergence to a unique Nash equilibrium and the corresponding system objective is accepted as the final solution G(𝐘[fav],nal). As a general case, this progress can be represented as 𝐺(𝐘[fav],1)𝐺(𝐘[fav],2)𝐺(𝐘[fav],𝑛)𝐺(𝐘[fav],𝑛+1)𝐺(𝐘[fav],nal).

4. The Circle Packing Problem (CPP)

A generalized packing problem consists of determining how best to pack 𝑧 objects into a predefined bounded space that yields best utilization of space with no overlap of object boundaries [52, 53]. The bounded space can also be referred to as a container. The packing objects and container can be circular, rectangular, or irregular. Although the problem appears rather simple and in spite of its practical applications in production and packing for the textile, apparel, naval, automobile, aerospace, food industries, and so forth [54], the CPP received considerable attention in the “pure” mathematics literature but only limited attention in the operations research literature [55]. As it is proven to be a NP-hard problem [53, 5658] and cannot be effectively solved by purely analytical approaches [5969], a number of heuristic techniques were proposed solving the CPP [52, 53, 7082]. Most of these approaches address the CPP in limited ways, such as close packing of fixed and uniform sized circles inside a square or circle container [53, 5970], close packing of fixed and different-sized circles inside a square or circle container [52, 54, 7582], simultaneous increase in the size of the circles covering the maximum possible area inside a square [7174], and so forth.

As per knowledge of the authors of this paper, the CPP was never solved in a distributed way. In this paper, as a distributed MAS, every individual circle changes its size and position autonomously. This allows for addressing the important issue of the avoidance of the tragedy of commons which was also never addressed before in the context of the CPP. The next few sections describe the mathematical formulation and the solution to two cases of the CPP.

4.1. Formulation of the CPP

The objective of the CPP solved here was to cover the maximum possible area within a square by 𝑧 number of circles without overlapping one another or exceeding the boundaries of the square. In order to achieve this objective, all the circles were allowed to increase their sizes as well as change their locations. The problem is formulated as follows:Minimize𝑓=𝐿2𝑧𝑖=1𝜋𝑟2𝑖(20)Subjectto𝑥𝑖𝑥𝑗2+𝑦𝑖𝑦𝑗2𝑟𝑖+𝑟𝑗𝑥,(21)𝑙𝑥𝑖𝑟𝑖𝑥,(22)𝑢𝑥𝑖+𝑟𝑖𝑦,(23)𝑙𝑦𝑖𝑟𝑖𝑦,(24)𝑢𝑦𝑖+𝑟𝑖,(25)0.001𝑟𝑖𝐿2,(26)𝑖,𝑗=1,2,,𝑧,𝑖𝑗,(27) where 𝐿 is length of the side of the square, 𝑟𝑖 is radius of circle 𝑖, 𝑥𝑖,𝑦𝑖 are 𝑥 and 𝑦 coordinates of the center of circle 𝑖, 𝑥𝑙, 𝑦𝑙 are 𝑥 and 𝑦 coordinates of the lower left corner of the square, 𝑥𝑢,𝑦𝑢 are 𝑥 and 𝑦 coordinates of the upper right corner of the square.

In solving the proposed CPP using constrained PC approach, the circles were considered as autonomous agents. These circles were assigned the strategy sets of 𝑋-coordinates and 𝑌-coordinates of the center and the radius. Two cases of the CPP were solved. These cases differ from each other based on the initial configuration (location) of the circles as well as the constraint handling method(s) applied solving each case. In Case 1, the circles were randomly initialized inside the square and were not allowed to cross the square boundaries. The constraints in (21) were satisfied using the feasibility-based approach described in Section 3. And the constraints in (22) to (25) were satisfied in every iteration of the algorithm using a repair approach. The repair approach refers to pushing the circles inside the square if they crossed the boundaries of it. It is similar to the one implemented by the authors of this paper in their previous work [1, 26]. In Case 2, the circles were randomly located in and around the square and all the constraints from (21) to (25) were satisfied using the feasibility-based approach described in Section 3. The initial configuration of Case 1 and Case 2 is shown in Figures 3(a) and 6(a), respectively.

The constrained PC algorithm solving both cases was coded in MATLAB 7.8.0 (R2009A), and the simulations were run on a Windows platform using an Intel Core 2 Duo, 3 GHz processor speed and 3.25GB memory capacity. Furthermore, for both cases, the set of parameters chosen was as follows: (a) individual agent sample size 𝑚𝑖=5, (b) number of test iterations 𝑛test=20, (c) the shrinking interval factor 𝜆down=0.05, (d) the expansion interval factor 𝜆up=0.1, (e) perturbation parameters 𝜎lower1=0.001, 𝜎upper1=0.01, 𝜎lower2=0.5, 𝜎upper2=0.7, 𝛾=0.99, and the sign in (15) was chosen to be “−”. In addition to it, a voting heuristic was also incorporated in the constrained PC algorithm. It is described in the Section 4.4.

4.2. Case 1: CPP with Circles Randomly Initialized Inside the Square

In this case of the CPP, five circles (𝑧=5) were initialized randomly inside the square without exceeding the boundary edges of the square. The length of the side of the square was five units (i.e., 𝐿=5). More than 30 runs of the constrained PC algorithm described in Section 3 were conducted solving Case 1 of the CPP with different initial configurations of the circles inside the square. The true optimum solution was achieved in every run with the average CPU time of 14.05 minutes, and average number of function evaluations is 17515.

The randomly generated initial solution, the intermediate iteration solutions, and the converged true optimum solution from one of the instances are presented in Figure 3. The corresponding convergence plot of the system objective is presented in Figure 4. The convergence of the associated variables such as radius of the circles, 𝑋-coordinates, and 𝑌-coordinates of the center of the circles is presented in Figures 5(a), 5(b) and 5(c), respectively. The solution was converged at iteration 1035 with 26910 function evaluations. The true optimum value of the objective function (𝑓) achieved was 3.0807 units.

As mentioned before, the algorithm was assumed to have converged when successive implementations of the perturbation approach stabilize to equal objective function value. It is evident from Figures 3, 4, and 5 that the solution was converged to true optimum at iteration 1055 as the successive implementations of the perturbation approach produced stable and equal objective function values. Furthermore, it is also evident from Figures 4 and 5 that the solution was perturbed at iteration 778, 901, 1136, and 1300. It is clear that the implementation of the perturbation approach at iteration 901 helped the solution to jump out of the local minima and further achieve the true optimum solution at iteration 1106.

4.3. Case 2: CPP with Circles Randomly Initialized

In this case of the CPP, five circles (𝑧=5) were initialized randomly in the space with no restriction as in Case 1 where circles were randomly placed inside the square. The length of the side of the square was five units (i.e., 𝐿=5). Similar to Case 1, more than 30 runs of the constrained PC algorithm described in Section 3 with different initial configuration of the circles were conducted solving Case 2. The true optimum solution was achieved in every run with the average CPU time of 14.05 minutes, and average number of function evaluations is 68406.

The randomly generated initial solution, the intermediate iteration solutions, and the converged true optimum solution from one of the instances of Case 2 are presented in Figure 6. The corresponding convergence plot of the system objective is presented in Figure 7. The convergence of the associated variables is presented in Figures 8(a), 8(b), and 8(c), respectively. The solution was converged at iteration 955 with 24830 function evaluations. The true optimum value of the objective function (𝑓) achieved was 3.0807 units.

In the instance of Case 2 represented here, the solution was perturbed at iteration 788, 988, 1170, and 1355. It is clear that the implementation of the perturbation approach at iteration 788 helped the solution to jump out of the local minima and further achieve the true optimum solution at iteration 955. It is important to mention that the instance illustrated here did not require the voting heuristic to be applied.

4.4. Voting Heuristic

In a few instances of the CPP cases solved here, in order to jump out of the local minimum, a voting heuristic was required. It was implemented in conjunction with the perturbation approach. Once the solution was perturbed, every circle voted 1 to each quadrant which it does not belong to at all and voted 0 otherwise. The circle with the smallest size shifted itself to the extreme corner of the quadrant with the highest number of votes, that is, the winner quadrant. The new position of the smallest size circle was confirmed only when the solution remained feasible and the algorithm continues. If all the quadrants acquire equal number of votes, then no circle moves its position and the algorithm continues. The voting heuristic is demonstrated in Figure 9.

A voting grid corresponding to every quadrant of the square in Figure 9(a) is represented in Figure 9(b). The solid circles represent the solution before perturbation while corresponding perturbed ones are represented in dotted lines. The votes given by the perturbed circles (dotted circles) to the quadrants are presented in the grid. As the maximum number of votes are given to quadrant 1 (i.e., Q 1), the circle with smallest size (circle 3) shifts to the extreme corner of the quadrant Q 1 and confirms the new position as the solution remains feasible. Based on the trials conducted so far, it was noticed that the voting heuristic was not necessary to be implemented in every run of the constrained PC algorithm solving the CPP. Moreover, in those of the few cases in which the voting heuristic was required, it was required to be implemented only once in the entire execution of the algorithm. A variant of the voting heuristic was also implemented in conjunction with energy landscape paving algorithm [54, 76, 77]. The smallest circle was picked and placed randomly at the vacant place producing new configuration. It was claimed that this heuristic helped the algorithm jump out of the local minima. Furthermore, this heuristic was required to be implemented in every iteration of the algorithm.

5. Discussion

The above solutions using constrained PC indicated that it could successfully be used to solve constrained optimization problems such as the CPP. It is evident from the results that the approach was sufficiently robust and produced true optimum results in every run of both cases. It implies that the rational behavior of the agents could be successfully formulated and demonstrated. It is important to highlight that the distributed nature of the PC approach allowed the total number of function evaluations to be equally divided among the agents of the system. This can be made practically evident by implementing the PC approach on a real distributed platform assigning separate workstations carrying out the computations independently. These advantages along with the directly incorporated uncertainty using the real-valued probabilities treated as variables suggest that PC can potentially be applied to real world complex problems.

It is worth to mention some of the key differences of the PC methodology presented here and the original PC approach [2, 3, 13, 14]. In the present approach, fewer numbers of samples were drawn from the uniform distribution of the individual agent’s sampling interval. On the contrary, the original PC approach used a Monte Carlo sampling method which was computationally expensive and slow as the number of samples needed was in the thousands or even millions. Most significantly, the sampling in further stages of the PC algorithm presented here was narrowed down in every iteration by selecting the sampling interval in the neighborhood of the most favorable value in the particular iteration. This ensures faster convergence and an improvement in efficiency over the original PC approach in which regression was necessary to sample the strategy values in the close neighborhood of the favorable value. Moreover, the coordination among the agents representing the variables in the system was achieved based on the partial small bit of information. In other words, in order to optimize the global/system objective, every agent selects its best possible strategy by guessing the model of every other agent based merely on their recent favorable strategies communicated. This gives the advantage to the agents and the entire system to quickly search the better solution and reach the Nash equilibrium and avoid the tragedy of commons.

In addition, the feasibility-based rule in [3440] suffered from maintaining the diversity and further required additional techniques such as niching [34], SA [35], modified mutation approach [38, 39], and several associated trials in [3639], and so forth. It may require further computations and memory usage. On the other hand, a simple perturbation approach assisting the feasibility-based rule implemented here was computationally cheaper and requires no additional memory usage.

In agreement with the no-free-lunch theorem [83], some limitations were also identified. The rate of convergence and the quality of the solution were dependent on the parameters such as the number of strategies 𝑚𝑖 in every agent’s strategy set 𝑋𝑖, the interval factor 𝜆, number of test iterations 𝑛test, the shrinking interval factor 𝜆down, the expansion interval factor 𝜆up, and also the perturbation parameters such as 𝜎lower1, 𝜎upper1,𝜎lower2, 𝜎upper2, and 𝛾. It necessitated some preliminary trials for fine-tuning these parameters. Additionally, in order to confirm the convergence, the algorithm was required to be run beyond the convergence for a considerable number of iterations.

6. Concluding Remarks and Future Work

This paper proposes a generalized constrained PC approach using a variation of the feasibility-based rule originally proposed in [34]. Similar to [27], the constraint violation tolerance was iteratively tightened in order to obtain the fitter solution. In addition, in order to jump out of the possible local minima, the perturbation approach was successfully incorporated into the constrained PC algorithm. The concept of Nash equilibrium was also successfully formalized and demonstrated. Furthermore, the authors believe that the PC algorithm is made simpler and faster by improving the sampling method, the convergence criterion, and most importantly the neighboring approach narrowing the sampling options.

The constrained PC approach was successfully demonstrated solving two cases of the CPP. In both cases, the approach could find the true optimal solution in reasonable computational efforts. It is important to mention that the concept of the avoidance of tragedy of commons was also successfully demonstrated solving two cases of the CPP. Although only inequality constraints were handled in both cases of the CPP solved here, the approach of transformation of the equality constraints into the inequality constraints [810] can be implemented.

In the future, to make the approach more generalized and to improve the diversification of sampling, the rate of convergence, the quality of results, and so forth, a self-adaptive scheme can be developed for the parameters such as the number of strategies 𝑚𝑖 and interval factor 𝜆. Furthermore, the constraint handling technique may be further improved/developed using a multicriteria optimization approach [810, 84]. The constrained PC approach can be used for solving more realistic problems such as machine shop scheduling and urban traffic, and so forth. The authors also see some potential in the field of healthcare systems management [85].

Appendix

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method Minimizing the Homotopy Function

The minimization of the Homotopy function given in (12) was carried out using a suitable second-order optimization technique such as Broyden-Fletcher-Goldfarb-Shanno (BFGS) method [47, 48]. The approximated Hessian in this method is positive definite. Moreover, the updating of the Hessian also preserves the positive definiteness. The BFGS method minimizing the Homotopy function in (12) is discussed below. (1)Set BFGS iteration counter 𝑘=1, BFGS maximum number of iterations 𝜈, and step size 𝛼step (0<𝛼step1). The value of 𝛼step is held constant throughout the optimization and chosen based on the preliminary trials of the algorithm. (a)Initialize the convergence criterion 𝑞(𝐗𝑖)𝑘𝑞(𝐗𝑖)𝑘1𝜀2. The convergence parameter 𝜀2=0.0001 is equal for all the 𝑁 agents. (b)Initialize the Hessian 𝐇𝑘𝑖 to a positive definite matrix, preferably identity matrix 𝐈 of size 𝑚𝑖×𝑚𝑖. (c)Initialize the probability variables as follows: 𝑞𝑋𝑖𝑘=𝑞𝐗𝑖[1]𝑘=1𝑚𝑖,𝑞𝑋𝑖[2]𝑘=1𝑚𝑖𝑞𝑋,,[𝑚𝑖]𝑖𝑘=1𝑚𝑖.(A.1) That is, assign uniform probabilities to the strategies of agent 𝑖. This is because, at the beginning, least information is available (largest uncertainty and highest entropy) about which strategy is favorable for the minimization of the collection of system objectives 𝑚𝑖𝑟=1G(𝐘𝑖[𝑟]). (d)Compute the gradient of the Homotopy function in (12) as follows𝐂𝑘=𝜕𝐽𝑖𝑞(𝐗𝑖),𝑇𝑘𝑋𝜕𝑞𝑖[1]𝑘𝜕𝐽𝑖𝑞(𝐗𝑖),𝑇𝑘𝑋𝜕𝑞𝑖[2]𝑘𝜕𝐽𝑖𝑞(𝐗𝑖),𝑇𝑘𝑋𝜕𝑞[𝑚𝑖]𝑖𝑘=𝐺𝐘𝑖[1]𝑞𝑋[?](𝑖)+𝑇𝑞𝑋ln(2)1+ln𝑖[1]𝑘𝐺𝐘𝑖[1]𝑞𝑋[?](𝑖)+𝑇𝑞𝑋ln(2)1+ln𝑖[2]𝑘𝐘𝐺[𝑚𝑖]𝑖𝑞𝑋[?](𝑖)+𝑇𝑞𝑋ln(2)1+ln[𝑚𝑖]𝑖𝑘.(A.2)(2)Compute the search direction as 𝐝𝑘𝑖=𝐂𝑘𝑖(𝐇𝑘𝑖)1. (3)Compute 𝐽𝑖((𝑞(𝐗𝑖)+𝛼step𝐝𝑘𝑖),𝑇).(4)Update the probability vector 𝑞(𝐗𝑖)𝑘+1=𝑞(𝐗𝑖)𝑘+𝛼step𝐝𝑘𝑖(5)Update the Hessian 𝐇𝑖𝑘+1=𝐇𝑘𝑖+𝐃𝑘𝑖+𝐄𝑘𝑖, where 𝐃𝑘𝑖=𝐲𝑘𝑖.𝐲𝑘𝑖𝑇𝐲𝑘𝑖𝐬𝑘𝑖,𝐄𝑘𝑖=𝐂𝑘𝑖𝐂𝑘𝑖𝑇𝐂𝑘𝑖𝐝𝑘𝑖,𝐬𝑘𝑖=𝛼step𝐝𝑘𝑖,𝐲𝑘𝑖=𝐂𝑖𝑘+1𝐂𝑘𝑖,(A.3)𝐂𝑘+1=𝜕𝐽𝑖𝑞𝐗𝑖,𝑇𝑘+1𝑋𝜕𝑞𝑖[1]𝑘+1𝜕𝐽𝑖𝑞𝐗𝑖,𝑇𝑘+1𝑋𝜕𝑞𝑖[2]𝑘+1𝜕𝐽𝑖𝑞𝐗𝑖,𝑇𝑘+1𝑋𝜕𝑞[𝑚𝑖]𝑖𝑘+1=𝐺𝐘𝑖[1]𝑞𝑋[?](𝑖)+𝑇𝑞𝑋ln(2)1+ln𝑖[1]𝑘+1𝐺𝐘𝑖[1]𝑞𝑋[?](𝑖)+𝑇𝑞𝑋ln(2)1+ln𝑖[2]𝑘+1𝐘𝐺[𝑚𝑖]𝑖𝑞𝑋[?](𝑖)+𝑇𝑞𝑋ln(2)1+ln[𝑚𝑖]𝑖𝑘+1.(A.4)(6)Accept the current probability distribution 𝑞(𝐗𝑖)𝑘 if 𝑘𝜈 or the condition 𝑞(𝐗𝑖)𝑘𝑞(𝐗𝑖)𝑘1𝜀2 is true for successive considerable number of iterations, then stop; else update 𝑘=𝑘+1 and go to (2).