Abstract

We provide a necessary condition that a constrained Nash-equilibrium (CNE) policy pair satisfies in two-person zero-sum constrained stochastic discounted-payoff games and discuss a general method of approximating CNE based on the condition.

1. Introduction

Altman and Shwartz [1] established a sufficient condition for the existence of a stationary Markovian constrained Nash equilibrium (CNE) policy pair in a general model of finite two-person zero-sum constrained stochastic games and Alvarez-Mena and Hernández-Lerma [2] extended the result for infinite state and action spaces. Even though a few computational studies exist for average-payoff models with additional simplifying assumptions (see, e.g., [35]), there seems to be no work providing a meaningful necessary condition for CNE or any general approximation scheme for CNE within the general discounting-cost model.

This brief paper establishes a necessary condition that a CNE policy pair satisfies by a novel characterization of the set of all feasible policies of one player when the other player's policy is fixed. This is done by identifying feasible mixed actions of one player at a current state when the expected total discounted constraint cost from each reachable next state is given by a value function defined over the state space. The necessary condition provides a general method of testing whether a given policy pair is a CNE policy pair and can induce a general approximation scheme for CNE.

2. Preliminaries

Consider a two-person zero-sum Markov game (MG) [6] , where is a finite state set, and and are nonempty finite pure-action sets for the minimizer and the maximizer, respectively, at in with and . We denote the mixed action sets at in over and to be and , respectively, with and .

Once in and in are simultaneously taken at in by the minimizer and the maximizer, respectively (with the complete knowledge of the state but without knowing each other's current action being taken), makes a transition to a next state by the probability given as Here denotes the probability of selecting , similar to , and denotes the probability of moving from to by and . Then the minimizer obtains an expected cost of given by where in is a payoff to the minimizer (the negative of this will be incurred to the maximizer).

We define a stationary Markovian policy   of the minimizer as a function with for all and denote to be the set of all possible such policies. A policy is similarly defined for the maximizer with , and we denote to be the set of all possible such policies. Define the objective value of in and in with an initial state in as where is a random variable denoting the state at time by following and , and is a fixed discounting factor. We let for a given initial state distribution   over .

The MG is associated with constraint functions , , defined over and constraint-cost functions , , where , , ,  and  , is a constraint-cost paid by the minimizer if and by the maximizer if . (For simplicity, we consider the model in [1, 2] with only one side constraint for each player.)

A policy in is called -feasible with respect to in if the pair of and satisfies the constraint inequality of , where is defined with such that The expected constraint cost is given by Similarly, in is -feasible with respect to in if , where is defined with and . We say that is feasible with respect to if, for all in , , and is feasible with respect to if, for all in , . Note that if is feasible with respect to , then is -feasible with respect to for any .

Let That is, is the set of all -feasible policies in with respect to (when the maximizer's policy is fixed by ); similarly is obtained when the minimizer's policy is fixed by . Then has a constrained Nash equilibrium (CNE) if there exists a pair of in and in such that is -feasible with respect to and is -feasible with respect to , and

3. A Necessary Condition for CNE

Let be the set of all real-valued functions on . Given in and in , define -constrained feasible mixed action set with for the minimizer: for all in , We further define -constrained feasible mixed action set with for the maximizer for in and in : for all in ,

Lemma 1. Given in and in , suppose that for all in . Then any in such that for all in is feasible with respect to . Similarly, given in and in , if for all in , then any in such that for all in is feasible with respect to .

Proof. Consider an operator given as Then we have that for in and in if for all in , then, for all , and [6].
Because is in for all in , for all in . With , we have that, for all in , The second statement can be proven with the similar symmetrical reasoning.

Some notable examples of choosing in for and are , , , and , for some pair of and , and the zero function such that for all in . In particular, if for all , then .

We now let If for some in , then , and similarly if for some in . The following theorem characterizes the set of one player's all feasible policies with respect to a fixed policy of the other player.

Theorem 2. Given in , in is feasible with respect to if and only if is in . Furthermore, given , in is feasible with respect to if and only if is in .

Proof. We prove only the first part of the statement for the minimizer case. Suppose that is feasible with respect to . Then by setting , . It follows that and this implies that there exists in such that is in for all in .
For the other direction, if is in , then, for some , is in for all in . By Lemma 1, is feasible with respect to .

Because a policy of one player which is feasible with respect to a policy of the other player is -feasible with respect to the policy of the other player, the following necessary condition satisfied by a CNE policy pair is immediate.

Corollary 3. If a pair of in and in is a CNE pair for a given , then the pair satisfies the following saddle-point inequality:

Given a pair of in and in , if we find some , in such that the pair does not satisfy the saddle-point inequality over nonempty and , then the pair is not a CNE pair.

4. An Example of Approximation Scheme for CNE

We now provide an example of a general approximation scheme for CNE based on the necessary condition. Basically, we fix some , in and try to find an equilibrium policy pair that satisfies the saddle-point inequalities over subsets of the feasible policy spaces induced with the selected , and the pair, which is an approximate CNE pair.

We start with selecting arbitrary , in and define feasible mixed joint-action sets induced with , : for all ,

Let where denotes the mixed action with and with .

Assume that for all in . We then obtain a pair of nonempty and such that .

Let and similarly That is, is a -constrained feasible pure-action set with at for the minimizer and similarly is for the maximizer. For any subset , with , we denote to be the set of all possible probability distributions over that have zero probabilities for the actions in . If , then . The notation of for is similarly denoted. (Note that for all in in general.) We further let If for some in , , and similarly if for some in .

By construction, the following result is immediate. For all , which further implies that, for any such that for all in and for all in , is feasible with respect to and is feasible with respect to .

Consider now the unconstrained game , where and are evaluated only at in and in for all in , and denote the set of all NE policy pairs of to be . The above two results finally imply then that any  is a local CNE for . In other words, for any , is -feasible with respect to and is -feasible with respect to , and and , and That is, the local CNE pair satisfies all of the conditions of CNE except that the saddle-point inequality is satisfied locally for the subsets of and . In fact, related solution concepts for games that are resistant to local deviations, called “local NE," have been already established in economics (see, e.g., [7]).

Projecting into the two sets of and turns out to be equivalent to obtaining a complete bipartite subgraph or biclique subgraph from a (bipartite) graph. The problem of finding a biclique in a given bipartite graph is well studied in the graph theory literature (see, e.g., [8]). Another issue is how we set and such that is nonempty for all in . If there exists a pure-policy pair of in and in such that, for all in , for some and for some and one policy is feasible with respect to the other policy, then by setting , , we have for all in , making for all in . We can put the following feasibility assumption on to assure the existence of such a pure-policy pair: for all in ,

In other words, by this assumption there exists at least one pure-policy pair such that one policy is feasible with respect to the other policy. The existence comes from the fact that there exists a pure-policy pair that achieves for all in in solving this Markov decision process problem [9] and also for the case of for all in .

To illustrate how the above method works, we consider the following simple MG , where , , , and , , , and are given in terms of a matrix form, respectively: , , , and ; , ; and , , , and . In this matrix form, for example, the th entries of and refer to and , respectively. The other parameters are given such that , , , and .

We first observe that, for the pure-policy pair , one policy is feasible with respect to the other policy. The notation of here refers to and . For simplicity, we write as for the distribution concentrated on the action . (Note that we can obtain another such pure-policy pair which achieves , .)

Now, by setting , , we first obtain ,  for  all , thereby having the sets of and ,  for  all  . We solve the unconstrained two-person zero-sum MG , obtaining a pure NE policy pair of for , which is then a local CNE for .

5. Concluding Remark

For simplicity, the model of the present note deals with only one side constraint for each player. Generalization of this into multiple side constraints per player would make the definitions complex, but the ideas must be the same to the one side-constraint case.

Acknowledgment

The author is with the Department of Computer Science and Engineering at Sogang University, Seoul, Republic of Korea, 121-742.