- About this Journal ·
- Abstracting and Indexing ·
- Aims and Scope ·
- Annual Issues ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents

Journal of Applied Mathematics

Volume 2014 (2014), Article ID 453168, 9 pages

http://dx.doi.org/10.1155/2014/453168

## Learning in General Games with Nature’s Moves

Kedge Business School, Domaine de Luminy, BP 921, 13 288 Marseille Cedex 9, France

Received 10 October 2013; Accepted 21 December 2013; Published 19 January 2014

Academic Editor: Takashi Matsuhisa

Copyright © 2014 Patrick L. Leoni. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper investigates simultaneous learning about both nature and others’ actions in repeated games and identifies a set of sufficient conditions for which Harsanyi’s doctrine holds. Players have a utility function over infinite histories that are continuous for the sup-norm topology. Nature’s drawing after any history may depend on any past actions. Provided that (1) every player maximizes her expected payoff against her own beliefs, (2) every player updates her beliefs in a Bayesian manner, (3) prior beliefs about both nature and other players’ strategies have a grain of truth, and (4) beliefs about nature are independent of actions chosen during the game, we construct a Nash equilibrium, that is, realization-equivalent to the actual plays, where Harsanyi’s doctrine holds. Those assumptions are shown to be tight.

#### 1. Introduction

Consider a finite number of agents interacting simultaneously. Every agent possibly plays an infinite number of times, and her payoff depends on the joint choice of actions as well as events beyond agents’ control (called *choices of nature*). To analyze such interactions, it is always assumed that the players share a common prior about the probability distribution of nature. Such an approach is known as *Harsanyi’s doctrine*, introduced in Harsanyi [1].

We provide a learning foundation for this doctrine. We consider a class of games where nature’s choices may (or not) depend on any past actions by players, and payoff functions are continuous for the sup-norm topology over the set of infinite histories. Provided that Bayesian players have a grain of truth, we show that resulting outcomes converge for the sup-norm topology to a Nash equilibrium that we construct, where Harsanyi’s doctrine holds.

Kalai and Lehrer [2, 3] consider a similar type of learning model, without nature’s choice and with a weaker topology of convergence that significantly restricts their class of games. More precisely, those last references use a structure, that is, not a topology, in sharp contrast with the general sup-norm topology. The notion in Kalai and Lehrer states that, for any two probability measures and and for some small real , the measure is -close to if two conditions hold. There must exist a measurable set that is assigned a measure of at least both by and , and for any measurable set it must be true that A given strategy profile with associated probability measure is then said to play -like another strategy profile with associated probability measure if is -close to . The main problem with this concept is the lack of symmetry; that is, if is -close to , then is not -close to in general. This issue excludes any form of topology of convergence, and the restriction of the set of infinite play paths for which this property must hold severely restricts the class of games under consideration.

To derive a convergence result for the sup-norm topology with choices of nature, a significantly different approach than that in Kalai and Lehrer [2, 3] is needed. We explicitly construct the Nash equilibrium toward which convergence occurs, where Harsanyi’s doctrine holds. The (almost) Nash equilibrium is as follows:(1)along every equilibrium play path, every player chooses the (possibly randomized) action that maximizes her subjective expected payoff against her beliefs on others’ strategies;(2)in case of unilateral deviation, every player plays the strategy described in the beliefs of the deviator;(3)in case of multilateral deviation, strategies are defined arbitrarily.

Continuity for the sup-norm topology allows for a form of control over future payoffs. We use this property to approximate resulting plays by a Nash equilibrium for a game with a finite number of histories derived from the original game (a somewhat equivalent notion of truncated games for our setting). This approximation is a direct consequence of continuity for the sup-norm topology, and global approximation by a Nash equilibrium for the whole game is also a consequence of the control over subsequent payoffs derived from continuity. The resulting equilibrium is also proven to satisfy Harsanyi’s doctrine.

The proof requires the introduction of the concept of *stochastic subjective equilibrium*, which generalizes the concept of *subjective equilibrium* (see Battigalli et al. [4] for a history and a discussion of this concept), as well as the concept of *self-confirming equilibrium* (see [5]). We show that, in finite time and for almost every infinite history, players’ behaviors become identical to behaviors described in a stochastic subjective equilibrium. Given the properties of (stochastic) subjective equilibria discussed in the above references, this last result provides a second decision-theoretic foundation for the concept of the Nash equilibrium. We then show that any stochastic subjective equilibrium is realization-equivalent to the strategy profile above, which is an almost-Nash equilibrium for accurate enough beliefs, where Harsanyi’s doctrine holds. The proof of this last statement uses the continuity of the utility functions to ensure convergence of payoffs as beliefs become correct. Finally, eventual correctness of beliefs follows from the grain of truth assumption, together with Blackwell-Dubins’ theorem (see [6]).

The literature typically deals with more restrictive settings than ours; for instance, most follow Kalai and Lehrer by assuming that the same game is repeated over time and that the payoff function is a discounted sum of one-shot payoffs. Continuity for the sup-norm topology, as considered here, goes far beyond this setting. Absolute continuity of beliefs, an issue of paramount importance in our work, is not a necessary condition for convergence in general. Sandroni [7] shows that this assumption does not rule out plays that are asymptotically consistent with the Nash equilibrium play. Convergence to the Nash equilibrium is difficult to obtain in broader settings, even without choice of nature. Noguchi [8] considers the case of bounded rationality, where agents smoothly approximate optimal behavior, and he gives a set of prior beliefs leading to convergence to an approximate Nash equilibrium. Foster and Young [9] show that rational agents may not always learn others’ strategies when payoffs are random; Nachbar [10] also shows that under a weak learning condition, if players are potentially able to learn to predict opponents’ strategies for rich enough strategy sets, and if their learning process is symmetric, then one player cannot learn to predict others’ optimal strategies given their own beliefs.

The paper is organized as follows. In Section 2 we formally describe the class of games and the equilibrium concepts; in Section 3 we give the main result; Section 4 contains some intermediary results; Section 5 presents the counterexamples and concluding remarks; and finally all the technical proofs are in the Appendices.

#### 2. The Model

The model and some assumptions needed to obtain the main result of the paper are now described.

Time is discrete and continues forever. A period is denoted by the letter . There are players , who play forever. In every period , every player has a finite set of actions . Let be defined as , the set of *action combinations*. There is a nonempty and finite (or possibly countable) set of *states of nature *in every period.

For every , a cylinder with base is defined to be the set of all infinite histories whose initial elements coincide with . We define the set to be the -algebra that consists of all finite unions of cylinders with base on , and is defined to be the trivial -algebra. The sequence generates a filtration, and we define to be the -algebra generated by .

Let be the set of all histories of length ; that is, (The set is defined to be the singleton consisting of the null history.) and let be the set of all finite histories; that is, the set .

A *(behavioral) strategy* for every player assigns to every possible finite history a (possibly randomized) action. We represent a (behavioral) strategy for player as a function
where is the set of all probability distributions on .

Nature draws a state in every period, after every possible history. We thus represent nature’ choices by a behavioral strategy where is the set of all probability distributions on . The state of nature is not revealed to the players before they have simultaneously made their choices of actions.

The game is played with *perfect monitoring*; that is, the players know all realized past action combinations actually played, as well as all the states of nature drawn prior to the current period. The state of nature drawn in the current period is revealed to the players at the end of this period (without necessarily revealing the payoff of the other players).

##### 2.1. Realized Play Paths

The concept of *infinite play path* is now defined. (What follows is described in Kalai and Lehrer [3].) This notion of infinite (or realized) play path represents the actual actions chosen by the players over time, in the following sense.

Consider a -vector of behavioral strategies . The null history leads to the realized action combination in the support of . Defined recursively, in period , the players will choose the randomizations , which will result in the action combination . The vector is called the *realized play path. *A realized play path, finite or infinite, will be denoted by the letter .

Denote by the support of ; that is, the set of infinite play paths was assigned strictly positive probability by .

##### 2.2. Beliefs

The beliefs of the players about others’ strategies and the realizations of the states of nature are now formally described.

Every player is assumed to have subjective prior beliefs about both other players’ strategies and nature. Those prior beliefs are formed before the first period of the game, and they will be updated in every subsequent period in a Bayesian manner, according to available information (see Kalai and Lehrer [2, 3] for a formal definition, as we use the same approach here).

Formally, the beliefs of player regarding the strategies of the other players are represented by a -vector of strategies . The belief represents the belief of player about player ’s strategy. Moreover, player knows her own strategy (i.e., for every ).

The belief of player about nature is represented by a behavioral strategy , for every .

We consider the following probabilistic representation of beliefs. We associate a -vector of strategies and a choice of nature to a unique probability measure on the set on infinite play paths, as follows.

First, the measure is defined inductively on the set of finite play paths and then uniquely extended to the set of infinite play paths.

Define to be for the null history. Consider now a finite history whose corresponding realized play path is given by the vector , a vector of actions , and a state of nature . The value is inductively defined to be

So defined, we now uniquely extend this measure to . Any finite history is now interpreted as the cylinder , which is by definition the set of infinite paths whose initial segment is .

Define the probability measure as for every such cylinder, and consider its unique extension to given by Caratheodory’s theorem. The extension of the probability measure to is the unique extension of on . See for instance Kalai and Lehrer [2] for more details.

The above representation implicitly requires that the belief of every player about nature is independent (in a probabilistic sense) of the actions chosen by other players. Moreover, it is also that every player believes that other players choose their actions independently of each other.

In Section 5.1, an example is given showing that none of the current results hold without those last two assumptions. The reader is also referred to Section 5.2 for a discussion of those assumptions on beliefs and their behavioral implications.

For sake of notational convenience, we shall denote by the same symbol the prior belief about the nature and others’ strategies of player and her updated beliefs obtained by iterated applications of Bayes’ formula.

##### 2.3. Payoffs

The intertemporal payoff functions of the players are now described. Every player has the utility function over the set of infinite histories

We assume that, for every player ,(1)the function is continuous with respect to the sup-norm topology on , and(2) is uniformly bounded above and below.

For any vector of strategies , player , whose belief about the states of nature is , receives the expected payoff where is the expected value with respect to the probability measure induced by the strategies and nature’ choices .

Moreover, every player is assumed to maximize the above expression, namely, her (subjective) expected payoff given her subjective belief about nature’ drawings and against her subjective belief about other players’ strategies.

The above specification of payoffs encompasses the case treated in Kalai and Lehrer [2, 3] and Sandroni [7], where the payoff over infinite histories takes the form of expected discounted sum of one-period payoff.

##### 2.4. Equilibrium Concepts

This section is devoted to defining the solution concepts that will be used throughout.

First, the concept of best-response against others’ strategies, given a belief about nature, is defined. Pick any player and consider her belief about nature and a -vector strategies (this last vector can represent either beliefs about others’ strategies or actual strategies). For every , a strategy is a -*best response to * if
for every other strategy available to player .

Fix now . A *Nash **-equilibrium* is a -vector of strategies such that is an -best response* to * for every , where is the *true* probability of the states of nature. In particular, in any (almost) Nash equilibrium, beliefs about nature and others’ strategies are exact.

The next notion allows us to specify a concept of closeness, in a probabilistic sense, between two vectors of strategies and for two particular choices of nature. Define first, for and probability measures on the same measurable space , the *sup-norm *to be

*Definition 1. *Fix . The strategy profile plays -like the strategy profile if

The concept of “playing -like” for two given strategies measures how distant those strategies are from each other in a probabilistic sense. It is preferable to approach this issue from a probabilistic standpoint, as explained in detail in Kalai and Lehrer [2, 3], even though the authors use a different concept.

With the above definitions, it is now possible to introduce the concept of *stochastic subjective equilibrium* (up to some constants), which generalizes the concept of subjective equilibrium introduced in Kalai and Lehrer [2, 3] and the concept of self-confirming equilibrium introduced in Fudenberg and Levine [5, 11]. This notion will play an important role in the proof of the main result of this paper.

*Definition 2. *Fix . A stochastic -subjective equilibrium is a matrix of beliefs , satisfying for every that(i)the strategy is a best-response to , and(ii)the strategy profile plays -like .

In other words, in any stochastic subjective equilibrium, the following requirements hold: (i) every player maximizes her intertemporal utility function against her beliefs about others’ strategies and given her beliefs about nature, and (ii) the beliefs about others’ strategies and nature are realization-equivalent (up to ) to actual plays. No particular condition on beliefs about nature is required in such equilibrium; for instance there is no need at this point for requirement ensuring that beliefs about nature are consistent with available information and learning processes.

The above definition extends the notion of subjective equilibrium, as introduced in Kalai and Lehrer [2, 3], in that uncertainty about nature is added. The point of the paper is to show that, even when facing this type of uncertainty, Bayesian players will choose (in finite time) a behavior described by a Nash equilibrium.

#### 3. The Main Result

In this section, the main result of the paper is stated and discussed. That is, the set of sufficient conditions leading to convergence toward the Nash equilibria in finite time is given.

We first introduce a definition, which captures the concept of *induced strategy* resulting from a given strategy after a particular finite history.

*Definition 3. *Consider a -vector of strategies , a period , and a finite history . The induced strategy is defined as
where is the history of length resulting from the concatenation of the history (first) and (followed by) the history .

For any -vector of strategies (with ), the induced -vector of strategies are defined as

Before stating the main result of this paper, a notion in Measure Theory is first defined. Consider two measures and on the same measurable space . The measure is said to be *absolutely continuous* with respect to , denoted by , if for every such that it is true that .

Finally, for any realized play path and time , denote by the truncation of to its first elements (thus ).

Theorem 4. *Consider a -vector of strategies representing actual plays and for every player the belief with such that, for every ,*(i)*the strategy is a best-response to ,*(ii)*the beliefs are such that ,*(iii)*player updates her beliefs in a Bayesian manner, and*(iv)*the belief of player about nature is not correlated with observations of actions taken by other players.**Fix now any arbitrary . For -almost every path , there exists a time such that, for every , there exists a strategy profile such that*(1)* is a Nash -equilibrium, and*(2)* plays 0-like .*

The above theorem says that, if (1) players maximize their intertemporal utility functions against their own beliefs, and if (2) beliefs are updated in a Bayesian manner, as long as the independence requirement is satisfied and the grain of truth holds, actual plays are realization-equivalent to an almost Nash equilibrium in finite time.

One of the keys to proving the above result is that, when Assumptions (i)–(iv) are satisfied, along the realized play path actions satisfy the properties of a stochastic (almost-) subjective equilibrium in finite time. Since also any (almost) Nash equilibrium is an almost stochastic subjective equilibrium, and since also any (almost) Nash equilibrium trivially satisfies Assumptions (i)–(iv) above, Theorem 4 implicitly establishes some form of equivalence between those three different concepts.

The assumptions used in Theorem 4 are discussed next.

Assumptions (ii) and (iv) above are tight; for instance the reader is referred to Kalai and Lehrer [2] for counterexamples violating any of those assumptions, in the case where beliefs about nature are correct.

When all assumptions in Theorem 4 but (iv) above are satisfied, Bayesian players may not be able to learn anything about others’ strategies and/or nature. In this case, resulting plays can become chaotic and convergence is not to be expected. In Section 5.1, such an example is presented. The example is taken from an early study by Jordan [12], and it is very similar to the framework under study. The only difference is that players have now private and partial information about the realizations of the states of nature. Using this example, Kalai and Lehrer [2] show that, when , convergence toward a Nash equilibrium is not obtained even when players have exact beliefs about nature. A counterexample, matching exactly the current framework, can be easily derived from Jordan’s example.

In terms of possible extensions to Theorem 4, it is conjectured that Bayesian learning is not the only (non-trivial) learning process for which the above result holds. The characterization of all learning processes for which convergence toward a Nash equilibrium obtains is an open problem.

The proof of Theorem 4 is given in the next section.

#### 4. Proof and Intermediary Results

This section is devoted to giving the main line of the proof of Theorem 4. Since the proof is technical, only the main intermediary results are presented here.

The strategy of the proof of Theorem 4 goes as follows. First, it is shown that beliefs and strategies satisfying Assumptions (i)–(iv) become, in finite time, identical to an almost subjective stochastic equilibrium. Second, almost subjective stochastic equilibria are shown to be realization-equivalent to an almost Nash equilibrium when beliefs are accurate enough. Finally, it is shown that beliefs become accurate enough, in finite time. Overall, this leads to the approximation of initial strategies and beliefs by an almost Nash equilibrium, as in Theorem 4.

The first proposition makes the link between strategies and beliefs satisfying Assumptions (i)–(iv) in Theorem 4 and the concept of (almost-) stochastic subjective equilibrium. Its proof is given in the Appendix.

Proposition 5. *Consider a -vector of strategies representing actual plays and for every player the belief with such that, for every ,*(i)*the strategy ** is a best-response to **,*(ii)*the beliefs are such that **,*(iii)*player ** updates her beliefs in a Bayesian manner, and*(iv)*the belief of player ** about nature is not correlated with observations of actions taken by other players.**For every and for -almost every play path , there exists a time such that, for every , the strategy profile is stochastic -subjective equilibrium for the repeated game starting after .*

The above result implies that, as long as Assumptions (i)–(iv) hold, actual plays and beliefs about others’ strategies along almost every path will become, in finite time, an (almost-) stochastic subjective equilibrium.

The next proposition makes the link between (almost-) stochastic subjective equilibrium and (almost-) Nash equilibrium. Its proof is given in the Appendix.

Proposition 6. *Fix any vector of beliefs . For every , there exists a constant such that for every , if is a stochastic -subjective equilibrium, then there exists a strategy profile satisfying*(1)* is a Nash **-equilibrium, and*(2)* plays 0-like **.*

The above result mainly states that, provided that beliefs are accurate enough, every stochastic subjective equilibrium is an (almost-) Nash equilibrium. Given the conclusion of Proposition 6, and in order to prove the main result, it is enough to ensure that beliefs become arbitrarily correct.

Arbitrary accuracy of beliefs follows from the next proposition, which is the well-known and important result proved by Blackwell and Dubins [6]. It is a convergence result for conditional probabilities, stating that as information increases conditional probabilities of two different measures will convergence, as long as a requirement of absolute continuity is satisfied by those two measures.

Before stating the result, let be a measurable space and let be an increasing sequence of countable partitions of , also called filter. This sequence of partitions represents the information available to an agent in any given period, and the set represents the set of all the choices of nature. For any and any period , let be the unique set in such that . Its proof can be found in Blackwell and Dubins [6].

Theorem 7 (Blackwell and Dubins [6]). *Consider two -additive measures and on such that . For -almost every and for every , there exists a time such that
**
for every and for every .*

With all the above intermediary results, we next move to the proof of Theorem 4.

*Proof of Theorem 4. *Fix the strategies and beliefs such that for every , satisfying Assumptions (i)–(iv) of Theorem 4. Fix any and consider also associated with and by Proposition 6 and any .

By Proposition 5, for -every path there exists a time after which is a stochastic -subjective equilibrium. By Proposition 6, there exists also a Nash -equilibrium for the repeated game starting after that plays 0-like .

Thus, we have found a period such that the original strategies play 0-like a Nash -equilibrium in the repeated game starting after .

The proof is now complete.

#### 5. Concluding Remarks

This section provides some extended discussions of the assumptions in Theorem 4.

##### 5.1. Independence of Beliefs

In this section, an example is given showing that Assumption (iv) in Theorem 4 cannot be relaxed. The game below is taken from Jordan [12], and it is discussed in detail in Kalai and Lehrer [2].

Consider two players engaged in an infinitely repeated game. The repeated game is similar to the one studied so far, with the difference that a randomly generated (fixed-size) pair of payoff matrices is drawn by nature before the first period (the sets and are assumed to be finite). A pair is drawn according to the probability distribution . The probability distribution is common knowledge among the players; that is, their beliefs about nature are exact.

Player is told privately the realized value , and player is told privately the realized value . Private information is not to be revealed to the other player at any point in the game. In this game, uncertainty about nature faced by any player is on the private information owned by the other player. Thus, the belief about nature of player can be represented by the conditional probability and similarly for player . All other assumptions but Assumption (iv) in Theorem 4 also hold. Assume also that the players maximize the expected sum of discounted one-period payoffs of the same game repeated over time (see Jordan, [12], for more details).

Kalai and Lehrer [2] show that, when the number of players is two, convergence toward a Nash equilibrium obtains when players update their beliefs about nature according to realized actions. However, when the number of players is strictly greater than two, convergence fails.

The intuition of such a result is that, when , players’ information sets are no longer filters as more information becomes available, due mostly to strategic attempt to hide private information and mislead others players through their own actions. Such behaviors immediately lead to an impossibility of learning, and convergence as in Theorem 4 does not usually obtain.

##### 5.2. Equivalent Representation of Beliefs and Kuhn’s Theorem

Instead of representing the belief of player about player by the behavioral strategy , we could have represented instead player belief by a probability distribution over the set of behavioral strategies available to player . Kuhn’s theorem (see Aumann, [13], and Kuhn, [14]) applies to our setting and ensures that the two approaches above are equivalent.

Of importance is the assumption that every player believes that others’ actions are uncorrelated with each other. Informally, this last assumption ensures that any player’ beliefs are represented by a measure product over beliefs about others’ strategies. When this assumption is not present, it is easy to find examples where convergence toward a Nash equilibrium does not obtain (see for instance Kalai and Lehrer, [2], for a discussion of this issue).

#### Appendices

The Appendices are devoted to proving technical results left aside earlier in the paper. In what follows, we consider nature as an additional player maximizing a constant utility function. This does not yield any loss of generality.

#### A. Proof of Proposition 5

Proposition 5 is proved as follows.

Fix . Consider now to be the set of all cylinders up to time and their extensions to the infinitely repeated-game; that is, each element of is the set of infinite play paths with the same basis of actions up to time . Clearly, each is a partition of the set of infinite play paths, and the family is a filtration of this set.

Therefore, Theorem 7 applies to the probability measures and for every and to the filtration . It follows that for -almost every infinite play path , there exists a time such that for every it is true that for every . It follows that plays -like , for every .

Consider any realized play paths described above and the corresponding time . By the law of iterated expectations, the actions chosen by players after the history , with the belief and taking it as given behaviors outside of , are identical to the actions chosen after the history in the first period with belief . Since is a best-response to for every , this implies that is best-response to for every .

The proof of Proposition 5 is now complete.

#### B. Proof of Proposition 6

We first start with a technical lemma, stating that when two measures become eventually similar for the sup-norm, the expectations of any continuous functions according to those measures also become eventually similar. Consider a complete metric space , denote by the -algebra generated by the open ball of the metric, and denote by the topology generated by the same open balls.

Lemma B.1. *Consider two positive and finite measures and , defined on the measurable space . Let be a continuous function for the topology , uniformly bounded above and below. It is true that
*

*Proof. *Consider any such function and any such measures and . We have that
Since is bounded and since the difference is a finite measure, it follows from the previous equation that

The proof is complete.

We next state another technical lemma, related to the notion of stochastic subjective equilibrium.

For every , define first and then .

Lemma B.2. *Fix any vector of beliefs . For every , there exists such that, if is a stochastic -subjective equilibrium, then for every and for every behavioral strategy such that , the following holds:
*

*Proof. *To prove the result, we first truncate the infinite repeated game to a finitely repeated game, show that the result holds within this truncated game, and then extends the result to the original framework.

Fix . Consider any strategy vector , any , and any behavioral strategy such that .

First, we have that, for every , the function is continuous for the product topology. This implies that there exists a period such that the contribution of any strategy profile to the overall payoff of every player after period is no greater than .

We restrict our attention to the truncated game of length in the following way: we consider the original strategy profile up to period , and leave payoff constant thereafter by extending the original strategy profile to a constant arbitrary strategy profile after . By our previous remark, the difference in payoff between the original strategy profile and the newly formed one is no greater than for every player.

Formally, for any behavioral strategy , we denote by the restriction of to and by the restriction of to the set of infinite histories starting at . To truncate strategies, fix also any (dummy) strategy profile such that for every and . For any strategy profile , define now for every the truncated strategy and let .

Consider also the function

With this last function, only changes of individual strategy within the truncated game of length can affect the value of .

In a first step, we show that for any unilateral deviation from player in the support of his initial strategy.

By applying Lemma B.1 applied to , we get, for every and such that , that
where is the sup-norm restricted to the -algebra generated by .

We next analyze the right-hand side of (B.6). To simplify notations, we define for every the function

For every history , we have by construction of the beliefs that

Further, for any given , if is a stochastic -subjective equilibrium, we have for every history that

Consider now the set of such histories assigned strictly positive probability by and denote it by . The last inequality implies for every that

Moreover, since , for every history , it must be true that is assigned strictly positive probability.

We next use the above remark to find a uniform upperbound to the right-hand side of (B.6). Define
which is strictly positive, and let denote the (finite) cardinal of .

For every set of finite histories of length , from the above, we have that

Taking the maximum over such sets, we have that

Setting , and together with (B.6), the previous analysis implies that if is a stochastic -subjective equilibrium then

Moreover, since the contribution of any strategy profile after period is no greater than , if is a stochastic -subjective equilibrium then

We have thus derived the desired inequality, and the proof is now complete.

With the two previous lemmas, we can now prove Proposition 6.

The proof goes as follows. Fix and consider any vector of beliefs .

We associate to the following strategy profile :(1)for every , define for every ,(2)for every , consider the shortest prefix of , say , such that and consider two cases:(i)if corresponds to an unilateral deviation by player from the support of her strategy, define for every ,(ii)if does not correspond to an unilateral deviation, define arbitrarily.

To prove Proposition 6, it is enough to show that there exists such that, if is a stochastic -subjective equilibrium, then its associated strategy profile is a Nash -equilibrium.

We first claim that there exists such that, if is a stochastic -subjective equilibrium, then for every ,

Indeed, by Lemma B.1, we have for every that

Define . Then for every stochastic -subjective equilibrium and for every , the inequality (B.16) holds.

We next use the previous claim to get our result. In a first step we first prove the property for every individual deviation in the support of , and then we extend this result to any arbitrary individual deviation.

By Lemma B.2, there exists such that, if is a stochastic -subjective equilibrium, then for every and for every behavioral strategy such that , the following holds:

Define . Clearly, for every stochastic -subjective equilibrium , for every , and every such that , the following holds:

Combining (B.16) and (B.18) into this last relation gives

Moreover, since is best-response to for player , the above implies that Equivalently, in terms of strategy profile , for any and such that we have just shown that

We now extend this result to any arbitrary behavioral strategy . Fix any player and any strategy . Assume that differs from . This implies that there exists a history such that is not in the support of . By construction of , in the subgames starting at any such corresponding unilateral deviation by player , all the other players play according to . Since is best-response to in those subgames, player can improve upon by playing in any such subgame according to , and leave behaviors on unchanged. We are therefore in the previous case, and the result follows.

All together, we have shown that is a Nash -equilibrium. Moreover, since as shown above there is no incentive to deviate from the original play paths, the strategy profile plays 0-like . The proof is now complete.

#### Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

#### References

- J. C. Harsanyi, “Games with incomplete information played by “Bayesian” players. I. The basic model,”
*Management Science*, vol. 14, pp. 159–182, 1967. View at Zentralblatt MATH · View at MathSciNet - E. Kalai and E. Lehrer, “Rational learning leads to Nash equilibrium,”
*Econometrica*, vol. 61, no. 5, pp. 1019–1045, 1993. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet - E. Kalai and E. Lehrer, “Subjective equilibrium in repeated games,”
*Econometrica*, vol. 61, no. 5, pp. 1231–1240, 1993. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet - P. Battigalli, M. Gilli, and M. C. Molinari, “Learning and convergence to equilibrium in repeated strategic interactions: an introductory survey,”
*Ricerche Economiche*, vol. 46, pp. 335–377, 1992. - D. Fudenberg and D. K. Levine, “Self-confirming equilibrium,”
*Econometrica*, vol. 61, no. 3, pp. 523–545, 1993. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet - D. Blackwell and L. Dubins, “Merging of opinions with increasing information,”
*Annals of Mathematical Statistics*, vol. 33, pp. 882–886, 1962. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet - A. Sandroni, “Does rational learning lead to Nash equilibrium in finitely repeated games?”
*Journal of Economic Theory*, vol. 78, no. 1, pp. 195–218, 1998. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet - Y. Noguchi,
*Bayesian Learning with Bounded Rationality: Convergence to*, Kanto Gakuin University, 2007.*ε*-Nash Equilibirum - D. P. Foster and H. P. Young, “On the impossibility of predicting the behavior of rational agents,”
*Proceedings of the National Academy of Sciences of the United States of America*, vol. 98, no. 22, pp. 12848–12853, 2001. View at Publisher · View at Google Scholar · View at MathSciNet - J. H. Nachbar, “Beliefs in repeated games,”
*Econometrica*, vol. 73, no. 2, pp. 459–480, 2005. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet - D. Fudenberg and D. K. Levine, “Steady-state learning and self-confirming equilibrium,”
*Econometrica*, vol. 61, pp. 547–573, 1993. - J. S. Jordan, “Bayesian learning in normal form games,”
*Games and Economic Behavior*, vol. 3, no. 1, pp. 60–81, 1991. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet - R. J. Aumann, “Mixed and behavior strategies in infinite extensive games,” in
*Advances in Game Theory*, M. Dresher, L. Shapley, and A. Tucker, Eds., vol. 52 of*Annals of Mathematics Studies*, pp. 627–650, Princeton University Press, Princeton, NJ, USA, 1964. View at Zentralblatt MATH · View at MathSciNet - H. W. Kuhn, “Extensive games and the problem of information,” in
*Contributions to the Theory of Games*, H. Kuhn and A. Tucker, Eds., vol. 28 of*Annals of Mathematics Studies*, pp. 193–216, Princeton University Press, Princeton, NJ, USA, 1953. View at Zentralblatt MATH · View at MathSciNet