Abstract

We present a simple adaptive learning model of a poker-like game, by means of which we show how a bluffing strategy emerges very naturally and can also be rational and evolutionarily stable. Despite their very simple learning algorithms, agents learn to bluff, and the most bluffing player is usually the winner.

1. Introduction

Among the concepts formulated and elaborated by the cognitive psychology in the last century, the study of the process of development of problem-solving strategies has enabled a meaningful improvement in the comprehension of the mental activity and its relations with the cerebral circuits.

At this level of description, the concept of cognitive process is defined as the interconnected performances of some elementary cognitive activities that operate on and affect mental contents, representing the fundamental issue to bridge at a theoretical level the superior cognitive functions and the human behaviour [1, 2]. Such a concept is used in a wider sense to mean the act of knowing and may be interpreted in a social or cultural sense to describe the emergent development of knowledge, concepts, or strategies.

Cognitive psychologists argue that the mind can be understood in terms of information processing, especially when processes as abstraction, categorization, knowledge, expertise, or learning are involved [35].

The concept of cognitive process is defined both in terms of result of the parallel elaboration of several well-defined and functionally independent neural moduli and in terms of a sort of “software” able to optimize the integration among different cognitive functions by the adaptation to the different environmental/informational circumstances [68].

The target of the present paper is to formulate a cognitive model of a poker-like game in which the players are able to develop strategies by learning from experience. Such a task, better known as problem solving, allows to investigate effectively the relation and coupling between the dynamics of cognitive process and the environment.

Moreover, the study of poker is of great and general interest in complex systems, because it is strictly related with sociophysics, decision theory, and behaviour evolution. Actually, the study of human behaviour and, more in general, of social phenomena has been faced in the last years utilizing the tools of complex systems physics [9]. Poker-like games provide a very good instance of strategic dilemma where agents must optimize their income in conditions of imperfect information (see Section 2) or where it is not clear which is the true optimal strategy, a case which is very common in real human interactions [10, 11]. Understanding the mechanisms which underlie poker is then very useful for a comprehension of human psychology and to get hints of how we have to approach to more complicated models of human society.

2. Poker-Like Games

Poker is an interesting testsbed for artificial intelligence research [1215]. It is a game of imperfect information, where multiple competing agents must deal with probabilistic knowledge, risk assessment, and possible deception, not unlike decisions made in the real world. The so-called “opponent modelling” is another difficult problem in decision-making applications, and it is essential to achieve high performances in poker. Moreover, poker has a rich history of study in several academic fields. Economists and mathematicians have applied a variety of analytical techniques to poker-related problems [1618]. For example, the earliest investigations in game theory, by luminaries such as John von Neumann and John Nash, used simplified poker to illustrate the fundamental principles.

There is an important difference between board games and popular card games like bridge and poker. In board games, players have complete knowledge of the entire game state since everything is visible to all participants. In contrast, bridge and poker involve imperfect information since the other players’ cards are not known.

From a computational point of view, it is important to distinguish the lack of information from the possibility of chance moves. The former involves uncertainty about the current state of the world, in particular situations where different players have access to different information. The latter involves only uncertainty about the future, uncertainty which is resolved as soon as the future materializes. Perfect and imperfect information games may involve an element of chance; examples of games from all four categories are shown in Table 1.

The presence of chance elements does not need major changes to the computational techniques used to solve a game. In fact, the cost of solving a perfect information game with chance moves is not substantially greater than solving a game with no chance moves. By contrast, the introduction of imperfect information increases the complexity of the problem.

Due to the complexity (both conceptual and algorithmic) of dealing with imperfect information games, this problem has been largely ignored at the computational level until the introduction of randomized strategies concept.

Once randomized strategies are allowed, the existence of “optimal strategies” in imperfect information games can be proved. In particular, this means that there exists an optimal randomized strategy for poker in the same way as there exists an optimal deterministic strategy for chess. Indeed, Kuhn showed for a simplified poker game that the optimal strategy does use randomization [19].

The optimal strategy has several advantages: the player cannot do better than this strategy if playing against a good opponent; furthermore, the player does not do worse even if his strategy is revealed to his opponent; that is, the opponent gains no advantage from figuring out the first player’s strategy.

Another interesting result of such researches is the existence of an optimal strategy for the gambler in poker game. As first observed in a simple poker-like game by Kuhn [19], behaviors such as bluffing, that seem to arise from the psychological makeup of human players, are actually game theoretically optimal.

One of the earliest and most thorough investigations of poker appears in the classical treatise on game theory “Games and Economic Behavior” by von Neumann and Morgenstern [20], where a large section was devoted to the formal analysis of “bluffing” in several simplified variants of a two-person poker game with either symmetric or asymmetric information.

Indeed, the general considerations concerning poker and the mathematical discussions of the different versions of the game were carried out by von Neumann as early as 1926. Recognizing that “bluffing” in poker “is unquestionably practiced by all experienced players,” von Neumann and Morgenstern identified two reasons for bluffing. “The first is the desire to give a “false” impression of strength in “real” weakness; the second is the desire to give a “false” impression of weakness in “real” strength” [20].

Solutions to these simplified poker-like games as well as a large class of both zero-sum and nonzero-sum games were unified by the concept of mixed strategy, a probability distribution over the player’s set of actions. The importance of mixed strategies to the theory of games and its applications to the social and behavioral sciences stems from the fact that for many interactive decision processes there can be no Nash equilibria in pure strategies.

Using randomization and adaptive learning as key concepts to modelize into a computational scaffolding of the cognitive processes, we believe that this area of research is more likely to produce insights about superior cognitive strategies because of their intrinsically structures. Finally, comparing them with the real human strategy, it is possible both to investigate the role of environmental factors on cognitive strategies development and to validate some theoretical psychological assumptions.

2.1. Summary

The target of this paper is to show how bluffing strategies can arise naturally as a mathematical property of a very simple model, without any references to psychological assessments. For this reason, we present a simple model of poker-like game with only two players and only two possible strategies: folding and calling, which each agent assumes simultaneously. Such oversimplified game, as we will see, allows to catch the fundamental mechanisms underlying the phenomenon of bluffing. The fact that bluffing naturally emerges already in a very simple version of the game seems to suggest that such a strategy is perfectly rational and can be mathematically characterized.

3. The Model

The model we are going to define and analyse is probably one of the greatest simplifications possible of a game of chance with imperfect information.

Here we have two players: at the beginning of each hand of the game, they put one coin as the entry pot. Then, they pick a “card” from a pack: each card has an integer value between 0 and (i.e., there are cards overall). At this point, according to the value of their card, the players decide to call or instead to fold. If both players call, they put another coin in the pot, and who holds the highest card wins: the winner gets the entire pot (four coins). If one of the players folds, the “caller” wins and gets the entry pot (two coins). Finally, if nobody calls, both players take back the coin they had put as entry pot. Mathematically, when the player holds the card of value , he decides to call according to the probability distribution , with of course and . After every hand, both players update their strategy. More precisely, if one folds, nothing happens; if the agent , thanks to the card , calls and wins (because he has the highest card or because the opponent folds), the probability that he calls holding the card will change in this way

Analogously, if the agent loses (i.e., if he calls but the opponent has a higher card than him), the probability will change instead in the following way:

In (1) and (2), the coefficient is the learning factor (LF), which can also be seen as a sort of risk propensity of the player . The LFs of the players are set at the beginning of the game and will never change. Moreover, it can assume a value between 0 (no risk propensity at all) and 1 (maximum risk propensity possible): actually, a player with and the card does not increase , even though he wins and sets as soon as he loses; instead, with , he sets when he wins but does not decrease if he loses. Finally, it is easy to notice that (1) and (2) ensure that will always stay in the interval .

3.1. Numerical Results

In this section, we will present the most remarkable results of the simulations of the simple model defined previously.

First of all, for simplicity, we set the LF of the “player 1” equal to 0.5, and then we checked the dynamics by varying : actually it is the difference between the LFs which essentially determines the main features of the dynamics, as we saw in several simulations. In particular, we can distinguish three cases: , , and .

3.1.1. Case

In Figures 1 and 2, the behaviour of the money of both players is shown as a function of time, where the time unit is a single hand of the game: we set , , and , respectively, and the results are averaged over 104 and 105 iterations, respectively; in both cases, we let the agents play 104 hands. For simplicity, we considered players with an infinite amount of money available, and we gauged to zero the initial amount. Additionally, the initial calling distributions are picked randomly for each .

As it can be seen, the first player, with higher LF, wins over his opponent, with smaller LF, and the money gained by player 1 increases with time. Moreover, the smaller , the faster and bigger the winnings of player 1. Previous figures show that on average, the player with bigger risk propensity finally overwhelms the other one, and this means that in a single match the most risk-inclined player has a bigger probability to win, and such probability increases as the difference increases in its turn. The fact that risking is convenient gets confirmed in the next figures, where the final calling distributions for both players are depicted.

As we can see, in both cases, the winner is characterized by higher calling probabilities than his opponent’s ones for every value of . Moreover, we have , which is the most explicit evidence of the emergence of bluffing.

3.1.2. Case

In this case, both players have the same winning probability, as it is well shown in Figure 5: indeed, having the same LF, they have also exactly the same behaviours so that in a single match nobody is able to overwhelm definitively the opponent.

Naturally, it is also easy to forecast the behaviour of the final calling probabilities of the agents: they will be equal, with for both and .

3.1.3. Case

In this case, the results are qualitatively equal to the ones of case ; only now it is player 2 which defeats player 1, as shown in Figures 7 and 8.

3.2. Discussion

The first conclusion we can obtain just from the numerical results is that in this game bluffing emerges necessarily as rational strategy. Moreover, the player who bluffs more finally wins. This can be easily understood watching Figures 3, 4, 6, and 8. Indeed, while in general the calling probabilities of both players tend to have the same value for , for small the player with higher LF, that is, with higher risk propensity, bluffs much more than his opponent: this means that even holding a poor card, the “risk-lover” will call and unless his opponent has a very good point will get the entry pot.

It is possible to formalize such considerations by writing the equations of the dynamics for the model at stake. Neglecting fluctuations, the “mean-field” equation ruling the money won (or lost) by the first player is where is the probability that the card held by player 1 is higher than the card held by player 2; analogously it is . On the other hand, is an operator defined as follows and represents the probability that player 2 calls from the point of view of player 1. Equation (3) can be rewritten as a differential equation in time, which can assume the form with

Since this is a zero-sum game, second player’s money will be obtained by the relation .

Finally, the relation giving the temporal behaviour of the calling distributions of player 1 (being the one of the opponent of analogous form) is

Now, (5), (6), and (7) are rather complicated, but some features of them can be determined without an explicit solution. Actually, it is straightforward to understand that we have

Now, since in our simulations we always started from the same initial for all , , and from , this implies that for both players must have on average the same calling distributions and then they should not gain nor lose money, apart from fluctuations: this is exactly what we found in Figures 5 and 6. It can also be shown that if , then we will have soon for all (with the equality holding only for ), so that (6) allow us to get , that is, the victory of player 1, as shown in Figures 1 to 4. Obviously, the opposite situation takes place for (as shown in Figures 7 and 8).

4. Conclusions and Perspectives

In this work, we have developed a model to describe very roughly a cognitive processes dynamics. The model is among the most simple ones, but allows to capture the main behaviours of real dynamics, demonstrating how fundamental is bluffing as a rational strategy.

First of all this model confirms the role of LF () as prominent. In a straightforward way, a direct connection between LF and bluffing tendency is here detectable. In fact even though both agents tend to develop bluff, the one with the greatest LF bluffs more than the other, ending up as the winner. Finally this last evidence suggests that bluffing must be an evolutionary rational (stable) strategy.

Other, more realistic models have been developed [21], and for sure they can give us deeper information about the inner mechanisms driving to bluffing behaviours in poker-like games, but the emergence of bluff as a rational strategy already in a so simple toy model is with no doubt a very remarkable result. Indeed, more in-depth analyses of the real game can be found in the literature [13] which systematically survey every facet of poker (blind, flop, raise, etc.), but this is the first time that it is tried to catch the main features of it by means of a very “reductionist” approach, by means of the simplest version of the game. Shortly, if it has been thought until now that to catch an apparently complex behaviour, bluffing with effective outcomes, it is necessary to take into account (almost) all the rules of poker, we have shown here that such attitude is much more fundamental, and emerges naturally when few simple ingredients are present.

Moreover, for practical purposes, a simpler model which can be more easily understood turns out to be very useful because it will result easier also in the realization and the analysis of experimental tests with real human agents, and hence it will be possible to plan more accurately other kinds of experimental tools. In particular, it will be straightforward to utilize a tool as in reference [22] for a test with real players and to exploit data from poker-online web sites for a statistical analysis of the outcomes of real poker games.