Abstract

In this paper, we employ convex optimization and the saddle point equation to find the two-player optimal payoff in iterated rock-paper-scissors game. We also describe the equivalent of payoff written in a two-person non-zero-sum matrix in the hypothetical game system, which provides a possible way to make quantitative analyses. In addition, we use the interior-point methods to simulate rock-paper-scissors game and our numerical results verify that our hypothesis of the payoff equation, , can still work very well even if changes with the payoff , but it is never affected by other factors.

1. Introduction

It is challenging for human beings to make optimal decisions in noncooperative strategic interactions [1]. The finest approach for scissors of rock paper is to really behave allegedly. This means that each choice is played around a third of the time and a player cannot imagine what is next. Rock-paper-scissors (RPS for short) game widely used to study competitive phenomena in society and biology, especially species diversity and pattern formation [27], offers a new way. As is known to all, the concept of Nash equilibrium (NE), developed under the assumption that the players are sufficiently rational to ensure that they can accurately learn the strategies of the competing players and to optimize their own strategy accordingly, plays a fundamental role in both classic game theory and evolutionary game theory [1, 813]. Furthermore, much effort has been devoted to investigating RPS game using a variety of models, such as the “rock-paper-scissors dynamics model” [14, 15], scale-free memory model [16], and cyclic dominance model [1719]. Note that rock play is a dominant strategy for both players (i.e., the best choice of rock, whenever your opponent plays! So, the balance for this game is unique: both players always choose rock).

Convex optimization is a reliable and efficient optimization method, which can obtain the global optimal solution precisely by selecting the appropriate algorithm. Convex optimization, frequently used to solve optimization problems due to its strong scalability and wide applications which include electronic technology [2023], software engineering [2427], and machine learning [28], can be fully applied to the game theory and sheds light on the optimal strategy in iterated RPS game. When we check the payment table for rock, paper, and scissors, we see that such a balance does not exist. There is no option where the selections for the two players are the best answer for the other player. So, there is no true Nash strategy balances.

In this paper, we employ convex optimization (CVX for short) and the saddle point equation to optimize two players’ payoff in iterated RPS game, which has a considerable amount to offer in both theoretical researches and practical applications. Make a move that either gives you a win or a stalemate that ensures you will not lose. You can suppose for instance that your opponent will not play it three times in a row, if they toss out scissors twice. Either rock or paper they will play. In order to verify our hypothesis of the study, we assign different values to the convex optimization model and then draw graphs with simulation software to find the pertinent rule, which shows the feasibility of the convex optimization method. Rock scissoring (also known as “rock” or ro-sham-bo) is a hand game usually played between two people, in which each participant produces a single one of three forms with an extended hand (articles of rock, “rocks” are sometimes known by various orders). The hand game normally is played in between two people. The forms “rock,” “paper,” and “scissors” are “clocked fist” (a fist with the index finger and middle finger extended, forming a V). “Scissors” are the same as the two-fingered V sign (also denoting “victory” or “peace”). It is not held vertically but is directed horizontally.

The marginal contribution of our study is that we introduce the two-player non-zero-sum matrix to describe the payoff of players quantitatively as and ; we simulate the system with Newton algorithm. We use the saddle point equation to calculate how players can obtain the maximal payoff and find out that the EPRs of player and player increase with the payoff . This method takes into account the opponent’s previous movements, to decide whether the opponent wants to choose one move over another. In essence, each rock, paper, and scissor have a “score.” The rating of the opponent’s move is increased after every move. If the rock paper scissors is a game that is bad, then they can only play in the devil’s sons, and of course, the frenzied fans are spectators. Therefore, the concept that the game is bad is not even taken into account or that the game is not bad.

2. Model

For the sake of simplicity, we take a two-person non-zero-sum RPS game model as an example to study the noncooperative game system. In this game, each player plays innumerable rounds of the game and can only choose one action among R, P, and S in each round, as shown in Figure 1. The payoff is defined as the only parameter of the winning action in this game [1] (see Figure 2), and rational players just make decisions according to the value of . Two players get a unit payoff when they choose the same action. Furthermore, player will win with payoff while player is going to get zero payoff when player beats player , and vice versa.

During competitions, players often plan three gestures before the tournament begins. Some tourney players utilize methods to mislead or fool other players into an illegal move, which leads to a loss. One such approach is to call the name of one move in order to misdirect and mislead the other.

3. Results

3.1. Convex Optimization Equation

The expected payoff per round (EPR) of player and the EPR of player are as follows: where

, , and denote the payoff probabilities of player , and , , and denote the payoff probabilities of player .

Compared with the traditional payoff function, our payoff equation is strictly convex. The convex optimization problem is described by (3) and (4).

Let , , and . With the incidence matrices, we can rewrite problems (3) and (4) as (5) and (6).

Here, player selects a strategy , while player selects a strategy . As (1)–(4) above are convex, all their optima are global optima. Two adversaries randomly throw out motions in the game rock paper scissors, and each wins, loses, or draws with equal probability. It must be a game of sheer luck, not competence—and certainly, if everybody could be perfectly alleged, nobody could take the lead on anybody else.

In this paper, the system we study is a two-person non-zero-sum game, so it can be written as . In order to study it quantitatively, we suppose that player makes his decision first and player acts according to player ’s decision later. In an object larger than another, a paper which covers a rock still makes sense. That is why the paper beats the rock; only because the rock is not harmed does it invisibly render the rock unnecessary to the rest of the world. For rational players and , player wants to minimize , while player wants to maximize . Similarly, player wants to maximize , while player wants to minimize . Game theory just renders it ineffective as an instrument to analyse the occurrences of the real world with the highly problematic assumptions on “rationality,” equilibrium solutions, information, and knowledge.

Player ’s best defense is to use to minimize . while player should choose to maximize one of the payoffs of the system.

We define , , and , where denote the values of the expected payoff (gain), respectively.

When and player wants to minimize , then he should maximize and its coefficient . That is to say, when and , is the minimal payoff. From the inner minimization in (8), we have . The other two cases () proceed similarly. Two-person games are the simplest form of competing situations. These games have only two players; it is dubbed zero-sum games, as one player wins the other player lose.

For , inner optimization can be described as follows: where denotes the vector that is all zeros except for one in the th position, that is, deterministic strategy . These optimization expressions are to be compared with the following standard form of a convex optimization function: the most difficult aspect of making decisions, according to a study presented at the annual conference of the academy of management this month, is not finding the proper answer; it has the fortitude to really act on that information.

In this paper, we introduce a scalar variable representing the value of the inner minimization:

Writing in matrix notation where , , , and

The symbol ≥ denotes being greater or equal to, the symbol ≤ denotes being less or equal to, and the symbol denotes being equivalent to. Thus, we have obtained the standard convex optimization equation of our model.

3.2. The Saddle Point Equation

The primal problem (11) is convex with convex payoff mainly decided by player .

The Lagrangian is

Thus, the dual function is

For each pair with , the Lagrange dual function gives us a lower bound on the optimal value of the optimization problem (15). You are calm; scissors shows. You must be very careful to cut an object or open a box with scissors if you wish to cut it. Scissors shows that you are crafty and are awaiting a chance. You have just made a lovely dinner for scissors if you believed you could suffocate rock by throwing paper. Thus, we have a lower bound that depends on some parameters and . Problem (15) is translated into the optimization problem:

This is called the Lagrange dual problem corresponding to problem (15). The Slater condition [28] says that strong duality between (17) and (18) holds if the quadratic inequality constraints are strictly feasible; i.e., if there exists an with , . Strong duality between (17) and (18) will be proven in the next section.

Now, suppose the order of play is reversed; player chooses first, and then, player chooses . Following a similar argument, if the players follow the optimal strategy, player should choose to minimize , which results in a payoff of which is equivalent to where Equations (13) and (19) are standard convex optimization payoff functions of players and . Note that player ’s problem is dual to player ’s in game theory. Many powerful algorithms have developed as a result of competitions for programming rock paper scissors, the heuristic compilation of techniques by Iocaine Powder, for instance, who was the winner of the First International RoShamBo Programing Competition in 1999. It also contains six metastrategies for each method it deploys, which defeat the opponent in second, third, and second guessing and so on.

The previous hypothesis is based on the sequence of how player and player make decisions. Now, player and player are making decisions at the same time. In comparison with Equations (13) and (19), we accordingly have

We call Equation (20) the saddle point equation. It is a description of the saddle point at which players get the maximal payoff. Equation (13) gives the left part of Equation (20), whereas (19) provides the right part of Equation (20).

Similarly, looking at the payoff of player , we have another saddle point equation

The above equation is also a description of the saddle point equation at which players can achieve the maximal payoff of the system we study. Equations (13) and (20) represent the left part and the right part of Equation (20), respectively. Based on this, we denote the left and right parts of Equation (21) as

We use strong duality theorem to prove Equation (20). The same principle can also be used to prove Equation (21).

To prove it, first, it is noted that

This means that we can write the optimal value of the primal payoff as

We also have by the definition of the dual function. Thus, strong duality can be described as the equality . satisfies the strong max-min property or the saddle point property [29].

3.3. Optimization Algorithm

With linear equality and inequality constraints reduced to a sequence of linear constraint problems, the optimization problem in standard form is as follows:

We have which are called the Karush-Kuhn-Tucker (KKT) conditions [29].

Because problem (15) is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal. Thus, we have

Then, and are primal and dual optimal, with zero duality gap. Interior-point methods solve problem (15) by applying Newton’s method to a sequence of equality constrained problems.

First, we translate problem (15) into an optimization problem, which states that the inequality constraints are implicit in the objective where is the indicator function for the nonpositive reals and is the indicator function of.

Then, we define the function to approximate the indicator function by using the barrier method, where is a parameter that determines the accuracy of the approximation. The function is convex and nondecreasing and takes on the value for . is differentiable and closed and increases to as increases to 0.

Subsequently, we replace with in (30).

The objective function here is convex, since is convex, increasing in , and differentiable [29]. We obtain the function with which is the logarithmic barrier for problem (15). Finally, we obtain the gradient and Hessian of the logarithmic barrier function :

3.4. Verifying Hypothesis

The hypothesis in our study is that the saddle point equation can still work very well even if changes with , but it is never affected by other factors. Assigning a specific value to in simulation software, such as , we can calculate the left part of the saddle point equation easily using the CVX software package. With our specified values, the calculation result of the left part of the saddle point equation is . The calculation result of the right part is . Assigning another specific value to in simulation software, such as , the calculation result of the left part of the saddle point equation is . The calculation result of the right part is . It means that no matter or , the left part of the saddle point equation stays the same with the right part. As indicated in Figure 3, suppose , the optimal value of EPR of player equals to that of player , which increases with and reaching towards 0.95. Thus, we believe numerical results are consistent with our theoretical hypothesis.

4. Conclusion

We have demonstrated in this paper how players can obtain the optimal payoff in two-player iterated RPS game with the convex optimization method and saddle point equation and verify the hypothesis of the payoff equation with interior methods by using simulation software. Hence, it can be concluded that convex optimization is a feasible method to maximize the payoff in two-player iterated RPS game regardless of other factors. Furthermore, the research method is operational and the results of our study can be extended to game systems like social cycling, species competition, election, and economical issues and provide insight into further related quantitative research.

Rock paper scissors (RPS) is not only a game popular with children but also a basic and classic model system for studying the mechanism of decision-making in noncooperative strategic interactions in depth. The RPS is a topic of increasing interest and significance for it helps improving our understanding on many complex competition issues (species divergence, price cycling, human decision-making, rationality and cooperation, and so on). It is also a starting point to enter into the interdisciplinary field between statistical physics and game theory [9].

As stated in Bi and Zhou’s paper [8], cooperation in a finite-population RPS game system with more than two players may be much more difficult and complex to achieve than the case of only two players; we only provide the simplest model to probe into the optimal strategy in iterated RPS game in the paper; much more complicated related research needs being carried out in the future.

Data Availability

The figures used to support the findings of this study are included in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

W.J.L. was responsible for conceptualization. Y.F.L. was responsible for the software. Q.Q. and Y.F.L. were responsible for writing the original draft. Q.Q. was responsible for writing, review, and editing. All authors read and agreed to the published version of the manuscript.

Acknowledgments

The work was supported by the Medical-engineering Interdisciplinary Project funded by University of Shanghai for Science and Technology (No. 1020308412).