Abstract

A long-standing problem in biology, economics, and social sciences is to understand the conditions required for the emergence and maintenance of cooperation in evolving populations. This paper investigates how to promote the evolution of cooperation in the Prisoner’s Dilemma game (PDG). Differing from previous approaches, we not only propose a tag-based control (TBC) mechanism but also look at how the evolution of cooperation by TBC can be successfully promoted. The effect of TBC on the evolutionary process of cooperation shows that it can both reduce the payoff of defectors and inhibit defection; although when the cooperation rate is high, TBC will also reduce the payoff of cooperators unless the identified rate of the TBC is large enough. An optimal timing control (OTC) of switched replicator dynamics is designed to consider the control costs, the cooperation rate at terminal time, and the cooperator’s payoff. The results show that the switching control (SC) between an optimal identified rate control of the TBC and no TBC can properly not only maintain a high cooperation rate but also greatly enhance the payoff of the cooperators. Our results provide valuable insights for some clusters, for example, logistics parks and government, to regard the decision to promote cooperation.

1. Introduction

Animal Dispersion in Relation to Social Behavior was published by Wynne-Edwards and looked specifically at the “Evolution of cooperation” [1]. Why should an individual help another person who is a potential competitor in the struggle for survival? This question is listed in “Science” Magazine as one of the 25 core problems of the 125 scientific challenges proposed by scientists from all over the world.

The evolution of cooperation is an enduring conundrum in biology, mathematics, and social sciences. Natural selection opposes cooperation unless some mechanisms are at work to promote the evolution of cooperation. The Prisoner’s Dilemma game (PDG), which represents an extreme case, has emerged as one of the most promising mathematical areas in the study of cooperation.

In the PDG, two players simultaneously decide whether to cooperate or to defect . If one player cooperates, the other player can choose between cooperation which yields (the reward for mutual cooperation) or defection which yields (the temptation to defect). On the other hand, if one player defects, the other player can choose between cooperation which yields (the sucker’s payoff) or defection which yields (the punishment for mutual defection), where and . The donor-recipient game (DRG) is a special case of the PDG. In the DRG, a cooperator is someone who pays a cost, , for another individual to receive a benefit, . A defector has no cost and does not deal out benefits. By comparing the PDG and DRG, we obtain , , , and . If the game is only played once, then each player gets a higher payoff from than from , regardless of what the other player does. So, natural selection implies competition and, therefore, opposes cooperation unless a specific mechanism is at work.

Based on Hamilton’s rule [2], natural selection can favor cooperation if the donor and the recipient of an altruistic act are genetic relatives [37]. The donor may preferentially donate a benefit to the recipient through “kin recognition”. Explanations of cooperation between nonkin include “direct reciprocity” [813], “indirect reciprocity” [1416], “network reciprocity” [17], “group selection” [18], “optional participation mechanism” [19, 20], and “punishment mechanism” [2123].

Direct reciprocity was proposed by Trivers [8] and developed as a mechanism for the evolution of cooperation [9, 24]. In the repeated Prisoner’s Dilemma, the same two individuals repeatedly encounter each other for some rounds. If one cooperates now, the other may cooperate later. Hence, he or she might cooperate [10]. Axelrod and Hamilton proposed a model of the evolution of cooperation based on the iterated Prisoner’s Dilemma [11]. In two computer tournaments, Axelrod discovered that the “winning strategy” was the simplest of all, tit-for-tat (TFT). TFT is a program where one player uses on the first move of the game and, then, plays whatever the other player chose in the previous move. This simple concept captured the fascination of many enthusiasts of the repeated Prisoner’s Dilemma, and a number of empirical and theoretical studies were inspired by Axelrod’s groundbreaking work [12, 13, 25]. The presence of the decoy increased willingness of volunteers to cooperate in the first step of each game, leading to subsequent propagation of such willingness by (noisy) tit-for-tat [25]. Reference [26] pointed that resilient cooperators could sustain cooperation indefinitely in the finitely repeated Prisoner’s Dilemma. The Prisoner’s Dilemma experiment showed that the termination rules and the (expected) length of the game significantly increased cooperation in [27].

Indirect reciprocity does not rely on repeated encounters between the same two individuals. An individual can establish a good reputation by helping someone which can increase the chance of receiving help from others. Indirect reciprocity has substantial cognitive demands. For example, language is needed to gain the information and spread the gossip associated with indirect reciprocity so that only humans seem to engage in the full complexity of the game. Differing from indirect reciprocity based on reputation, kin can be recognized through familiarity based on environmental or learned cues or through pattern matching based on some inherited trait. If individuals display a heritable marker or a tag, they preferentially cooperate with partners who share their own marker. So, tag-based donation does not have the substantial cognitive demands that a reputation mechanism based on indirect reciprocity has. Cooperation can also arise when individuals donate to others who are sufficiently similar to themselves in some arbitrary characteristic. Such a characteristic, or “tag,” for example, a green beard [28, 29], can be observed by others. Hamilton illustrated how this mechanism worked even when individuals were not genealogical kin [2]. This is known as the green beard effect. For tag-based donations, it is not necessarily required to remember past encounters. In the recent years, the green beard effect has become a favorite topic among both evolutionary biologists and sociologists [3036]. For example, Tian et al. and Tian et al. [30, 31] proposed Vcash as a reputation framework for identifying denial of traffic service to resolve the trustworthiness problem, where the result of verified traffic event notification acted as a “tag”. Taylor and Nowak proposed nonuniform interaction rates (interaction rates dependent on the strategies) that allowed the coexistence of cooperators and defectors in the Prisoner’s Dilemma [37]. The analytical models of evolution of cooperation were given when nonuniform interaction rates were introduced [38]. Both“kin recognition” and “tag-based donation” lead to nonuniform interaction rates.

By combining evolutionary processes with differential equations, the links between stability and optimization have been researched and a mathematical framework has been built [39, 40]. In multiagent systems, an individual is regarded as an agent who autonomously regulates his behavior according to his benefit. Agents play games with their neighbors through local interaction. The evolution of cooperation is an adaptive coordinated control process which is a controllable, intelligent, and autonomous decision process. From the perspective of evolutionary games, the following objectives, inter alia, have been achieved through the designing of the control law: optimizing individual cost function [39], designing consensus control of stochastic multiagent systems [41], and promoting the evolution of cooperation in social dilemmas [23, 38].

Motivated by the abovementioned discussion, through combining direct reciprocity, indirect reciprocity, tag-based donation, and optimal control theory, a tag-based control (TBC) is proposed. The goal of this paper is to study the feasibility of promoting cooperation by TBC and designing an appropriate identified rate for operators. As part of this, we discuss the different effectiveness and drawbacks between by identifying cooperation and by identifying defection for promoting the evolution of cooperation. Using the optimal control theory, we examine whether an operator should design TBC mechanisms for the evolution of cooperation or give up TBC temporarily for increasing the group’s payoff. We show that an operator can design an optimal timing control (OTC) to optimize the control costs, the cooperation rate, and the cooperator’s payoff.

The main contributions of this paper are as follows. (1) For the first time, a TBC that promotes the evolution of cooperation in the Prisoner’s Dilemma is proposed. (2) Advantages and disadvantages of identifying cooperation and identifying defection in TBC are investigated. (3) An OTC is designed to better promote cooperation rates and greatly enhance the group’s payoff. (4) From a research standpoint, this work contributes to evolutionary game theory and optimal control theory.

The paper is organized as follows. In Section 2, we obtain the payoff matrix and the replicator dynamics of the Prisoner’s Dilemma with TBC. By applying the results of Section 2, we discuss the problem of evolution of cooperation in the Prisoner’s Dilemma with TBC in Section 3. In the next two sections, the optimal TBC is designed. We design the optimal identified rate of promoting cooperation in Section 4. The effectiveness of TBC is illustrated and an OTC is given in Section 5. We provide some concluding remarks and discussion in the last section.

2. The Replicator Dynamics of the Prisoner’s Dilemma with TBC

In the Prisoner’s Dilemma game, two individuals can each either cooperate or defect . The payoff matrix of the Prisoner’s Dilemma is as follows:where

No matter what the other does, the selfish choice of defection yields a higher payoff than cooperation, so a player will choose defection no matter whether his opponent chooses cooperation or defection. However, if both defect, both get rather than the larger value of that they both could have gotten if they had both cooperated. In other words, if both defect, both do worse than if both had cooperated. Hence, the game poses a dilemma. In repeated Prisoner’s Dilemma games, if constraint (3) cannot be satisfied, the social efficiency that all players always choose cooperation is less than that they agree to choose cooperation and defection in turn.

Let be a group of game participants, where is a sufficiently large natural number. Consider the symmetric static games of complete information between two individuals. The pure strategy set for each individual in group is denoted by . For notational convenience, we will label every individual’s pure strategies by positive integers. Hence, the pure-strategy set of each individual is written as . A vector of pure-strategy profile is denoted as , where is a pure strategy for individual . The pure strategy space is , where is the Cartesian product of the pure strategy set . A probability distribution over the pure-strategy set of an individual is defined as a hybrid strategy for the individual , where and are the probabilities assigned to the individual’s pure strategy and , respectively. We denote . The hybrid strategy space of each individual is the simplex , where is a subset of the two-dimensional vectors with all elements being positive. At any point of time , let and be the rates at which individuals choose strategy and in group , respectively. Then, the corresponding group state is , where and . Therefore, the group state is formally equivalent to the hybrid strategy. Because the state is completely determined by its component or , this paper discusses the variable only.

In the group , the individuals who choose strategy have a higher average fitness than the others. Therefore, selection acts to increase the relative abundance of strategy . After sometime, cooperation will vanish from the group. So, strategy is the only evolutionarily stable strategy (ESS) unless a specific mechanism is at work.

2.1. The Game under Labeling

This paper studies the special mechanisms for the evolution of cooperation in some clusters in the economic society when faced with the Prisoner’s Dilemma, for example, in logistics parks. For such a large number of individuals from different sources, it is difficult to build mutual trust because of the high cost of fully understanding and accurately knowing each other’s willingness to cooperate. In this case, the mechanisms for promoting the evolution of cooperation, such as direct reciprocity, indirect reciprocity, punishment mechanism, and nonuniform interaction, are difficult to meet. A third party in the cluster, though, can play the role of collecting information and rewarding or punishing enterprise behavior to promote mutual recognition efficiently. Such examples might be park managers, platform promoters, park management committees, industry associations, and so on.

In view of this, we assume that there exists a controller (an operator) outside of group who can identify defection and cooperation. The controller labels “cooperator” or “defector” for everyone. The tags labeled by the controller affect the strategy choice of individuals in the following games until the next identification.

Definition 1. A person or an organization is called a controller if he/she can affect the choice of strategy of individuals through identifying the actions of individuals in past games and reveal the results of identification in group .
The identification implemented by the controller can be regarded as tagging individuals. Strategy choices of all individuals in group are decided by their tags in the following games.

Definition 2. One round game is defined as a game set which consists of a game identified by a controller and all games tagged by this identification.
Assume the number of rounds is infinite. In one round, every individual plays period games, where is a positive integer. The first period is identified by the controller in one round. All individuals are tagged before the second period game begins. In the following periods, all individuals know their tags and can distinguish tags of others.

Assumption 1. The individuals are bounded rationality and any individual encounters other individuals randomly with equal probability.
This paper assumes that the individuals (enterprises or persons) are in certain clusters (such as logistics parks) or bilateral platforms in the economic society. The indirect reciprocity has substantial cognitive demands for players. The tag mechanism for the evolution of cooperation demands that individuals own and display a heritable marker or a tag. Due to the large number of individuals in the cluster, the interaction between them is random, and it is difficult to form long-term repeated games. Therefore, it is impossible to establish a direct reciprocity mechanism based on repeated games. In the early stage of the formation of clusters or bilateral platforms, due to the lack of mutual understanding and trust relationship between individuals, the cost of collecting information and displaying individual reputation is very high, so indirect reciprocity and tag mechanism cannot work unless a person or organization helps the players. We define this person or organization as the controller (Definition 1). The identification implemented by the controller can be regarded as tagging individuals. In the first period of a round game, the controller identifies individuals and displays the results.
Based on Assumption 1, for each individual, the probability of interacting with a cooperator or a defector is or , respectively.

Definition 3. Define an individual as a cooperator if he/she chooses cooperation without control.
At time , the rate of cooperators in the group is . So, the state is defined as the group cooperation rate. The cooperator defined by Definition 3 can be equated with the individual who always chooses cooperation without control based on Assumption 1. Obviously, an individual is a defector if he is not a cooperator under Definition 3.

Assumption 2. Assume that the identification service is not perfect. The probabilities that the cooperators and defectors are correctly identified by the controller are and , respectively, where .
Based on Assumption 2, is called the cooperation identified rate and is called the defection identified rate. We denote the tags of “cooperator” and “defector” as “” and ““. In one round, is divided into 4 parts noted by . and are sets of cooperators and defectors labeled , respectively. Analogously, and are sets of cooperators and defectors labeled as , respectively. Furthermore, , where is an empty set. By denoting individual labeled as , one can know that , , , and . According to Assumption 2, , , , and , where indicates the probability of event happening.

Assumption 3. In every round game, cooperators choose and defectors choose in the first period. In the following periods, the cooperators labeled and all defectors always choose . Cooperators labeled choose if he/she encounters an individual labeled and chooses if he/she encounters an individual labeled .

Definition 4. The control of a controller dominating individuals’ strategy choice through tagging is called TBC. The individual identified as the cooperator is tabled as , and the individual identified as the defector is tabled as by the controller. The TBC rule is designed as follows: the cooperators labeled and all defectors always choose . Cooperators labeled choose if he/she encounters an individual labeled and chooses if he/she encounters an individual labeled .
Based on Assumption 3, the controller can dominate an individual’s strategy choice through tagging individuals. At time , the cooperation rate is in group . In the first period, the ratio of to total strategies is equal to the cooperation rate . In the following periods, an individual’s strategy is decided by tags according to Assumption 3. The ratio of to total strategies is not the cooperation rate but . One can know that .
Assumption 3 is reasonable by Assumption 1 and Definition 1. From the abovementioned discussion, we know that not only denotes cooperation but also denotes a cooperator and not only denotes defection but also denotes a defector. Numbers of individuals in , and are , , , and , respectively. Based on Assumption 2 and 3, in the first period, cooperators choose and defectors choose . In the following periods, two individuals choose , if and only if they are in . The strategy pair is contributed. When one player from encounters a player from , a cooperator chooses and a defector chooses . In addition to the abovementioned two cases, the strategy pair will be formed. According to Taylor and Nowak [31] and Dong et al. [32], the probabilities of , , and are , , and with TBC. The larger and means the higher probability of and the lower probability . So, the larger and are in accordance with the better effect of promoting the evolution of cooperation. But, under uniform interaction rates, the corresponding probabilities are and , respectively, so an individual’s encounter constrained by TBC no longer abides by uniform interaction rates.
Based on Assumptions 2 and 3 and Definition 4, cooperation identified rate and defection identified rate reflect the accuracy of TBC. We denote as the probability of another encounter between the same two individuals in the repeated Prisoner’s Dilemma and as the probability of knowing someone’s reputation in indirect reciprocity. According to [31], the condition that direct reciprocity (indirect reciprocity) can lead to the evolution of cooperation depends on the parameters of payoff matrix (1) and . But, by Assumption 1, is very small because the group size, , is large enough. On the other hand, the high cost of fully understanding and accurately knowing each other’s willingness to cooperate leads to that is very small also. So, direct reciprocity and indirect reciprocity cannot promote cooperation in the setting that this paper considers. In every round game, cooperation identified rate and defection identified rate can act as of direct reciprocity or of indirect reciprocity.

Assumption 4. The evolution of group state is carried out according to the replicator. An individual replication occurs after each round game ends and before the next round begins. An individual’s payoff refers to the average total payoff in this round game.
Assumption 4 means the offspring of cooperators are still cooperators and the offspring of defectors are still defectors. So, the offsprings of or are disconnected with their parent’s tag. By Assumption 3, defectors always choose , but the cooperator does not always choose .
“Tag” does a mapping between and , i.e., . By Assumption 2, , , , and .
By calculation, the payoff of cooperator (defector) in the period of the round game is independent of but determined by the cooperation rate . Let denote the payoff of a cooperator (defector) in the period. Let denote total payoff of the cooperator (defector) in one round game; we getLet denote the payoffs of the cooperator (defector) labeled in the period game; we getBy Assumptions 1 and 2, the probability of anyone encountering a cooperator labeled is and the probability of anyone encountering a cooperator labeled is ; the probability of anyone encountering a defector labeled or is or . One can getTogether, (4)–(7) determine and as follows:By Assumption 3, an individual’s average payoff in the round game is .
Summing up the abovementioned discussion, the total payoff matrix (1) of a round game under TBC isIn fact, in the latter periods, the hybrid strategy and the group state are not equivalent because a cooperator constrained by Assumption 3 does not always choose . So, the payoff in payoff matrix (9) does not indicate the payoff of individual encountering strategy , where . However, we still use payoff matrix (9) for the following two reasons: (1) the payoff of individual is still linear with the group state ; (2) a cooperator’s average total payoff under TBC is if he enters a group of cooperators or is if he enters a group of defectors. Similarly, a defector’s average total payoff under TBC is if he enters a group of cooperators or is if he enters a group of defectors.
If , i.e., all individuals are identified as cooperators, the game with payoff matrix (9) is equivalent to a Prisoner’s Dilemma repeated times with payoff matrix (1). By Assumption 3, cooperators always choose and defectors always choose .

2.2. The Replicator Dynamics with TBC

At time , we suppose the number of cooperators is and the number of defectors is . In the next , under Assumption 4, the cooperators reproduce offspring and the defectors reproduce offspring. The total number of death is . The number of cooperators death is , and the number of defectors death is . Then, . Then, the replicator dynamics under TBC is as follows:

Assumption 5. The Prisoner’s Dilemma tagged repeats enough periods, i.e., .
By Assumption 5, the total payoff matrix (9) in a round game approximately equals the following matrix:The replicator dynamics (10) approximates the following dynamics:

Definition 5. (See [42]). A state is an ESS if for all other states either (i) or (ii) and , where denotes the individual’s payoff with strategy under the strategy pair .

Definition 6. (See [43]). A fixed point in replicator dynamics is an evolutionary equilibrium (EE) if it is locally asymptotically stable, i.e., every open neighborhood of the point has the property that every path starting sufficiently close to remains in and converges asymptotically to .

Definition 7. (See [43]). The largest open set of points whose evolutionary paths converge to a given EE is called its basin of attraction.
In order to simplify the calculation, is assumed in the following discussion. Doing so would result in a simpler model, without significant qualitative or directional changes to our key results about evolution of cooperation. Obviously, the parameters , and satisfy . Based on (11), the payoff matrix for a period under TBC is transformed as follows:Regardless of time scale, does not affect either ESS or EE and does not even affect the evolutionary path of a replicator’s dynamics, so we can rewrite the replicator dynamics (12) as follows:Two specific instances are given in the following.Case1. Only the defectors are identifies, and let . is divided into two parts. One is the set of defectors tagged . The other is the set of individuals without tag. The corresponding part of Assumption 3 is modified as follows: a cooperator chooses if he encounters an individual without tag but chooses if he encounters an individual tagged . Defectors always choose defect. In this case, the payoff matrix is as follows:Case 2. Only the cooperators are identified, and let . is divided into two parts. One is the set of cooperators tagged . The other is the set of individuals without a tag. The corresponding part of Assumption 3 is modified as follows: the cooperator chooses if he encounters an individual with tag and chooses if he encounters an individual without a tag. Defectors always choose defect. In this case, we obtain the payoff matrix as follows:By comparing matrices (13) and (15), we see that Case 1 is equivalent to cooperation identified rate in matrix (13), i.e., the controller accurately identifies cooperators. It follows that Case 1 is in accordance with control based on identifying defection. Correspondingly, Case 2 means the controller identifies defectors accurately.
From the operator’s perspective, the relevant decisions are (1) whether to design TBC; (2) identifying which individual is more effective, cooperator or defector; and (3) how to design cooperation identified rate and defection identified rate .

3. Evolution of Cooperation for the Prisoner’s Dilemma with TBC

3.1. Short-Term Effects Caused by TBC

Based on Assumption 3, an individual chooses a strategy according to a tag. Can a tag improve the correctness of the choice?

Proposition 1. Let denote the individual labeled , where . We denote as the probability that the individual tagged is a cooperator and as the probability that the individual tagged is a defector. Then, and , if and only if .

Proof. According to Assumption 2, and . Denote the probability of an individual labeled as . Obviously, and . So, one can obtain posterior probability and as follows:Thus, and , if and only if .
Now, we discuss an individual’s payoff with TBC. Let be the payoff of individual with TBC and be the payoff of individual without TBC, where .

Proposition 2. If , then .

Proof. From matrices (1) and (13), one can obtainIt follows from (18) and (19) that and are all linear to . Since and , by , it can be derived thatBy and , if , then . So, we getIn summary, if .
Based on Proposition 1, TBC can help cooperators not only identify cooperators but also reduce the risk of unilateral fraud by defectors if . According to Proposition 2, for any cooperation rate , TBC satisfying can inhibit defection. But, Propositions 1 and 2 do not mean TBC can promote a cooperator’s payoff even when . This judgment is confusing. Propositions 3 and 4 will demonstrate.

Proposition 3. We denote , where ; . With increasing cooperation rate , some properties of the individual’s payoff are as follows:(i)(ii)

Proof. From (18) and (19), according to , we get and . Combining with and , we derive . Similarly, we can get . So, with increases in cooperation rate , the payoff of all individuals with TBC increases less than their payoff without TBC.
By Proposition 3, the fact that all individual’s payoff increases in cooperation rate has nothing to do with TBC or not. But, the rate of increase of an individual’s payoff with TBC is lower than its payoff without TBC. In this sense, TBC reduces group efficiency when the rate of group cooperation rate increases. Why do we design TBC in the Prisoner’s Dilemma? In fact, without special mechanisms for promoting the evolution of cooperation, the group cooperation rate will always gradually decrease until cooperators are completely expelled from the group. So, the effect of TBC on the payoff of individuals and the evolution of cooperation is complex. The effect of TBC on the cooperator’s payoff and cooperation evolution involves , parameters , and group cooperation rate . We denote , .

Proposition 4. The effects of TBC on group cooperation rate and an individual's payoff are as follows:(i)For any , the defector’s payoff with TBC is always lower than its payoff without TBC.(ii)A cooperator's payoff with TBC is higher than its payoff without TBC when the group cooperation rate is lower than but is lower than its payoff without TBC when is higher than , i.e., , where is a sign function of .(iii)For any , the cooperator's payoff increases in and the defector's payoff increases in but decreases in , i.e., , , and ; where ,. However, the cooperator's payoff increases in when the group cooperation rate is higher than but decreases when is lower than , i.e., .(iv)If , then .

Proof. From (18) and (19), for any and , and . when and when . Conclusions (I) and (II) are proved.
Furthermore, one can obtain , , and . From , when and when Based on , if , then . Conclusions (III) and (IV) are proved.
By applying Proposition 4, for any , the TBC increases or decreases the cooperator’s payoff depending on the group cooperation rate and . Fortunately, the critical point satisfies . Furthermore, we obtainSo, the TBC can always improve the cooperator’s payoff if cooperation identified rate is large enough. Furthermore, for any , we can design an appropriate , such that x1. Applying (I) and (II) of Proposition 4, an appropriate TBC can raise the cooperator’s payoff and reduce the defector’s payoff, thus increasing the group cooperation rate rapidly.
According to (III) of Proposition 4, for any , with increasing, the cooperator’s payoff increases and defector’s payoff reduces. An interesting phenomenon is for any , i.e., a defector’s payoff will always increase in . Proposition 4 shows that the effect of cooperation identified rate on an individual’s payoff is more complex than defection identified rate . The cooperator’s payoff increases in when the group cooperation rate is higher than the critical value but decreases in when . Based on (II), TBC raises the cooperator’s payoff only when . So, only when , TBC promotes the cooperator’s payoff, and larger results in larger cooperator’s payoff.
According to (VI), if , . Furthermore, , i.e., the cooperator’s payoff increases in cooperation identified rate if defection identified rate is large enough. At the same time, if , then the cooperator’s payoff increases in for any . On the other hand, increasing can also increase the defector’s payoff because the probability of the strategy pair is . Thus, increasing increases the probability that a defector gains the payoff !
According to (III) of Proposition 4, for a given defection identified rate , if the group cooperation rate is very low, for example, , then increasing cooperation identified rate will not only bring harm to cooperators but also help defectors to get more payoffs. A larger will doubtlessly cause great damage to the group if the defection identified rate and the group cooperation rate are all little.
Denoting the group’s expected payoff without TBC as and with TBC as , we get the following proposition.

Proposition 5. If , then, for any given and any , the group’s expected payoff without TBC is higher than that with TBC, i.e.,

Proof. Applying the payoff matrix (13), we get and . If , it yieldsThis completes the proof.
Proposition 5 is simple but interesting. It shows that the TBC reduces the group’s average payoff when the cooperator’s loss , caused by defection, is less than the defector’s gain . Furthermore, this conclusion is not affected by , , and . Furthermore, we obtainWhen , based on (3), . By letting , we can obtain . This is consistent with the result of Proposition 2. By the fact that , when is large enough, increasing can improve the group's average payoff, but it is still lower than that without TBC.
We assume that . Denoting , we get . In other words, in the situation where two players’ total payoff with strategy pair is lower than the total payoff with , the TBC can increase the group’s average payoff for a lower cooperation rate.

Remark 1. Propositions 2–5 are all based on the payoff matrix (13) that is obtained by and . The assumption is realistic. For example, confirmation by the supervisory department of, amongst other things, an enterprise’s creditor product quality, cannot be eliminated in short term. The values of will be more complicated if ,. But, the hypothesis that does not result in significant qualitative or directional changes to conclusions of Propositions 25. Furthermore, the ESS and EE of the game do not alter by the assumptions and .

Remark 2. Propositions 25 are only static analysis of the impact of TBC on an individual’s payoff.
Propositions 25 show that TBC will bring down a group’s payoff unless the cooperation identified rate and the defection identified rate are all large enough. From the point of static analysis, it is hard to understand that TBC can promote cooperation. However, from the evolutionary point of view, without TBC, cooperation will fail in evolution and all benefits based on cooperation will disappear. Now, we discuss the ESS and EE of the Prisoner’s Dilemma with TBC.

3.2. Evolution of Cooperation for the Replicator Dynamics

Proposition 6. Defection is always ESS of the Prisoner’s Dilemma for any .

Proof. For any state and any , by payoff matrix (13), one can get and . So, defection is always ESS.
According to Friedman [43], the Prisoner’s Dilemma was discussed as being a linear game; the point is an EE of replicator dynamics (14). Based on Proposition 6, a small group of cooperators cannot invade a group of defectors, no matter how high identified rates and are.

Proposition 7. Cooperation is an ESS of the Prisoner’s Dilemma with TBC, if and only if the following condition is satisfied:

By [42], Proposition 7 can be obtained.

Based on Propositions 6 and 7, if and only if (26) is satisfied, the Prisoner’s Dilemma with TBC is a symmetric coordination game. The pure strategies “cooperation” and “defection” are all ESS. The fixed points and are all EE. Furthermore, the basin of attraction of the point is , where . As , the basin of attraction of cooperation can rise if we increase the cooperation identified rate or the defection identified rate . For any , . This shows that a large-enough defection identified rate can successfully promote the evolution of cooperation for any . Furthermore, the basin of attraction of cooperation can be sufficiently large through the designing of an appropriate defection identified rate.

If the defection identified rate satisfies , cooperation can evolve successfully. By denoting , the largest basin of attraction of cooperation is for given . Otherwise, if , defection is the only ESS. If we consider as the cost-to-benefit ratio of the cooperation, this conclusion is in good agreement with the work of Nowak [44]. In [44], it is shown that direct reciprocity can lead to the evolution of cooperation only if the probability of another encounter between the same two individuals exceeds the cost-to-benefit ratio of cooperation, i.e., , and indirect reciprocity can only promote cooperation if the probability of knowing someone’s reputation exceeds the cost-to-benefit ratio of the cooperation, i.e., .

By Proposition 7, when condition (26) is satisfied, cooperation is the ESS of the Prisoner's Dilemma and the average payoff of an individual in a group full of cooperators is . But, without TBC, defection is the only ESS and the average payoff of an individual will be 0! To sum up the abovementioned discussion, from a dynamic point of view, TBC can promote a cooperator’s payoff. Further observation shows that defection identified rate plays a more important role in promoting the evolution of cooperation than cooperation identified rate , but determines an individual’s payoff when the cooperation rate is high.

Now, we only discuss the cooperator’s payoff and evolution of cooperation.

Proposition 8. Designing TBC with cooperation identified rate and the defection identified rate , we obtain the following conclusions:(i)When condition (26) is satisfied, but , this TBC can promote the evolution of cooperation, but for given cooperation rate in the basin of attraction of cooperation, the cooperator's payoff with TBC is lower than its payoff without TBC(ii)When , this TBC can not only expel defectors but also increase the cooperator's payoff for

Proof. If condition (26) is satisfied, based on Proposition 7, the TBC can promote the evolution of cooperation. From the condition , we obtain that and . The basin of attraction of cooperation is , and the cooperator’s payoff with TBC is higher than its payoff without TBC only when the group cooperation rate is lower than . So, the TBC can expel defectors, but the cooperator’s payoff with TBC is lower than its payoff without TBC when .
If the condition is satisfied, we get and . So, the TBC can not only expel defectors but also increase the cooperator’s payoff for .
The condition of (II) of Proposition 8 is stricter than (26). So, based on (II) of Proposition 8, the TBC can not only expel defectors if but also increase the cooperator’s payoff for . But, only when , the TBC can result in these two advantages.
Cooperation identified rate and defection identified rate are the input of TBC. They are feedback of cooperation rate . At any time , cooperation rate is variable . So, and are functions of time . We denote and as the cooperation identified rate and the defection identified rate at time in the next part of this paper.

4. The Optimal Identified Rate of Promoting the Cooperation Rate

4.1. The Optimal Identified Rate Model

We define as

Based on , one can obtain and , . The performance index is introduced:where is the terminal time of the replicator dynamic (14), denotes the terminal cooperation rate, is the effective cost of promoting the cooperation rate, and is the control cost of the entire TBC process. The bigger corresponds to the better effect of promoting cooperation and the lower effective cost. The parameter is used to adjust the difference between the effective cost and the control cost in the dimension and the weight. Furthermore, reflects the controller’s balance of the TBC for cooperation and for defection.

We rewrite replicator dynamic (14) and performance index (28) under transformation (27) as the follows:

We suppose the initial time of replicator dynamic (14) is , and the initial value of system is

The initial value (31) indicates that, at the beginning of the evolution, the population is in a state in which cooperation and defection are mixed. We denote . We consider as a control variable of the dynamic (29).

The feedback control law of the state is designed so that the total cost defined by the performance index (30) is minimized. The optimal identified rate problem can be expressed as follows: under dynamic constraint (29) and initial value constraint (31), the optimal control and the optimal state trajectory are designed to minimize (30).

4.2. The Optimal Identified Rate Designing Process

There is no analytic solution for the optimal control problem (29)–(31). Now, we give a numerical algorithm to design an optimal identified rate.

Theorem 1. Consider the optimal identified rate design described by replicator dynamic (14) with performance index (30). Under (27) and (31), the optimal identified rate is existent and unique, and is expressed in the following form:where is the solution of the following adjoint variable differential equation:where , and is the solution of the following state differential equation:where .

Proof. By introducing the adjoint variable and letting , if the optimal control and the optimal state are and , then there exists the adjoint variable such that , , andFrom (36), we obtain , , and satisfying the following equations:The adjoint variable and its terminal condition with the optimal control are as follows:Substituting (37) and (38) into (29) and (39) yields the following two-point boundary value (TPBV) problem:Note that the TPBV problem (41) is nonlinear; for a nonlinear TPBV problem, there is no analytic solution. From (37), (38), and (27), one can obtainwhere and is the solution to the TPBV problem (41). A sequence of state differential equations is constructed:For certain , is known, so equation (43) is a sequence of differential equations. Furthermore, can be solved from (43). According to Tang and Dong [45], the solution sequence defined by (43) is uniformly convergent to the solution of the following equation:We define and construct the following sequence:In the iteration, and are known, so (45) is also a sequence of differential equations. Based on [45], sequence (45) is uniformly convergent to the solution of the following equation:We denote . By constructing sequences (33) and (34) and repeating the previous iteration process, we can obtain the sequences and . The sequences and are uniformly convergent to the solution of TPBV (39), i.e.,This completes the proof.
The number of iterations and can be determined by the accuracy requirement for the performance index (30). We give a practical algorithm calculating the suboptimal optimal identified rate as Algorithm 1

Input: Given positive real numbers and .
Output: suboptimal .
Step 1: Initialize , .
Step 2: Initialize , .
Step 3: Calculate state variable according to equation (34).
Step 4: If , let ; otherwise, let and jump to Step 3.
Step 5: Initialize , .
Step 6: Calculate adjoint variable according to equation (33).
Step 7: If , let ; Otherwise, let and jump to Step 6.
Step 8: Letting , calculate .. according to the following equation:
Step 9: If stop the algorithm and calculate the suboptimal according to equation (32); otherwise, let and jump to Step 2.

5. Simulation Examples

In the Prisoner’s Dilemma with payoff matrix (1), let , . The replicator dynamic with TBC is

The replicator dynamic without TBC is

By simulation, we study the following three problems.

5.1. Effect Analysis of TBC

Let , and , and , respectively. In the previous discussion, we denote as the payoff of the cooperator with TBC, denote as the payoff of the defector with TBC, denote as the payoff of the cooperator without TBC, and denote as the payoff of the defector without TBC, denoting the group’s expected payoff without TBC as and with TBC as . In Figure 1, to simplify legends and discuss the effect of on promoting cooperation, we redefine the symbol as follows: at , let and be the payoffs of group and individual with TBC form to , respectively, and be obtained after TBC has just been withdrawn, i.e., TBC is used from to , where is positive and small enough, is the cooperation rate with TBC , is the payoff function curve of individual , is the payoff function curve of the group, and is the cooperation rate curve without TBC, respectively, where .

Based on Figures 1(a) and 1(b), from the curves of and , we can know the payoffs of the cooperator and defector are all decreasing and cooperation fails to evolve without TBC. Comparing Figures 1(a) and 1(d), we can see that cooperators obtain more payoff than without TBC, but cooperation rate decreases and cooperators are expelled still because the identified rate is low when . According to the parameters in this example, is in the basin of attraction of cooperation only when . So, as a result, payoff of cooperator decreases and cooperation fails to evolve. When , payoff of cooperators increases and cooperation evolves successfully; payoff of defectors increases but defection fails to evolve.

Here, we try to show what will happen if we withdraw TBC for a short time. First of all, we emphasize the symbols , , and of Figure 1(c). For a given time , and all correspond to the cooperation rate defined by the replicator dynamics (48) with TBC , but corresponds to the cooperation rate defined by the replicator dynamics without TBC (49). As shown in Figure 1(b), for any identified rate . When , condition (26) is not satisfied and , so the point is not in the basin of attraction of cooperation and . But, by designing a large identified rate , we obtain the basin of attraction of cooperation is and . When , one gets and so that TBC inhibits the growth rate of the cooperator’s payoff. If we withdraw TBC for a short time, cooperators will obtain more payoffs.

Combined with the evolution of cooperation, we come to the following conclusion: when the cooperation rateis low, TBC results in the payoff for all individuals being higher than the payoff without TBC; when the cooperation rateis large enough, the payoff for all individuals will be lower than the payoff without TBC under same . On one hand, Figures 1(b) and 1(c) show that TBC decreases not only the defector’s payoff but also the group’s payoff. Furthermore, the higher the cooperation rate is, the more obvious the inhibition is. On the other hand, we observe the positive effects of TBC on promoting both the cooperation rate and the group’s payoff. Only under a bigger does the cooperation rate increases. For example, although the group payoff under TBC is always smaller than , it is always greater than .

In summary, Figure 1 shows that when the group cooperation rate defined by (48) reaches a satisfactory value, temporary withdrawal of TBC can increase the payoff of all individuals in the group for a short time. However, if without TBC for a long time, the group cooperation rate defined by (49) will drop to a very low level and all individual benefits will decrease.

5.2. Analysis of Rationality of the Optimal Identified Rate

Letting , , we obtain the optimal identified rate under different as Figure 2 and the optimal state path as Figure 3.

Given a sufficiently large constant and a sufficiently small constant , if , we define , and if , we define , where . Figure 2 shows that approximates bang-bang control with switching time . If , then ; otherwise, . The smaller corresponds to the bigger switching time. If is large enough, then and .

Now, we explain . According to the definition of TBC, means all cooperators are labeled as “defectors” and all defectors are labeled as “cooperators”. This is impossible and inefficient. Here, can be understood as no effective TBC; all individuals choose “defection” strategy for self-protection. In this case, the control cost is 0. Under , the cooperation rate is a constant value. By designing the bang-bang control as Figure 2, we can achieve the goal of successfully promoting the evolution of cooperation. This meets the requirement of minimizing the performance index (30). This is, however, unreasonable for the group in terms of revenue. Probably, the more reasonable approach would be to withdraw TBC when the cooperation rate is satisfactory. Now, let us discuss the TBC design from the perspective of revenue.

Figure 3 shows that there is a suitable TBC to make the cooperation evolve successfully. Furthermore, the smaller results in the better effect of promoting cooperation.

By designing defined by Theorem 1, the corresponding expressions of the total payoff of the cooperator, defector, and group is , , and , respectively. If we do not design the TBC, from to , the total payoffs of the cooperator, defector, and group can be expressed as , , and , respectively, where is described by (49). Let and , respectively. The calculation results are shown in Table 1.

Table 1 shows that improves the cooperator’s payoff. Furthermore, as decreases, the switching time moves backwards and the cooperator’s payoff increases.

5.3. Optimal Timing Control (OTC) of Switched Replicator Dynamics

System (48) and (49) is a switched nonlinear system. In the past decades, switched systems have received significant research attention, see, e.g., [4648]. In this paper, we design the switching control (SC) as follows: for a given and , let , and we design TBC with the identified rate as

However, when , we give up the TBC. The replicator dynamics with SC is as follows:

Assuming , the switching time is decided by the optimal identified rate problem (29)–(31) according to different weight coefficients . Based on a switched system (51), the total payoffs of the cooperator, defector, and group can be expressed as , , and , respectively. The simulation results are shown in Table 2, and the state path with SC is shown in Figure 4.

By comparing Tables 1 and 2, we find the impact of control on the average payoff of a group is complex. Table 1 shows that the smaller coefficient corresponds to a large group’s payoff under optimal TBC, but Table 2 shows the opposite result. Combining with Figures 3 and 4, the group has a larger average payoff under SC, but the cooperation rate is lower at .

So, how to choose an appropriate control, TBC or SC, and how to choose the switching time depend on what we pay more attention to.

Now, we design the cost function with OTC as follows:where is the terminal cooperation rate, is the control cost, is the cooperator’s average payoff, and is its weight coefficient. We search the optimal switching time to maximize the function (52), i.e., we design an OTC for promoting evolution of the Prisoner’s Dilemma and increasing the cooperator’s average payoff.

Theorem 2. Consider the OTC described by the switched replicator dynamic (51) with performance index (52). and are solutions to algebra equation (53) and implicit function (54), respectively:The optimal switching time is decided by

Proof. The following two implicit functions can be obtained from the switched replicator dynamics (51):Based on existence theorem of an implicit function, we get from (56) and (57). So, satisfies (53) and satisfies (54). The cost function can be rewritten as follows:Because , , and , the optimal switching time is decided by (55).
Letting , and 1, respectively, we get the cooperation rate curve as in Figure 5 and the payoffs of the cooperator, defector, and group as in Table 3 by applying Theorem 2.
Figure 5 shows that the cooperation rate is very high whenever the weight coefficient of the cooperator’s payoff or although the switching time changes ( varies from 2.472 to 2.967). It is because we choose a large coefficient of cooperation rate at terminal time . In fact, the main goal of this paper is to promote the evolution of cooperation by designing TBC. So, we design the cost function to promote the cooperation rate and to reconcile control cost and the cooperator’s payoff. Observing Table 3, we note that the switching time increases as the cooperator’s payoff weight coefficient gets bigger.
The OTC has the following advantages when Table 3 is compared to Tables 1 and 2. Firstly, it has the same advantage as TBC: it restrains the payoffs of defectors. Secondly, it greatly enhances the payoffs of cooperators compared to TBC. Thirdly, it is the optimal way to promote the payoffs not only for cooperators but also for groups. Lastly, it can effectively promote the cooperation rate .

6. Conclusions

The analysis in this paper provides value insights into a number of issues regarding some clusters in the economic society when faced with the Prisoner’s Dilemma, which can help operators of clusters decide to adopt TBC or not, identify cooperators or identify defectors, and design the value of the identified rate.

First, operators should judge the harm of defection in the Prisoner’s Dilemma. We show that when the cooperation rate is high, to further increase cooperation rate, a large identified rate is needed. Thus, it is difficult to design a TBC to further promote cooperation and further improve the cooperator’s payoff. Furthermore, the cluster must pay for the TBC. So, it is not necessary to use TBC for a cluster when the cooperation rate is high. If operator considers TBC is necessary, we advice he/she depends more on the cooperation identification.

By contrast, if the cooperation rate is too low, i.e., the group is almost full of defectors, it is difficult to design a TBC which can help cooperators invade the group of defectors successfully. In this case, any TBC all can increase the cooperator’s payoff. But, a large defection identified rate is essential to promote the cooperation rate. It is very important to note that defection identification is more effective than cooperation identification.

We show the switch between adopting TBC and withdrawing TBC is necessary. TBC has an advantage in promoting evolution of cooperation. However, when the cooperation rate is high, TBC reduces the payoffs of the cooperators unless the identified rate is high enough. This paper gives OTC as a way to design switching law. The simulation results confirm OTC not only maintains a high cooperation rate but also increases the payoff of the cooperators.

While this paper is a significant step in the economic analysis of the Prisoner’s Dilemma and in the optimal control of promoting cooperation, there are a number of interesting directions for future work in this area. There are a number of potential interesting extensions of our model through relaxation of some of our assumptions. For example, we assume the payoff of mutual defection is in this paper, i.e., . In fact, we can assume an individual is exclusive [49] and every individual only accepts an encounter-tagged “cooperator”.

Last but not the least, this paper does not really design an SC law for the switched replicator dynamics; the SC designed in this paper is only an OTC with one switching time. Further discussions for promoting cooperation and increasing the cooperator’s average payoff will aim to design an OTC with multiple switching times and general SC unrestrained by bang-bang control. Furthermore, the optimal tag-based cooperation control proposed in this paper can be applied in management practice, such as in e-commerce platforms. Because of information asymmetry about product quality in e-commerce platforms, there exists the PDG in seller groups and consumer groups. Platform manager can design tag-based cooperation control by offering an authentication service. A valuable extension of this research is to empirically examine the feasibility of TBC proposed in this paper.

Data Availability

The simulation model and data are given in the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This paper was supported by the National Natural Science Foundation of China (71871171, 61773156, 71871173, and 71801175). This work was also partly supported by the Key Scientific Research Project in Universities of Henan Province (19A120006).