Abstract

One of the key advantages of unmanned swarm operation is its autonomous cooperation. When the communication is interrupted or the centralized control manner is lost, the cooperative operation can still be carried out orderly. This work proposed a cooperative evolution mechanism within the framework of multiplayer public goods game to solve the problem of autonomous collaboration of unmanned swarm in case of failure of centralized control. It starts with the requirement analysis of autonomous cooperation in unmanned swarm, and then, the evolutionary game model of multiplayer public goods based on aspiration-driven dynamics is established. On this basis, the average abundance function is constructed by theoretical derivation, and furthermore, the influence of cost, multiplication factor, and aspiration level on the average abundance is simulated. Finally, the evolutionary mechanism of parameter adjustment in swarm cooperation is revealed via case study, and deliberate proposals are suggested to provide a meaningful exploration in the actual control of unmanned swarm cooperation.

1. Introduction

With the continuous advancing of the third wave of artificial intelligence, “group evolutionary intelligence” developed from “single-agent autonomous intelligence” has become one of the important characteristics of the new generation of artificial intelligence. Particularly in the military field, unmanned swarm (unmanned vehicle cluster [1], unmanned boat cluster [2], and unmanned aerial vehicle cluster [3]) operations have received unprecedented attention over the past two years. The US military has listed unmanned swarm operations as a “subversive technology” that can change the rules of war.

There are mainly two kinds of control modes of unmanned swarm: centralized control and autonomous collaboration. Under the premise of good communication, the Command and Control (C2) center can implement centralized control on the swarm. However, in the complex electromagnetic environment of the battlefield, there is a real risk of communication failure [4]. In such a predicament, the centralized control mode fails, and the unmanned swarm must make effective response on the spot according to the external situation and achieve self-management and self-coordination. An issue that has led to considerable interest is how unmanned swarms autonomously and cooperatively complete established military operations. The sketch of autonomous cooperation of unmanned swarm is shown in Figure 1.

Overall planning and reallocation of operation resources (communication, firepower, intelligence, etc.) within the unmanned swarms is required when autonomous collaboration occurs. However, in the process mentioned above, there are often contradictions between individual partiality and swarm needs, which are difficult to reconcile. For example, in the fire strike task, the “rational” unmanned units with intelligence and decision-making ability will choose to “contribute” ammunition to the swarm as little as possible in order to maintain its combat effectiveness, while on the other hand, the more ammunition each unit contributes to the swarm, the higher the survival rate and the greater the combat effectiveness of the whole swarm will be. The contradiction between the two will lead to “tragedy of the commons” [5]; therefore, how to increase the number of units’ willing to positively contribute ammunition to the swarm and avoid the tragedy has become a crucial and urgent problem in both technology research and practical application of unmanned swarm.

Evolutionary game theory [69] combines “equilibrium” in economics with “adaptability” in biology to depict the process that individuals adapt to the external environment through learning, imitation, and trial-and-error under boundary rationality and asymmetric information. And the evolutionary game of public goods (PGG) [10] provides a basic theoretical framework for revealing the cooperative evolution mechanism and coping with the tragedy of the commons. PGG reflects that investors (collaborators) and hitchhikers (betrayers) play strategic games over time based on cost, multiplication factor, selection intensity, etc., which makes the proportion of collaborators and betrayers in the population change dynamically, and finally tends to an evolutionarily stable state (ESS). The research focus of PGG is to calculate the mathematical expectation of the proportion of collaborators in a population after multiround game, that is, the average abundance, and then analyze the relationship between average abundance and parameters (cost, multiplication factor, selection intensity, etc.) to achieve the ultimate purpose of manual control.

At present, there are two main research directions to solve the problem of swarm cooperation with evolutionary game theory: one is to study the evolutionary dynamics process and cooperation mechanism of spatial structured population such as complex network based on graph theory [11, 12], and the other is to study the evolutionary stability state of well-mixed population and the dominant condition of cooperation based on Markov stochastic process [13, 14].

For the former, the team of Professor Nowak from Harvard University theoretically deduced the evolution of population in spatial structure such as circle, random graph, and scale-free network and creatively proposed the relationship between the ratio b/c and the network average degree k. They pointed out that the smaller the network connectivity is, the more conducive the cooperation in natural selection is [15]. Then, they used the pair approximation theory to theoretically deduce the cooperation phenomenon on the regular lattice and obtained the boundary conditions for the generation and expansion of cooperation [16]. On the basis of the above achievements, further comparative analysis was made on the differences between homogeneous and heterogeneous networks in promoting cooperative behavior, and simulation results show that weak connection can better promote cooperative behavior on heterogeneous networks [17]. At the same time, other researchers studied the dynamic process of multiparty game on the graph, and simulation results show that spatial structured population can promote the occurrence of cooperation better compared with unstructured population. In the recent two years, the team of Nowak has applied the evolution dynamics of cooperation in spatial structure to social network, analyzed the critical conditions of cooperation behavior in human society [18], initially explored the trade-off between the evolution convergence probability and the evolution convergence time [19], and extended the cooperative evolution on structural population to weighted graph [20]. Other representative studies include literature [21, 22] on the specific model of multiplayer snowdrift game, the relationship curves between the ratio b/c and cooperation level on the well-mixed population and the structured population, respectively, and the significant differences between the homogeneous/heterogeneous network and the unstructured population in the promotion of cooperation are compared.

For the latter, the representative researches are as follows: Wang at Wuhan University obtained the average abundance function of the snowdrift evolutionary game using the stable distribution of Markov chain and simulated the effect of parameters on the average abundance [23]. Based on the research work of Tarnita et al. [24], Du at Peking University obtained the inequality of strategic dominance conditions in two-party evolutionary game through strict mathematical derivation; besides, the simulation results show that the average abundance under weak selection intensity is independent of aspiration level [25].

The aforementioned work on swarm cooperation is of great theoretical and engineering value. We have also conducted an exploration on the cooperation mechanism of unmanned swarm, and the relevant results can be referred to in [2630]. Nevertheless, there are still two shortcomings in the above achievements in solving the issue of cooperative evolution of unmanned swarms: first, it has not focused on the public goods game; that is, although there are similarities between the snowdrift game and public goods game [31], there are essential differences in the game mechanism; in addition, the cooperative evolution of unmanned swarm is the multiple interaction of combat units; that is, the evolution result is not only related to the strategy selection of single unit, but also depends on the strategy of other units in the swarm, which is characterized by multiplayer games [32, 33]. So far, the academic community has mastered the payoff matrix [34] of the public goods game with multiplayer and has simulated the influence of different selection intensity [3537] and threshold values [31] on the cooperation level. In particular, in literature [34], the authors derived a general average abundance formula of multiparty games in a finite population under aspiration-driven dynamics, which can be applicable to any multiparty game under the aspiration-driven dynamics of a finite population. However, the average abundance formula for specific public goods game is not mentioned, so the important work of this study is to obtain the average abundance analytic expression of public goods game based on the existing payment matrix.

Furthermore, the generalized evolutionary game model can be simplified as Markov chain + strategy update rules in finite population, so the average abundance function is also closely related to the strategy update rules. By studying strategy update rules in the framework of evolutionary game theory, one can differentiate between imitation processes and aspiration-driven dynamics [38]. In the former case, individuals imitate the strategy of a more successful peer [39]. In the latter case, individuals adjust their strategies based on a comparison between their own payoff and the value they aspire, called the level of aspiration [40]. Unlike the imitation processes of pairwise comparison, aspiration-driven updates do not require additional information about the strategic environment and can thus be interpreted as being more spontaneous [41, 42]. In the complex battlefield environment, the information acquisition is incomplete, asymmetric, which requires the swarm to achieve self-management and self-coordination, and this requirement just coincides with the aspiration-driven dynamics. Moreover, the existing results show that, in both prisoner’s dilemma game and public goods game, the dynamic mechanism driven by aspiration can improve the average abundance value and promote cooperation more than the traditional imitation dynamics [43, 44].

Aiming at the cooperative evolution mechanism of the unmanned swarm, we modelled the evolution process based on multiplayer public goods game framework and aspiration-driven update rule and then deduced the average abundance function of the model by analyzing the stable distribution of the Markov chain; on this foundation, we studied the influence of relevant parameters on the average abundance through theoretical analysis and numerical calculation; finally, we studied the effect of parameter adjusting on swarm cooperation via case study and discussed the corresponding solutions and advice to avoid “the tragedy of the commons.”

3. Model Hypothesis

In essence, the autonomous collaboration of unmanned swarms is a game process of multiparty and multiround, which focuses on the autonomous allocation of public resources. Therefore, we use multiplayer public goods evolutionary game to model the cooperative evolution of unmanned swarms. The mapping between the concepts of cooperative evolution in unmanned swarms and multiplayer public goods evolutionary game is listed in Table 1.

3.1. Framework of Multiplayer Public Goods Evolutionary Game

It is set that the autonomous cooperation of unmanned units takes place in a well-mixed swarm of size N, and every unit has two alternative strategies, A and B. Every d units interact simultaneously to get their payoffs; i.e., they are in a two-strategy and d-player game. The strategy update procedure is as follows:(i)Select any focal individual X in the population of N and select d−1 individuals from the remaining N−1 individuals to form a group of . A focal individual can be of type A or B, and encounters a group containing other players of type A and players of type B.(ii)The focal individual X plays games with the rest d−1 individuals in strategy space . Denote by and the payoffs of strategy A and B a player obtains, respectively, when facing i other A individuals within the group of size d, where i ranges from 0 to d−1.(iii)At the end of each round of the game, the focal individual X evaluates the benefits under different strategy choices and then updates its strategy according to aspiration-driven dynamics. The above process is repeated until the proportion of a certain strategy tends to be stable in the whole population. Obviously, the value of k determines the payoffs— and . All possible payoffs of a focal individual are uniquely defined by the number of A in the group, and the payoff matrix is as follows.

For any group engaging in a one-shot game, we can obtain each member’s payoff according to Table 2. When X chooses strategy A, the total contribution by individuals to the swarm is , the total gain of the swarm is multiplied by the profit coefficient , and the gain of each individual is . As the cost of X is , the net gain of X is . When X chooses strategy B, the total contribution by individuals to the swarm is , the total gain of the swarm is , and the gain of each individual is . As there is no cost in such a case, the net gain of X is . Thus, the payoffs for A and B are

3.2. Expected Payoff for Strategies A and B

In a finite well-mixed population of size N, groups of size d are assembled randomly, so the probability of choosing a group that consists of another k players of type A and players of type B is given by a hypergeometric distribution [45]. For example, the probability that an A player is in a group of k other A’s is given by , where i is the number of A players in the population. The symbol denotes a combinatorial notation, which is the number of ways to choose a k element subset from an n element set.

The expected payoffs for any A or B in a population of size N, with i players of type A and Ni players of type B, are given by

3.3. Aspiration-Driven Dynamics

There are several typical strategy update rules in evolutionary game, such as unconditional imitation [35], Fermi rule [46, 47], etc. Aspiration-driven dynamics focuses on comparing the payoff with aspiration level to make new decisions. Players need not see any particular payoffs but their own, which they compare with an aspiration value. The aspiration-driven dynamics coincides with the requirement of self-management and self-coordination of unmanned swarm in the case of incomplete information acquisition in complex battlefield. The level of aspiration, , is a variable that influences the stochastic strategy updating. The probability of switching strategy is random when individuals’ payoffs are close to the level of , reflecting the basic degree of uncertainty in the population. When payoff exceeds , strategy switching is unlikely. At high values of compared with payoff, switching probabilities are high.

To model stochastic aspiration-driven switching (from strategy A to B), we can use the following probability function:

The aspiration level, , provides the benchmark used to evaluate how “greedy” an individual is. Higher aspiration levels mean that individuals aspire to higher payoffs. The intensity of selection, , provides a measure of how important individuals deem the impact of the actual game on their update. Let ; if , then , which means that individuals have the same preference for strategies A and B; if (i.e., the individual payoff is higher than aspiration level ), then , which means individuals prefer strategy A; if (i.e., individual payoff is lower than aspiration level ), then , which means individuals prefer strategy B. As for whether an individual updates strategy or keeps strategy unchanged in a certain round of game, it can be further determined by other algorithms, such as roulette algorithm.

In the same way, the probability of the focal individual updating from strategy B to A is

In the aspiration-driven dynamics, at each time step, the number of strategy A, i.e., i, can only increase by one, decrease by one, or stay the same. When the number of strategy A increases by one, two subsequent events happen: first, a B strategy individual is selected from the population; then it does not satisfy with the payoff it obtains and switches to the strategy A. A similar process holds for the number of strategy B. Therefore, the probability that the number of A individuals changes at one time step is

Because there is a stable distribution in the Markov chain without absorbing state, the average abundance function of multiplayer evolutionary game can be derived based on the above state transformation equation.

4. Average Abundance Function

At present, most of the research on average abundance is based on digital simulation, but no strict mathematical expression is given. In this part, we first give the definition of the average abundance of unmanned swarm and then derive its mathematical expression by analyzing the stable distribution of the nonabsorbent Markov chain to support the subsequent simulation analysis in Section 5.

4.1. Average Abundance

Definition 1. (average abundance of unmanned swarm). Set the proportion of unmanned units with strategy A in a swarm as a random variable. Let be the probability distribution of ; then the expected value of is defined as the average abundance of unmanned units with strategy A.
Therefore, the definition of average abundance can be expressed as The key to calculating the average abundance is to determine the probability distribution . For Markov chains without absorbing state, is just the stable distribution , and it satisfies the detailed balance condition [48]: .
Equation (10) is just a definition formula, which cannot be directly applied to the actual calculation and analysis. Next, we will theoretically deduce the average abundance formula based on the detailed balance condition so as to reveal the quantitative relationship between the average abundance and related parameters (cost, multiplication factor, selection intensity, etc.) and provide a theoretical calculation basis for the subsequent characteristic analysis.

4.2. Function Deduction

It can be derived from the detailed balance condition:

Further, we induce and summarize the above formulas; then we get where is the strategy dominant function. If , that is, the increasing probability of strategy A is greater than the decreasing probability, it means that strategy A is dominant in the swarm; otherwise, strategy B is dominant.

Since ,

Then, we have . Inserting into formula (14), we have

Inserting (16) into (10), can be written as

Equations (17) and (18) are just a general expression for the average abundance of multiplayer evolutionary games under aspiration-driven dynamics, and the specific application depends on and . Therefore, the combination of equations (1)–(4), (17), and (18) forms the average abundance function of unmanned swarm under the framework of multiplayer public goods evolutionary game.

5. Evolutionary Game Analysis

On the basis of the average abundance of unmanned swarm obtained above, we will analyze the impact of cost , multiplication factor , and aspiration level on it. Set the basic parameters , , , , and , and when calculating the impact of one parameter, others remain unchanged. In addition, in order to highlight the different influence degree of parameters on average abundance under different selection intensities, is selected in each simulation scenario.

5.1. Average Abundance With Respect to Cost

It can be easily proved through mathematical induction from equations (1) and (2) that increasing will increase and and then increase and , resulting in the decrease of both and . Since , in the case of increasing , the change of and are difficult to directly determined. Next, we will give a set of numerical solutions to intuitively observe the interaction within a certain range, so as to reveal the influence of on through simulation.

Select the interval and draw the average abundance curve of the strategy A as follows.

As shown in Figure 2, as increases, will monotonically decrease; when , (i.e., the proportion of collaborators and betrayers in the swarm is balanced), while , increases with ; moreover, with the decrease of , the influence degree of on increases: , while .

5.1.1. Conclusion 1

The increasing of cost will decrease the average abundance, especially when selection intensity is small.

5.2. Average Abundance with Respect to Multiplication Factor

Similarly, in the case of increasing, the change of and cannot be determined only by deduction. Select the interval and draw the average abundance curve of the strategy A under different selection intensities as shown in Figure 3.

As the multiplication factor increases, will monotonically decrease, which means the phenomenon of “free riding” appears, resulting in the weakening of cooperation and the decline of ; moreover, with the decrease of , the influence of on increases: , while .

5.2.1. Conclusion 2

The increasing of multiplication factor will decrease the average abundance, especially when selection intensity is small.

5.3. Average Abundance with respect to Aspiration Level

Select the interval and draw the average abundance curve of the strategy A under different selection intensities as shown in Figure 4.

As the aspiration level increases, will monotonically increase, which means the rising of aspiration level makes it more difficult for betrayers to satisfy their expectations, and thus more betrayers transfer to cooperators; moreover, with the decrease of , the influence degree of on increases: , while .

5.3.1. Conclusion 3

The increasing of aspiration level will increase the average abundance, especially when selection intensity is small.

According to the simulation results, , , and have an impact on the curve trend of average abundance. When and increase, the average abundance decreases monotonically, while, with the increase of , the average abundance increases monotonically. The conclusions from the simulation provide a theoretical basis for the regulation of swarms in practical application. Based on the conclusions above, in the following section, a case study is provided to further reveal the cooperative evolution mechanism of the unmanned swarm.

6. Case Study

Fire strike is a typical task in unmanned swarm operation. Limited by the ammunition loading/mounting capacity, when the unmanned swarm carries out the fire strike task in case of failure of centralized control mode, the “rational” unmanned units with intelligence and decision-making ability will strictly control the ammunition launching/delivery quantity with “free riding” mentality, while from the perspective of the whole swarm combat effectiveness, we hope that each unit can provide as much ammunition as possible to ensure the overall strike effectiveness on enemy. The key to coping with this contradiction is how to raise the proportion of cooperators in the swarm through self-regulation and self-coordination.

Consistent with the above section, set, , , , and and draw the basic curve (see Figure 5). Since , this case is a nondominant case; that is, most units choose strategy B. Therefore, we try to regulate relevant parameters to increase the average abundance of unmanned swarm and promote cooperation.

As in Figure 5, reducing the cost or increasing the aspiration level can raise the proportion of cooperators. However, increasing the multiplication factor will cause the average abundance curve to deviate downward from the basic curve, which is because increasing payoff of cooperators and betrayers by the same margin will make the “free riding” situation more serious. Consequently, we try to separate the multiplication factor of cooperators from that of betrayers, only increase the multiplication factor of cooperators (the multiplication factor of betrayers remains unchanged), and find that the average abundance curve deviates upward from the basic curve.

Furthermore, we simulate the average abundance under different (see Figure 6). When , the average abundance is approximately equal to 0.5, which indicates that the proportion of cooperators and betrayers in the swarm is basically balanced. With the further increase of , when , the average abundance will be greater than 0.5 at , while when , the average abundance will be greater than 0.5 at . Thus, we can reach the following conclusions.(1)The adjustment on can switch the dominant strategy, making the average abundance of strategy A greater than 0.5.(2)The lower the is, the more stringent requirement for will be, and the higher the is, the looser requirement for will be: , while .

In order to investigate the regulation sensitivity of different parameters, we simulate the affecting degree of unit variation of , , and on the average abundance. We select the simulation results with to be discussed, as shown in Figures 7(a)7(c), respectively.(1)When , the average abundance is identically equal to 0.5, and thus the parameter regulation loses its effect (see Figure 7(a)).(2)When and unit variation of parameters (i.e., ) is small (note that the threshold of is related to : , ), the change in value of average abundance caused by adjusting and is much greater than adjusting (see Figures 7(b) and 7(c)). The regulation of and is more sensitive than that of .(3)When and is large, the regulation effect of is much better than that of and . And the larger is, the more sensitive is; i.e., a small leads to a large increasing in average abundance: , while (see Figures 7(b) and 7(c)).

To improve the average abundance, the ideal measure is to increase the multiplication factor, reduce the cost of cooperators, or both. However, in order to ensure the effectiveness of the operation in the actual battlefield, the cost is difficult to reduce or even increase. Therefore, it is necessary to consider increasing both and . Figure 8 shows the change of average abundance when and increase at the same time ( increases by 50%, and increases by 69% and 73%, respectively). Accordingly, as long as increases by more than 73%, not only can the adverse effect of cost increasing on average abundance be offset, but also the cooperation in swarm can be promoted.

Unfortunately, the above regulation can only achieve a limited increase in the average abundance; that is, it cannot make the average abundance greater than 0.5. According to the conclusions from Figures 5 and 6, the conversion of dominant strategies (a large increase in average abundance) depends on a large selection intensity and a large unit variation , and thus we further increase under the premise of increasing by 50% (see Figure 9). According to the results in Figure 9, when and , will be greater than 0.5; when and , will be greater than 0.5.

The increase of means that the hitchhiker will no longer get as much payoff as the cooperator, and the decrease of payoff will directly increase the strategy update probability , so then more units tend to cooperate (more betrayers transfer to cooperators).

According to the above simulation results and conclusions, we can consider the following measures from two dimensions of management and technology in the actual control of unmanned swarm cooperation: (1)Increase the multiplication factor value of cooperators as much as possible. For example, with the help of advanced management means, for each combat unit in the swarm, its investment (i.e., cost ) in previous operations can be accumulated, and those with higher cumulative investment will be given more supplies (e.g., ammunition) or higher supply priority in the follow-up operations.(2)Minimize the cost for each operation. For example, with the help of advanced technology means, improve the reliability and survivability of combat units or the strike accuracy and damage-power of ammunition.

In addition, since ,, and are closely related to specific operation tasks, it is also necessary to discuss specific control measures in combination with operation tasks under the limitation of parameter value range.

7. Conclusion

The advantage of unmanned swarm operation lies in its autonomy that it can continually conduct cooperative operation efficiently in case of combat unit damage or communication failure. This work aims at the autonomous collaboration of the unmanned swarm under the failure of centralized control mode and proposes a cooperative evolution mechanism within the framework of multiplayer public goods evolutionary game. We get the average abundance function by theoretical derivation and then simulate the influence of different parameters (i.e., , , and ) on the abundance. The simulation results of unmanned swarm fire attack show that increasing the multiplication factor and reducing the cost can improve the average abundance of cooperators; furthermore, when the unit variation is large, not only has a high regulation sensitivity, but also can realize the switching of the dominant strategy. Finally, we suggest some proposals to provide an exploration for the transformation from theory to application.

The evolution of cooperation is a fascinating topic that has been studied from different perspectives and theoretical approaches. Our approach by means of multiplayer public goods evolutionary game sheds new light on how to study and analyze the evolution of cooperation in the unmanned swarm. In our work, we assume that the units in the swarm are homogeneous, which indicates a globally consistent in the process of strategy updating. However, in reality, different units (firepower units, intelligence units, etc.) probably have various requirements for . Thus, how to get the average abundance and explore the cooperative evolution mechanism when multiple coexist will be our further work.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 71901217); the National Key R&D Program of China (no. 2018YFC0806900); China Postdoctoral Science Foundation funded Project (no. 2018M633757); and the Primary Research and Development Plan of Jiangsu Province under Grant no. BE2019762.