Abstract

Network defenders always face the problem of how to use limited resources to make the most reasonable decision. The network attack-defense game model is an effective means to solve this problem. However, existing network attack-defense game models usually assume that defenders will no longer change defense strategies after deploying them. However, in an advanced network attack-defense confrontation, defenders usually redeploy defense strategies for different attack situations. Therefore, the existing network attack-defense game models are challenging to accurately describe the advanced network attack-defense process. To address the above challenges, this paper proposes a defense strategy selection method based on the network attack-defense wargame model. We model the advanced network attack-defense confrontation process as a turn-based wargame in which both attackers and defenders can continuously adjust their strategies in response to the attack-defense posture and use the Monte Carlo tree search method to solve the optimal defense strategy. Finally, a network example is used to illustrate the effectiveness of the model and method in selecting the optimal defense strategy.

1. Introduction

With increasingly complex network environment and diverse attack methods, network defenders with limited defense resources can hardly solve all network flaws and defend against all attacks. Therefore, the key to network defense is to make the most use of limited resources for the most reasonable defense decisions. Selection of network defense strategy seeks the equilibrium point for network attack-defense game, with attack-defense revenue thoroughly considered [1]. Currently, network defense strategy selection technologies mainly include attack-defense models [2, 3], strategy quantification and selection [4], and game theory [57]. The network attack-defense game model is an effective means to solve this problem [8]. In order to quantitatively analyze the game process elements such as attack-defense scenarios, processes, and cost-revenue, the attack-defense game model is indispensable.

Nevertheless, the existing network attack-defense game model usually assumes that the defender will no longer change after deploying the defense strategy. However, in advanced network attack and defense confrontation, the defender usually redeploys defense strategies for different attack situations. Therefore, how to accurately reflect the dynamic interactions and deduction processes between attackers and defenders and how to select the optimal defense strategy have drawn extensive attention from domestic and foreign scholars.

According to the abovementioned problems, this paper aims to study the defense strategy selection method in the dynamic game network. We model the high-level network attack-defense confrontation process as a turn-based wargame in which both attackers and defenders can continuously adjust their strategies in response to the attack-defense posture and use the Monte Carlo tree search method to solve the optimal defense strategy. We conclude our contributions as follows:(i)We propose a formal description method for the selection of optimal defense strategies, which formally defines the selection of optimal defense strategies for network security(ii)We propose a network attack-defense wargame model, which is a turn-based wargame and both attackers as defenders can continuously adjust their strategies in response to the attack-defense posture(iii)We propose a defense strategy selection method based on Monte Carlo tree search, using artificial intelligence methods to analyze the attack-defense strategies(iv)We design a simulation instance which is used to illustrate the effectiveness of the model and algorithm in selecting the optimal defense strategy

The rest of this paper is structured as follows. The second section discusses the related work. The third section details the formal description of the optimal defense strategy. The fourth section discusses the network attack-defense wargame model. Continuing on this model, the fifth section uses Monte Carlo tree search to select the defense strategy method. The sixth section proposes an example to illustrate the effectiveness of the model and algorithm. The seventh section gives the comparison of related work. Finally, the eighth section summarizes the paper and proposes future work.

Although some research results have been achieved about attack-defense models [2, 3], strategy quantification and selection [4], and game theory [57], it is still in its infancy and no systematic theoretical methods have been formed.

Lee et al. [9] first proposed a cost-sensitive model as the basis of response decisions in 2002, which determined whether to respond or not according to the attack cost and revenue. The decision-making idea was relatively simple, and the quantification of cost-revenue was relatively rough. However, the ideas and methods of cost-revenue quantification, classification, and attack classification can be used for reference. Li et al. [10] established a noncooperative game model between attacker and sensor trust node and gave the optimal attack strategy by calculating Nash equilibrium. Because of the complexity of the restriction conditions of Nash equilibrium, Serra et al. [11] used the Pareto optimization method to calculate the Nash equilibrium solution of the game. Esmalifalak et al. [12] took the attack and defense times as the basic strategy of both sides, established a complete information two-person zero-sum game model, and verified it in the system. Wu et al. [13] used the reinforcement learning algorithm to solve Nash equilibrium and realize security situation analysis and prediction of an intelligent system. Liu et al. [14] investigated how to achieve such a trade‐off optimally of cost-revenue by proposing a two‐player strategic game model between the attack and the defender. Then, a graph‐based simulated annealing algorithm is proposed to derive the utility‐maximising strategy. Wang et al. [15] analyzed the influence of the selection of cooperation strategy on the cooperation effect of sensor network nodes with the help of an evolutionary game. Na et al. [16] and Abass et al. [17] calculated the optimal evolutionary stability strategy against DoS attack and APT attack by using replication dynamic equation, respectively. Hayel and Zhu [18] established an evolutionary Poisson game model between malware and antivirus programs and analyzed the program opening strategy about the replication dynamic equation. Consider that the randomness of attack and defense means would inevitably lead to the state jump of the game system.

3. Formal Description of Optimal Defense Strategy Selection

The complex network topology, coupled with the various network node states, is difficult to describe in real-time. For example, in the network structure of n nodes, there are kinds of different situation combinations, where represents the authority status of the node and represents the state type of the network topology. Therefore, a formal description of the network attack-defense environment can greatly reduce the computational complexity.

Definition 1 (network topology matrix). For the network topology, it can be represented by a two-tuple . If an n-order square matrix is used to represent the network topology, the square matrix satisfies the following formula:where the two-tuple G represents the network topology , V represents the set of nodes in the network , and E represents the set of edges .

Definition 2 (vulnerabilities of network nodes). represents the vulnerability of node in the network as follows:

Definition 3 (network attack reachable node vector). For the network attack reachable node vector at time t, it is represented by the following vector:where n is the number of nodes in the network. Each node vector is calculated as follows:

Definition 4 (attack-defense posture vector). Attack-defense posture vectors can reflect the permission of each node in the network. In this paper, we believe that the permission of nodes in the network does not belong to the attacker; it must belong to the defender. It is represented by the following vector:where n is the number of nodes in the network.

Definition 5 (the attack strategy node vector ). It represents the next attack node under the state . The attack strategy vector is expressed as follows:where each node vector is calculated as follows:where there is only in , and means that node i is the node of the current round of confrontation game.
From the definition, we can know that . At time t, the nodes involved in the next attack strategy must belong to the reachable nodes of the network attack.

Definition 6 (target vector ). After several strategy combinations, the target is reached, and the target vector is expressed as follows: where represents that the goal of the attacker is to obtain the permission of the node target.
According to the above description, under certain attack-defense resources, based on the attack-defense game rules, the attacker’s permission to seize the node target can be described as given network topology , an initial attack to reach the node vector , and the initial attack-defense situation vector , after several strategies , and the target is reached. The defense strategy selection problem implemented by the defender can be described as how the defender allocates the defense resource for each attack strategy of the attacker.

4. Network Attack-Defense Wargame Model

Wargame [19] is a kind of turn-based, role-playing, strategy game. This paper models the high-level network attack-defense confrontation process as a turn-based wargame in which both attackers and defenders can continuously adjust their strategies in response to the attack-defense posture.

4.1. Model Hypothesis

The hypothesis of the network attack-defense wargame model is as follows.

Hypothesis 1 (rational hypothesis). Assuming that the attacker and the defender are completely rational, the attacker will not launch unprofitable attacks and the defender will not defend at all costs.

Hypothesis 2 (cost assumptions). The goals of the attacker and the defender are to obtain and protect their network equipment. Both parties can be quantified and measured during the game to the offensive and defensive costs.

Hypothesis 3 (game hypothesis). Assume that the winner can replace the loser in the attack-defense game to obtain all the permissions of the node. If the attacker succeeds, it means that he will not be discovered by the defender and will proceed to the next round of attack-defense games. If the attacker fails, the defender can redeploy defense measures which are very important and effective. The existing network attack-defense game model usually assumes that the defender will not change after deploying the defense strategy, but in a high-level network attack and defense confrontation, the defender usually redeploys defense strategies for different attack situations. Last, if the game is flat, it means that the attacker is not discovered by the defender, and this node can be used as a springboard node for the next round of the game.

Hypothesis 4 (attack hypothesis). Assume that at the beginning of the attack, the permissions of all nodes in the network belong to the defender and no less than two network devices are exposed to the external network. If there is only one entry node, the defender only needs to protect the node, and there is no game process.

4.2. Formal Description

In the network attack-defense game, the attacker finally obtains the target node permissions by obtaining the node permissions of the defender on the attack path. The following shows the network attack-defense wargame model in the multigame state.

Definition 7. The network attack-defense wargame model (NADWM) is given below. It can be represented by the eight-tuple.
NADWM = (Participant, Time, State, Strategy, Resource, Cost, ), where(i)Participant = {p1, p2, p3, ……, pn} is the set of players in the game, where . Participants are individuals or organizations that participate in the game and independently make decisions and bear the results. In this paper, participants are the attacker and the defender .(ii)Time = {t1, t2, t3, ……, tn}. Network attack-defense is a dynamic and continuous confrontation process, which needs to be modeled and analyzed from the time dimension. Time is a set of attack-defense time series.(iii)State = {state1, state2, state3, …, staten }. There are many states of network attack-defense games. Statet is the state of the player at time t in the current attack and defense process(iv)Strategy = {, }. Strategy represents the strategy space. represents the strategy space of the attacker and represents the strategy space of the defender.(v)Resource = {}. Resource represents the resource of the attacker and the defender. presents the resource value of the attacker at time , and represents the resource of the defender at time t. When , the attacker is not enough to launch an attack, and the game is judged to be a failure.(vi). represents resource allocation vector.Attacker resource allocation vector  = (), where is the number of nodes in the network and represents the attack force distribution of the attacker at each node at time . There is one and only one in , which represents that the attacker attacks at node i at time t.
Defender resource allocation vector  = (), is the number of nodes in the network and represents the defense force distribution of the defense side at each node at time .(i)Cost = {} represents the attack-defense cost function at time in the network attack-defense game process, including the attacker cost , which represents at time t the attacker attack the defender in node by using vulnerability and the defender cost represents the defense force deployed by the defender on node at time t.(ii) represents the attack and defense revenue function at time in the network attack-defense game process, including the attacker revenue , which represents the revenue of the attacker attacking the defender at time t, and the defender revenue , which represents the revenue of the defender defending the attacker at time t.

4.3. Cost-Revenue Quantification and Attack-Defense Strategy
4.3.1. Cost-Revenue Quantification

In the process of the network attack-defense game, the cost-revenue function needs to be quantified.

means the cost that the attacker is attacking the defender in node by using vulnerability at time t. The attack cost in this model represents the value of vulnerabilities used by the attacker, assuming that the attacker purchases vulnerabilities and exploits tools through the vulnerability market. So, higher value of vulnerability leads to higher cost of attack cost, which further leads to greater difficulty and higher cost of defense. However, once the vulnerability is found by defenders, it will lose value as defenders can simply deploy firewall rules.

This paper uses the vulnerability price proposed in the VulDB library to evaluate the attack cost. VulDB is the number one vulnerability database worldwide with more than 178000 entries available. Its specialists work with the crowd-based community to document the latest vulnerabilities every day since 1970, providing technical details, and additional threat intelligence such as current risk levels and exploit price forecasts. Their price estimations are calculated via mathematical algorithms developed by their specialists over the years through observing the exploit market and exchange behavior of involved actors. It allows the prediction of generic prices by considering multiple technical aspects of the affected vulnerability [20].

This paper quantifies the cost of attack into 9 different levels according to the price range proposed by VulDB, . Higher level represents higher cost of attack as well as higher value of vulnerability. The price and level of the vulnerability are shown in Table 1.

(1). The revenue of the attacker represents at time t. In the attack-defense game, the attack revenue increases by k when they get the permission of the nontarget node, which is the path node revenue. When the attacker obtains the permission of the target node, the attack revenue increases by , where n is the number of nodes in the network. In the process of attack, the ultimate goal of the attacker is to obtain the permission of the target node, so the sum of attack revenue obtained by the node on the attack path should be less than or equal to the attack revenue obtained by the target node. Otherwise, the attacker can achieve the maximum attack revenue as long as he focuses on the attack process. In this paper, for the convenience of calculation, k = 1:

(2). The cost of defender represents the defense cost of the defender deployed on the node . There are many kinds of defender costs, such as manpower, equipment, and resources. In this paper, we do not consider the specific methods of defense but only pay attention to the allocation of limited defense resources to the network topology. Corresponding to the attack cost, the defense cost is quantified into 10 different levels, . The higher the level, the more defense resource deployed in the node.

(3). The revenue of the defender represents the defense revenue at time t. Since both attack and defend belong to a zero-sum game, [21]:

4.3.2. Attack-Defense Strategy

In the process of network attack-defense game, it is necessary to define the attack and defense strategy.

Definition 8. Attack and defense strategy set . It represents the set of action strategies taken by the attacker or defense in the game statek.where , , and
represents the strategy of attack resource allocation, where has one and only one . It means that the attacker launches an attack at node i at this time, and .where , , and .
represents the strategy of defense resource allocation at each node in this state.

Definition 9 (Nash equilibrium [22]). The game model of NADWM is a zero-sum random game. In the game statek, attack and defense strategy set can be expressed as follows:The stable mixed strategy is a Nash equilibrium if and only if the mixed strategy is the optimal response of both offense and defense.
Due to the huge number of strategy choices, this paper uses the Monte Carlo tree search method in the fifth part to solve the Nash equilibrium. The final task of Monte Carlo tree search is to find the Nash equilibrium. The way to find is through continuous selection, expansion, and simulation and finally backpropagation whether the strategy on this path is appropriate. The victorious party increases all the utility, while the losing party reduces the action utility, and a balanced state can be reached finally.

4.4. State and State Transition Rules

The construction of the network attack-defense game process consists of three steps: determining the initial state of the model, setting the rules of game state transition, and the termination state of the game.

4.4.1. The Initial State of the Model

To start the game, the following conditions must be determined:(i)In the game process, there is only one attacker pA and one defender pD in the target network structure.(ii)In the initial stage, the permission of all nodes belongs to the defender, and the defender has no less than 2 network devices exposed in the outer network. If only one device, the optimal strategy of the defender is to put all attack resources in this outer network node, and there is no intelligent game.(iii)The defender distributes the defense costs for all nodes in the network.

4.4.2. Game State Transition Rules

The game state transition rules satisfy the following two conditions:(i)After the game starts, the attacker can move randomly in any direction within the network topology reachable nodes, and each move step needs to consume the attack cost .(ii)In the process of network attack-defense, the attacker has a strong pertinence when attacking a target, while for the defender, it has a wide range of universality in the defense process. Simply put, an exploit attack may only apply to a certain environment of a certain node in the network, and in most cases, the defensive tools apply to both computers and servers. Therefore, during each attack, only the attacker will deduct the cost of the attack from its total resource. Then, the game is played according to the defense cost and attack cost. When attacker wins, node permissions belong to the attacker, the defender does not detect the attacker, and judge whether it is a target node. If it is not a target node, and . Then, the attacker will proceed to the next attack. When , both attacker and defender are tied and the node belongs to the defender, but the attacker can reach the next node from this node. When , defender wins. Defenders can redistribute the defense revenue during each round of the game.

4.4.3. Game Termination State

The game ends if any of the following conditions were met:(i)The attack resource is less than or equal to 0.(ii)The attacker has no new reachable nodes to choose from.(iii)The attacker arrives at the target node. is updated and .

In the process of attack-defense games, the goal of the defender is to ensure the security of the target node, as much as possible to ensure the security of other nodes in the network. In other words, the defender is to reduce the attack revenue under the premise of a certain amount of attack-defense resources. The simulation operation process of NADWM is shown in Figure 1.

5.1. Algorithm Idea

In the model of NADWM, as a large number of game states lead to a huge amount of computation, it is difficult to select the optimal defense strategy by using the enumeration method. This paper adopts the Monte Carlo tree search method, which is a heuristic search algorithm based on the tree data structure, which is still effective in the huge search space [23]. For different attack states, this scheme provides a relatively suitable defense strategy for the defender.

5.2. Algorithm Description

Recently, Monte Carlo tree search is boosting the performance of computer Go playing programs which is a tree search strategy that balances the history and future returns. The basic principle is to randomly select the maneuver strategy and then update the value of the originally selected strategy through expected return. This algorithm makes a large number of repeated random simulations until the best strategy appears. Specifically, MCTS is divided into 4 parts, Selection, Expansion, Simulation, and Backpropagation. It is empirically proved that the performance of MCTS scales well against the number of simulations to select an optimal move in computer Go. In addition, developing efficient parallel MCTS (PMCTS) algorithms is important to improve the performance because single processor’s performance may not be expected to increase as used to [24]. The PMCTS principle is shown in Figure 2.

5.2.1. Monte Carlo Tree Search Steps

Monte Carlo tree search is essential to maintain a tree, in which each node corresponds to a specific situation state R. The edge of this node is composed of all game actions of both attack-defense parties in state R [25], as shown in Figure 3.

Step 1. Selection. Select the node to go next.

Definition 10. represents the number of times the node performs action p in state R

Definition 11. represents the attack revenue of the node to perform action p in state R.

Definition 12. represents the average attack revenue obtained by the node performing a series of actions p in state R and reflects the level of attack revenue that state R can provide, which is calculated as the following equation:

Definition 13. represents the relative defense revenue. Because the attack-defense parties belong to a zero-sum game [19], the higher the revenue of the attack, the worse the defense strategy, and is calculated as the following equation:where is a positive number to ensure .
To select a node at the current position, the defender needs to select one from all legal strategies that satisfy the rules of the game and also satisfy the following equation:where presents the relative defense revenue, represents the upper limit of the confidence interval of , and is calculated as the following equation:where c is the priority probability stored in the strategy branch, is the number of times the action pj has been executed, and is the total number of the exploration policy so far. Parameter c can be selected by the expert knowledge in the actual process, the larger the c, the more attention will be paid to the nodes with relatively few visits [26].
It can be seen that can measure the degree of policy exploration as the degree of uncertainty of . The addition of can improve the situation of the simple greedy policy which is easy to fall into the local minimum. [27].

Step 2. Expansion. In order to parallelize the algorithm, we modify the expansion stages as the way proposed in ref [28]. After selecting the attack strategy by , we expand the node into m random children rather than a single child. In this paper, we assume that the defense resources of the defender can be quantified as a nonnegative integer. Since the defender has at most resource + 1 methods of defense force deployment, m is less than resource + 1.

Step 3. Simulation. Selection and expansion are the simulation process, and the simulation process must follow the rules of the game. There are two simulation end conditions: one is that the simulation reaches the leaf node and the other is that the simulation reaches the end state of the strategy and cannot be expanded. In order to parallelize the algorithm, we modify the simulation stages as the way proposed in ref [28] too. Rather than simulating out the child state only once, we simulate each child state k times. Here, k is a parameter that can be determined using m and N. N is the number of nodes in the cluster. k is at least N/m. This is to ensure that every node in the cluster is occupied during the simulation stage.

Step 4. Backpropagation. After simulation process is completed, the parameters of all edges in the simulation path must be updated. This process can reflect how the Monte Carlo tree search samples stronger strategic actions. The update method of a single simulation process is as the following equations: Among them, the formula updates two related variables, the number of visits of each edge is increased by 1, and the attack revenue is accumulated . represents the increase in the cumulative attack revenue during the expansion process and finally calculates the new .
Through the above steps, it can be concluded that as the number of sampling increases, the Monte Carlo tree grows and the coverage state becomes more. The shape of the final tree is usually unbalanced. Some states are searched very deep and some are very shallow. This also reflects the advantages of Monte Carlo tree search. The most potential branches will be fully searched to a very deep level.
At this point, the Monte Carlo tree search has been sampled, and the information of all edges has been updated. Based on the finally formed Monte Carlo tree, the defender can make a defense strategy for the actual attack action of the current situation.

5.3. Algorithm Analysis

The algorithm complexity can be simply expressed as O(mkI/C) where m and k are the same as Section 5.2, I is the number of iterations, and C is the number of cores available [28].

6. Experiment and Analysis

According to the network game rules, a simulation experiment is designed to test and analyze the effectiveness and applicability of the model and algorithm proposed in this paper.

6.1. Network Attack-Defense Environment
6.1.1. Topological Structure Description

This paper assumes that there is a typical network topology as shown in Figure 4. In the beginning, the attacker penetrates the internal network from the host , and the defender protects network information system , including 3 desktops, 4 laptops, 2 printers, and 4 servers. It is divided into 4 different subnets, and the target of the attacker is .

6.1.2. Formal Description of the Optimal Defense Strategy

A formal description of the attack-defense strategy is carried out according to the method in Section 3. Initially, the attacker has the authority of node and launches an attack on the network system V. The final attack target is .

The network topology matrix A is expressed as follows:

The formal expression of the attacker’s target is as follows:, which means that the attacker obtains the permission of the target . The initial network attack reachable node is formally expressed as follows: , which means that the attack can attack nodes at the first.

In summary, based on the network shown in Figure 4, the optimal defense strategy selection problem can be described as follows: on the certain attack-defense resources, based on the rules of the wargame model, the given network structure is , an initial reachable node vector , and initial attack-defense situation vector , the optimal defense strategy for each attack strategy is found.

6.2. Description of Attack-Defense Resources and Attack-Defense Strategies

For the convenience of calculation, is set at the beginning of the game, which means that the resources of the attacker and the defender are both 10 at first.

In this simulation experiment, when the attacker obtains the authority of node with a certain cost, it has the highest root privilege.

Give an example of the description of the attack-defense strategy in the game. When the attacker obtains the privilege of through in subnet A, , the next attack can reach the node vector:. At this time, the attacker has 21 types of attack-defense strategies. Among them, there are a total of 3 attack nodes to choose from and then the attack power chooses on the determined node in the set as follows:.

In the above example, the attacker can adopt a set of action strategies as follows:where ,

At the same time, the defender adopts a set of actions as follows:which satisfies and .

6.3. Defense Strategy Selection

In this example, we set two typical attack-defense states that defenders must pay more attention to during the attack-defense game, that is, the beginning of the game and the end of the game. The selection of the defense strategy is discussed based on the abovementioned state using the method proposed in this paper.

6.3.1. At the Beginning of the Game

The following discusses the defensive force distribution of the defender at the beginning of the attack.

The defender firstly allocates the defense resource for each node in the network. There are tens of thousands of defense resource deployment plans for the abovementioned network. Analyzing the topological diagram of the network structure shown in Figure 4, it can be seen that node is a necessary node from subnet C to subnet D whose authority attribution plays a key role in the network attack-defense. Combined with the network topology, this paper selects the following four typical game initial states for analysis.Strategy 1: Strategy 2: Strategy 3: Strategy 4:

Considering both the path revenue and the target node revenue of the attacker, the strategy is evaluated according to the relative defense revenue. The larger the relative defense revenue, the better the defense effect of the strategy.

The above four typical strategies are analyzed by relative defense revenue under different simulation times. As shown in Table 2, rows indicate the number of iterations and columns indicate different strategies.

As shown in Table 2, we can see that when the number of simulations increases, the score of relative defense revenue of the defense strategy gradually stabilizes. We can distinguish optimal strategies, among them . According to the simulation results, we analyze the four strategies in detail.

Strategy 1. This strategy is to pay attention only to the network entry node.In this strategy, we concentrate the defense resource on the two entrance nodes, and each node deploys 5 resources. During the first attack, the attacker has an initial attack resource of 10, so the probability of the attacker entering the intranet is 50%. Suppose the attacker first attacks node , if the attack is successful, the attacker can drive straight into the network and obtain the permission of the node directly. At the same time, the gains on the attack path are relatively large, making the relative defense revenue at a medium level.

Strategy 2. This strategy is to pay attention only to the key nodes of the network.In this strategy, we concentrate all the defense resources on the key node . During the attack, the attacker can successfully obtain the following 9 nodes due to the attack resource , yet the attacker cannot get the permission of and thus cannot enter the subnet D. This ensures the security of the target node but let the attacker process gain more as  = 8. It is not acceptable that this strategy protects the target but loses a lot of node permission along the path. If the security of the important data server is ensured yet network devices such as computers and printers are all implanted with Trojan horses or file encryption by the attacker, the company still cannot operate normally. So, in our opinion, this strategy is not a good strategy. This result is in line with our simulation result, which shows the effectiveness of the wargame model.

Strategy 3. This strategy is to pay attention to network entry nodes and key nodes at the same time.This strategy is a compromise between the above two strategies. During the attack-defense process, the defender pays attention to the entry node and key nodes at the same time. That is to say, on the way to the target node, the defender balances the defense resource and network structure characteristics, setting two safety protection measures. Data analysis from Table 2 shows that it is the most suitable defense deployment plan. The abovementioned plan also conforms to our common methods of the network protection, which proves the practicability of the wargame model.

Strategy 4. This strategy is to randomly select and defend 10 nodes on the network and divide the defense resources equally.This strategy is random. When is greater than 1, the attacker can attack the node where the defender deploys the defense without being discovered. However, since the defensive resource of the defender is scattered, when the cost of attack is high, it cannot defend effectively.

6.3.2. Near the End of the Game

The following discusses the attack-defense game state and the attack-defense force distribution at the end of the game. At this time,  = 5 and  = 10.

The means that the attacker has obtained the node of in subnet C and will launch an attack on . Since the attacker has used 5 attack resources when reaching the node , he can use not more than 5 attack resources in . It is very important that node is the only entry node for subnet C to subnet D. To ensure the safety of the target node , the defense deployment of the defender in is shown in Figure 5.

In this state, when the attacker uses to attack the defender, the target node is effectively protected if the defense cost , and when the attacker uses to attack the defender, the target node is effectively protected if the defense cost . As a result, when the defender discovers that the attacker is approaching the target node, the defense should be concentrated on the key node entering the subnet entrance.

Through the abovementioned analysis, it can be concluded that the defender should pay more attention to the entry nodes and key nodes at the beginning of the game, while focusing on the key node entering the last subnet entrance when the attacker approaches the large node. The abovementioned strategy of the typical states also conforms to our common defense strategy in network protection, which proves the practicability of the model and algorithm in selecting the optimal defense strategy.

This section will compare our work with related work, as game models, defense strategy, the rules of the game, and so on.

Liu et al. [29] used the state attack-defense map in combination with the security vulnerability assessment system to calculate and modeled the utility matrix. The optimal attack-defense decision was made by calculating the mixed strategy Nash equilibrium. However, since this defense strategy corresponded to the attack strategy one by one, it did not consider all possible defense strategies. Tosh et al. [30] proposed an evolutionary game model framework for cyber threat intelligence sharing. However, there existed excessive subjectivity in the quantitative calculation of the cost of attack-defense strategies. Lin et al. [31] proposed a full-information dynamic active defense game model by converting the “virtual node” into a game tree. Despite giving an algorithm for the game of attack-defense that suits two scenarios of complete and incomplete information, the algorithm did not fully consider the attacker’s intentions. Zhang et al. [32] proposed the heterogeneous population evolutionary game model and then the decision method of network security defense and improved the accuracy of network security defense decision.

The comparison of the characteristics of this paper and related research is shown in Table 3.

Compared with the related work, the defense strategy selection method based on the network attack-defense wargame model has the following characteristics:(i)We model high-level network attack-defense confrontation as a turn-based wargame in which both attackers and defenders can continuously adjust their strategies in response to attack-defense posture.(ii)We add a higher and stronger defense strategy, by which the defender can redistribute defense resources when discovered that the target node is attacked.(iii)We propose the wargame model as a multistate dynamic attack-defense game.(iv)We use Monte Carlo tree search method to solve the optimal defense strategy.

8. Conclusions and Future Work

This paper proposed a defense strategy selection method based on the network attack-defense wargame model. We modeled the high-level network attack-defense confrontation process as a turn-based wargame in which both attackers and defenders could continuously adjust their strategies in response to the attack-defense posture. Based on the idea of artificial intelligence, we used the Monte Carlo tree search method to solve the optimal defense strategy in the game confrontation environment. Finally, a simulation model is designed to analyze the attack-defense process of the target network with rapid modeling and quantitative calculation.

Our future research will focus on the following 3 points. Firstly, stronger network attack-defense strategies will be added to the attack-defense wargame model. Secondly, the impact of defense deployment will be analyzed in key sections game. Thirdly, the applicability of the attack-defense game model will be improved with distributed coordinated attack-defense considered.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Key R&D Program of China Nos. 2019QY1302 and 2019QY1305.