Research Article | Open Access
Thanh H. Nguyen, Mason Wright, Michael P. Wellman, Satinder Singh, "Multistage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis", Security and Communication Networks, vol. 2018, Article ID 2864873, 28 pages, 2018. https://doi.org/10.1155/2018/2864873
Multistage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis
We study the problem of allocating limited security countermeasures to protect network data from cyber-attacks, for scenarios modeled by Bayesian attack graphs. We consider multistage interactions between a network administrator and cybercriminals, formulated as a security game. This formulation is capable of representing security environments with significant dynamics and uncertainty and very large strategy spaces. We propose parameterized heuristic strategies for the attacker and defender and provide detailed analysis of their time complexity. Our heuristics exploit the topological structure of attack graphs and employ sampling methods to overcome the computational complexity in predicting opponent actions. Due to the complexity of the game, we employ a simulation-based approach and perform empirical game analysis over an enumerated set of heuristic strategies. Finally, we conduct experiments in various game settings to evaluate the performance of our heuristics in defending networks, in a manner that is robust to uncertainty about the security environment.
Attack graphs are graphical models used in cybersecurity research to decompose complex security scenarios into a hierarchy of simple and quantifiable actions [1–3]. Many research efforts have employed attack graphs or related graphical security models to analyze complex security scenarios [4, 5]. In particular, attack graph models have been used to evaluate network hardening strategies, where a network administrator (the defender) deploys security countermeasures to protect network data from cybercriminals (the attacker) [6–13].
Attack graphs are particularly suitable for modeling scenarios in moving target defense (MTD) , where the defender employs proactive tactics to dynamically change network configurations, limiting the exposure of vulnerabilities. MTD techniques are most useful in thwarting progressive attacks [15–17], where system reconfiguration by the defender prevents the attacker from exploiting knowledge accumulated over time. Attack graphs naturally represent the progress of an attack, and the defense actions in our model can incorporate MTD methods.
We build on prior works that represent security problems with attack graphs, and we take a game-theoretic approach to reasoning about the strategic interaction between the defender and the attacker. Building on an existing Bayesian attack graph formalism [11, 18], we model the problem as a simultaneous multistage attack graph security game. Nodes in a Bayesian attack graph represent security conditions of the network system. For example, an SSH buffer overflow vulnerability in an FTP server can be considered a security condition, as can user privileges achieved as a result of exploiting that vulnerability. The defender attempts to protect a set of goal nodes (critical security conditions) in the attack graph. Conversely, the attacker, starting from some initial security conditions, follows paths through the graph to undermine these goal nodes. At every time step, the defender and the attacker simultaneously take actions. Given limited security resources, the defender has to decide for which nodes of the attack graph to deploy security countermeasures. Meanwhile, the attacker selects nodes to attack in order to progress toward the goals. The outcome of the players’ actions (whether the attacker succeeds) follows a stochastic process, which represents the success probability of the actions taken. These outcomes are only imperfectly observed by the defender, adding further uncertainty which must be considered in its strategic reasoning.
Based on our game model, we propose various parameterized strategies for both players. Our attack strategies assess the value of each possible attack action at a given time step by examining the future attack paths they enable. These paths are sequences of nodes which could feasibly be attacked in subsequent time steps (as a result of attack actions in the current time step) in order to reach goal nodes. Since there are exponentially many possible attack paths, it is impractical to evaluate all of them. Therefore, our attack heuristics incorporate two simplifying approximations. First, we estimate the attack value for each individual node locally, based on the attack values of neighboring nodes. Attack values of goal nodes, in particular, correspond to the importance of each goal node. Second, we consider only a small subset of attack paths, selected by random sampling, according to the likelihood that the attacker will successfully reach a goal node by following each path. We present a detailed analysis of the polynomial time complexity of our attack heuristics.
Likewise, our heuristic defense strategies employ simplifying assumptions. At each time step, the defense strategies: (i) update the defender’s belief about the outcome of players’ actions in the previous time step and (ii) generate a new defense action based on the updated belief and the defender’s assumption about the attacker’s strategy. For stage (i), we apply particle filtering  to deal with the exponential number of possible outcomes. For stage (ii), we evaluate defense candidate actions using concepts similar to those employed in the attack strategies. We show that the running time of our defense heuristics is also polynomial.
Finally, we employ a simulation-based methodology called empirical game-theoretic analysis (EGTA) , to construct and analyze game models over the heuristic strategies. We present a detailed evaluation of the proposed strategies based on this game analysis. Our experiments are conducted over a variety of game settings with different attack graph topologies. We show that our defense strategies provide high solution quality compared to multiple baselines. Furthermore, we examine the robustness of defense strategies to uncertainty about game states and about the attacker’s strategy.
2. Related Work
Attack graphs are commonly used to provide a convenient representation for analysis of network vulnerabilities [1–3]. In an attack graph, nodes represent attack conditions of a system, and edges represent relationships among these conditions: specifically, how the achievement of specific conditions through an attacker’s actions can enable other conditions. Various graph-based security models related to attack graphs have been proposed to represent and analyze complex security scenarios . For example, Wang et al.  introduced such a model for correlating, hypothesizing, and predicting intrusion alerts. Temporal attack graphs, introduced by Albanese and Jajodia , extend the attack graph model of Wang et al.  with additional temporal constraints on the unfolding of attacks. Augmented attack trees were introduced by Ray and Poolsapassit  to probabilistically measure the progression of an attacker’s actions toward compromising the system. This concept of augmented attack trees was later used to perform a forensic analysis of log files . Vidalis et al.  presented vulnerability trees to capture the interdependency between different vulnerabilities of a system. Bayesian attack graphs  combine attack graphs with quantified uncertainty on attack states and relations. Revised versions of Bayesian attack graphs incorporate other security aspects such as dynamic behavior and mitigation strategies [18, 26]. Our work is based on the specific formulation of Bayesian attack graphs by Poolsappasit et al. . The basic ideas of our game model and heuristic strategies would also apply to variant forms of graph-based security models with reasonable modifications. In particular, our work might be extended to handle zero-day attacks captured by zero-day attack graphs [27–30] by incorporating payoff uncertainty as a result of unknown locations and impacts of zero-day vulnerabilities. In the scope of this work, we consider known payoffs of players.
Though many attack graph models attempt to analyze different progressive attack scenarios without considering countermeasures of a defender, there is an important line of work on graph-based models covering both attacks and defenses . For example, Wu et al. introduced intrusion DAGs to represent the underlying structure of attacker goals in an adaptive, intrusion-tolerant system. Each node of an intrusion DAG is associated with an alert from the intrusion-detection framework, allowing the system to automatically trigger a response. Attack-Response Trees (ARTs), introduced by Zonouz et al. , extend attack trees to incorporate possible response actions against attacks.
Given the prominence of graph-based security models, previous work has proposed different game-theoretic solutions for finding an optimal defense policy based on those models. Durkota et al. [8, 9] study the problem of hardening the security of a network by deploying honeypots to the network to deceive the attacker. They model the problem as a Stackelberg security game in which the attacker’s plans are compactly represented using attack graphs. Zonouz et al.  present an automated intrusion-response system which models the security problem on an attack-response tree as a two-player, zero-sum Stackelberg stochastic game. Besides Stackelberg games, single-stage simultaneous games are also applied to model the security problem on attack-defense trees, and Nash equilibrium is used to find an optimal defense policy [6, 7, 10].
Game theory has been applied for solving various cybersecurity problems [31–38]. Previous work has modeled these security problems as a dynamic (complete/incomplete/imperfect) game and analyzed equilibrium solutions of those games. Our game model (to represent security problems with attack graphs) belongs to the class of partially observable stochastic games. Since the game is too complex for analytic solution, we focus on developing heuristic strategies for players and employing the simulation-based methodology EGTA to evaluate these strategies.
Similar to our work, the non-game-theoretic solution proposed by Miehling et al.  is also built on the attack graph formalism of Poolsappasit et al. . In their work, the attacker’s behavior is modeled by a probabilistic spreading process, which is known by the defender. In our model, on the other hand, both the defender and the attacker dynamically decide on which actions to take at every time step, depending on their knowledge with respect to the game.
3. Game Model
3.1. Game Definition
Definition 1 (Bayesian attack graph ). A Bayesian attack graph is a directed acyclic graph, denoted by . (i) is a nonempty set of nodes, representing security-related attributes of the network systems including (i) system vulnerabilities; (ii) insecure system properties such as corrupted files or memory access permissions; (iii) insecure network properties such as unsafe network conditions or unsafe firewall properties; and (iv) access privilege conditions such as user account or root account.(ii)At time , each node has a state, where 0 means is inactive (i.e., a security state not compromised by the attacker) and 1 means it is active (compromised). The initial state represents the initial setting of node . For example, suppose a node (representing the root access privilege on a machine) is active; this means the attacker has the root access privilege on that machine.(iii) is a set of directed edges between nodes in , with each edge representing an atomic attack action or an exploit. For edge , is called a precondition and is called a postcondition. For example, suppose a node represents the SSHD BOF vulnerability on machine and node represents the root access privilege on . Then the exploit indicates that the attacker can exploit the SSHD BOF vulnerability on to obtain the root access privilege on . We denote by the set of preconditions and by the set of postconditions associated with node . An exploit is feasible when its precondition is active.(iv)Nodes without preconditions are called root nodes. Nodes without postconditions are called leaf nodes.(v)Each node is assigned a node type. An -type node can be activated via any of the feasible exploits into . Activating an -type node requires all exploits into to be feasible and taken. Root nodes (-type nodes without preconditions) have no prerequisite exploits and so can be activated directly. We denote by the set of all -type nodes and by the set of all -type nodes. The sets of edges into -type nodes and -type nodes, respectively, are denoted by and .(vi)The activation probability of edge represents the probability that the -node becomes active when the exploit is taken (assuming is not defended, is active, and no other exploit is attacked). -type nodes are also associated with activation probabilities; is the probability that becomes active when all exploits into are taken (assuming is not defended and all parent nodes of are active).
An example of a Bayesian attack graph based on this definition is shown in Figure 1. Our multistage Bayesian attack graph security game model is defined as follows.
Definition 2 (attack graph game). A Bayesian attack graph security game is defined on a Bayesian attack graph by elements : (i)Time step: where is the time horizon.(ii)Player goal: A nonempty subset of nodes are distinguished as critical security conditions. The attacker aims to activate these goal nodes while the defender attempts to keep them inactive.(iii)Graph state: where represents the active nodes at time step .(iv)Defender observation: , where associates each node with one of the signals . Signal () indicates is active (inactive). If is active at , signal is generated with probability , and if is inactive, signal is generated with probability . Otherwise, signal is generated. The signals are independently distributed, over time and nodes.(v)Player action: where defender action comprises a set of nodes which the defender disables at time step . where attacker action consists of (i) -type nodes , meaning that the attacker takes all exploits into to activate that node at time step , and (ii) exploits into -type nodes , meaning that attacker takes exploit to activate the -type node at time step .(vi)Goal reward: assigns a reward for the players to each goal node . is the attacker reward and is the defender reward (i.e., a penalty) if is active. For inactive goal nodes, both receive zero.(vii)Action cost: assigns a cost to each action the players take. In particular, is the attacker’s cost to attempt exploit and is the attacker’s cost to attempt all exploits into -type node . The defender incurs cost to disable node .(viii)Discount factor: .
Initially, , , and . We assume the defender knows only the initial graph state , whereas the attacker is fully aware of graph states at every time step. Thus, we can set . At each time step , the attacker decides which feasible exploits to attempt. At time step 1, in particular, the attacker can choose any root nodes to activate directly with a success probability . Simultaneously, the defender decides which nodes to disable to prevent the attacker from intruding further. An example attack graph is shown in Figure 2.
3.2. Network Example
We first briefly present the testbed network introduced by Poolsappasit et al. . We then describe a portion of the corresponding Bayesian attack graph with security controls of the network. Our security game model and proposed heuristic strategies for both the defender and attacker are built based on their model.
Overall, the testbed network consists of eight hosts located within two different subnets: DMZ zone and Trusted zone. The DMZ zone includes a mail server, a DNS server, and a web server. The Trusted zone has two local desktops, an administrative server, a gateway server, and an SQR server. There is an installed trihomed DMZ firewall with a set of policies to separate servers in the DMZ network from the local network. The attacker has to pass the DMZ firewall to attack the network. The web server in the DMZ zone can send SQL queries to the SQL server in the Trusted zone on a designated channel. In the Trusted zone, there is a NAT firewall such that local machines (located behind NAT) have to communicate with external parties through the gateway server. Finally, the gateway server monitors remote connections through SSHD.
There are several vulnerabilities associated with each machine which can be exploited by the attacker in the testbed network. For example, the gateway server has to face heap corruption in OpenSSH and improper cookie handler in OpenSSH. In addition, SQL and DNS servers have to deal with the SQR injection and DNS cache poisoning vulnerabilities, respectively. The defender can deploy security controls such as limit access to DNS server to tackle the vulnerability DNS cache poisoning .
A portion of the Bayesian attack graph of the testbed network is shown in Figure 3. A complete Bayesian attack graph can be found in Poolsappasit et al. . Each grey rectangle represents a security attribute of the network. For example, the attacker can exploit the stack BOF at local desktops to obtain the root access privilege at those desktops. Then by exploiting the root access privilege, the attacker can achieve the error message leakage at the mail server and the DNS cache poisoning at the DNS server. Finally, the attacker can obtain the identity theft and information leakage by exploiting the error message leakage at the mail server, etc. To prevent such attack progression, the defender can deploy the MS workaround to deal with the vulnerability stack BOF. The defender can also deploy POP3 and limit access to DNS server to address the vulnerabilities error message leakage and DNS cache poisoning, respectively. Finally, encryption and digital signature could be used to resolve the identity theft issues at the DNS and mail server.
3.3. Timing of Game Events
The game proceeds in discrete time steps, , with both players aware of the current time. At each time step , the following sequence of events occurs.(1)Observations: (i)The attacker observes .(ii)The defender observes .(2)The attacker and defender simultaneously select actions and according to their respective strategies.(3)The environment transitions to its next state according to the transition function (Algorithm 1).(4)The attacker and defender assess rewards (and/or costs) for the time step.
When an active node is disabled by the defender, that node becomes inactive. If a node is activated by the attacker at the same step it is being disabled by the defender, the node remains inactive.
3.4. Payoff Function
We denote by the game history, which consists of all actions and resulting graph states at each time step. At time , is a resulting graph state when the attacker plays , the defender plays , and the previous graph state is . The defender and attacker’s payoffs with respect to , which comprise goal rewards and action costs, are computed as follows: Both players aim to maximize expected utility with respect to the distribution of . Since the game is too complex for analytic solution, we propose heuristic strategies for both players and employ the simulation-based methodology EGTA to evaluate these strategies. Our heuristic strategies for each player are categorized based on (i) the assumptions regarding their opponent’s strategies and (ii) heuristic methods used to generate actions for the players to take at each time step. In particular, our proposed heuristic strategies can be explained in the following hierarchical view.
4. Level-0 Heuristic Strategies
We define a level-0 strategy as one that does not explicitly invoke an assumption about its opponent’s strategy. Higher-level strategies do invoke such assumptions, generally that the others play one level below, in the spirit of cognitive hierarchy models . In our context, level-0 heuristic strategies directly operate on the topological structure of the Bayesian attack graph to decide on actions to take at each time step.
4.1. Level-0 Defense Strategies
We introduce four level-0 defense strategies. Each targets a specific group of nodes to disable at each time step . These groups of nodes are chosen solely based on the topological structure of the graph.
4.1.1. Level-0 Uniform Defense Strategy
The defender chooses nodes in the graph to disable uniformly at random. The number of nodes chosen is a certain fraction of the total number of nodes in the graph.
4.1.2. Level-0 Min-Cut Uniform Defense Strategy
The defender chooses nodes in the min-cut set to disable uniformly at random. The min-cut set is the minimum set of edges such that removing them disconnects the root nodes from the goal nodes. The number of chosen nodes is a certain fraction of the cardinality of the min-cut set.
4.1.3. Level-0 Root-Node Uniform Defense Strategy
The defender chooses root nodes to disable uniformly at random. The number of nodes chosen is a certain fraction of the total number of root nodes in the graph.
4.1.4. Level-0 Goal-Node Defense Strategy
The defender randomly chooses goal nodes to disable with probabilities depending on the rewards and costs associated with these goal nodes. The probability of disabling each goal node is based on the conditional logistic function: where indicates the potential value the defender receives for disabling . In addition, is the parameter of the logistic function which is predetermined. This parameter governs how strictly the choice follows assessed defense values. In particular, if , the goal-node defense strategy chooses to disable each goal node uniformly at random. On the other hand, if , this strategy only disables nodes with highest . The number of nodes chosen will be a certain fraction of the number of goal nodes. Then we draw that many nodes from the distribution.
4.2. Level-0 Attack Strategies
4.2.1. Attack Candidate Set
At time step , based on the graph state , the attacker needs to consider only -exploits in and -nodes in that can change the graph state at . We call this set of -exploits and -nodes the attack candidate set at time , denoted by and defined as follows.Essentially, consists of (i) -exploits from active preconditions to inactive -postconditions and (ii) inactive -nodes for which all preconditions are active. Each -node or -exploit in is considered as a candidate attack at . An attack action at can be any subset of this candidate set. For example, in Figure 2, the current graph state is . The attack candidate set thus consists of (i) all green edges and (ii) all green nodes. In particular, the attacker can perform exploit to activate the currently inactive node 9. The attacker can also attempt to activate the root nodes 0 and 3.
To find an optimal attack action to take at , we need to determine the attack value of each possible attack action at time step , which represents the attack action’s impact on activating the goal nodes by the final time step . However, exactly computing the attack value of each attack action requires taking into account all possible future outcomes regarding this attack action, which is computationally expensive. In the following, we propose a series of heuristic attack strategies, of increasing complexity.
4.2.2. Level-0 Uniform Attack Strategy
Under this strategy, the attacker chooses a fixed fraction of the candidate set , uniformly at random.
4.2.3. Level-0 Value-Propagation Attack Strategy
Attack Value Propagation. The value-propagation strategy chooses attack actions based on a quantitative assessment of each attack in the candidate set . The main idea of this strategy is to approximate the attack value of each individual inactive node locally based on attack values of its inactive postconditions. Attack values of the inactive goal nodes, in particular, correspond to the attacker’s rewards at these nodes. So essentially, the attacker rewards at inactive goal nodes are propagated backward to other nodes. The cost of attacking and the activation probabilities are incorporated accordingly. In the propagation process, there are multiple paths from goal nodes to each node. The attack value of a node is computed as the maximum value among propagation paths reaching that node. This propagation process is illustrated in Algorithm 2, which approximates attack values of every inactive node in polynomial time.
Algorithm 2 leverages the directed acyclic topological structure of the Bayesian attack graph to perform the goal-value propagation faster. We sort nodes according to the graph’s topological order and start the propagation from leaf nodes following the inverse direction of the topological order. By doing so, we ensure that when a node is examined in the propagation process, all postconditions of that node have already been examined. As a result, we need to examine each node only once during the whole propagation process.
In Algorithm 2, line 1 specifies the input of the algorithm which includes the current time step , the graph state in previous time step , and the inverse topological order of the graph , . Line 2 initializes attack values of inactive nodes with respect to each inactive goal node and time step . Intuitively, indicates the attack value of node with respect to propagation paths of length from the inactive goal node to . Given the time horizon , we consider only paths of length up to . At each iteration of evaluating a particular inactive node , Algorithm 2 examines all inactive postconditions of and estimates the attack value propagated from to , . If node is of -type, the propagated attack value with respect to , , is equally distributed to all of its inactive preconditions including (line 7). The propagation parameter regulates the amount of distributed value. When , in particular, that value is equally divided among these inactive preconditions. If node is of -type, receives the propagated attack value of from (line 9). Since there are multiple propagation paths reaching node , Algorithm 2 keeps the maximum propagated value (line 11). Finally, the attack value of each inactive node is computed as the maximum over inactive goal nodes and time steps (line 12). An example of Algorithm 2 is illustrated in Figure 4.
Proposition 3. The time complexity of Algorithm 2 is .
Proof. In Algorithm 2, line 2 initializes attack values of inactive nodes of the attack graph with respect to each inactive goal node and time step. The time complexity of this step is . Algorithm 2 then iteratively examines each inactive node once following the inverse direction of the topological order. The attack value of each node with respect to each inactive goal node and time step is estimated locally based on its neighboring nodes. In other words, Algorithm 2 iterates over each edge of the graph once to compute this attack value. The time complexity of this step is thus . Finally, line 12 computes the maximum propagated attack value for each node, which takes time. Therefore, the total time complexity of Algorithm 2 is .
Probabilistic Selection of Attack Action. Based on attack values of inactive nodes, we approximate the value of each candidate attack in taking into account the cost of this attack and the corresponding activation probability, as follows: and, finally, the attack strategy selects attacks to execute probabilistically, based on the assessed attack value. It first determines the number of attacks to execute. The strategy then selects that number of attacks using a conditional logistic function. The probability each exploit is selected and computed as follows:The probability for a -node in is defined similarly. The model parameter governs how strictly the choice follows assessed attack values.
4.2.4. Level-0 Sampled-Activation Attack Strategy
Like the value-propagation strategy, the sampled-activation attack strategy selects actions based on a quantitative assessment of relative value. Rather than propagating backward from goal nodes, this strategy constructs estimates by forward sampling from the current candidates .
Random Activation Process. The sampled-activation attack strategy aims to sample paths of activation from the current graph state to activate each inactive goal node. For example, in Figure 4, given the current graph state is , a possible path or sequence of nodes in order to activate the inactive goal node 10 is as follows: (i) activate node 0; (ii) if 0 becomes active, perform exploits and to activate node 6; (iii) if 6 becomes active, perform exploits and to activate nodes 8 and 9; and (iv) if nodes 8 and 9 become active, perform exploits and to activate goal node 10. In fact, there are many possible paths which can be selected to activate each inactive goal node of the graph. Therefore, the sampled-activation attack strategy selects each path probabilistically based on the probability that each inactive goal node will become active if the attacker follows that path to activate the goal node.
In particular, for each node , the sampled-activation attack strategy keeps track of a set of preconditions which are selected in the sampled-activation process to activate . If is a -node, the set consists of all preconditions of . If is a -node, we randomly select a precondition to use to activate . We select only one precondition to activate each inactive -node to simplify the computation of the probability that a node becomes active in the random activation. This randomized selection is explained below. Each inactive node is assigned an activation probability and an activation time step according to the random action the attacker takes. The activation probability and the activation time step represent the probability, and the time step node becomes active if the attacker follows the sampled action sequence to activate . These values are computed under the assumptions that the defender takes no action and the attacker attempts to activate each node on activation paths once.
The random activation process is illustrated in Algorithm 3. In this process, we use the topological order of the attack graph to perform random activation. Following this order ensures that all preconditions are visited before any corresponding postconditions and thus we only need to examine each node once. When visiting an inactive -node , the attacker randomly chooses a precondition (from which to activate that -node) with a probability . This probability is computed based on activation probability and activation probability of the associated precondition (line 6). Intuitively, is the probability that becomes active if the attacker chooses the exploit to activate in the random activation. Accordingly, the higher the , the higher the chance that is the selected precondition for . We update with respect to the selected (lines 7–9).
When visiting an inactive -node , all preconditions of are required to activate (line 13). Thus, the activation time step of must be computed based on the maximum activation time step of ’s preconditions (line 12). Furthermore, can become active only when all of its preconditions are active. Thus, the activation probability of the inactive -node involves the activation probability of all of its preconditions . These activation probabilities depend on the sequences of nodes (which may not be disjoint) chosen in the random activation process to activate all the preconditions of . Therefore, we need to backtrack over all nodes in the activation process of to compute . We denote this sequence of nodes as which can be defined as follows. For example, in Figure 4, given the current graph state is , we suppose that the sequence of nodes chosen to activate the inactive goal node 10 is as follows: (i) activate node 0; (ii) if 0 becomes active, perform exploits and to activate nodes 6; (iii) if 6 becomes active, perform exploits and to activate nodes 8 and 9; and (iv) if nodes 8 and 9 become active, perform exploits and to activate goal node 10. Thus, . Essentially, following the random activation process, can be activated only when all nodes in are active. Therefore, the activation probability, , is computed as follows, which comprises the activation probabilities of all edges and nodes involved in activating : Here , consists of -nodes only, and consists of -nodes.
Proposition 4. The time complexity of Algorithm 3 is .
Proof. In the random activation process, for each visited node , Algorithm 3 updates the activation probability and activation time based on the preconditions of . The complexity of updating all nodes is thus . On the other hand, for each visited node , Algorithm 3 backtracks all nodes in the sequence of activating , of which complexity is . Updating all nodes is thus . Therefore, the time complexity of Algorithm 3 is .
Expected Utility of the Attacker. At the end of a random activation, we obtain a sequence of nodes chosen to activate each inactive goal node. Thus, we estimate the attack value of each subset of inactive goal nodes according to the random activation based on Proposition 5.
Proposition 5. At time step , given the graph state , we suppose the attacker follows a random activation process to activate a subset of goal nodes . If the defender takes no further action, the attacker obtains an expected utility which is computed as follows: where and consist of all -nodes and -nodes in the sequences chosen by the random activation process to activate inactive goal nodes in .
Proof. In this equation, the first term accounts for the expected rewards of the goal nodes in the subset. In particular, for each goal node , the probability that becomes active at time step if the attacker follows the random activation process is . Conversely, node remains inactive. Therefore, the attacker receives an expected reward of regarding each goal node . Furthermore, second and third terms account for the costs of activating inactive nodes in the corresponding sampled-activation sequences. The probability indicates the probability that all preconditions of the -node become active and thus the attacker can activate with a cost . Similarly, is the probability that the chosen precondition of the -node becomes active and thus the attacker can activate with a cost .
Greedy Attack. For each random activation, the sampled-activation attack strategy aims to find a subset of inactive goal nodes to activate, which maximizes the attack value. However, finding an optimal subset of inactive goal nodes is computationally expensive, because there is an exponential number of subsets of inactive goal nodes to consider. Therefore, we use the greedy approach to find a reasonable subset of inactive goal nodes to attempt to activate. Given the current subset of selected inactive goal nodes (which was initially empty), we iteratively find the next best inactive goal node such that the attack value is maximized and add to . This greedy process continues until the attack value stops increasing: . Based on the chosen , we obtain a corresponding candidate subset: which need to be activated in current time step according to the sampled-activation process in order to activate the goal subset subsequently. We assign the value of the goal subset to this candidate subset.
Finally, by running random activation multiple times, we obtain a set of candidate subsets, each associated with an estimated attack value. The attacker action at is randomly chosen among these subsets of candidates following a conditional logistic distribution with respect to the attack values of these subsets.
Proposition 6. Suppose that the sampled-activation attack strategy runs random activation times; the time complexity of this attack strategy is .
Proof. For each random activation, it takes time to sample activation paths for the attacker to activate each goal node (Proposition 4). Furthermore, at each iteration of the greedy attack heuristic, given current chosen goal subset , the strategy examines all inactive goal nodes to find the next best goal node . Each such examination of an inactive goal node requires computing the expected utility of the attacker to follow the random activation to activate goal nodes in , which takes time (Proposition 5). Thus, the time complexity of each iteration of the greedy attack heuristic is . As a result, the time complexity of the greedy attack heuristic is . The sampled-activation attack strategy comprises pairs of executing random activation and greedy attack heuristic. Therefore, the complexity of this attack strategy is .
While level-0 heuristic strategies of each player do not take into account their opponent’s strategies, we introduce level-1 heuristic strategies, assuming the opponents play level-0 strategies. In the scope of this work, we focus on studying level-1 heuristic defense strategies.
5. Level-1 Defense Strategies
5.1. Defender Strategic Reasoning
Level-1 defense strategies assume the attacker plays level-0 attack strategies. At each time step, because the defender does not know the true graph state, it is important for it to reason about possible graph states before choosing defense actions. An overview of the defender belief update is shown in Figure 5.
As mentioned before, in our game, the defender knows the initial graph state, in which all nodes are inactive. As the game evolves, at the end of each time step , the defender updates her belief on possible graph states, taking into account (i) her belief on graph states at the end of time step ; (ii) her action at time step ; (iii) her observations at time step after the players’ actions at are taken; and (iv) the assumed attacker strategy. Finally, based on the defender’s belief update on graph states at the end of time step and the assumption of which heuristic strategy the attacker plays, the defender decides on which defense action to take at time step . In the following, we first study the defender’s belief update on graph states at each time step and then propose different defense heuristic strategies.
5.2. Defender Belief Update
5.2.1. Exact Inference
We denote by the defender’s belief at the end of time step , where is the probability that the graph state at time step is , and . At time step 0, . Based on the defender’s belief at time step , , her action at , , and her observation at , , we can update the defender’s belief based on Bayes’ rule as follows:where is the probability that the defender receives observation given the graph state is at time step . Because the alerts with respect to each node are independent of other nodes, we can compute this observation probability based on the observation probability of each node, as follows: In addition, is the belief state set associated with the defender’s belief at time step : . The probability of state transition is computed based on the state transition of every node: where is the transition probability computed in Algorithm 1. The set consists of all possible attack actions with respect to the graph state . Finally, is the probability that the attacker takes action at time step given the graph state at the end of time step is .
5.2.2. Approximate Inference
Exactly computing the defender’s belief (10) over all possible graph states at each time step is impractical. Indeed, there are exponentially many graph states to explore, as well as an exponential number of possible attack actions. To overcome this computational challenge, we apply particle filtering , a Monte Carlo sampling method for performing state inference, given noisy observations at each time step. This approach allows us to limit the number of graph states and attack actions considered. The overview of the particle filtering method is shown in Figure 6.
Essentially, the method samples particles (each particle is a sample of the graph state at time step ). Given belief , which provides a probability distribution over graph state set , we randomly sample a graph state at based on this distribution. Given the sampled graph state , we sample an attack action , according to and the assumption of the defender about which level-0 heuristic strategy the attacker plays. We then sample a new graph state at time step based on (i) sampled graph state ; (ii) sampled attack action ; and (iii) the defender’s action . Finally, we update the defender’s belief for the sampled graph state based on the defender’s observation at time step .
Finding an optimal defense action at given the defender’s updated belief requires taking into account an assumption regarding the attacker’s strategy and all possible future outcomes of the game as a result of the defender’s action at , which is computationally expensive. Therefore, we propose two different heuristic strategies for the defender. In the following section, we propose two new attack-modeling-based defense strategies that take into account the defender’s belief, called the value-propagation and sampled-activation defense strategies. These two defense strategies use concepts similar to the value-propagation and sampled-activation attack strategies.
5.3. Defense Candidate Set
At each time step , the defender has belief on possible graph states at the end of time step . For each state in the belief set , we define the defense candidate set, , which consists of active goal nodes, -nodes, and -postconditions of exploits in the attack candidate set . We aim to disable not only active goal nodes but also nodes in , to prevent the attacker from intruding further. where consists of -postconditions of exploits in .
5.4. Level-1 Value-Propagation Defense Strategy
The overview of the level-1 value-propagation defense strategy is illustrated in Figure 7. For each graph state in the belief set , based on the assumption of which heuristic strategy is played by the attacker, the defense strategy first samples attack actions for . The strategy then estimates the defense value of each node in the defense set based on the graph state and the set of sampled attack actions following the value-propagation approach, which will be explained later. This process is repeated for all graph states in the belief set . Finally, the value-propagation defense strategy computes the expected defense value for candidate nodes based on the defender’s belief and then probabilistically selects nodes to disable based on this expected defense value.
For each possible graph state , we first estimate the propagated defense reward of each node by propagating the defender’s rewards at inactive goal nodes to . Intuitively, the propagated defense reward associated with each node accounts for the potential loss the defender can prevent for blocking that node. The complete procedure is presented as Algorithm 4. The idea of computing propagated defense rewards is similar to computing propagated attack values. In lines 3–12 of Algorithm 4, we compute the propagated defense reward for all nodes . In particular, is the defense reward the postcondition propagates to the precondition with respect to the inactive goal node and time step .