Abstract

We study the problem of allocating limited security countermeasures to protect network data from cyber-attacks, for scenarios modeled by Bayesian attack graphs. We consider multistage interactions between a network administrator and cybercriminals, formulated as a security game. This formulation is capable of representing security environments with significant dynamics and uncertainty and very large strategy spaces. We propose parameterized heuristic strategies for the attacker and defender and provide detailed analysis of their time complexity. Our heuristics exploit the topological structure of attack graphs and employ sampling methods to overcome the computational complexity in predicting opponent actions. Due to the complexity of the game, we employ a simulation-based approach and perform empirical game analysis over an enumerated set of heuristic strategies. Finally, we conduct experiments in various game settings to evaluate the performance of our heuristics in defending networks, in a manner that is robust to uncertainty about the security environment.

1. Introduction

Attack graphs are graphical models used in cybersecurity research to decompose complex security scenarios into a hierarchy of simple and quantifiable actions [13]. Many research efforts have employed attack graphs or related graphical security models to analyze complex security scenarios [4, 5]. In particular, attack graph models have been used to evaluate network hardening strategies, where a network administrator (the defender) deploys security countermeasures to protect network data from cybercriminals (the attacker) [613].

Attack graphs are particularly suitable for modeling scenarios in moving target defense (MTD) [14], where the defender employs proactive tactics to dynamically change network configurations, limiting the exposure of vulnerabilities. MTD techniques are most useful in thwarting progressive attacks [1517], where system reconfiguration by the defender prevents the attacker from exploiting knowledge accumulated over time. Attack graphs naturally represent the progress of an attack, and the defense actions in our model can incorporate MTD methods.

We build on prior works that represent security problems with attack graphs, and we take a game-theoretic approach to reasoning about the strategic interaction between the defender and the attacker. Building on an existing Bayesian attack graph formalism [11, 18], we model the problem as a simultaneous multistage attack graph security game. Nodes in a Bayesian attack graph represent security conditions of the network system. For example, an SSH buffer overflow vulnerability in an FTP server can be considered a security condition, as can user privileges achieved as a result of exploiting that vulnerability. The defender attempts to protect a set of goal nodes (critical security conditions) in the attack graph. Conversely, the attacker, starting from some initial security conditions, follows paths through the graph to undermine these goal nodes. At every time step, the defender and the attacker simultaneously take actions. Given limited security resources, the defender has to decide for which nodes of the attack graph to deploy security countermeasures. Meanwhile, the attacker selects nodes to attack in order to progress toward the goals. The outcome of the players’ actions (whether the attacker succeeds) follows a stochastic process, which represents the success probability of the actions taken. These outcomes are only imperfectly observed by the defender, adding further uncertainty which must be considered in its strategic reasoning.

Based on our game model, we propose various parameterized strategies for both players. Our attack strategies assess the value of each possible attack action at a given time step by examining the future attack paths they enable. These paths are sequences of nodes which could feasibly be attacked in subsequent time steps (as a result of attack actions in the current time step) in order to reach goal nodes. Since there are exponentially many possible attack paths, it is impractical to evaluate all of them. Therefore, our attack heuristics incorporate two simplifying approximations. First, we estimate the attack value for each individual node locally, based on the attack values of neighboring nodes. Attack values of goal nodes, in particular, correspond to the importance of each goal node. Second, we consider only a small subset of attack paths, selected by random sampling, according to the likelihood that the attacker will successfully reach a goal node by following each path. We present a detailed analysis of the polynomial time complexity of our attack heuristics.

Likewise, our heuristic defense strategies employ simplifying assumptions. At each time step, the defense strategies: (i) update the defender’s belief about the outcome of players’ actions in the previous time step and (ii) generate a new defense action based on the updated belief and the defender’s assumption about the attacker’s strategy. For stage (i), we apply particle filtering [19] to deal with the exponential number of possible outcomes. For stage (ii), we evaluate defense candidate actions using concepts similar to those employed in the attack strategies. We show that the running time of our defense heuristics is also polynomial.

Finally, we employ a simulation-based methodology called empirical game-theoretic analysis (EGTA) [20], to construct and analyze game models over the heuristic strategies. We present a detailed evaluation of the proposed strategies based on this game analysis. Our experiments are conducted over a variety of game settings with different attack graph topologies. We show that our defense strategies provide high solution quality compared to multiple baselines. Furthermore, we examine the robustness of defense strategies to uncertainty about game states and about the attacker’s strategy.

Attack graphs are commonly used to provide a convenient representation for analysis of network vulnerabilities [13]. In an attack graph, nodes represent attack conditions of a system, and edges represent relationships among these conditions: specifically, how the achievement of specific conditions through an attacker’s actions can enable other conditions. Various graph-based security models related to attack graphs have been proposed to represent and analyze complex security scenarios [5]. For example, Wang et al. [21] introduced such a model for correlating, hypothesizing, and predicting intrusion alerts. Temporal attack graphs, introduced by Albanese and Jajodia [4], extend the attack graph model of Wang et al. [21] with additional temporal constraints on the unfolding of attacks. Augmented attack trees were introduced by Ray and Poolsapassit [22] to probabilistically measure the progression of an attacker’s actions toward compromising the system. This concept of augmented attack trees was later used to perform a forensic analysis of log files [23]. Vidalis et al. [24] presented vulnerability trees to capture the interdependency between different vulnerabilities of a system. Bayesian attack graphs [25] combine attack graphs with quantified uncertainty on attack states and relations. Revised versions of Bayesian attack graphs incorporate other security aspects such as dynamic behavior and mitigation strategies [18, 26]. Our work is based on the specific formulation of Bayesian attack graphs by Poolsappasit et al. [18]. The basic ideas of our game model and heuristic strategies would also apply to variant forms of graph-based security models with reasonable modifications. In particular, our work might be extended to handle zero-day attacks captured by zero-day attack graphs [2730] by incorporating payoff uncertainty as a result of unknown locations and impacts of zero-day vulnerabilities. In the scope of this work, we consider known payoffs of players.

Though many attack graph models attempt to analyze different progressive attack scenarios without considering countermeasures of a defender, there is an important line of work on graph-based models covering both attacks and defenses [5]. For example, Wu et al. introduced intrusion DAGs to represent the underlying structure of attacker goals in an adaptive, intrusion-tolerant system. Each node of an intrusion DAG is associated with an alert from the intrusion-detection framework, allowing the system to automatically trigger a response. Attack-Response Trees (ARTs), introduced by Zonouz et al. [13], extend attack trees to incorporate possible response actions against attacks.

Given the prominence of graph-based security models, previous work has proposed different game-theoretic solutions for finding an optimal defense policy based on those models. Durkota et al. [8, 9] study the problem of hardening the security of a network by deploying honeypots to the network to deceive the attacker. They model the problem as a Stackelberg security game in which the attacker’s plans are compactly represented using attack graphs. Zonouz et al. [13] present an automated intrusion-response system which models the security problem on an attack-response tree as a two-player, zero-sum Stackelberg stochastic game. Besides Stackelberg games, single-stage simultaneous games are also applied to model the security problem on attack-defense trees, and Nash equilibrium is used to find an optimal defense policy [6, 7, 10].

Game theory has been applied for solving various cybersecurity problems [3138]. Previous work has modeled these security problems as a dynamic (complete/incomplete/imperfect) game and analyzed equilibrium solutions of those games. Our game model (to represent security problems with attack graphs) belongs to the class of partially observable stochastic games. Since the game is too complex for analytic solution, we focus on developing heuristic strategies for players and employing the simulation-based methodology EGTA to evaluate these strategies.

Similar to our work, the non-game-theoretic solution proposed by Miehling et al. [11] is also built on the attack graph formalism of Poolsappasit et al. [18]. In their work, the attacker’s behavior is modeled by a probabilistic spreading process, which is known by the defender. In our model, on the other hand, both the defender and the attacker dynamically decide on which actions to take at every time step, depending on their knowledge with respect to the game.

3. Game Model

3.1. Game Definition

Definition 1 (Bayesian attack graph [18]). A Bayesian attack graph is a directed acyclic graph, denoted by . (i) is a nonempty set of nodes, representing security-related attributes of the network systems including (i) system vulnerabilities; (ii) insecure system properties such as corrupted files or memory access permissions; (iii) insecure network properties such as unsafe network conditions or unsafe firewall properties; and (iv) access privilege conditions such as user account or root account.(ii)At time , each node has a state, where 0 means is inactive (i.e., a security state not compromised by the attacker) and 1 means it is active (compromised). The initial state represents the initial setting of node . For example, suppose a node (representing the root access privilege on a machine) is active; this means the attacker has the root access privilege on that machine.(iii) is a set of directed edges between nodes in , with each edge representing an atomic attack action or an exploit. For edge , is called a precondition and is called a postcondition. For example, suppose a node represents the SSHD BOF vulnerability on machine and node represents the root access privilege on . Then the exploit indicates that the attacker can exploit the SSHD BOF vulnerability on to obtain the root access privilege on . We denote by the set of preconditions and by the set of postconditions associated with node . An exploit is feasible when its precondition is active.(iv)Nodes without preconditions are called root nodes. Nodes without postconditions are called leaf nodes.(v)Each node is assigned a node type. An -type node can be activated via any of the feasible exploits into . Activating an -type node requires all exploits into to be feasible and taken. Root nodes (-type nodes without preconditions) have no prerequisite exploits and so can be activated directly. We denote by the set of all -type nodes and by the set of all -type nodes. The sets of edges into -type nodes and -type nodes, respectively, are denoted by and .(vi)The activation probability of edge represents the probability that the -node becomes active when the exploit is taken (assuming is not defended, is active, and no other exploit is attacked). -type nodes are also associated with activation probabilities; is the probability that becomes active when all exploits into are taken (assuming is not defended and all parent nodes of are active).

An example of a Bayesian attack graph based on this definition is shown in Figure 1. Our multistage Bayesian attack graph security game model is defined as follows.

Definition 2 (attack graph game). A Bayesian attack graph security game is defined on a Bayesian attack graph by elements : (i)Time step: where is the time horizon.(ii)Player goal: A nonempty subset of nodes are distinguished as critical security conditions. The attacker aims to activate these goal nodes while the defender attempts to keep them inactive.(iii)Graph state: where represents the active nodes at time step .(iv)Defender observation: , where associates each node with one of the signals . Signal () indicates is active (inactive). If is active at , signal is generated with probability , and if is inactive, signal is generated with probability . Otherwise, signal is generated. The signals are independently distributed, over time and nodes.(v)Player action: where defender action comprises a set of nodes which the defender disables at time step . where attacker action consists of (i) -type nodes , meaning that the attacker takes all exploits into to activate that node at time step , and (ii) exploits into -type nodes , meaning that attacker takes exploit to activate the -type node at time step .(vi)Goal reward: assigns a reward for the players to each goal node . is the attacker reward and is the defender reward (i.e., a penalty) if is active. For inactive goal nodes, both receive zero.(vii)Action cost: assigns a cost to each action the players take. In particular, is the attacker’s cost to attempt exploit and is the attacker’s cost to attempt all exploits into -type node . The defender incurs cost to disable node .(viii)Discount factor: .

Initially, , , and . We assume the defender knows only the initial graph state , whereas the attacker is fully aware of graph states at every time step. Thus, we can set . At each time step , the attacker decides which feasible exploits to attempt. At time step 1, in particular, the attacker can choose any root nodes to activate directly with a success probability . Simultaneously, the defender decides which nodes to disable to prevent the attacker from intruding further. An example attack graph is shown in Figure 2.

3.2. Network Example

We first briefly present the testbed network introduced by Poolsappasit et al. [18]. We then describe a portion of the corresponding Bayesian attack graph with security controls of the network. Our security game model and proposed heuristic strategies for both the defender and attacker are built based on their model.

Overall, the testbed network consists of eight hosts located within two different subnets: DMZ zone and Trusted zone. The DMZ zone includes a mail server, a DNS server, and a web server. The Trusted zone has two local desktops, an administrative server, a gateway server, and an SQR server. There is an installed trihomed DMZ firewall with a set of policies to separate servers in the DMZ network from the local network. The attacker has to pass the DMZ firewall to attack the network. The web server in the DMZ zone can send SQL queries to the SQL server in the Trusted zone on a designated channel. In the Trusted zone, there is a NAT firewall such that local machines (located behind NAT) have to communicate with external parties through the gateway server. Finally, the gateway server monitors remote connections through SSHD.

There are several vulnerabilities associated with each machine which can be exploited by the attacker in the testbed network. For example, the gateway server has to face heap corruption in OpenSSH and improper cookie handler in OpenSSH. In addition, SQL and DNS servers have to deal with the SQR injection and DNS cache poisoning vulnerabilities, respectively. The defender can deploy security controls such as limit access to DNS server to tackle the vulnerability DNS cache poisoning [18].

A portion of the Bayesian attack graph of the testbed network is shown in Figure 3. A complete Bayesian attack graph can be found in Poolsappasit et al. [18]. Each grey rectangle represents a security attribute of the network. For example, the attacker can exploit the stack BOF at local desktops to obtain the root access privilege at those desktops. Then by exploiting the root access privilege, the attacker can achieve the error message leakage at the mail server and the DNS cache poisoning at the DNS server. Finally, the attacker can obtain the identity theft and information leakage by exploiting the error message leakage at the mail server, etc. To prevent such attack progression, the defender can deploy the MS workaround to deal with the vulnerability stack BOF. The defender can also deploy POP3 and limit access to DNS server to address the vulnerabilities error message leakage and DNS cache poisoning, respectively. Finally, encryption and digital signature could be used to resolve the identity theft issues at the DNS and mail server.

3.3. Timing of Game Events

The game proceeds in discrete time steps, , with both players aware of the current time. At each time step , the following sequence of events occurs.(1)Observations: (i)The attacker observes .(ii)The defender observes .(2)The attacker and defender simultaneously select actions and according to their respective strategies.(3)The environment transitions to its next state according to the transition function (Algorithm 1).(4)The attacker and defender assess rewards (and/or costs) for the time step.

1 Initialize ;
2  if    then
3; // defender overrules attacker
4  else
5if        then
6with probability ,
7else
8for        do
9with probability ,

When an active node is disabled by the defender, that node becomes inactive. If a node is activated by the attacker at the same step it is being disabled by the defender, the node remains inactive.

3.4. Payoff Function

We denote by the game history, which consists of all actions and resulting graph states at each time step. At time , is a resulting graph state when the attacker plays , the defender plays , and the previous graph state is . The defender and attacker’s payoffs with respect to , which comprise goal rewards and action costs, are computed as follows: Both players aim to maximize expected utility with respect to the distribution of . Since the game is too complex for analytic solution, we propose heuristic strategies for both players and employ the simulation-based methodology EGTA to evaluate these strategies. Our heuristic strategies for each player are categorized based on (i) the assumptions regarding their opponent’s strategies and (ii) heuristic methods used to generate actions for the players to take at each time step. In particular, our proposed heuristic strategies can be explained in the following hierarchical view.

4. Level-0 Heuristic Strategies

We define a level-0 strategy as one that does not explicitly invoke an assumption about its opponent’s strategy. Higher-level strategies do invoke such assumptions, generally that the others play one level below, in the spirit of cognitive hierarchy models [39]. In our context, level-0 heuristic strategies directly operate on the topological structure of the Bayesian attack graph to decide on actions to take at each time step.

4.1. Level-0 Defense Strategies

We introduce four level-0 defense strategies. Each targets a specific group of nodes to disable at each time step . These groups of nodes are chosen solely based on the topological structure of the graph.

4.1.1. Level-0 Uniform Defense Strategy

The defender chooses nodes in the graph to disable uniformly at random. The number of nodes chosen is a certain fraction of the total number of nodes in the graph.

4.1.2. Level-0 Min-Cut Uniform Defense Strategy

The defender chooses nodes in the min-cut set to disable uniformly at random. The min-cut set is the minimum set of edges such that removing them disconnects the root nodes from the goal nodes. The number of chosen nodes is a certain fraction of the cardinality of the min-cut set.

4.1.3. Level-0 Root-Node Uniform Defense Strategy

The defender chooses root nodes to disable uniformly at random. The number of nodes chosen is a certain fraction of the total number of root nodes in the graph.

4.1.4. Level-0 Goal-Node Defense Strategy

The defender randomly chooses goal nodes to disable with probabilities depending on the rewards and costs associated with these goal nodes. The probability of disabling each goal node is based on the conditional logistic function: where indicates the potential value the defender receives for disabling . In addition, is the parameter of the logistic function which is predetermined. This parameter governs how strictly the choice follows assessed defense values. In particular, if , the goal-node defense strategy chooses to disable each goal node uniformly at random. On the other hand, if , this strategy only disables nodes with highest . The number of nodes chosen will be a certain fraction of the number of goal nodes. Then we draw that many nodes from the distribution.

4.2. Level-0 Attack Strategies
4.2.1. Attack Candidate Set

At time step , based on the graph state , the attacker needs to consider only -exploits in and -nodes in that can change the graph state at . We call this set of -exploits and -nodes the attack candidate set at time , denoted by and defined as follows.Essentially, consists of (i) -exploits from active preconditions to inactive -postconditions and (ii) inactive -nodes for which all preconditions are active. Each -node or -exploit in is considered as a candidate attack at . An attack action at can be any subset of this candidate set. For example, in Figure 2, the current graph state is . The attack candidate set thus consists of (i) all green edges and (ii) all green nodes. In particular, the attacker can perform exploit to activate the currently inactive node 9. The attacker can also attempt to activate the root nodes 0 and 3.

To find an optimal attack action to take at , we need to determine the attack value of each possible attack action at time step , which represents the attack action’s impact on activating the goal nodes by the final time step . However, exactly computing the attack value of each attack action requires taking into account all possible future outcomes regarding this attack action, which is computationally expensive. In the following, we propose a series of heuristic attack strategies, of increasing complexity.

4.2.2. Level-0 Uniform Attack Strategy

Under this strategy, the attacker chooses a fixed fraction of the candidate set , uniformly at random.

4.2.3. Level-0 Value-Propagation Attack Strategy

Attack Value Propagation. The value-propagation strategy chooses attack actions based on a quantitative assessment of each attack in the candidate set . The main idea of this strategy is to approximate the attack value of each individual inactive node locally based on attack values of its inactive postconditions. Attack values of the inactive goal nodes, in particular, correspond to the attacker’s rewards at these nodes. So essentially, the attacker rewards at inactive goal nodes are propagated backward to other nodes. The cost of attacking and the activation probabilities are incorporated accordingly. In the propagation process, there are multiple paths from goal nodes to each node. The attack value of a node is computed as the maximum value among propagation paths reaching that node. This propagation process is illustrated in Algorithm 2, which approximates attack values of every inactive node in polynomial time.

1  Input: , , and inverse topological order of ,  ;
2  Initialize node values and for inactive goal nodes
, all inactive nodes , and time step ;
3  for    do
4for   do
5for    do
6if    then
7;
8else
9;
10if    then
11Update ;
12  Return , ;

Algorithm 2 leverages the directed acyclic topological structure of the Bayesian attack graph to perform the goal-value propagation faster. We sort nodes according to the graph’s topological order and start the propagation from leaf nodes following the inverse direction of the topological order. By doing so, we ensure that when a node is examined in the propagation process, all postconditions of that node have already been examined. As a result, we need to examine each node only once during the whole propagation process.

In Algorithm 2, line 1 specifies the input of the algorithm which includes the current time step , the graph state in previous time step , and the inverse topological order of the graph , . Line 2 initializes attack values of inactive nodes with respect to each inactive goal node and time step . Intuitively, indicates the attack value of node with respect to propagation paths of length from the inactive goal node to . Given the time horizon , we consider only paths of length up to . At each iteration of evaluating a particular inactive node , Algorithm 2 examines all inactive postconditions of and estimates the attack value propagated from to , . If node is of -type, the propagated attack value with respect to , , is equally distributed to all of its inactive preconditions including (line 7). The propagation parameter regulates the amount of distributed value. When , in particular, that value is equally divided among these inactive preconditions. If node is of -type, receives the propagated attack value of from (line 9). Since there are multiple propagation paths reaching node , Algorithm 2 keeps the maximum propagated value (line 11). Finally, the attack value of each inactive node is computed as the maximum over inactive goal nodes and time steps (line 12). An example of Algorithm 2 is illustrated in Figure 4.

Proposition 3. The time complexity of Algorithm 2 is .

Proof. In Algorithm 2, line 2 initializes attack values of inactive nodes of the attack graph with respect to each inactive goal node and time step. The time complexity of this step is . Algorithm 2 then iteratively examines each inactive node once following the inverse direction of the topological order. The attack value of each node with respect to each inactive goal node and time step is estimated locally based on its neighboring nodes. In other words, Algorithm 2 iterates over each edge of the graph once to compute this attack value. The time complexity of this step is thus . Finally, line 12 computes the maximum propagated attack value for each node, which takes time. Therefore, the total time complexity of Algorithm 2 is .

Probabilistic Selection of Attack Action. Based on attack values of inactive nodes, we approximate the value of each candidate attack in taking into account the cost of this attack and the corresponding activation probability, as follows: and, finally, the attack strategy selects attacks to execute probabilistically, based on the assessed attack value. It first determines the number of attacks to execute. The strategy then selects that number of attacks using a conditional logistic function. The probability each exploit is selected and computed as follows:The probability for a -node in is defined similarly. The model parameter governs how strictly the choice follows assessed attack values.

4.2.4. Level-0 Sampled-Activation Attack Strategy

Like the value-propagation strategy, the sampled-activation attack strategy selects actions based on a quantitative assessment of relative value. Rather than propagating backward from goal nodes, this strategy constructs estimates by forward sampling from the current candidates .

Random Activation Process. The sampled-activation attack strategy aims to sample paths of activation from the current graph state to activate each inactive goal node. For example, in Figure 4, given the current graph state is , a possible path or sequence of nodes in order to activate the inactive goal node 10 is as follows: (i) activate node 0; (ii) if 0 becomes active, perform exploits and to activate node 6; (iii) if 6 becomes active, perform exploits and to activate nodes 8 and 9; and (iv) if nodes 8 and 9 become active, perform exploits and to activate goal node 10. In fact, there are many possible paths which can be selected to activate each inactive goal node of the graph. Therefore, the sampled-activation attack strategy selects each path probabilistically based on the probability that each inactive goal node will become active if the attacker follows that path to activate the goal node.

In particular, for each node , the sampled-activation attack strategy keeps track of a set of preconditions which are selected in the sampled-activation process to activate . If is a -node, the set consists of all preconditions of . If is a -node, we randomly select a precondition to use to activate . We select only one precondition to activate each inactive -node to simplify the computation of the probability that a node becomes active in the random activation. This randomized selection is explained below. Each inactive node is assigned an activation probability and an activation time step according to the random action the attacker takes. The activation probability and the activation time step represent the probability, and the time step node becomes active if the attacker follows the sampled action sequence to activate . These values are computed under the assumptions that the defender takes no action and the attacker attempts to activate each node on activation paths once.

The random activation process is illustrated in Algorithm 3. In this process, we use the topological order of the attack graph to perform random activation. Following this order ensures that all preconditions are visited before any corresponding postconditions and thus we only need to examine each node once. When visiting an inactive -node , the attacker randomly chooses a precondition (from which to activate that -node) with a probability . This probability is computed based on activation probability and activation probability of the associated precondition (line 6). Intuitively, is the probability that becomes active if the attacker chooses the exploit to activate in the random activation. Accordingly, the higher the , the higher the chance that is the selected precondition for . We update with respect to the selected (lines 7–9).

1 Input: , , and topological order of , ;
2 Initialize , , and , for all active nodes ;
3 Initialize , , and for all inactive root nodes ;
4  for    do
5if    then
6Randomly choose a precondition to activate with probability
;
7Update ;
8Update ;
9Update ;
10else
11Update with respect to all preconditions ;
12Update ;
13Update ;
14 Return ;

When visiting an inactive -node , all preconditions of are required to activate (line 13). Thus, the activation time step of must be computed based on the maximum activation time step of ’s preconditions (line 12). Furthermore, can become active only when all of its preconditions are active. Thus, the activation probability of the inactive -node involves the activation probability of all of its preconditions . These activation probabilities depend on the sequences of nodes (which may not be disjoint) chosen in the random activation process to activate all the preconditions of . Therefore, we need to backtrack over all nodes in the activation process of to compute . We denote this sequence of nodes as which can be defined as follows. For example, in Figure 4, given the current graph state is , we suppose that the sequence of nodes chosen to activate the inactive goal node 10 is as follows: (i) activate node 0; (ii) if 0 becomes active, perform exploits and to activate nodes 6; (iii) if 6 becomes active, perform exploits and to activate nodes 8 and 9; and (iv) if nodes 8 and 9 become active, perform exploits and to activate goal node 10. Thus, . Essentially, following the random activation process, can be activated only when all nodes in are active. Therefore, the activation probability, , is computed as follows, which comprises the activation probabilities of all edges and nodes involved in activating : Here , consists of -nodes only, and consists of -nodes.

Proposition 4. The time complexity of Algorithm 3 is .

Proof. In the random activation process, for each visited node , Algorithm 3 updates the activation probability and activation time based on the preconditions of . The complexity of updating all nodes is thus . On the other hand, for each visited node , Algorithm 3 backtracks all nodes in the sequence of activating , of which complexity is . Updating all nodes is thus . Therefore, the time complexity of Algorithm 3 is .

Expected Utility of the Attacker. At the end of a random activation, we obtain a sequence of nodes chosen to activate each inactive goal node. Thus, we estimate the attack value of each subset of inactive goal nodes according to the random activation based on Proposition 5.

Proposition 5. At time step , given the graph state , we suppose the attacker follows a random activation process to activate a subset of goal nodes . If the defender takes no further action, the attacker obtains an expected utility which is computed as follows: where and consist of all -nodes and -nodes in the sequences chosen by the random activation process to activate inactive goal nodes in .

Proof. In this equation, the first term accounts for the expected rewards of the goal nodes in the subset. In particular, for each goal node , the probability that becomes active at time step if the attacker follows the random activation process is . Conversely, node remains inactive. Therefore, the attacker receives an expected reward of regarding each goal node . Furthermore, second and third terms account for the costs of activating inactive nodes in the corresponding sampled-activation sequences. The probability indicates the probability that all preconditions of the -node become active and thus the attacker can activate with a cost . Similarly, is the probability that the chosen precondition of the -node becomes active and thus the attacker can activate with a cost .

Greedy Attack. For each random activation, the sampled-activation attack strategy aims to find a subset of inactive goal nodes to activate, which maximizes the attack value. However, finding an optimal subset of inactive goal nodes is computationally expensive, because there is an exponential number of subsets of inactive goal nodes to consider. Therefore, we use the greedy approach to find a reasonable subset of inactive goal nodes to attempt to activate. Given the current subset of selected inactive goal nodes (which was initially empty), we iteratively find the next best inactive goal node such that the attack value is maximized and add to . This greedy process continues until the attack value stops increasing: . Based on the chosen , we obtain a corresponding candidate subset: which need to be activated in current time step according to the sampled-activation process in order to activate the goal subset subsequently. We assign the value of the goal subset to this candidate subset.

Finally, by running random activation multiple times, we obtain a set of candidate subsets, each associated with an estimated attack value. The attacker action at is randomly chosen among these subsets of candidates following a conditional logistic distribution with respect to the attack values of these subsets.

Proposition 6. Suppose that the sampled-activation attack strategy runs random activation times; the time complexity of this attack strategy is .

Proof. For each random activation, it takes time to sample activation paths for the attacker to activate each goal node (Proposition 4). Furthermore, at each iteration of the greedy attack heuristic, given current chosen goal subset , the strategy examines all inactive goal nodes to find the next best goal node . Each such examination of an inactive goal node requires computing the expected utility of the attacker to follow the random activation to activate goal nodes in , which takes time (Proposition 5). Thus, the time complexity of each iteration of the greedy attack heuristic is . As a result, the time complexity of the greedy attack heuristic is . The sampled-activation attack strategy comprises pairs of executing random activation and greedy attack heuristic. Therefore, the complexity of this attack strategy is .

While level-0 heuristic strategies of each player do not take into account their opponent’s strategies, we introduce level-1 heuristic strategies, assuming the opponents play level-0 strategies. In the scope of this work, we focus on studying level-1 heuristic defense strategies.

5. Level-1 Defense Strategies

5.1. Defender Strategic Reasoning

Level-1 defense strategies assume the attacker plays level-0 attack strategies. At each time step, because the defender does not know the true graph state, it is important for it to reason about possible graph states before choosing defense actions. An overview of the defender belief update is shown in Figure 5.

As mentioned before, in our game, the defender knows the initial graph state, in which all nodes are inactive. As the game evolves, at the end of each time step , the defender updates her belief on possible graph states, taking into account (i) her belief on graph states at the end of time step ; (ii) her action at time step ; (iii) her observations at time step after the players’ actions at are taken; and (iv) the assumed attacker strategy. Finally, based on the defender’s belief update on graph states at the end of time step and the assumption of which heuristic strategy the attacker plays, the defender decides on which defense action to take at time step . In the following, we first study the defender’s belief update on graph states at each time step and then propose different defense heuristic strategies.

5.2. Defender Belief Update
5.2.1. Exact Inference

We denote by the defender’s belief at the end of time step , where is the probability that the graph state at time step is , and . At time step 0, . Based on the defender’s belief at time step , , her action at , , and her observation at , , we can update the defender’s belief based on Bayes’ rule as follows:where is the probability that the defender receives observation given the graph state is at time step . Because the alerts with respect to each node are independent of other nodes, we can compute this observation probability based on the observation probability of each node, as follows: In addition, is the belief state set associated with the defender’s belief at time step : . The probability of state transition is computed based on the state transition of every node: where is the transition probability computed in Algorithm 1. The set consists of all possible attack actions with respect to the graph state . Finally, is the probability that the attacker takes action at time step given the graph state at the end of time step is .

5.2.2. Approximate Inference

Exactly computing the defender’s belief (10) over all possible graph states at each time step is impractical. Indeed, there are exponentially many graph states to explore, as well as an exponential number of possible attack actions. To overcome this computational challenge, we apply particle filtering [19], a Monte Carlo sampling method for performing state inference, given noisy observations at each time step. This approach allows us to limit the number of graph states and attack actions considered. The overview of the particle filtering method is shown in Figure 6.

Essentially, the method samples particles (each particle is a sample of the graph state at time step ). Given belief , which provides a probability distribution over graph state set , we randomly sample a graph state at based on this distribution. Given the sampled graph state , we sample an attack action , according to and the assumption of the defender about which level-0 heuristic strategy the attacker plays. We then sample a new graph state at time step based on (i) sampled graph state ; (ii) sampled attack action ; and (iii) the defender’s action . Finally, we update the defender’s belief for the sampled graph state based on the defender’s observation at time step .

Finding an optimal defense action at given the defender’s updated belief requires taking into account an assumption regarding the attacker’s strategy and all possible future outcomes of the game as a result of the defender’s action at , which is computationally expensive. Therefore, we propose two different heuristic strategies for the defender. In the following section, we propose two new attack-modeling-based defense strategies that take into account the defender’s belief, called the value-propagation and sampled-activation defense strategies. These two defense strategies use concepts similar to the value-propagation and sampled-activation attack strategies.

5.3. Defense Candidate Set

At each time step , the defender has belief on possible graph states at the end of time step . For each state in the belief set , we define the defense candidate set, , which consists of active goal nodes, -nodes, and -postconditions of exploits in the attack candidate set . We aim to disable not only active goal nodes but also nodes in , to prevent the attacker from intruding further. where consists of -postconditions of exploits in .

5.4. Level-1 Value-Propagation Defense Strategy

The overview of the level-1 value-propagation defense strategy is illustrated in Figure 7. For each graph state in the belief set , based on the assumption of which heuristic strategy is played by the attacker, the defense strategy first samples attack actions for . The strategy then estimates the defense value of each node in the defense set based on the graph state and the set of sampled attack actions following the value-propagation approach, which will be explained later. This process is repeated for all graph states in the belief set . Finally, the value-propagation defense strategy computes the expected defense value for candidate nodes based on the defender’s belief and then probabilistically selects nodes to disable based on this expected defense value.

For each possible graph state , we first estimate the propagated defense reward of each node by propagating the defender’s rewards at inactive goal nodes to . Intuitively, the propagated defense reward associated with each node accounts for the potential loss the defender can prevent for blocking that node. The complete procedure is presented as Algorithm 4. The idea of computing propagated defense rewards is similar to computing propagated attack values. In lines 3–12 of Algorithm 4, we compute the propagated defense reward for all nodes . In particular, is the defense reward the postcondition propagates to the precondition with respect to the inactive goal node and time step .

1 Input: time step, , graph state, , inverse topological order of , ;
2 Initialize defense reward and for all nodes
and inactive goals , for all time steps ;
3  for    do
4  for    do
5for    do
6if    then
7;
8else
9;
10if    then
11Update ;
12  Return , ;

Based on computed propagated defense rewards, given the set of sampled attack actions , we then can estimate the defense values, , for defense candidate nodes as follows. In particular, for active goal nodes , comprises not only the cost but also the propagated defense reward and the defender’s reward, , at . For other defense candidate nodes, takes into account the attack strategy to compute the probability that becomes active as a result of the attacker’s action at . Essentially, the higher the probability that a candidate node becomes active, the higher the defense value for the defender to disable that node. The defense value for each node (which is not an active goal node) takes into account the probability that becomes active (as a result of sampled attack actions), denoted by , which is equal to where is a binary indicator for condition . Finally, the probability that the defender will choose each node to disable is computed according to the conditional logistic function of the expected defense values. The number of chosen nodes to disable is a certain fraction of the cardinality of .

5.5. Level-1 Sampled-Activation Defense Strategy

In this defense strategy, we leverage the random activation process as described in Section 4.2.4 to reason about potential attack paths the attacker may follow to attack goal nodes. Based on this reasoning, we can estimate the defense value for each possible defense action. The defense value of each defense action indicates the impact of the defender’s action on preventing the attacker from attacking goal nodes. We then select the action which leads to the highest defense value. An overview of the sampled-activation defense strategy is illustrated in Figure 8. The details of the strategy are explained in the following Sections 5.5.1, 5.5.2, and 5.5.3.

5.5.1. Sample Attack Plans

We first sample attack plans of the attacker at time step . Each plan refers to a pair of an attack action at and a corresponding attack path the attacker may take to reach inactive goal nodes in future. At time step , for each possible graph state , we sample random activation, each resulting in sampled-activation sequences toward inactive goal nodes. We also sample a set of attack actions, , where , according to the defender’s assumption about the attacker’s strategy. For each , we select the best random activation among the sampled-activation samples such that performing at can lead to the activation of the subset of inactive goal nodes with the highest attack value according to that random activation (Section 4.2.4). We denote by the sequence of nodes (sorted according to the topological order of the graph) which can be activated in future time steps based on and the corresponding selected random activation. We call each pair an attack plan for the attacker.

5.5.2. Estimate Defense Values

For each possible graph state , based on sampled attack plans , we can estimate the defense value of any defense action with respect to as follows: where is the defender’s value for playing action against which is determined based on which goal nodes can potentially become active given players’ actions and . To determine these goal nodes, we iteratively examine nodes in to find which goal nodes of which sequence are not blocked by the defender’s action . Recall that is a sequence of nodes to activate in the chosen random activation to activate .

This search process is shown in Algorithm 5, where indicates if the attack sequence to node according to is blocked by the defender () or not (). Initially, for all nodes in while for all . While examining non-root nodes in , an -node is updated to if all preconditions of in are blocked. On the other hand, an -node becomes blocked when any of its preconditions is blocked. Given , we can estimate the defense value at time step as follows:where the first term is the cost of performing . The second term accounts for the potential loss of the defender which comprises the blocked status, the activation probability, the activation time step, and the rewards of all the goal nodes which either are active or will be activated based on the attack plan . Finally, based on the defense value with respect to the graph state , the expected defense value of each over the defender’s belief is computed as follows.

1 Input: ;
2 Initialize block status isBlocked for all and isBlocked for all
;
3  for   with isBlockeddo
4if    then
5if    then
6;
7else
8if    then
9;
10 Return .

Proposition 7. Given a graph state , a defense action , and a set of sampled attack plans , the time complexity of the estimation of the defense value of is .

Proof. Given a graph state , a defense action , and a sampled attack plan , Algorithm 5 iterates over each node in the attack sequence to update the blocked status of based on the preconditions of this node. Because of the directed acyclic topological structure of the graph, Algorithm 5 has to examine each node in the sequence and its incoming edges once. Therefore, the time complexity of Algorithm 5 is . In addition, given the blocked status of the graph computed by Algorithm 5, we estimate the defense value based on the cost of disabling nodes in and the potential loss regarding goal nodes of which attack paths are not blocked. The time complexity of this estimation is since . Since there are sampled attack plans, the time complexity of computing the defense value is .

5.5.3. Greedy Defense Strategy

Finding an optimal defense action according to the expected defense value is computationally expensive, because there is an exponential number of possible defense actions. Therefore, we propose two different greedy heuristics to overcome this computational challenge.

Static Greedy Heuristic. This heuristic greedily finds a reasonable set of nodes to disable over the defender’s belief . These nodes are chosen from the defense candidate set with respect to the defender’s belief , which is defined as where is the defense candidate set with respect to the graph state . Given the current set of selected nodes (which was initially empty), the heuristic finds the next best node such that is maximized. The iteration process stops when disabling new nodes does not increase the defender’s value: for all .

Proposition 8. Suppose that the number of graph states to sample in particle filtering is (which means that ). The time complexity of the static greedy heuristic is .

Proof. The size of the defense candidate set is at most . Therefore, there are at most iterations of finding next best node in the static greedy heuristic. In each iteration of the heuristic, given the current set of selected nodes , for each candidate node , we compute the defense value ; this step takes time according to Proposition 7. Since there are at most nodes in the candidate set, the time complexity of static greedy heuristic is .

In addition, we propose a randomized greedy heuristic to select a defense action from a set of greedy defense actions probabilistically. These greedy defense actions are generated based on each in .

Randomized Greedy Heuristic. This heuristic greedily finds a reasonable set of nodes to disable with respect to each in . For each , given the current set of selected nodes (which was initially empty), the heuristic finds the next best node such that is maximized. As a result, we obtain multiple greedy defense actions corresponding to possible game states in . The defender then randomizes its choice over according to a conditional logistic distribution with respect to the defense value computed based on the defender belief . The following proposition provides the time complexity of the randomized greedy heuristic. This proposition can be proved in a similar way to the complexity of the static greedy heuristic.

Proposition 9. The time complexity of the randomized greedy heuristic is .

6. Incorporating Uncertainty about the Attack Graph

In the previous sections, we presented our security game model and heuristic strategies of players on the same attack graph —both the defender and the attacker know the true underlying attack graph of the network system. In this section, we discuss an extension of our model and algorithms to incorporate uncertainty about the attack graph from the players’ perspectives. In other words, both the defender and the attacker have their own attack graphs which are different from the true one.

In particular, we denote by the true attack graph of the network system. Let and be the attack graphs of the attacker and defender, respectively. Here, and are the sets of nodes and exploits from the view of the attacker and the defender of the network system. Given this extension, it is straightforward to redefine our heuristic strategies for the attacker and the defender based on and separately. For example, the uniform attack heuristic chooses an attack action from the exploit set uniformly at random. Similarly, the value-propagation attack heuristic performs the attack value propagation as well as the probabilistic selection of an attack action based on the attack graph instead of the true .

In evaluating the effectiveness of our proposed heuristic strategies, our empirical game-theoretic analysis simulates the attack progression on all three attack graphs: (i) the true attack graph; (ii) the defender’s attack graph; and (iii) the attacker’s attack graph. The whole simulation process is illustrated in Figure 9. The states of the attack graphs with respect to the defender and the attacker are determined based on the states of the true attack graphs. The transition of the states on the true attack graphs depends on the actions of the players chosen from our heuristic strategies. For example, if the attacker performs a nonexisting exploit, there will be no change of the true graph state. Finally, players’ payoffs are defined based on the true attack graph states, which are used to evaluate the effectiveness of the heuristic strategies.

7. Experiments

We evaluate the effectiveness of the proposed strategies in various settings with different graph topologies, node type ratios, and levels of defender observation noise. As in prior EGTA treatments of cybersecurity scenarios [40], we employ the simulation data to estimate a two-player normal-form game model of the strategic interaction.

7.1. Player Strategies

We tune the strategies for the players by adjusting their parameter values. In our experiments, the fraction of candidates chosen to attack for the attacker’s strategies is of the total number of attack candidates. The logistic parameter value is . As a result, our experiments consist of nine different attack strategy instances: (i) a No-op (aNoop) instance in which the attacker does not perform any attack action; (ii) two uniform (aUniform) instances with ; (iii) four value-propagation (aVP) instances with and ; and (iv) two sampled-activation (aSA) instances with .

The fraction of nodes chosen to protect for the defender strategies is of the total number of defense candidate nodes. The logistic parameter value is . In addition, the defender’s assumption about the attacker strategy considers the same set of aforementioned attack parameter values. Thus, we evaluate 43 different defense strategy instances: (i) a No-op (dNoop) instance in which the defender does not perform any defense action; (ii) two uniform (dUniform), two min-cut uniform (dMincut), and two root-only uniform (dRoot-only) instances with ; (iii) four goal-only (dGoal-only) instances with and ; (iv) 16 aVP-dVP instances: the defender follows the value-propagation defense strategy while assuming the attacker follows the value-propagation attack strategy; these 16 defense strategy instances correspond to: and , and ; (v) eight aSA-dSA instances; these eight strategy instances correspond to , , and whether the defender uses randomized or static greedy heuristics; and finally (vi) eight aVP-dSA instances; these eight strategies correspond to , , and .

7.2. Simulation Settings

We consider two types of graph topology: (i) layered directed acyclic graphs (layered DAGs) and (ii) random directed acyclic graphs (random DAGs). Graphs in the former case consist of multiple separate layers with edges connecting only nodes in consecutive layers. We generate 5-layered DAGs. The layer () has nodes. All nodes in the last layer are goal nodes. In addition, edges are generated to connect every node at each layer to of nodes at the next layer (chosen uniformly at random). In the latter case, random DAGs are generated with and . In addition to leaf nodes, other nodes in random DAGs are selected as goal nodes uniformly at random given a fixed number of goal nodes (which is 15 in our experiments). The proportion of -nodes is either zero or half.

The defender’s cost to disable each node , , is generated within uniformly at random where is the shortest distance from root nodes to . The attacker’s costs are generated similarly. The attacker reward and the defender’s penalty at each goal node are generated within uniformly at random. Finally, the activation probability associated with each edge with an -postcondition and with each -node is randomly generated within and respectively.

We consider three cases of observation noise levels: (i) high noise: the signal probabilities and are generated within and uniformly at random, respectively; (ii) low noise: and are generated within and ; and (iii) no noise: and . The number of time steps is . The discount factor is .

7.3. Strategy Comparison

Based on the aforementioned settings, we generated 10 different games in each of our experiments. For each game, we ran 500 simulations to estimate the payoff of each pair of players’ strategy instances. As a result, we obtain a payoff matrix for each game based on which we can compute Nash equilibria using Gambit [41]. We compute the utility each player obtains for playing the proposed strategy instances (instead of the equilibrium strategy) against the opponent’s equilibrium strategy. We compare that with the utility of the players in the equilibria to evaluate the solution quality of the proposed strategies. Each data point of our results is averaged over the 10 games. In addition, instead of showing results of every individual defense strategy instance (43 in total), we present results of the defense strategies averaged over all corresponding instances.

7.3.1. Results on Layered DAGs

Our first set of experiments is based on layered DAGs, as shown in Figure 10 (-nodes) and Figure 11 (-nodes). In these figures, the x-axis represents the defender’s or the attacker’s expected utility, and the y-axis represents the corresponding strategy instances played by the players. For the purpose of analysis, we group these strategy instances based on heuristic strategies played, which are represented via the shading on bars. For example, Figures 10(a), 10(b), and 10(c) show the attacker’s utilities for playing the strategies indicated on the y-axis against the defender’s equilibrium strategy, when the percentage of -nodes in graphs is .

In Figures 10(d), 10(e), 10(f), 11(d), 11(e), and 11(f), the defense sampled-activation strategy (aVP-dSA and aSA-dSA) obtains the defense utility closest to the equilibrium utility (Eq) in all game settings regardless of the assumed attack strategies (i.e., whether the attacker follows the value-propagation or sampled-activation attack strategies with certain parameter values). In fact, when the percentage of -nodes is , the Nash equilibria obtained for all games based on EGTA comprise only the defender’s sampled-activation strategy instances. This result shows that the sampled-activation defense strategy is robust to the defender’s uncertainty about the attacker’s strategy. Furthermore, when the observation noise level increases from no noise to high noise, the defender’s sampled-activation strategies do not suffer from a significant loss in utility. This result implies that the sampled-activation defense strategies are also robust to the defender’s uncertainty about true graph state. Among the no-belief-update strategies, the goal-only strategy outperforms the root-only, min-cut, and uniform strategies in all game settings. This result shows that goal-only is a good candidate strategy when the defender’s belief update is not taken into account. In addition, the goal-only strategy even obtains a higher utility than aVP-dVP in the cases of low and high observation noise.

Figures 10(a), 10(b), 10(c), 11(a), 11(b), and 11(c) show that in all game settings, the attacker sampled-activation strategy (i.e., aSA- and aSA-) consistently obtains high attack utility compared with the attacker’s equilibrium strategy (Eq). Even though the defender’s equilibrium strategies focus on competing against the sampled-activation attack strategy, this attack strategy still performs well. The utility obtained by the attacker’s value-propagation strategy (aVP--, aVP--, aVP--, and aVP--), on the other hand, varies depending on the value of the attack logistic parameter (). In particular, both aVP-- and aVP-- with obtain a considerably higher utility for the attacker compared with aVP-- and aVP-- with . Compared to all other strategies, the attacker’s uniform strategy (aUniform- and aUniform-) obtains the lowest attack utility.

When the percentage of -nodes is , aNoop gets a high probability in the attacker’s equilibrium strategies in all game settings. In Figures 11(a), 11(b), and 11(c), aNoop obtains an attacker utility (which is zero) approximately the same as the equilibrium attack strategy. In fact, when the number of -nodes is large, it is difficult for the attacker to intrude deeply in the attack graph, because compromising -nodes take more effort. Consequently, the defender obtains a significantly higher utility when the percentage of -nodes is (Figures 11(d), 11(e), and 11(f)) than when it is (Figures 10(d), 10(e), and 10(f)). The dNoop strategy is also included in the defender’s equilibrium strategy when the -node percentage is . Finally, in Figure 10(c), the attacker’s equilibrium utility is approximately zero even when all the nodes are of -type. This result shows that when the defender knows the graph state, our sampled-activation defense strategy is highly effective such that the attacker cannot achieve any benefit from attacking.

In addition to the results on average in Figures 10 and 11, we provide a detailed equilibrium analysis of each individual game. We provide detailed equilibrium results of each individual game with respect to layered directed acyclic graphs with the percentage of -nodes being . In particular, Figures 14 and 15 show equilibria of 10 games in the case of high noise. Figures 16 and 17 show equilibria of 10 games in the case of low noise. Finally, Figures 18 and 19 show equilibria of 10 games in the case of no noise. In addition to the results on the attacker and the defender’s expected utility, we present the equilibrium of each game which consists of the probability that the players play each strategy instance. For example, Figure 14(a) shows the equilibrium analysis of Game 1. In Game 1, the defender’s equilibrium strategy is to play (i) the strategy instance (aVP-0.5-3.0)-(dSA-3.0-ranG) (the defender plays the sampled-activation defense strategy following the randomized greedy heuristic with , assuming the attacker plays the value-propagation strategy with and ) with a probability of ; and (ii) the strategy instance (aVP-0.3-3.0)-(dSA-3.0-ranG) (the defender plays the sampled-activation defense strategy following the randomized greedy heuristic with , assuming the attacker plays the value-propagation strategy with and ) with a probability of . On average over all strategy instances according to different parameter values, the defender’s strategy aVP-dSA obtains the expected utility of the defender closest to the equilibrium utility.

Overall, the strategy instances of the players involved in each game’s equilibrium vary across different games. Yet, the result of the players’ expected utility for playing each heuristic strategy (averaged over all corresponding strategy instances) is consistent among all games. For example, the sampled-activation defense strategy outperforms other heuristic defense strategies in terms of obtaining an expected utility of the defender closest to the equilibrium utility in all games.

7.3.2. Results on Random DAGs

Our second set of experiments is based on random DAGs (Figure 12). This figure shows that the solution quality of the defender’s strategies is similar to the case of layered DAGs. The defender’s sampled-activation strategy obtains a defense utility approximately the same as the equilibrium defense strategy (Eq) (Figures 12(d), 12(e), and 12(f)). For the attacker’s strategies, the sampled-activation attack strategy does not always obtain a high utility for the attacker. In fact, this attack strategy obtains the lowest attack utility in the case of low observation noise (Figure 12(b)). The solution quality of the value-propagation attack strategy, on the other hand, varies depending on the values of both the logistic parameter and the percentage of candidates chosen to attack . For example, when there is no observation noise (Figure 12(c)), aVP-- and aVP-- with obtain a considerably higher attack utility compared to aVP-- and aVP-- with . On the other hand, in the case of low observation noise (Figure 12(b)), aVP-- and aVP-- with obtain the highest attack utility compared with other attack strategies.

7.4. Strategy Robustness

In our third set of experiments, we evaluate the robustness of the defender’s strategies to the uncertainty about the attacker’s strategy. In particular, in addition to the nine aforementioned attack strategy instances, we consider seven more strategy instances corresponding to new parameter values: and . The seven new attack strategy instances include (i) one uniform instance with ; (ii) four value-propagation instances with and ; and (iii) two sampled-activation instances with . Meanwhile, we use the same 43 defense strategy instances as in the previous sets of experiments. None of these defense strategy instances is based on a model of the attacker using any of the seven new attack strategy instances. We use Nash equilibria of the resulting new games to analyze the solution quality of the proposed strategies. We aim to examine both the attacker’s gain and the defender’s loss in utility with respect to the new attack strategy instances. The experimental results are shown in Figure 13 for layered DAGs with -nodes.

Figures 13(a), 13(b), and 13(c) show that the attacker obtains a significantly higher utility on average compared to the corresponding results shown in Figures 10(a), 10(b), and 10(c). On the other hand, the defender suffers a moderate loss in utility (Figures 10(d), 10(e), and 10(f) versus Figures 13(d), 13(e), and 13(f)). For example, in the case of high noise in the defender’s observations, the attacker utility is , at equilibrium over the original nine attack strategy instances (Figure 10(a)). The attacker’s equilibrium utility is ( times higher) when the seven new attack strategy instances are also allowed (Figure 13(a)). Conversely, the defender’s equilibrium utility is approximately with the seven new attacker strategies allowed (Figure 13(d)), which is times lower than its equilibrium utility of for the original, smaller attacker strategy set (Figure 10(d)).

The new attacker strategy instances yield significantly higher utility for the attacker than the original set. For example, the sampled-activation instance aSA- with obtains a utility of (Figure 13(a)), while aSA- and aSA- with and only get a utility of and , respectively. Figure 13(a) shows that the attacker’s utility achieved by the value-propagation strategy instances with new , including aVP--, aVP--, and aVP--, is closer to the equilibrium utility than the utility of the earlier value-propagation instances.

Although the defender suffers reduced payoffs when the new attack strategy instances are allowed, the relative performance of the defender’s heuristic strategies is consistent with the previous experimental results (Figures 10(d), 10(e), and 10(f) versus Figures 13(d), 13(e), and 13(f)). In particular, the sampled-activation defense strategy still obtains the highest utility (which is close to the equilibrium utility) for the defender, regardless of which assumption the defender makes about the attacker’s strategy. Among the graph-based defense strategies, the dGoal-only strategy performs the best in all three cases of the defender’s observation noise level.

8. Summary

We study the problem of deploying security countermeasures on Bayesian attack graphs to protect networks from cyber-attacks. We propose a new simultaneous multistage attack graph security game model. Our game model encapsulates security environments with significant dynamics and uncertainty as a stochastic process over multiple time steps. We employ EGTA to analyze this complex security game, as obtaining an analytical solution would be impractical. We propose different parameterized heuristic strategies for both players which leverage the topological structure of attack graphs and employ sampling to overcome the computational complexity. We provide a detailed analysis of the time complexity of our proposed heuristic strategies, which is polynomial. Our EGTA results on various game settings show that our defense heuristics not only outperform several baselines, but also are robust to defender uncertainty about the security environment, including graph states and the attacker’s strategy.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure

This manuscript is an extension of a paper in the proceedings of the 2017 Workshop on Moving Target Defense [42].

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by MURI grant W911NF-13-1-0421 from the US Army Research Office. We benefited greatly from discussions with colleagues on this project, especially Erik Miehling and Massimiliano Albanese.