Abstract

Security measurement matters to every stakeholder in network security. It provides security practitioners the exact security awareness. However, most of the works are not applicable to the unknown threat. What is more, existing efforts on security metric mainly focus on the ease of certain attack from a theoretical point of view, ignoring the “likelihood of exploitation.” To help administrator have a better understanding, we analyze the behavior of attackers who exploit the zero-day vulnerabilities and predict their attack timing. Based on the prediction, we propose a method of security measurement. In detail, we compute the optimal attack timing from the perspective of attacker, using a long-term game to estimate the risk of being found and then choose the optimal timing based on the risk and profit. We design a learning strategy to model the information sharing mechanism among multiattackers and use spatial structure to model the long-term process. After calculating the Nash equilibrium for each subgame, we consider the likelihood of being attacked for each node as the security metric result. The experiment results show the efficiency of our approach.

1. Introduction

Security measurement matters to every stakeholder in network security and involves all the stages and aspects of the entire life cycle. There would be no effective security awareness and actions without accurate security measurement. The existing security measurements mainly focus on the relationship between exploits and system vulnerabilities, and their security measurements of unknown threats like zero-day loophole are very limited. In addition, zero-day attacks targeting governments and corporates are growing with time. An increasing number of hackers, motivated by their persistent love for technology or tempted by profits, are attempting to discover and propagate zero-day exploits. In 2016, 10822 vulnerabilities were found in China, and 2203 of them were zero-day vulnerabilities (http://www.cert.org.cn/publish/main/upload/File/2016CNVDannual1.pdf), which may cause serious consequences. According to The Hacker News, hackers exploited the zero-day vulnerability to attack Bangladesh’s central bank in 2016 and stole over $80 million from the Federal Reserve Bank (http://thehackernews.com/2016/03/bank-hacking-malware.html). In such case, it poses a great challenge as to carry out effective measurements of threats posed by zero-day vulnerabilities to help system administrators better understand and guard against them.

Current security measurements are mostly around known vulnerabilities. They get the result of the measurement after analyzing the attacks and coming up with corresponding rules. Such measurement can be a distortion from real-life situation. For example, some vulnerabilities proven highly threatening according to CVSS’ measurement are not actually exploited too much by the attackers. The work proposed by Wang et al. [1] has similar problems; this is one of the few measurement works targeting hidden vulnerability. The paper uses attack graph to analyze and decide the minimal number of vulnerabilities needed to achieve a set goal, assuming that there is a hidden vulnerability on any of the nodes. The number will be used as a reference for security measurements. This study measures only from the dimension of complexity, not taking the differences in the attackers’ preference, and the result is only a theoretical one, so there is not too much reference value for the result. For example, if there is a network with weak protection and only one hole is needed to break into the network, then the measurement result should be highly risky according to the method above. However, the attackers will not attack a network that they deem of no value; Honeynet is a case in point. Therefore, when measuring unknown threats, we need to include the attacking decision of the attacker as a significant metric and take the degree of complexity, possible risks, benefits, and so on into consideration. This way, we can analyze the decision the attacker is most likely to take and get a more realistic measurement result. That is to say, if, during a certain period, the attackers are eager to attack a node, then we will think this node is facing serious threats at this time; otherwise, the result should not be labeled as highly threatening.

The security measurement of unknown threats with the attack behavior preferences in mind faces the following three major challenges:(i)Modeling zero-day attacks from the time dimension: current studies express unknown vulnerabilities using attack graph, which are mainly from the spatial dimension. But zero-day attack is a long-term process. The zero-day exploit (also called cyber resources) will always be available unless the vulnerability is fixed by the defender. As the process goes on, the attacker needs to make a tradeoff between risk and profit for each time point: if the vulnerability is exploited, the attacker may get some profit but the risk of being noticed may also increase. On the contrary, if the zero-day vulnerability is exploited too late or not exploited, there is also a chance for the defender to fix it, leaving no chances for the attacker. So the first challenge is how to model this process properly from the dimension of time.(ii)Identify and calculate the factors that change over time: in the course of the zero-day attack and defense, the factors that influence the attacker’s decision-making include the potential profits, the attacker’s knowledge of the defending party, and the risks related to the attack. A comprehensive and reasonable expression and calculation of these factors are the basis for predicting attack decision.(iii)Predict the attack decision and measure the unknown threat: after setting the relevant parameters, how to use these parameters to accurately predict the attacker’s possible decision at each time point determines the accuracy and reliability of the entire security measurement.

To overcome these challenges, this paper, which is based on our previous work [2], uses long-term game theory to predict the behavior of the attacker (the attack-defense scenario is shown in Figure 1) and then propose a new security metric based on the prediction. The main contributions are as follows:(i)We present a multiple game model: we consider zero-day attacks as a long-term process. The attacker’s decision (attack or wait) at any point in time will have a corresponding effect on the later game process. Therefore, we discuss the continuous game process in a certain period of time.(ii)We discuss the factors that affect the attack and defense: we discuss the factors that affect the attack and defense, particularly the change of the attacker’s understanding of the defensive ability of the defender. At the beginning of the game, attackers know little about the defender. But with the game process, their understandings become more and more accurate due to their observation and information sharing. We design a learning mechanism to simulate the correction process.(iii)We propose a new security metric based on attack prediction: from the attacker’s point of view, we calculate the Nash equilibria for each subgame. Using the result as an important reference, we propose a new security metric for unknown threat.

The rest of the paper is organized as follows. Section 2 surveys the related work. Section 3 introduces the preliminaries. Section 4 introduces long-term game formulation. Section 5 discusses the details about the security measurement method. Section 6 reports experimental results, and Section 7 gives the conclusions.

There have been plenty of research studies on network security metrics. They focus on different aspects, such as metrics of system vulnerabilities, metrics of defense power, metrics of situations, and metrics of attack or threat severity [3]. In detail, when assessing the risk of malware threat, Hardy et al. [4] proposed the targeted threat index combining the social engineering and technical sophistication. Thakore [5] provided a set of metrics such as coverage, redundancy, confidence, and cost to quantitatively evaluate monitor deployments. Kührer et al. [6] focused on the effectiveness of malware blacklists and showed that the current blacklist is insufficient to protect against the variety of malware threats. There are other studies focused on evaluating the strength of IDS or other security products [7, 8], strength of user password [9, 10], and so on. However, the effects of all the metrics mentioned above are not ideal when faced with unknown threats.

In order to make a better understanding of zero-day attack, some researches focus on analyzing the attack itself such as detecting and identifying the attack. To identify the unknown files, Avasarala et al. [11] introduced the class-matching approach. Mishra and Gupta [12] proposed a hybrid solution which uses the concept of CSS matching and URI matching to defend against zero-day phishing attacks. Wang et al. proposed some representative works on measuring the zero-day attack [1, 13, 14]. To evaluate the robustness of networks, [13, 14] modeled network diversity as a security metric and then proposed two complementary diversity metrics. The paper [1] conducted the evaluation process based on how many zero-day vulnerabilities are required to compromise a network asset. However, all these works are conducted based on attack graph and do not consider the attacker behavior.

Analyzing the attacker behavior is of great importance when measuring the network security. Ekelhart et al. [15] developed a simulation-driven approach which took attack strategies and attacker behavior into consideration. Al-Jarrah and Arafat [16] used the time delay neural network which embedded the temporal behavior of the attacks to maximize the recognition rate of network. Mitchell and Chen [17] proposed specification-based IDS which can adapt to different attacker types such as reckless, random, and opportunistic attackers. In this way, it could get a higher detection accuracy. Allodi and Massacci first pointed out that not all the vulnerabilities were equally exploited by the attacker [18] and then focused on the choice of attackers [19]. By validating the actual “traces” attacks left on real systems, they claimed that the real attacker would behave less powerful than we thought and would not exploit every vulnerability. The attackers would strategically choose the busy periods and some certain vulnerabilities, while the efforts of security professionals were diffused across many vulnerabilities [20, 21]. Based on this observation, Dumitraş [22] proposed a novel metrics that enabled a more accurate assessment of the risk of cyberattacks. Bozorgi et al. [23] used machine learning method with high dimensional feature vectors input to predict the vulnerability which was most likely to be exploited by the attacker. All these analyses, however, are from the perspective of defender, ignoring the information sharing mechanism among attackers where the mechanism is the most important part during the attack and can guide attackers to changer their strategies dynamically.

3. Preliminaries

Before introducing the long-term game, we begin with the simple attack-defense game. The players of the game are attacker and defender. Liang and Xiao [24] divide the game applications into two subclasses: general analysis and the specialized analysis. In general analysis, the networks are often not specific but abstract, and the strategy set of attacker SA = ; meanwhile, the set of defender SD = . Let represent the attack cost, represent the defense cost, and represent the profit of a successful attack. In this section, is a fixed value. Their payoffs are as shown in Table 1. In [25], a Remainder cost was defined to indicate the damage that the attack brought to the system after the defender implemented the defense strategy, and where . For simplicity, we assume that in this section; that is, if the defender implemented the defense strategy, there is no damage to the system and no reward to the attacker.

According to Table 1, we can see that when the attack cost is fixed, if the attacker and the defender are completely rational, both of them will make their decisions by calculating the Nash equilibrium, and the Nash equilibrium is related to the parameter discussed below:(1)If , the attacker will not attack and the defender will not protect. In this case, we can say the network is pretty safe.(2)If , the attacker will attack, but the defender still not protect. In this case, we can say the network is of great danger.(3)If and , no pure Nash equilibrium exists, but there is a mixed Nash equilibrium; that is the attacker will choose to launch an attack with the probability of , and the protecting probability . In this case, we can say the network is a little bit safer than case two, but more dangerous than case one.

It can be seen from the above model that if the defender decides to protect the target, the attacker cannot finish his attack successfully, consequently getting no reward. As time goes on, this one-shot game is repeated time after time, and there is no necessary correlation between each of them. This is a simple case for security metrics, but it does not apply to the attacker who holds the zero-day exploits of certain target, mainly for the following two reasons. First, the payoff is much different. According to the stealth of zero-day exploits [26], most of the software and the security products cannot detect the existence of it and thus difficult to resist the zero-day attacks effectively. So, the attacker can get the corresponding profit only if he/she launches an attack. Second, the strategy is much different. Compared with the one-shot game, attacker with zero-day exploits is more concerned about the persistence of the resource, he/she has to make sure if this resource will still be useful after this attack and then makes a tradeoff between risk and profit. Therefore, the key points to the entire game process are the attacker’s decision and the defense capability. In next section, we will discuss the detail of the long-term game.

4. Long-Term Game Formulation

We analyze the attack-defense scenario between multiple attackers and a single target. The target could be a company or an organization, and it contains a lot of nodes. And we assume that each node has at least one cyber resource (where the node has no resource is out of our discussion). All of these vulnerable nodes are protected by the same administrator, so we assume that all the nodes share the same defense capability. We define an attack-defense game as a combination of one resource and one node. The attacker who owns more than one resource means that he will be involved in more than one attack-defense game. So the relationship is a many-to-many mapping. As mentioned above, our discussion takes place over a certain period of time since the zero-day attack is a long-term process. We assume that it takes one-time tick to complete an attack, and if the vulnerability was not discovered by the administrator this time, it will still be useful to the attacker next time. For each time tick, we compute the probability of being attacked for every node and take these results as the security metric. In this section, we first introduce the game formulation and some key parameters and then we focus on the attackers learning strategy.

4.1. Attack-Defense Game

Cyber Resource. We call a zero-day exploit or a set of zero-day exploits a cyber resource to the attacker [27]. A cyber resource could help attacker to finish a certain attack. If one or more exploits are fixed or expired, which could cause the failure of the attack, then we say the resource is expired.

The Number of Long-Term Game. In this attack-defense scenario, there are total attackers, cyber resource, and nodes. During the game, is a constant, but and will change from time to time. Compared with the number of attackers, we are more concerned about the number of resources, because resource means attack-defense games. We use to represent the total number at time tick . There are three main aspects that will influence the number. First, at the beginning of time tick , is used to denote the newly joined resource. Second, at the end of time point , is used to denote the number of expired resources. Third, according to the defense capability, some of the resources are randomly eliminated at the end of time point , and the number is notated as . Then the total number of attack-defense games at time point is

The Duration of Each Long-Term Game. Different from the one-shot game, zero-day attack-defense game is a long-term game. It consists of multiple subgames over a period of time. The game will go on as long as the resource is still useful. There are mainly three possibilities to terminate the game:(1)The resource is expired. We use notation to represent the lifecycle of certain resource. Different vulnerabilities have different lifecycles. For example, compared with buffer overflow and executable code, other vulnerabilities such as PHP vulnerability or SQL injection often have longer lifecycles [28]. If one of these vulnerabilities expired, we say the resource is expired.(2)The resource was found and patched due to the attack action. We call this the passive-defense capability of the administrator, denoted as , indicating the probability of discovering the vulnerability after being attacked.(3)The resource was found and patched when nothing happens. We call this the initiative-defense capability of the administrator, denoted as , indicating the probability of discovering the vulnerability before being attacked.

and are determined by the capability of the administrator itself, and both are fixed values which are known to the administrator and unknown to the attackers during the game process. Each attacker has its own assessment about these two values, denoted as and . After each subgame, the attacker will revise the assessment by observing the state of other neighbors and exchanging information with each other. This is consistent with the values of most hackers who believe that all the information should be free. They think everyone has the rights to access information, so, many hacker groups form a unique “black ecosystem” to share information more effectively. According to the director of Baidu security laboratory (Baidu is the predominant search engine in China), the information sharing mechanism among such black ecology is much better than those among white ecosystem. The details about the update rules will be discussed at the next subsection.

The Strategy Set for Each Subgame. The same as the one-shot game mentioned above, the strategy set of attacker SA = ; meanwhile, the set of defender SD = .

The Payoff for Each Subgame. For any resource , at any time , the payoff of the subgame consists of three parts: the attack cost denoted as , the one-time attack income , and the long-term profit expectancy from time to the end of the long-term game, denoted as . is dependent on four factors including , , , and the attacker’s action last subgame. When making a decision, what the attacker really cares about is not only the one-time attack income for the attack, but also the long-term profit expectancy during the whole lifecycle. The specific parameter settings are discussed in the next subsection. For the administrator, the loss also includes two parts, the defense cost denoted as and the loss caused by the attack. In order to reduce the complexity of this model, it is assumed that the attacking loss equals the negative of the attacker’s attack revenue.

Game Rules. For each subgame, given the payoffs, both attacker and administrator determine their decisions by calculating the Nash equilibrium. At the end of each subgame, if the resource is still useful, attacker will revise the assessment by observing the state of other neighbors and exchanging information with each other in order to recalculate the payoffs for next subgame.

4.2. Some Key Parameters

Notations lists the key parameters used in our long-term game model. In this paper, we assume that , , , , and are fixed value, and we mainly discuss the following three parameters.

Gain Function of Time . For certain resource , gain function of time represents the one-time attack income that the attacker can get when he attacks the node at the certain time . This could be a fixed value or a function of time, and that depends on the target type. For example, if the attacker wants to attack some e-commerce platforms, he can get better rewards in some specific days such as Black Friday. For the sake of discussion, Figure 2 shows the instantaneous yield curves of attacks against a live video platform and a bank over different time periods. It can be seen that, for a bank attack, the attack revenue on weekdays is higher than that on weekends, and the attacks on working hours are higher than the off-hours. On the contrary, the attack revenue is relatively higher on weekends and late nights when the attack target is changed to the video platform. In general, we assume that is affected by the target’s visiting traffic, business process, and customer’s work schedule. The specific values are beyond the scope of this article. This article assumes that both attackers and administrator have a good understanding of the target of the attack, so the value of is known.

Profit Expectancy . For the certain resource , we use to represent the profit expectancy from time to the end of the lifecycle, so , where and . So, we need to compute all the for , . If notation is used to indicate that the resource is still available at time point and notation is used to indicate that the attacker will choose to attack, then . So we have

means that the attacker has attacked last time but has not been discovered, and means that the attacker has not attacked last time and has not been discovered. To simplify the computation, it is assumed that the defender will always be protected when calculating . And used in (4) is an approximate value. Because the exact value of which is determined by calculating the Nash equilibrium of the subgame at time point could not be known, the probability is used instead. According to the above recursive formula, we can get

The Estimation of Administrators Defense Capability and . During the game, the attacker does not know the real and , but each attacker has its own assessment about these two values, denoted as and . After each subgame, the attacker will update the assessment by observing the state of other neighbors and exchanging information with each other. The update rule includes the following main steps.(i)Initializing the assessment randomly at the beginning of the game: the initial assessments are random because the attacker knows little about the defender.(ii)Calculating the observed result of and : at the end of the time point , the attacker observes his neighbors and counts the numbers of them (1) who had attacked this time and been discovered and (2) who had not attacked this time and been discovered and then calculates the observed result of and .(iii)Combining the observed result, neighbors’ assessment with his previous assessment to be his new assessment: when combining these three results, the reference value difference should be considered. For the neighbor who has survived longer, its assessment has a higher reference value. What is more, at the beginning of the game, the observed result plays an important role. However, as the game goes on, this importance diminished. Because the observed samples, that is, attacker’s neighbors, are limited, the observed result has a strong randomness.

Some new parameters and notations are introduced as follows. Let denote the probability of the attacker being eliminated after attack, so . Similarly, is used to denote the probability of being eliminated without attack, so . For any attacker at the end of time point , and represent the attacker’s assessment of and , , and are the initial estimates. We use to denote the neighbors’ strategy and to denote whether these neighbors are found by the defender or not, where is the total number of neighbors of , , . is used to denote the neighbors who had attacked this time and had been found, so . is used to denote the neighbors who were not attacked and had been found, so . is used to denote the calculated by neighbor . Let denote the observed value of , so ; let denote the observed value of so . After introducing above notations, this paper presents three plans for revising the parameters and ; we will make a brief introduction.

Plan 1: Average Summation. Record all the and for each neighbor and resource itself, and then take the average with to calculating , donated asand then calculateso

Plan 2: Staged Average Summation. In the first half of the resource life cycle, revise and according to Plan 1, and remove and when calculating and in the second half. That is because when this long-term game goes to a certain stage, the estimated value and are already close to the actual value, but there is a large uncertainty in the observations, so we removed the and in the second half.

Plan 3: Staged and Weighted Average Summation. Based on plan two, we introduce the concept of neighbor weight; that is, the longer the neighbor exists, the greater reference value it can provide. It should be noted that the notation in and is the existing time of resource but not the existing time of certain neighbor. So we use to represent the existing time of neighbors. Then the formula of plan three is as follows:

5. Measure the Network Security

5.1. Calculate the Nash Equilibrium for Each Subgame

After calculating the above parameters, we can fill the payoffs matrix for each subgame (see Table 2) where

Here is a brief discussion of the game’s Nash equilibrium, also the optimal timing selection guidelines:(1)At some time , when and , attacker will not attack but the defender will protect. When and , attacker will not attack and the defender will not protect.Because and , so ; the best choice for attackers is not attacking whatever the defending choice is. But in terms of the defenders, if , that means there is some probability of discovering the vulnerability by expending a little defending cost, so the defender will defend. But if the cost is high, that is, , the defender will not defend.(2)At some time , when and , attacker will attack and the defender will protect. When and , attacker will attack and the defender will not protect.Because , that means the profit of this attack is higher than the loss caused by the discovery of the attack, so the best choice for attackers is attacking whatever the defending choice is. But in terms of the defender, under this circumstance, if , the defender will defend, but if , that means the defending cost is higher than the loss caused by being attacked, so defender will not protect.(3)At some time , when and , the defender will not protect and the attacker will attack. Because and , so and ; due to the high cost of protection, the defender will not defend whatever the attacking choice is. Under this circumstance, the attacker will choose attack.(4)At some time , when and , the defender will protect and the attacker will not attack. Because and , so and ; due to the low cost of protection, the defender will defend whatever the attacking choice is. Under this circumstance, the attacker will not choose attack.(5)At some time , when and , there is no pure Nash equilibrium, but only mixed Nash equilibrium; that is, the attacker will attack with the probability of , and the defender will defend with the probability of . We use to denote the probability of attack for the attacker and to denote the probability of defending. So the expected utility function of the attacker isDifferentiate the above-mentioned function:Let ; we getSimilarly,Let ; we get

5.2. Measure the Network Security Using the Nash Equilibrium

After calculating the Nash equilibrium for each subgame, we use this result as a reference for the safety measurement of unknown threat.

At any time , for the th node in the target, let nonnegative vector denote the probability distribution of unknown vulnerabilities, where , , and represents the probability of th resource being used by the attacker. So we can compute the probability that the node is being attacked at time :

Assume which models the attack probability distribution of all nodes in the target; represent the weight of each node in the target. We use to denote the final result of the security measurement:

From the above equation we can see that the higher the value of , the greater the threat level. In next section, we will show how to calculate this score in detail.

6. Experiment

We evaluate the effectiveness of the proposed evolutionary games by synthetic data set. All experiments are conducted on a Windows 7 system with Intel Core i7-6700 3.4 GHz CPUs and 8 G memory.

Exp1: Numbers of Subgames. We first conduct our experiment and observe the total number of subgames. We provide ten types of gain functions with vary monotonicity and codomain. We set the lifecycle of each resource as 20 (for each subgame, it equals 1), and the total number of subgames for the beginning is 12000 (different attackers may keep the same resource), distributed at a matrix. The numbers of new joins can be regarded as a statistic process obeying Gaussian distribution. We pick different and to see whether the number could get stable with different number of new joins. Figure 3 shows the total number of subgames where the average of new joins equals 2000 and 4000; meanwhile, we set different attack cost.

We can see that the number of subgames can get stable in all these six experiments. We also can see there are so many factors that affect the number of subgames during the game. For example, different attack cost values lead to different experimental results. The average number in Figure 3(a) is higher than that in Figures 3(b) and 3(c). This is because when the cost of the attack is too high, the attacker’s probability of attack will drop drastically, so the probability of being discovered is relatively lower. We can also find that when equals 0.1 (notated as the black line and blue line), the number is greater than those where equals 0.2 (the red line and pink line). This is because the greater is, the greater the number of discovered vulnerabilities is. However, compared with , the difference of does not have a major impact on the change of total number. That is reasonable because during the whole game process, the probability of attack is much less than the probability of waiting, so the impact of is much less.

Exp2: Measure the Network Security Using the Attack Probabilities. After calculating the attack probability for each vulnerability at each time, we use these results as important references to measure the network security. This experiment aims to show how to measure the network security using the attack probabilities. Due to the space limitation, it is impossible to list all the nodes and the attack probabilities of the game, so we use a lite version to explain our method. We assume that the target has seven nodes and each node has at least one vulnerability. Table 3 lists the probabilities of these nodes at the certain time period. The attack probability for each vulnerability is computed through the game Nash equilibrium, and the attack probability for each node, recorded as “tot” in the table, is computed through (15). From the table we can see that, for some vulnerability, the attack probability decreased to zero and never increase, such as the vulnerability 2 in node 1, which means this vulnerability had been fixed at certain time. And for some other vulnerability, the attack probability stayed at zero for a period and then increased such as vulnerability 2 in node 2, which means the vulnerability is newly discovered at time . We take node 1 as an example; there are total 2 vulnerabilities, and at time , we compute each probability of vulnerability being exploited, that is, 0.25 and 0.06. After that we compute the probability of node 1 being attacked, that is, 0.3. Similarly, we can get the probability of being attacked for each node at time . Finally, we measure the network security through (16) and the final score of the system at time to is shown in Table 4.

7. Conclusion

This paper focuses on measuring the network security with unknown threats. Although there are a lot of research studies on network security metrics, most of them are not ideal when faced with unknown threats. To help administrator have a better understanding about the potential zero-day attack, we analyzed the behavior of attackers and predict their attack timing. Due to the stealth and persistence feature, we modeled the zero-day attack as a long-term game. We specified and computed all the key parameters during the game and then got the Nash equilibrium for each subgame. We use these results as important references to measure the network security. The experiment showed the efficiency of our approach.

Main Parameters and Descriptions

:Attacking cost
:Defending cost
:Real passive-defending capability of the administrator
:Real initiative-defending capability of the administrator
:The estimation of administrator’s passive-defense capability at time
:The estimation of administrator’s initiative-defense capability at time
:Resource lifecycle
:The gain function of time for certain attacker
:Profit expectation from time to the end of lifecycle.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key R&D Program of China (no. 2016YFB0800702) and DongGuan Innovative Research Team Program (no. 201636000100038).