Abstract
A network security situation assessment system based on the extended hidden Markov model is designed in this paper. Firstly, the standard hidden Markov model is expanded from fivetuple to seventuple, and two parameters of network defense efficiency and risk loss vector are added so that the model can describe network security situation more completely. Then, an initial algorithm of state transition matrix was defined, observation vectors were extracted from the fusion of various system security detection data, the network state transition matrix was created and modified by the observation vectors, and a solution procedure of the hidden state probability distribution sequence based on extended hidden Markov model was derived. Finally, a method of calculating risk loss vector according to the international definition was designed and the current network risk value was calculated by the hidden state probability distribution; then the global security situation was assessed. The experiment showed that the model satisfied practical applications and the assessment result is accurate and effective.
1. Introduction
With the widespread use of Internet technology, network security has gradually attracted public attention. Attacks on the network are increasingly complex although the defense measures based on the intrusion detection [1], firewall [2], virus prevention, and others have been formed, but it is also more and more difficult to get effective information and take effective emergency measures because the alarm information is too large. For example, IDS alarm data is enormous, false alarm and omission often happened, it is difficult to grasp the network security situation, and these traditional security means focus on the solution of unilateral security problems; how to grasp the current network situation accurately has become the hot topic in the field of Internet [3]. Network security situation assessment technology considers security elements comprehensively, reflects network states constantly, accurately predicts potential threats, and helps network administrators take effective measures [4].
Situation awareness technology was first used in the military field [5], and now it is widely used in aviation, transportation, network, medical emergency, and many other areas [6]. In 1988, Endsley firstly proposed the concept of situation awareness [7]; then, Bass proposed the concept of network situation awareness, which includes element extraction, situation understanding and situation assessment, and other contents and gave the concept of network situation awareness model [8]; Lakkaraju et al. got data mining technology as a network situation awareness of the key technologies [9]; Elshoush fused the elements which were extracted by data mining technology and used the fusion into intrusion detection, but it was difficult to avoid false alarm because of its huge number of data [10, 11].
The research of network security situation assessment started late at home. Wei et al. proposed a network security situation assessment model based on information fusion, which used the improved DS evidence theory to fuse multisource information [12], but this method was prone to have evidence conflicted. Chen et al. proposed a quantitative hierarchical threat evaluation model for network security, which started threat calculation from the bottom [13], but the method was too subjective, and the accuracy was not enough. Xi proposed a situation assessment method based on the attack graph method, which used the network topology and attack targets to construct the attack path, but this method was prone to the state space combination explosion problem [14]. Zhu et al. proposed the evaluation method based on honeynets, which used honeynets to collect intrusion behavior and draw the curve of the situation, but this method is only aimed at the intrusion behavior, and its data source was single [15].
Through the analysis of the network security situation assessment model at home and abroad, it is found that there are still many problems in the study of network security situation assessment: the state transition matrix is generally obtained by the experience of administrators, with strong subjectivity, and it is influenced by the administrator’s own ability; secondly, due to the lack of two parameters of network defense capability and risk loss, it is easy to lead to the calculation deviation of the hidden state vector sequence in the evaluation model when the observation vector sequence is generated. In this paper, we propose an improved Hidden Markov Model, which extends the fivetuple to seventuple in the traditional hidden Markov model and obtains a new model called HMMPlus, or HMMP for short. The system fuses a variety of security detection data, extracts the main attack logs from the network security equipment to form the observation vector sequence, then corrects state transition matrix by the realtime state, forms the hidden state probability distribution sequence by using the improved Viterbi algorithm, finally, combines the network topology and the network asset information with the hidden state probability distribution to calculate the current network risk value, and then assesses the global security situation of the current network, making the analysis and processing ability of network security products improved to a great extent in multiple index.
2. Network Security Situation Assessment Technology
Network security situation assessment model [16] refers to the factors which affect the network security situation and the relationship between them. Threat sources include hostile network or physical attacks; negligent or intentional manmade errors; and natural or manmade disasters. Once the threat event occurs, it will lead to unauthorized disclosure of information; modification of information; damage of information; or loss of confidentiality, integrity, and availability of information systems. From the above, we can see that risk is a function between the probability of occurrence of a threat and the harm it caused.
Information system assessment aims to understand the current and future risks of the system; assess the potential threats and the extent of the harm caused by these risks; and provide the basis for security decisionmaking, information system construction, and safe operation. Information system assessment process [17] is mainly divided into 4 steps: the first step is to prepare the assessment; the second to execute assessment; the third to feedback the evaluate result; and the fourth to maintain the assessment, as shown in Figure 1.
3. Network Security Situation Assessment Model Based on HMMP
Before introducing the HMMP model, let us first introduce the hidden Markov model (HMM). For HMM, we assume that S is a set of all possible hidden states and is a set of all possible observed states, which satisfy S = {s_{1}, s_{2}, …, s_{N}}, , where N is the number of possible hidden states and M is all possible observed states.
For a sequence with length T, I corresponds to a sequence of states and O to a sequence of observations, which satisfies I = {i_{1}, i_{2}, …, i_{T}}, O = {o_{1}, o_{2}, …, o_{T}}, where , .
The HMM model has two important assumptions:(1)The hypothesis of homogeneous Markov chain: the hidden state at any time only depends on its previous hidden state. The advantage of this assumption is that the model is simple and easy to solve. If the hidden state at time t is i_{t} = s_{i} and the hidden state at time t + 1 is i_{t+1} = s_{j}, then the state transition probability p_{ij} from time t to t + 1 can be expressed as So, a_{ij} can form the state transition matrix P of Markov chain:(2)The hypothesis of observational independence: the observed state at any time depends only on the hidden state at the current time. If the hidden state at time t is i_{t} = s_{j} and the corresponding observed state is , then the probability generated by the observed state under the hidden state s_{j} at time t satisfies So, can constitute the probability matrix Q generated by the observed state: In addition, we need a set of initial hidden state probability distribution II at time t = 1: where . A HMM model can be determined by initial hidden state probability distribution II, state transition probability matrix P, and observed state probability matrix Q. II and P determine the state sequence and Q determines the observation sequence. Therefore, the HMM model can be represented by a fivetuple as follows: .
3.1. HMMP Model
It is currently a hot topic to extend the traditional HMM model to carry out research in related fields [18, 19]. The standard HMM model consists of fivetuple . In this paper, we expand it to seventuple , called HMMPlus model, or HMMP for short, in which two parameters of network defense efficiency and risk loss vector are added to make it possible to describe the network security situation better.(1)S, hidden state set space, , indicates all the hidden states that the system may be in; there are N hidden states. In this paper, the network hidden state is divided into Safe State G, Probe State P, Attack State A, and Compromise State C according to practical demand. Here, we can set s_{1} = G, s_{2} = P, s_{3} = A, and s_{4} = C.(i)Safe State G (good) indicates that the host or network is not attacked.(ii)Probe State P (probed) indicates that the host or network is being probed or scanned.(iii)Attack State A (attacked) host or network is being attacked by one or more objects.(iv)Compromise State C (compromised) indicates that the network or host has been compromised.(2), observation vector set space, , represents all possible observation vectors; there are M state observation vector values. According to the practical demand, the network security equipment logs are divided into the following categories:(i)Compromise log: this type of log indicates that a successful attacker gains the administrator privileges.(ii)Scan log: this type indicates that the system has been scanned.(iii)Attack log: this type of log indicates that the system has been attacked.(iv)No log: no network security equipment logs on the network.(v)Suspicious log: the logs are not classified correctly.(3)P, hidden state transition matrix, denotes the transition probability between the hidden states of system P = {p_{ij}}, , , i_{t} means the network is in a hidden state s_{i} at time t, and i_{t+1} means the network is in a hidden state s_{j} at time t + 1.(4)Q, observation vector probability distribution matrix, , indicates the probability of observing the network security equipment log in the hidden state .(5), initial hidden state probability distribution matrix, , where ; 1 ≤ i ≤ N means the probability that the network is in the state at the initial moment.(6)F, current network defense efficiency, indicates the efficiency of defense efficiency of the current network state, . The higher the F, the better the network defense capability.(7)C, risk loss vector, , indicates the risk value that the system faces when the network is in the state .
3.2. Primary Generation of State Transition Matrix
This section mainly introduces the initial algorithm of hidden state transition matrix; this algorithm is different from the traditional algorithm, which is based on expertise. It is through the game theory to assess the transformational relation between hidden states and ultimately determine an initial state transition matrix.
The hidden state transition model is shown in Figure 2.
These circles represent the system state; according to the definition of this model, there are four hidden states: Safe State G, Probe State P, Attack State A, and Compromise State C. E represents the security events that may occur in the network; D indicates the defense measures in the network. Assuming that the current network has security measures D_{j} and the hidden state is s_{i}, if there is a security event E_{j} in the network at this moment, the network will enter the hidden state s_{j} at the next time. The process can be expressed as
When the current state is s_{i}, the hidden state transition can be described by the following matrix, as shown in Table 1.
E_{1} to E_{m} indicate security events; D_{1} to D_{n} indicate network defense measures; and s_{j} indicates that when the network state is s_{i}, if the network defense measure is D_{1}, the network state will be transferred to s_{j} when E_{1} security event occurs. Similarly, s_{o} indicates that when the network state is s_{i}, if the network defense measure is D_{n}, the network state will be transferred to s_{o} when E_{1} security event occurs.
The distribution of Safe State transitions can be seen intuitively from the matrix . The probability of state from to is
The Safe State transition vector can be obtained when the current state is .
Establishing state transition matrix for all states which are of the current network severally, then we can get the initial transition matrix:
3.3. Network State Transfer Matrix Modification Based on Defense Efficiency
Network security equipment defense efficiency refers to the network defense equipment and network basal equipment due to its high load or being attacked or other reasons and the availability is destroyed and cannot provide sufficient defense efficiency or external service. In order to assess the network security equipment defense efficiency, select the following factors: number of connections, bandwidth utilization, CPU utilization, and memory utilization.
The quantification of the above situation assessment factors is mainly carried out according to the following steps:(1)Establish hierarchy structure: It is done by analyzing the relationship among network efficiency and CPU utilization, memory utilization, network bandwidth utilization, and the number of connections, to establish the following two layers of structure, as shown in Figure 3.
(2)Construct judgment matrix and assign value: Assuming that the network bandwidth utilization is more important than the number of connections, the score is 3. In contrast, the number of connections for network bandwidth utilization is 0.3333333. Memory utilization and CPU utilization are equally important to the number of connections, which can be scored as 2. So, we can build a judgment matrix as shown in Table 2, in which the data are for illustrative purposes only.
(3)Weight calculation and consistency test: Using the SPSSAU online analysis tool, the analysis results are shown in Table 3. Here, we can also obtain the weight vector by using the arithmetic mean method according to the following formula: where a_{ij} is the comparison score of the judgment matrix.
Now, the weights have been calculated and the judgment matrix satisfies the consistency test. Then, the overall defense efficiency F of the current network is calculated according to the following formula: where f_{t} represents the normalized standard value of indicator i at the current time t.(4)In the current period, the average state of each index is used to measure the defense efficiency of the current period.(5)The probability of successful attack is different due to the different defense efficiency, so the state transition matrix is inconsistent under different conditions. Now, the modified vector is introduced, , and j indicates the hidden state of the system:(6)The probability transfer matrix is modified according to the modified vector , , is a matrix of , and
3.4. Solution of Network Hidden State Sequence Based on Improved Viterbi Algorithm
Viterbi algorithm is a dynamic programming algorithm, usually used to find the hidden state sequence which is most likely to produce observed event sequence from the hidden Markov model [20]. In this paper, the probability distributions of hidden states of the system every time in HMMP are obtained according to the idea of Viterbi algorithm. The solution procedure is as follows.
For the network security situation assessment model , suppose the observation vector sequence is . Take the first observation as y_{1}, and the calculation method of in the initial state is as follows:where represents the probability of the system in the state at time t = 1 (i.e., the initial time). represents the probability of the system in the state when can be observed from the sequence of observed vectors and the model’s parameter is λ.
Conditional probability formula is derived as follows:
When the parameter model is λ, the probability of observing is equal to the sum of the product of the probability that can be observed in all states and the probability that the system is in the same state when the parameter model is λ. The derivation is as follows:
By formulas (14) and (15) we can get
Substituting the specific parameter model λ, we can get
And then using formula (17) to calculate the probability of the system in each state at the initial time, the system probability matrix at the initial time .
The exhaustive operand is too large and the recursion method is used in order to simplify the computation of the followup state probability vector, assuming at time t, the system probability matrix . As known, represents the probability of the system in the state when the parameter model is λ and the observation vector sequence is . To solve the system state probability matrix at time t + 1,where indicates the probability of the system in the state at time t + 1. indicates the probability of the system in the state at time t + 1 when the parameter model is λ and the observation is . In the same way,where indicates the probability that can be observed when the parameter model is λ and the system is in the state . This probability is which is from the observation vector probability distribution matrix of model λ. indicates the probability of the system in the state at time t + 1.
Now we only need the probability of the system in the state at time t + 1. From the definition of the state transition matrix, we can see that the probability of the system in the state at time t + 1 is equal to the sum of the product of state probability distribution matrix when the system is at time t and the probability that the system would transfer to state :where indicates the probability that the system is in state when the parameter model is λ and the observation vector sequence is , which is the previously assumed condition . indicates the probability that the system would transfer to state the next time when the parameter model is λ and the current moment is . We can get
In order to express convenience, reckon , and then
Substituting into the formula, we can get
Then, the system state probability distribution at time t + 1 is
Finally, arranging above all, we can get
And .
3.5. Algorithm Pseudocode Description
From the previous section, we get the steps to solve the system hidden state probability distribution in HMMP as follows:(1)Judge whether the current observation vector is read; if it is, then jump to step 8; otherwise, enter step 2(2)Obtain the current observation vector and judge whether the current time is the initial time; if it is, go to step 3; otherwise enter step 5(3)Calculate the conditional probabilities separately of each hidden state when is observed by the formula .(4)Convert the conditional probability of each implied state into matrix in order and store the final hidden state probability distribution sequence X; then return to step 1(5)Calculate the system hidden state probability distribution without considering the observation vector by the formula .(6)Calculate the conditional probability of each hidden state when is observed by the formula .(7)Convert the conditional probability of each implied state into matrix in order, and store the final hidden state probability distribution sequence X; then return to step 1(8)Output the final hidden state probability distribution sequence X; then end the program.
The pseudocode of Algorithm 1 is as follows:

3.6. Calculation of Risk Loss Vector
According to the national standard definition, the risk is a function between the possibility of the system under attack and the degree of loss when the system is attacked. In the last section, the hidden probability distribution vector sequence is calculated, which is the probability of the system being attacked. Then, the following sections mainly calculate how much loss the system will be in the state, which is called risk loss vector.
3.6.1. Classification of Severity Levels
Risk vector is used to measure the degree of loss of the system in some state.
First of all, there is asset evaluation.
The three security attributes of asset evaluation are classified as confidentiality, integrity, and availability. According to the national standard GB/T20984, the assets are classified into five levels, and the more important the assets are, the higher the severity level will be.
Then, there is severity levels classification.
Now, most of the network equipment uses syslog type of log; syslog divides the severity into eight levels, as shown in Table 4.
3.6.2. The Current Network State Assessment
Calculate the risk that the current system is facing according to the hidden state distribution probability sequence X and the loss when the system is in each state. The loss of the current network state can be calculated by the following formula:and indicates the probability that the system is in the state at time t. indicates the risk that the system will face in the state .
4. Case Study
In order to verify the rationality of the evaluation method proposed in this paper, the quantitative evaluation of network security situation was carried out by using the IDS data of a certain department in real environment in November 3, 2015. Network topology is shown in Figure 4. The experimental network is connected to the Internet through a router, in which a firewall, intrusion prevention system, and other security defense systems are deployed, and the local area network includes the business office area and the server area. The experimental data include security event alarm record from the firewall, intrusion detection system, and server host and the record of efficiency index operation.
First, import data into the database such as IPS; Table 5 is part of the IPS original data.
Organize the data into a standard weblog format and extract the required fields, time, attack name, source IP, source port, destination port, and attack level, as shown in Table 6.
The time interval of the test is 5 minutes; the calculation process is the following based on the alarm importance calculation method.
There are 43 alarm logs in the time from 0:00, 2015/11/3, to 0:05, 2015/11/3; 42 of them are SMTP email attachment vulnerability, and the attack level is prompt and happened for the first time. One of them is Web application: SQL injection attack, the attack level is prompt and happened for the first time. So the importance of SMTP email attachment vulnerability is 42 1 + 42 25 + 1 25 = 1117; the importance of SQL injection attack is 1 1 + 1 25 + 1 25 = 51. So, the main alarm in this period is SMTP email attachment vulnerability.
There are 56 alarm logs in the time from 0:05, 2015/11/3, to 0:10, 2015/11/3; 51 of them are SMTP email attachment vulnerability, and the attack level is prompt and happened in the last period. Five of them is Web application: SQL injection attack, the attack level is prompt and happened for the first time. So, the importance of SMTP email attachment vulnerability in this period is 51 1 + 51 1 + 1 1 = 103; the importance of SQL injection attack is 5 1 + 5 25 + 1 25 = 155. So, the main alarm in this period is Web application: SQL injection attack.
Obtain the main alarm vectors in each time period of 288 time periods in turn to form the alarm vector sequence, as shown in Table 7.
The next step is to model the Markov model. Firstly, the initial state transition matrix is constructed according to the security event and the defense strategy:
Extract the main efficiency indexes from the firewall, IPS, and the host in this period, including CPU utilization, memory utilization, bandwidth utilization, and connection utilization, as shown in Figures 5–8, respectively.
Analyze the weight of each parameter by the Analytic Hierarchy Process, the process is as follows:
CPU utilization is as important as memory utilization, as the main task of the network service is outside, so the network bandwidth utilization and the number of connections are more important than CPU utilization and memory utilization. The department prohibits the operation of large flow, but allowing more people to access the same time, so the number of connections is slightly more important than the bandwidth utilization. Finally, we can get the relation matrix, as shown in Table 8.
Input process, and get CR = 0.0571 < 0.1 which accord with the consistency standard. The weight is shown in Table 9.
Take each efficiency into the final defense efficiency curve, as shown in Figure 9.
The hidden state probability distribution curve, as shown in Figure 10.
And then, obtain the network risk vector in a similar way.
According to the confidentiality, integrity and availability of the national standard, the asset importance attribute is shown in Table 10.
Combined with the losses caused by security events, the risk loss vectors are obtained as follows:
Finally, we can get the figure of network situation, as shown in Figure 11.
From Figure 11, we can see that in the 18 min to 20 min time period, the number of host connections is almost saturated, the service cannot provide services, and the host is actually in the capture state. And at this point, the network risk value is also in the highest state, in line with the actual situation. Through the above security events information and network situation diagram, network administrators can clearly understand the global network security event occurring at that time and control the network situation realtimely.
5. Conclusions
The network situation assessment technology based on HMMP is studied mainly in this paper, to solve the problem that network administrators can control the global network state realtimely in the face of multisource logs. In order to achieve this goal, state transition matrix generation method in HMMP is designed in this paper, the system can modify the state transition matrix automatically in realtime according to the network state through this method and obtain the hidden state probability distribution sequence of the current system through the improved Viterbi algorithm. Finally, the final network risk value is obtained through the method of calculating the risk loss vector. The experiment shows that the security assessment based on HMMP in this paper can describe the current network state precisely and comprehensively.
Data Availability
The data set can be obtained free of charge from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This present research work was supported by the National Natural Science Foundation of China (nos. 61202458 and 61403109), the Natural Science Foundation of Heilongjiang Province of China (no. F2017021), and the Harbin Science and Technology Innovation Research Funds (no. 2016RAQXJ036).