Abstract

The biggest difficulty of hidden Markov model applied to multistep attack is the determination of observations. Now the research of the determination of observations is still lacking, and it shows a certain degree of subjectivity. In this regard, we integrate the attack intentions and hidden Markov model (HMM) and support a method to forecasting multistep attack based on hidden Markov model. Firstly, we train the existing hidden Markov model(s) by the Baum-Welch algorithm of HMM. Then we recognize the alert belonging to attack scenarios with the Forward algorithm of HMM. Finally, we forecast the next possible attack sequence with the Viterbi algorithm of HMM. The results of simulation experiments show that the hidden Markov models which have been trained are better than the untrained in recognition and prediction.

1. Introduction

Currently, the network security situation is increasingly sophisticated and the multistep network attack has become the mainstream of network attack. 2012 Chinese Internet network security reports released by the National Computer Network Emergency Response Technical Team Coordination Center of China (CNCERT/CC) show that the two typical multistep attacks: warms and distributed denial of service (DDOS) [1] account for 60% of overall network attacks. Multistep attack [2] means that the attacks apply multiple attack steps to attack the security holes of the target itself and achieve the devastating blow to the target. There are three features of attack steps of multistep attack. (1) In the multistep attack, there is a casual relationship between multiple attack steps. (2) The attack steps of multistep attack have the property of time sequence [3]. (3) The attack steps of multistep attack have the characteristics of uncertainty [4].

Multistep attack is one of the main forms of network attack behaviors, recognizing and predicting multistep attack that laid the foundation of active defense, which is still one of the hot spots nowadays. Literature (application of hidden Markov models to detect multistep network attacks) proposed a method to recognize multistep attack based on hidden Markov model.

Markov model literature (improving the quality of alerts and predicting intruder’s next goal with hidden colored Petri-net) introduced the concept of attack “observation,” but both stayed in the specific attack behaviors, which have some limitations. Current research on the approaches to forecast multistep attack behaviors mainly includes four types: (1) the approach to forecasting multistep attack based on the antecedents and consequences of the attack [5]. It applies the precursor subsequent relationship of the event, to forecast the attacker wants to implement attacks in the near future. Because of the complexity and the diversity of the attack behaviors, this approach is difficult to achieve. (2) The approach to forecasting multistep attack based on hierarchical colored Petri-nets (HCPN) applies the raw alerts by Petri-nets and considers that the attack intention is inferred by raw alerts [4]. But this approach focuses on the intrusion detection of multistep attack behaviors. (3) The approach to forecasting multistep attack based on Bayes game theory could forecast the probability that the attackers choose to attack and the probability that the defenders choose to defend in the next stage rationally [6, 7]. However, in current study, only two-person game model is established, so this approach has some limitations. (4) The approach to forecasting multistep attack based on attack intention [3, 8] uses extended-directed graph to describe the logical relationship between attack behaviors and forecasts the next stage by logical relationship. The shortcoming of this approach is that it is difficult to determine the matching degree of the multistep attack. At the same time, there exists a certain degree of subjectivity in recognizing and forecasting multistep attack. In this regard, we integrate the attack intentions and hidden Markov model and propose a method to forecast multistep attack based on hidden Markov model. Firstly, we train the existing hidden Markov model(s) by the Baum-Welch algorithm of HMM. Then we recognize the alert belonging to attack scenarios with the Forward algorithm of HMM. Finally, we forecast the next possible attack sequence with the Viterbi algorithm of HMM. Simulation experiments results show that the hidden Markov models which have been trained are better than the untrained in recognition and prediction.

2. Hidden Markov Model

Hidden Markov model was first proposed by Baum and Petrie in 1966. It is a statistical model, which is used to describe a Markov process which contains a hidden parameter [9]. The research object of this model is a data sequence; each value of this data sequence is called an observation. Hidden Markov model assumes that there still exists another sequence which hides behind this data sequence; the other sequence consists of a series of states. Each observation occurs in a state, the state cannot be observed directly, and the features of the state can only be inferred from the observations.

A complete hidden Markov model (HMM) is usually represented by a triple , which includes the following five elements:(1)a finite state, which is represented by the set , where and, at time , the state is denoted by ;(2)the set of observations, which is represented by the set , where ;(3)the state transition matrix, which is represented by the matrix , where and ;(4)the probability distribution of matrix , which is represented by the matrix ,  where and , ;(5)the set of initial state probability distribution of HMM, which is represented by the set , where and .

The model of recognizing and forecasting multistep attack based on hidden Markov model is shown in Figure 1.

There are three problems which can be solved by hidden Markov model well.

(1) Probability Calculation Problems. Calculate the probability under a given hidden Markov model and the observation sequence .

(2) Learning Problems. Estimate the parameters of when the observation sequence is known, to maximize the probability .

(3) Prediction Problems. Calculate the state sequence under the maximum probability, when the hidden Markov model and observation sequence are given.

Correspondence between the problems and algorithms of hidden Markov model are shown in Figure 2.

Hidden Markov model is usually used to deal with the problems related to the time sequence and it has been widely used in speech recognition, signal processing, bioinformation, and other fields. Based on the characteristics of the attack steps of hidden Markov model and the problems that hidden Markov model can be solved, we apply the hidden Markov model to the field of recognizing and forecasting multistep attack. Firstly, the improved Baum-Welch algorithm is used to train the hidden Markov model , and we get a new hidden Markov model . Then we recognize the alert belonging to attack scenarios with the Forward algorithm of hidden Markov model. Finally, we forecast the next possible attack sequence with the Viterbi algorithm of hidden Markov model.

3. The Approach to Recognizing and Forecasting Multistep Attack

The steps of the approach to recognizing and forecasting multistep attack are as follows.

Step 1. Obtain the initial state matrix (old), state transition matrix (old), and observation matrix (old) of HMM ().

Step 2. Use the improved Baum-Welch algorithm to train the initial state matrix (old) and observation matrix (old), and we get an initial state matrix (new), observation matrix (new), and a new HMM ().

Step 3. Recognize the alert belonging to attack scenarios with the Forward algorithm.

Step 4. Forecast the next possible attack sequence with the Viterbi algorithm.

The flow chart is shown in Figure 3.

3.1. The Introduction of Baum-Welch Algorithm

If we want to apply the hidden Markov model to the multistep attack, the biggest problem is to determine the observations of HMM. A better parameter can improve the efficiency of calculation. Meanwhile, if the selection of observation is improper, this may result in a longer training time and even not complete the training. In this regard, we apply the Baum-Welch algorithm to train the given hidden Markov model. From the result of literature (accurate Baum-Welch algorithm free from overflow), we can learn that the most reliable algorithm to train the HMM is Baum-Welch algorithm. Baum-Welch algorithm can train the given hidden Markov model () by an observation sequence and generate a new hidden Markov model () for detection.

The steps of Baum-Welch algorithm are as in Algorithm 1.

Input: alert sequence.
     ;
Output: the parameters of hidden Markov model.
    .
Step  1. Initialization.
    for , select , we can obtain the initial model .
Step  2. Iterative calculation.
for ,
;
;
.
where ;
     .
Step  3. Termination. We can obtain the parameters of hidden Markov model.
.

3.2. Forward Algorithm

The pseudocode of Forward algorithm is as in Algorithm 2.

Forward_Algorithm :
Input: (1) alert sequence ;
    (2) hidden Markov model (HMM) .
Output: the probability generated by alert sequence of hidden Markov model.
Begin:
  (1) .
   // is the number of attack intentions.
   calculate the probability of generated by :
  (2) calculate the probability of alert sequence and .
   (a) at time , calculate the probability of alert sequence and : .
   (b) at time , calculate the probability of intent sequence generated by hidden Markov
   model (HMM): and
    : where .
  (3) calculate the probability of the intent sequence generated by hidden Markov
   model (HMM): .
    .
  (4) Return .
End;

Recognizing multistep attack is mainly based on the alert sequence. First, we calculate the probability of alert sequence generated by the given HMM(s). Then we decide that the attack which has the maximum is likely to be the ongoing attack. The structure of recognizing multistep attack with Forward algorithm is shown in Figure 4.

3.3. Viterbi Algorithm

The pseudocode of Viterbi algorithm is as in Algorithm 3.

   Viterbi_Algorithm :
   Input: alert sequence ;
   Output: (1) intent sequence: .
      (2) the completed intent sequence and the next likely intent.
Begin:
  for to HMM_
  // HMM_ is the number of hidden Markov model(s)
  {
    Prob = Forward_Algorithm(hmm_ , );
    // calculate the probability of alert sequence generated by each hidden Markov
    // model(s)
  }
  Most_likely_multi-step_attack_intention = maximum(Prob);
   = Viterbi_Algorithm(hmm_ , );
  // is the completed intent sequence
  // hmm_ is the maximum(Prob) of hmm_
    // the next likely intent
  // is the intent sequence of hmm_
End;

Predicting the behavior of multistep attack is mainly to determine the intentions that the attackers have been completed and forecast the next possible attack intentions. The structure of forecasting multistep attack with Viterbi algorithm is shown in Figure 5.

4. The Simulation Experiment and Analysis

4.1. Baum-Welch Algorithm: Train the Given HMM(s)

Based on the literature (approach to forecast multistep attack based on fuzzy hidden Markov model), we can obtain the initial state matrix, state transition matrix, and observation of DDoS_HMM, as is shown from Tables 1, 2, and 3.

The data set which is used in the simulation experiment is an attack scenario testing data set LLDOS1.0 (inside) provided by DARPA (Defense Advanced Research Projects Agency) in 2000. We extract two kinds of multistep attack from it; they are DDoS multistep attack and FTP Bounce multistep attack. While the calculation of the state transition matrix is completely the statistical calculations on data, we only train the initial state matrix and observation matrix of HMM. We can see that there are a large number of zeros in observation matrix clearly and the observation matrix is the sparse matrix. So we train the matrix(s) by block. We suppose that the number of observation sequences is and the length of is 32, where multiplied by 32 equals the number of training data. And there is no corresponding sequence of state. In this regard, we can obtain the initial state matrix (new) and the observation matrix (new) of the DDoS_HMM′ (), as is shown in Tables 4 and 5.

4.2. Forward Algorithm: Recognize the Alert Belonging to Attack Scenarios

The attack intentions and alerts of DDoS_HMM and FTP Bounce_HMM are shown in Tables 6 and 7, respectively.

When the alerts “” and “” were received, according to the Forward algorithm of hidden Markov model, we will obtain the probability based on DDoS_HMM′ and FTP Bounce_HMM′, respectively:p() = 0.2989,p() = 0.0036.

We can see from the above results, p() > p(). That is to say, the ongoing multistep attack behavior is likely to be DDoS_HMM.

4.3. Viterbi Algorithm: Forecast the Next Possible Attack Sequence

When the alert sequence was received by the console, we can obtain the completed intent sequence . That is to say, now completed intentions are the previous four attack intentions; the next intention will be .

4.4. Comparison of Results

We compare the results between the untrained HMM(s) and the trained HMM(s) by Baum-Welch algorithm; the comparison of results are shown in Table 8.

5. Conclusion

The biggest difficulty of hidden Markov model applied in multistep attack is the determination of observations. Now the research of the determination of observations is still lacking, and it shows a certain degree of subjectivity. In this regard, we train the existing hidden Markov model(s) by the Baum-Welch algorithm of HMM based on several groups of observation sequence. And we can obtain a new hidden Markov model which is more objectively. Simulation experiments results show that the hidden Markov models which have been trained are better than the untrained in recognition and prediction.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank the reviewers for their detailed reviews and constructive comments, which have helped in improving the quality of this paper. This work was supported by the National Natural Science Foundation of China no. 60573036, Hebei Science Fund under Grant no. F2013205193, and Hebei Science Supported Planning Projects no. 12213514D.