A Novel Algorithm for Intrusion Detection Based on RASL Model Checking
The interval temporal logic (ITL) model checking (MC) technique enhances the power of intrusion detection systems (IDSs) to detect concurrent attacks due to the strong expressive power of ITL. However, an ITL formula suffers from difficulty in the description of the time constraints between different actions in the same attack. To address this problem, we formalize a novel real-time interval temporal logic—real-time attack signature logic (RASL). Based on such a new logic, we put forward a RASL model checking algorithm. Furthermore, we use RASL formulas to describe attack signatures and employ discrete timed automata to create an audit log. As a result, RASL model checking algorithm can be used to automatically verify whether the automata satisfy the formulas, that is, whether the audit log coincides with the attack signatures. The simulation experiments show that the new approach effectively enhances the detection power of the MC-based intrusion detection methods for a number of telnet attacks, p-trace attacks, and the other sixteen types of attacks. And these experiments indicate that the new algorithm can find several types of real-time attacks, whereas the existing MC-based intrusion detection approaches cannot do that.
Intrusion detection (ID) is an important network security technique. ID can be divided into anomaly intrusion detection and misuse intrusion detection in terms of the different principles of ID. The former can find unknown types of attacks. However, false positives rate of anomaly intrusion detection is often very high. In contrast, a misuse intrusion detection system has a comparatively low false positives rate with regard to known types of attacks. This is due to the principle of misuse intrusion detection: IDS developers predefine their known types of attacks, use appropriate language to describe these types, and establish libraries of attack patterns (called misuse signatures). The system will monitor the audit log. Once a data stream in the log is found to match with certain attack type, it means that an attack is found.
However, such a class of detection methods based on pattern matching (PM) suffers from their inherent problems. First, affected by intruders’ subjective wishes or other random factors, the logical relationship among its atomic actions associated with attacks of the same pattern launched by different intruders may present different features [1, 2], where an atomic action means a minimum operation step in an attack. It is hard to depict precisely so vastly different attacks with a relatively small-scale attack pattern library. Second, a large-scale coordinated attack requires an intrusion detection algorithm to handle a large volume of network data in a short period of time. To address these issues, a series of intrusion detection methods based on model checking have been developed.
A relatively comprehensive algorithm has been presented, and it is based on linear temporal logic (LTL) model checking . Its basic principle can be formulated as follows: use an LTL formula to describe an attack pattern as well as an automaton to record what happened in the audit log, and use a model checking algorithm to check whether the automaton satisfies the formula (i.e., whether the records in log match the attack pattern). Since current model checking algorithms have been able to check up to 10120 states, they are particularly suitable for the large-scale attack detection , and the operators in LTL formulas can flexibly describe various logical relationships between atomic attack actions.
Compared with the PM-based approaches, the MC-based ones can effectively portray the ever-changing attack patterns [1, 3]. Furthermore, the MC-based approaches have an important advantage for intrusion detection over the PM-based ones. Pattern matching is usually applicable to detect inconsistencies between data while automata, temporal logic formulas, and model checking techniques are applicable to detect inconsistencies of behaviors. Thus, the MC-based methods can do something more than the PM-based ones since intrusion attacks involve complex behaviors besides the comparatively simple data.
However, the algorithm in  can realize the automatic detection for neither concurrent attacks nor real-time (i.e., time constraint relation) attacks because LTL formulas cannot be used to describe multiprocess activities or time constraint relationships between attack actions or attack action sequences. As the first attempt to address these issues, a method based on ITL model checking was presented in , and it can describe and detect concurrent attacks, since ITL has more power than LTL. However, ITL-model-checking-based methods still cannot describe and detect real-time attacks. For example, there are a large number of attacks with the following characteristic in a real network intrusion: No more than n seconds after action (sequence/process) occurs, action (sequence/process) occurs. Here, the condition “no more than” can be replaced by more than, less than, no less than, or equal to. The existing MC-based algorithms cannot find these attacks.
Therefore, motivated by addressing both concurrent attacks and real-time attacks simultaneously, we, in this paper, present a new interval temporal logic to describe conveniently the real-time attack signatures and also put forward a new MC-based approach to automatically detect the various changing modes of real-time attacks.
We conducted some simulation experiments and a benchmark test (see Section 7). The detection of several groups of attacks, such as telnet attacks and p-trace attacks, is simulated on MATLAB. The experiment results verify that the new algorithm finds more attacks than the existing MC-based algorithms; the new algorithm finds real-time attacks. This is the main contribution of this paper.
The remainder of this paper is organized as follows. Section 2 illustrates some related works and compares them with the new approach. Section 3 defines a new logic, RASL, and gives its formal syntax and semantics. Section 4 uses RASL formulas to establish some models for attack patterns. Several examples of models are given in Section 5. Section 6 formalizes a RASL model checking algorithm based on a new data structure called timed normal form graph (TNFG), and a misuse intrusion detection algorithm is presented. Section 7 presents several groups of experiments and compares the new algorithm with the existing ones with regard to the description capabilities and detection capabilities for intrusion attacks. Section 8 draws the conclusions of this paper.
2. Related Works
2.1. Detect Various Attack Types Using Model Checking Linear Temporal Logics
A tool called ORCHIDS was developed , which fulfilled the LTL-model-checking-based method for intrusion detection in reality . In one experiment, ORCHIDS found some p-trace attacks  which usually exploit the flaws in process calls to inject malicious code. It is difficult for traditional intrusion detection systems to find this type of attacks because they only match individual events . The ORCHIDS was improved in . In a real environment, it successfully detected a series of wireless network attacks , including deauthentication flooding, rogue access points, and Chop-Chop. This is the first IDS to successfully detect Chop-Chop attacks . Furthermore, to avoid repeated verifications needed by the algorithm in , an improved algorithm was put forward in , which is able to compute the number of guesses in password attacks.
Compared with the methods mentioned above, the new algorithm can be used to detect complex concurrent attacks and real-time attacks (See Section 7).
2.2. Detect Various Attack Types Using Model Checking Interval Temporal Logics
ITL was put forward in . With its successful and broader adoption and adaptation [8–11], ITL is becoming a class of logics, including some non-real-time interval logics  and some real-time interval logics [12–14]. Figure 1(a) illustrates the relationships between some temporal logics.
There are some studies that use interval temporal logics to describe attack patterns so that more intrusion behaviors can be expressed [15–17]. However, these papers do not mention how to detect these attacks automatically. The method presented in  can do it automatically, but it can only find concurrent attacks rather than real-time attacks. In contrast, as a real-time interval logic, RASL has more expressive power (see Figure 1(b)), which can be used to describe the time relationships among attack activities, and our model checking algorithm can find real-time intrusion attacks in a fully automatic manner (See Section 7).
Definitions 1 and 2 give the formal description of the syntax of RASL, whereas the other definitions present its semantics. Compared with ITL [11, 18, 19], the additional operator denoted as “” in RASL is appended for the description of time constraints between intervals.
Definition 1. RASL formulas have the following syntax given in the Backus-Naur form: terms ,constraint formulas ,interval formulas ,timed formulas
Definition 2. The derived formulas are defined as follows:
, , ,
Definition 3. A state is a tuple , where and denotes the absolute time of the current state.
Definition 4. A timed sequence of states is defined and also denoted as , where is a state.
Definition 5. An interpretation is a quadruple , where is a timed sequence of states over , is the current state. We use the notation for the number of states in interval and for the time distance between the endpoints in interval, where is the absolute time of state .
Definition 6. Let an interval, be integers, and . We use notation to denote a projection from to , where is obtained by deleting the duplicate numbers from .
Definition 7. Let and be the true value of in state . The satisfaction relation is inductively defined as follows:(1), (2),(3) if and only if , (4) if and only if ,(5) if and only if , (6) if and only if ,(7) if and only if and ,(8) if and only if = true, (9) if and only if and , (10) if and only if or , (11) if and only if , such that and ,(12) if and only if , (13) if and only if ,(14) if and only if (i) there exist finite many , such that , and for every (ii) or ,(15), if and only if there exist integers and , where , such that for in the two cases mentioned below, we have —(i) and , (ii) and ,(16) if and only if there exists , such that ,(17) if and only if and , (18) if and only if or ,(19) if and only if , such that and ,
4. Construct Signatures with RASL Formulas
We can use RASL formulas to construct signatures, that is, specifications of attack patterns. Compared with linear temporal logic, RASL has been additionally equipped with interval semantics. So, a phase, that is, a sequence of atomic actions, in an attack can be described with an interval in a RASL formula, while various steps in the phase can be described with various points in the interval . Temporal relationship between steps in an attack can be described with temporal operators. Logical relationship between various phases can be described with operator “;” . And a concurrent attack can be described with a formula with the operator “”. Compared with ITL, RASL can express more. Particularly, repeated attacks can be described with operator “” or “”, and a time constraint between phases or steps in an attack can be described with operator “”.
Table 1 presents how to construct formal models for intrusion attacks with RASL formulas. And Figure 2 illustrates sequential relationships, concurrent relationships, and time relationships between behaviors in an attack.
|(a) Sequential relation|
|(b) Concurrent relation|
(c) Time relation
Definition 8 (See ). A record in a log library is modeled by a finite state automaton .
Theorem 9. A record of a log can be modeled by a timed automaton .
Proof. According to Definition 8, we know that a record of a log can be modeled by a finite state automaton . For every transition of , we add time constraint “true”. For every state of , we extend to , where denotes absolute time. So, finite state automaton is turned to timed automaton . The theorem holds.
5. A Case Study
As a case study, we discuss several examples to show the expressive capability of the above proposed models.
Example 10. Password cracking inconsecutive attack: failure. The RASL formula is
where connect means that an intruder is trying to connect. The intruder could launch another concurrent process before the end of current connection process. Thus, the subinterval that describes current execution of the concurrent connection process is over, and it can be described with operators before . The sub-interval that describes the result is over while this connection process fails, and it can be described with the operator after . The intruder repeatedly tries connection, and it can be described with “”. Inconsecutive phenomenon between connections can be described with “”.
Example 11. Password cracking inconsecutive attack: success after connection failed times.
At first, one time failure in connection can be described as . And, then, a successful trial can be described as .
The formula that describes times failures in connections can be defined as .
The formula that describes the attack can be defined as
As shown in Figure 3, the definition of is illustrated, where , that is, denotes three times failures in connections. As shown in (a), (b), and (c) of Figure 3, there are three cases on the length of interval in RASL formula. In each of the failures in connections, there exists a one-to-one map between attack actions and their results. That is to say, the number of which describe attack actions is equal to the number of (fail)s that describe their results. This number is three, so only (c) of Figure 3 is correct. To this end, we can append atomic proposition to the formula, and let follow . Furthermore, we can append atomic proposition to the formula when subinterval is over. The number of is equal to the number of (fail)s if holds, as shown in (c) of Figure 3.
Subinterval is executed repeatedly times to guarantee times cycles of , as shown in Figure 4. We need two states in current subinterval to make sure that the first state of the next subinterval is the next state of the final state of the current subinterval. So, we replace with .
Example 12. Phases of a telnet attack are observed as follows. Phase 1: the telnet service is started, and it is described as atomic formula . Phase 2: the intruder closes firewall. There are three steps in this phase. At first, the intruder accesses C: ∖windows in order to find program . It is described as RASL atomic formula . And, then, the intruder executes command and monitors all processes in order to find of firewall process. It is described as RASL atomic formula . At last, the intruder executes command to close firewall. It is described as RASL atomic formula . The intruder performs the three steps of this phase in sequence with a gap between each step. Each of the two delays is less than seconds, and it is described as . Phase 3: in order to login the system again in the future, the intruder makes a backdoor. There are two steps in this phase. The first step is to access directory in which file instsrv.exe exists, and step 2 is to execute command in order to setup service which is a backdoor. The former can be denoted as a RASL atomic formula , and the later can be denoted as a RASL atomic formula . The intruder performs the two steps of this phase in sequence with a gap between each step. The delay is less than m seconds, and it is described as .
In summary, the timed formula for the telnet attack is formulized as follows:
In Formula (3), “;” is used to express a piecewise action, “” is used to express a concurrent action, and “” is used to express a time constraint relationship.
6. RASL Model Checking Algorithm and Intrusion Detection Algorithm
We can give a subset of RASL called ASL, which is obtained by deleting all of the time constraints in RASL. Reference  gives a data structure called normal form graph (NFG) as well as a procedure called PRO(P) to construct the NFG model denoted as for an ASL formula . Thus, an ASL model checking algorithm was obtained in . Based on this work, we can obtain a RASL model checking algorithm and its intrusion detection algorithm.
First, Definition 13 presents a data structure called TNFG, which is a timed version of NFG.
Definition 13. For a formula , the TNFG of is defined as a tuple , where is a finite clock set, and the set of nodes and the set of edges are inductively defined as follows:, where is an ASL formula in which all in are replaced by ,for every , if , , , and for every , , we have , , where set gives the clocks to be reset and is a clock constraint, and are produced by and(or) only.
Second, Algorithm 1 constructs TNFG models for RASL formulas. Some notations presented in the algorithm are explained in Table 2.
Third, if we append accepted conditions to TNFGs, we will obtain discrete timed automata models of RASL formulas. It is illustrated by Algorithm 2.
Algorithm 2 gives a procedure to compute discrete timed automaton , that is, the model of RASL formula . The model of is the formal language accepted by the automaton.
Last, we can use to describe an attack signature and another discrete timed automaton to a record of the audit log. If , the result of model checking algorithm is that satisfies , else the result is that does not satisfy . We can surely say that IDS finds an attack if satisfies . Thus, the intrusion detection algorithm is obtained, as shown in Figure 5.
The inherent complexity of interval temporal logic model checking problem is nonelementary. The number of exponential order is proportional to the number of embedded not operators. The approach based on an NFG or TNFG reaches the lower bound of this problem [14, 19]. There is only one occurrence of the operator not in the new model checking algorithm. So, both the inherent complexity of the intrusion detection problem based on RASL model checking and the complexity of our algorithm, in the worst case, are exponential.
7. Simulation Experiments
In order to compare the existing approaches with our new algorithm, we conducted experiments by simulating and detecting telnet attacks and password attacks mentioned above as well as other types of attacks. The platform used is a PC with Dual core 3.2 GHz, 8 GB, and Windows XP SP3, along with MATLAB 2010. The results on detection ability are shown in Tables 3, 4, and 5. The different results are due to the different expressive powers of the different logics.
In order to compare the LTL-model-checking-based approaches in [1, 3, 5] with our RASL-model-checking-based algorithm, we simulate and detect some telnet attacks by using MATLAB. We randomly produce 25 kinds of telnet attacks and repeat 80 times for every of these attacks. On average, less than 5 kinds of attacks are reported by the LTL-based simulator, whereas almost 100 percent of these attacks are found by the RASL-based simulator, as shown in Figure 6(a). The simulation results indicate that the model checking technique itself cannot make an IDS stronger, but this technique, when employing a stronger temporal logic, such as RASL, to describe attacks, can.
We simulate and detect some p-trace attacks by using MATLAB. We randomly produce 30 kinds of p-trace attacks, and repeat 100 times for each of these attacks. On average, less than 10 kinds of attacks are reported by the LTL-based simulator, whereas almost 100 percent of kinds of attacks are found by the RASL-based simulator, as shown in Figure 6(b). The results indicate that the RASL-based algorithm enhances the detection power for p-trace attacks, compared with the LTL-based algorithm. Clearly, this is due to the stronger expressive power of RASL.
Suppose that the standard time unit is a second; Figure 7 illustrates a comparison between the ITL-model-checking-based approach in  and our RASL-model-checking-based algorithm. We randomly produce some attacks including real-time attacks and non-real-time attacks. Compared with the ITL-based simulator, the RASL-based simulator raises the average number of detected attacks by as high as 400%, where the average time distance (or time constraints) between two atomic actions in the same real-time attack is only five seconds. The average number will still be raised by 15% even in the worst case, that is, the time distance is more than three thousand seconds. These results indicate that the RASL-based algorithm further raises the power of detection for p-trace attacks, compared with the ITL-based algorithm, again, due to the stronger expressive power of RASL.
In order to give a comparison of the detection ability for more types of attacks between the ITL-model-checking-based approach  and the RASL-model-checking-based one, we tried to conduct a Benchmark test on KDD CUP 99 . We used a behavior version of a sample subset of this standard benchmark set  to evaluate our research in intrusion detection. Attacks fall into four main categories , that is, DOS, R2L, U2R, and Probe, including totally twenty-two types of attacks, as shown in Figures 8, 9, 10, and 11. In each of these four figures, the -axis means the ratio between the number of attacks found by ITL-based simulator and the number of attacks found by RASL-based simulator, whereas the -axis means different types of attacks.
As shown in the figures, all of the ratios range between 0 and 1. For some types of attacks, such as perl and ftp write, et al. the ITL-based simulator finds equal number of attacks when the new simulator does. And for other types of attacks, such as back, Neptune, and smurf, et al., the ITL-based simulator almost does nothing, whereas the RASL-based one does more. This is due to the strong expressive power of RASL again.
This paper defined a new real-time interval temporal logic—RASL. Based on it, we presented a RASL model checking algorithm and its intrusion detection algorithm. This enables us to employee MC-based approaches for detecting real-time attacks. P-trace attacks especially are hard to be detected by the existing IDS  except the LTL-based algorithm [1, 3], the ITL-based algorithm , and the new RASL algorithm. The new algorithm has detected some real-time p-trace attacks in our simulation experiments. To the best of our knowledge, this is the only method to report this type of attacks. It is the benefit of using the new approach.
Conflict of Interests
The authors certify that they have no conflict of interests with any trademark included in this paper.
The first author of this paper would like to thank Dr. Kevin Lu at Brunel University, UK, for his constructive suggestions on this paper. This work has been partially supported by the National Natural Science Foundation of China (No. 61250007, no. U1204608, no. 61003079, and no. 61202099), the China Postdoctoral Science Foundation (no. 2012M511588), the SRFDP (no. 20100203120012), and the Fundamental Research Funds for the Central Universities in China (no. K5051203019).
M. Roger and J. Goubault-Larrecq, “Log auditing through model-checking,” in Proceedings of the 14th IEEE workshop on Computer Security Foundations (CSFW '01), pp. 220–234, IEEE Computer Society, Washington, DC, USA, June 2001.View at: Google Scholar
W. Zhu, Z. Wang, and H. Zhang, “A novel algorithm for intrusion detection based on model checking interval temporal logic,” China Communications, vol. 8, no. 3, pp. 66–72, 2011.View at: Google Scholar
J. Olivain and J. Goubault-Larrecq, “The ORCHIDS intrusion detection tool,” in Proceedings of the 17th International Conference on Computer Aided Verification (CAV '05), Lecture Notes in Computer Science, pp. 286–290, Springer, Edinburgh, UK, July 2005.View at: Google Scholar
Y. Zhang, Y. Fu, and X. Sun, “A method of intrusion detection based on model-checking,” Wuhan University Journal of Natural Sciences, vol. 51, no. 3, pp. 319–322, 2005 (Chinese).View at: Google Scholar
B. Moszkowski, Reasoning about digital circuits [Ph.D. thesis], Department of Computer Science, Stanford University, Stanford, Calif, USA, 1983.
M. Hira, “Verification of tempura specification of sequential circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 16, no. 4, pp. 362–375, 1997.View at: Google Scholar
M. Solanki, A. Cau, and H. Zedan, “Semantically annotating reactive web services with temporal specifications,” in Proceedings of the IEEE 2nd International Workshop on Semantic and Dynamic Web Processes (ICWS '05), 2005.View at: Google Scholar
Z. Duan, Modeling of Hybrid Systems, Science Press, Beijing, China, 2004.
M. G. Ouyang, F. Pan, and Y. T. Zhang, “ISITL: intrusion signatures in augmented interval temporal logic,” in Proceedings of the International Conference on Machine Learning and Cybernetics, vol. 3, pp. 1630–1635, November 2003.View at: Google Scholar
M. G. Ouyang and Y. B. Zhou, “ISDTM: an intrusion signatures description temporal model,” Wuhan University Journal of Natural Sciences A, vol. 8, no. 2, pp. 373–378, 2003.View at: Google Scholar
“KDD Cup 1999 Data,” 2007, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.View at: Google Scholar