Abstract

Cloud computing platforms are usually based on virtual machines as the underlying architecture; the security of virtual machine systems is the core of cloud computing security. This paper presents an immune-based intrusion detection model in virtual machines of cloud computing environment, denoted as IB-IDS, to ensure the safety of user-level applications in client virtual machines. In the model, system call sequences and their parameters of processes are used, and environment information in the client virtual machines is extracted. Then the model simulates immune responses to ensure the state of user-level programs, which can detect attacks on the dynamic runtime of applications and has high real-time performance. There are five modules in the model: antigen presenting module, signal acquisition module, immune response module, signal measurement module, and information monitoring module, which are distributed into different levels of virtual machine environment. Performance analysis and experimental results show that the model brings a small performance overhead for the virtual machine system and has a good detection performance. It is applicable to judge the state of user-level application in guest virtual machine, and it is feasible to use it to increase the user-level security in software services of cloud computing platform.

1. Introduction

Cloud computing has become the mainstream of the next generation of information technology; it provides a new and economic technology of allocating and using computing resources. Due to huge scale, complex software and hardware structure, third-party data storage, and unprecedented openness and complexity in cloud computing systems, it makes the security of cloud computing stricter than traditional information systems. If security issues cannot be well solved, it will seriously restrict the rapid development of cloud computing and the popularity of cloud computing applications.

Cloud computing platforms are usually based on virtual machines as the underlying architecture; the security of virtual machine systems is the core of cloud computing security. At present, there are few security researches on virtual machine system in cloud computing environment, and existing researches are briefly introduced.

Haeberlen et al. put forward the concept of accountable virtual machines (AVMs) [1], in which programs are executed and related information is recorded to determine whether programs are normal. This method belongs to static assessment and cannot detect the real-time safety of programs.

Payne et al. [2] presented the Lares system, inserting a hook function in the client virtual machine which can proactively monitor events of client virtual machine (VM). This hook function can trigger safety program of security virtual machine (privileged VM) which make decisions for events of client VM. The monitoring program is located within the secure VM and out of the client VM. Therefore, it belongs to the out-of-VM monitoring method. This method is of high security but requires frequent contexts switching between virtual machines, which brings greater performance cost and especially does not apply to fine-grained monitoring.

Sharif et al. [3] put forward a common in VM monitoring framework, in which monitoring and judging processes run in untrusted guest VM. In order to achieve the same security with out-of-VM monitoring method, this framework uses hardware memory protection mechanism and hardware virtualization technology. In the guest VM, a memory space protected by the VM monitor is divided and used by the safety monitoring program under controlled conditions. This framework requires hardware virtualization support.

Wang et al. [4] put forward a lightweight system named HookSafe based on VM monitor, which is mainly used to monitor the rootkit attacks of kernel spaces. Rootkit attack modifies the control data or hook function address. Hooks are often dynamically allocated with other data and distributed in noncontiguous memory areas, which needs byte-level granularity protection, while current hardware protection mechanism only provides page-level granularity. To solve the problem, Hooksafe introduced a hook function jump layer, which maps hooks to a contiguous page-aligned memory space and then uses the hardware protection mechanism to control access to this block of memory area.

The work in [5, 6] is also used to detect kernel rootkits. The work in [5] monitors invariants in controlled flow transferring and constant relationships in data of uncontrolled flow. The work in [6] adopts the Daikon tool to deduce invariants from data structures which are extracted from memory pages and monitors these invariants to determine the state of kernels.

Bharadwaja et al. [7] analyzed the security issues raised by hypercalls in virtualized environments and proposed a Xen-based distributed intrusion detection system, which implemented filtering operations on hypercalls in the privileged domain to achieve security.

Srivastava et al. [8] studied the use of rootkit to fuzzed system calls for virtual machine monitor (VMM) attacks and proposed a Xen-based monitoring system named Sherlock. The system overlooks call flows by increasing observation points in the process of kernel implementation and automatically adjusts the sensitivity according to security needs.

Szefer et al. [9] proposed the NoHype system. The system does not require too much involvement of VMM, runs VM directly on the underlying hardware, and maintains multiple virtual machines, in order to reduce the possibility of attacks between virtual machines and security threats caused by vulnerabilities of VMM. The main ideas are as follows: preallocating processor and memory resources, use of virtualization I/O device, small modifications of the client OS to perform examinations in the system boot process, and preventing the client VM from indirect contact with the hardware.

Benzina and Goubault-Larrecq [10] pointed out that Domain 0 is an important loophole of virtualization system and proposed a role-based access control model. This model describes unnecessary activity streams by simple timing formulas, which reduces threats of Domain 0 attacks, such as Trojan horses.

Wang et al. [11] proposed a detection method of hidden processes which is based on VMM. This method runs the detection tool out of the VM to be monitored and has high security. It gets the underlying status information of VMs to be monitored through VM introspection mechanism and reconstructs process queues to determine malicious processes.

The above works studied security of user procedures in VM and vulnerabilities of VMM and proposed corresponding defensive methods. However, through careful analysis, current methods cannot accurately determine the real-time status of client VM applications or the security vulnerabilities of VMM. Most of proposed methods are for particular attacks and vulnerabilities and cannot effectively deal with threats of other attacks.

Inspired by the immune response mechanism and the danger theory of the biological immune system, this paper presents an immune-based intrusion detection model in virtual machines of the cloud computing environment, named IB-IDS. The main contributions of this model are as follows. () The model introduces the danger theory into VM intrusion detection and defines the implementation of danger signals; () the model can monitor the state of applications and detect attacks on the dynamic runtime of applications, which has high real-time performance; () the model monitors the whole intrusion detection process and makes sure that every module of the model is safely running; () immune evolution mechanism and performance analysis of the model are described, which shows that the model is effective theoretically. The remainder of this paper is organized as follows. The theories of the model including description of the architecture, definitions of the model, implementation mechanism of danger signals, implementation mechanism of information monitoring, and the immune evolution model are described in Section 2. Performance analysis of the model is showed in Section 3. The effectiveness of IB-IDS is verified in Section 4. Finally, the conclusion is given in the last section.

2. Theories of the Model

Virtualization technology is the foundation of cloud computing. With the popularity of cloud computing, it has received more and more attention. Virtualization technology is achieved when there are many virtual machines in one physical machine, and each virtual machine runs different operating systems and applications and has good isolation with other virtual machines. These are implemented by adding a layer of software called virtual machine monitor (VMM) to the hardware. There is usually a virtual machine with a relatively high authority, called privileged virtual machine (privileged VM), which can manage and control other client virtual machines (Guest VM) to a certain extent. Xen [12, 13] was developed by the University of Cambridge’s computer laboratory. It is an open-source project, is therefore widely used in academic research, and is also based on a number of cloud computing platforms, such as Amazon EC2 Service and Eucalyptus. In Xen, VMM is called hypervisor, and VM is called domain: the first domain which starts together with the hypervisor is called dom0, and other domain is called domU, which is shown in Figure 1.

For a virtual machine system, the most common attacks are basically completed using some certain vulnerabilities of the system. And these attacks are performed by a program or software, which is called malware (malicious software). Common malwares are viruses, worms, Trojan horses, and rootkits. Some of them are user-state malicious processes which do not affect the operating system kernel; some are lurking in the kernel or process and modifying the memory space. When the system has no defense, it is vulnerable to be attacked. For example, when a program runs, we cannot be sure that the dynamic data structure changes in the inner core of the region are reasonable or because of the invasion. The proposed model can detect these kinds of malware.

2.1. Description of the Architecture

Due to the high privilege levels and relatively streamlined structure of the privileged VM and the hypervisor, it is assumed that these two are safe. The main intention of this model is to ensure the safety of user-level applications of guest VM. The architecture of IB-IDS is shown in Figure 2. This architecture is divided into four levels: the underlying hardware layer, the VMM layer, the privileged VM layer, and the guest VM layer. Modules of the model are distributed into these four levels. In order to reduce context switching between dom0 and domU and be able to do fine-grained monitoring, antigen presenting module and signal acquisition module are deployed in every guest VM. Immune response module and signal measurement module are deployed in the privileged VM. These two modules do not need communicating with domU and just get data on a regular basis during execution and are deployed separately in dom0, which can reduce the performance cost and improve the security of dom0. Information monitoring module is deployed in VMM. Because the guest VM is not credible, the model introduces the information monitoring module to supervise the running of antigen presenting module and signal acquisition module, to ensure the safety of the detection process.

The detection process is as follows. First, the antigen presenting module monitors executions of user-level applications in client VMs, extracts critical data as antigens, and delivers them to the immune response module in privileged VM through inter-VM communication mechanism. Meanwhile, the signal acquisition module collects environmental information when the program executes and transmits to the signal measurement module in privilege VM. These operations are performed on a regular basis. Then, the immune response module evaluates whether to trigger secondary response based on the set of memory antibodies. If it does, invasion occurs. If the secondary response is not triggered, the signal measurement module will evaluate the current environment’s risk rating through cloud model, produce danger signals of different degrees, and then determine whether the invasion happens. If it does, the model will start a further initial response to eliminate alien antigens. Information monitoring module periodically runs after the system starts through accessing memory spaces of antigen presentation module and signal acquisition module, in order to ensure that these two modules are not attacked.

2.2. Model Definition

In the software system of virtual machines, all the information in the end can be reduced to a binary string and the virtual machine intrusion detection is classification of the binary string according to certain rules and a priori knowledge. Define that the problem state space . Based on biological immune principles, we define the virtual system platform as organism, client virtual machines as immunologic tissues, and the user programs in virtual machines as antigens. Define that is the collection of antigens. The aim of the virtual machine intrusion detection is to differentiate patterns. Given an input pattern , the system detects and makes sure whether this pattern belongs to a self or a nonself. There are two mistakes in the process of testing: false negative, which sorts nonselves for selves; false positive, which classifies selves as nonselves.

Forrest et al. [14] found out that the execution of critical programs can be described by the sequence of system calls, which is also called the execution trace. The situation of system calls can reflect behavioral characteristics of the program to some extent, and the execution trace has a local stability when the program is running. Taking system calls and their parameters into account, which are up to six in the Linux system regulation, we define the process ID, the short sequence of system calls, and their parameters as gene fragments of antigens.

Definition 1. The antigen is defined as a triple , which represents the feature vector in the solution space of the problem domain.
gid is the unique ID which identifies the client VM. pid is the process ID. is the gene fragments of antigens. sid is the system call ID. k is the length of the short system call sequence, that is to say, the encoded length of immune cells, which reflects order relationships of system calls during the execution process. is the parameter of a system call, . is the number of parameters. All the antigens in the space compose a collection .
It is assumed that normal short sequences that can be recognized by the model are defined as self set , all the unknown short sequences are defined as , abnormal short sequences that produce danger signals are defined as , and short sequences that are judged as invasions are defined as .
Then, . Danger theory does not distinguish between self and nonself, only recognizes intrusion set which triggers immune responses, and does not respond to harmless set .

Definition 2. Antibodies can recognize antigens and trigger specific immune responses. Antibodies have the same structure as antigens, are used for detecting and matching antigens, and are expressed as . The set of antibodies are defined as .

Definition 3. The matching rule which is the affinity of antibody and antigen is indicated as the binding strength between antibody and antigen. In this paper, we propose an improved -continuous bit matching method:where is the value of matching threshold and is -continuous bit matching method between antibody gene fragment and antigen:

Definition 4. Detector set is defined as , where ab is antibody of the detector, is the age of the detector, and is the maximum age of the detector. The detector set consists of immature detectors, mature detectors, and memory detectors. The immature detector which is not subjected to self-tolerance will evolve into a mature one when it passes self-tolerance. The mature detector will become a memory one after it is activated.

The immature detector set is defined as , where simulates tolerance period. The mature detector set is defined as . The memory detector set is defined as .

In the detector generation process, if , the detector can describe self and triggers immune self-reaction, which must be removed. In the end of the process, remaining detectors only can describe elements of the nonself set. In the detection process, if , antigen ag can be described by detector , triggering the immune response.

We use Figure 3 to represent the immune mechanism of the model. In the model, a new immature detector is generated by gene coding, and the immature detector evolves into a mature detector by negative selection (self-tolerance). If it matches selves, it dies. Mature detector has fixed length of the life cycle. If it is activated by danger signals in the life cycle, it evolves into the memory detector and generates first response; otherwise, it dies (deleting those detectors which are useless against antigens). The memory detector has a long life cycle, and once it is matched to an antigen, it will be activated immediately and produce second response.

2.3. Implementation Mechanism of Danger Signals

Danger theory emphasizes that danger signals which are generated from environmental changes result in various degrees of immune response, and the area around signals is called danger zone. The most important issue of introducing danger theory into intrusion detection systems is the definition of danger signals, which is how to determine the danger. In a virtual machine environment, we select the number of regular files of system variable , the memory ratio used by a process Rss, and the number of files reported by lsof command , these three environmental values as assessments of danger signals, and normalize them to real value intervals between .

For antigen , define the function of danger signal below. This function takes the three environmental values , Rss, and as inputs and then generates signal values where the antigen is.

As can be seen, and Rss will have a negative influence on the environment, and the increase of and Rss shows that the environment is damaged or the possibility of being damaged is larger. will have a positive influence on the environment, and the increase of shows that the possibility of the environment being normal is larger.

The size of the danger zone limits the scope of the immune response, and immune cells in the region will be activated to participate in the immune response. For antigen , define the function of the danger zone below. This function returns a collection of detectors whose distance from is less than r_danger:where r_danger is the radius of the danger zone.

How to determine whether the environment is damaged according to danger signals? We took advantage of the cloud model to evaluate. The cloud model [15] is a probabilistic reasoning tool and is a mathematical transformation model between the qualitative concept expressed by language values and quantitative data, which has three numerical characteristics: expectation Ex, entropy En, and hyperentropy He. Based on the danger signal modeling, we use cloud rule generator and reverse cloud generator to carry out qualitative analysis of environments of guest virtual machines. Rule generator can be divided into front cloud and rear cloud. IF part is the condition of the rule, which is achieved by the front cloud, while THEN part is a result of the rule, which is implemented by the rear cloud. The inputs of front cloud are values to be seized, and the output is the membership of some rule activated by samples, which is also input of rear cloud, and the output of rear cloud is the conclusion of the rule.

First, danger signals were sampled times in a safe state and an attacked state. Based on obtained cloud droplets, we got numerical characteristics of front cloud and through reverse cloud generator. If the secure state cloud and dangerous state cloud cover the entire state space, then we can use these two clouds to determine the status of the system. This is an ideal situation. If these two clouds cannot cover the whole state space, we need to divide the empty part, and it can be divided into weak secure state cloud and weak dangerous state cloud. In general, the closer it is to the center of discourse domain, the smaller the entropy and hyperentropy of clouds are; the more it is distant from the center, the larger the entropy and hyperentropy are. For two clouds which are next to each other, entropy and hyperentropy of the smaller one are 0.618 times of the greater one. That is the empirical value. So we can get , , ,. According to the “3En rules” of the cloud model, we can estimate expectations of weak secure state cloud and weak dangerous state cloud. Formulas are as follows:

We design rules listing in the following to build the rule generator. Then we can get the environment and the level of membership according to actual value of danger signals.

Rule 1. IF danger signal indicator is low, THEN the system is safe and does not elicit the immune response, and the corresponding antibody can be deleted.

Rule 2. IF danger signal indicator is comparatively low, THEN the system is relatively safe and does not elicit an immune response.

Rule 3. IF danger signal indicator is comparatively high, THEN the system is relatively in danger and elicits an immune response.

Rule 4. IF danger signal indicator is high, THEN the system is in danger, elicits an immune response, and adds corresponding mature antibody into the memory antibody collection.

When the system triggers the secondary response or danger signals trigger the initial response, antibodies will mutate based on the immune response mechanism to generate new antibodies which have higher affinity with original antigens in order to more quickly identify danger and also generate antibodies which have lower affinity to add into immature antibody collection in order to ensure the diversity of the immune system.

2.4. Implementation Mechanism of Information Monitoring

Antigen presenting module and signal acquisition module are deployed in domU. Because Linux is an open-source operating system, we can add these two modules into domU’s kernel. Information monitoring module is deployed in VMM. To ensure antigen presentation module and signal acquisition module’s safety, the model accesses memory spaces which they belong to and performs hash computing of the memory data. The implementation mechanism needs to solve two important issues. The first one is how to find the memory space which antigen presenting module and signal acquisition module belong to and the second is how to use hashing to ensure that the two modules are not attacked.

VMM is responsible for managing and distributing various hardware resources and provides virtual hardware resources for the upper operating system kernel. domU accesses the physical memory through VMM. In Linux system, system.map file is a specific kernel symbol table and lists all the kernel symbolic names and their corresponding virtual addresses. A kernel symbol may be a variable name or a function name. Since antigen presenting module and signal acquisition module are in domU’s kernel space, all the variables and functions which they contain can be found in system.map; that is to say, we can find virtual memory addresses of these variables and functions in domU. In Xen system, there are three memory structures which are virtual memory, pseudophysical memory, and machine memory. Virtual memory means that each process has a separate virtual memory address space. Pseudophysical memory locates between virtual memory and machine memory, and each operating system of domUs believes that pseudophysical memory is “physical memory.” In fact, machine memory is real physical memory. VMM maintains a M2P (Machine to Physical) global conversion table, and each domU maintains a P2M (Physical to Machine) partial conversion table. As can be seen, we can find the pseudophysical address corresponding to virtual memory address through domU’s page table and find machine address corresponding to pseudophysical address through domU’s P2M table.

Through the above method, we can find the memory space to which antigen presenting module and signal acquisition module belong. Information monitoring module reads contents of all initialized data, read-only data, and functions’ memory which belong to the two modules in the order in accordance with the system.map file, as hash input. Hash computing can map binary value of arbitrary length to a shorter fixed-length binary value, and two different inputs cannot be mapped to the same value. Therefore, we use hash computing to ensure the integrity of memory spaces of antigen presenting module and signal acquisition module. In hypervisor, we define two variables and , which store cumulative hash values of antigen presenting module and signal acquisition module, and they are calculated as follows:

In (5), is the hash function, & is a binary string concatenation operator, is the content of the th memory segment of antigen presenting module, and is the accumulative value after times hash computing for antigen presenting module. Meaning of (6) is by analogy. We mark the final cumulative hash values of antigen presenting module and signal acquisition module stored by hypervisor in a safe state as standard values and . Information monitoring module periodically is executed. Through comparing hash values and which are obtained when the program is running with standard values, we can determine the security of antigen presenting module and signal acquisition module.

2.5. The Immune Evolution Model
2.5.1. Self-Evolution Model

where , , respectively, express the self set in the moment of and . is the self set in the initial moment. is the evolutionary cycle of selves. In the cycle, the self set remains unchanged; in the end of period, new elements will complement, such as loading new programs, those programs that have been uninstalled will be deleted, and part of selves will be eliminated, in order to avoid increases of self set without limit.

The computer software system is a huge collection. The self set of a complete software system is too large for the calculation ability at the present stage of computer, and it is very difficult to find an absolute reliable self set in the dynamic software system. The evolution of the self set can make the model only need to maintain a smaller set of selves, to ensure higher time efficiency according to the existing computing capacity. In addition, because of the continuous evolution of selves, nonself elements which mix into selves will eventually be removed, reducing the rate of false negative caused by incomplete self set.

2.5.2. Antibody Gene Lib Evolution Model

where , respectively, express the set of antibody gene lib in the moment of and . is the initial antibody gene collection, which are gene fragments of these typical kinds of malware. is set of mutated genes which should be removed in the time of . is set of memory detectors with false positive. When mature detector is cloned, its gene will join the antibody gene library as the dominant gene. is set of activated mature detectors.

Antibody gene lib is mainly used to improve the generation efficiency of immature detectors. In the generation process of new immature detectors, their antibodies are produced by gene encoding measures, so they have the ability to detect known malware variants, reducing the tolerance time. The use of genetic coding produces “Baldwin effect”: evolution and learning will enable new individuals to acquire some of the same characteristics, reducing the diversity of the system. In order to solve this problem, a certain proportion of randomly generated immature detectors are added to ensure the diversity of the system.

2.5.3. Immature Detectors Evolution Model

where , respectively, express set of immature detectors in the moment of and . means adding 1 to the age of every detector in . is set of immature detectors which does not pass self-tolerance, and is set of mature detectors which pass self-tolerance. is newly created immature detectors in the time and includes two parts: completely random-generated detectors (to ensure diversity) and detectors generated by genes encoding in the antibody gene lib (to ensure availability).

2.5.4. Mature Detectors Evolution Model

where , respectively, express the set of mature detectors in the moment of and . is set of mature detectors which are not activated at the end of the life cycle. is set of mature detectors activated by danger signals. is set of new mature detectors. is set of mature detectors which are produced by clonal mutation of activated ones. is clonal variation equation and executes clone and mutation operation for each element in X.

2.5.5. Memory Detectors Evolution Model

where , respectively, express the set of memory detectors in the moment of and . is set of initial memory detectors. These detectors can be obtained from common malwares. is set of memory detectors with false positive in the moment . expresses set of newly created memory detectors. sets the age of each detector in to . is set of activated memory detectors in the time .

2.5.6. Antigen Detection

where , respectively, express the set of antigens in the moment of and . is set of initial antigens. expresses antigens to be checked in the moment t.

3. Performance Analysis of the Model

Set the number of programs in a computer as , and usually the proportion of nonselves is . The size of the self set is , the size of the mature detector set is , and the size of the memory detector set is . The matching probability between any given detector and any given antigen is (which is related to the specific matching rule). is the probability of occurrence of event .

Theorem 5. For any detector which passes the self-tolerance, the probability of this detector matching those selves which are not described is .

Proof. Set that is event “the given detector does not match any self in the self set,” and is event “the given detector matches at least one self in the un-described self set.” It is clear that the detector from is self-tolerated and the detector from may be not self-tolerated. . In the event , the number of times that detectors match selves meets the binomial distribution, that is to say, , where, , . Then, . In a similar way, in the event B, the number of times that detectors match selves meets the binomial distribution, that is to say, , where, . Then, . .

Theorem 6. For any given nonself antigen ag, the probability of this antigen identified correctly is .

Proof. Set that is event “ag matches some memory detector or some mature detector which is triggered by danger signals.” . In the event , the number of times that antigens match detectors meets the binomial distribution, , where, , . The memory detector and the mature detector which recognize selves cannot identify nonselves, which is not counting. Then, . According to Poisson theorem, when is small and is large, .

Theorem 7. For any given nonself antigen ag, the probability of false negative with this antigen is ; for any given self antigen ag, the probability of false positive with this antigen is .

Proof. By Theorem 6, . Set that is event “the given self matches memory detector or mature detector.” Then, . In event A, the number of times that selves match detectors meets the binomial distribution, , where . So, . According to Poisson theorem, when is small and is large, .

Theorem 8. Selves of the model are completely described at the macrolevel. The spatial complexity of the dynamic tolerance model producing a fixed number of mature detectors is constant, and the time complexity is linear with the number of detectors (excluding immature detectors).

Proof. According to (8), the self set evolves with a fixed length of time slice. With the passage of time, will cover the entire self space, which is to say, description of selves at the macrolevel is complete. Moreover, the size of the self set is limited to . Without loss of generality, considering the extreme case, the number of selves is . D’haeseleer et al. [16] pointed out that, for an arbitrary matching rule, the spatial complexity of producing a fixed number of mature detectors is , and the time complexity is . For a specific matching algorithm, is constant. By Theorem 7, . By Theorem 5, . So, the time complexity of producing a fixed number of mature detectors is . That is to say, the time complexity of producing a fixed number of mature detectors is linear with the number of memory detectors and mature detectors.

For a specific matching rule, is constant [17]. For -continuous bit matching method, . Figures 4 and 5 are the Matlab simulations of Theorem 5. As can be seen from the figures, when is large enough, effect of and on is small. When = 200, = 500, , < 1% reaches the ideal value.

Figure 6 is the Matlab simulation of Theorem 6. As can be seen from the figure, when and become large, increases.

Figures 7 and 8 are the Matlab simulations of Theorem 7. As can be seen from the figures, with the rise of and , decreases and increases.

Considering simulations of Theorems 5, 6, and 7, when = 200, = 500, , = 100 and = 100, < 1%, > 95%, < 1%, < 5% reach ideal values.

4. Experimental Results and Analysis

In this section, we verified the validity of IB-IDS through experiments, including security analysis, effects on the performance of programs after joining IB-IDS into the Xen virtual machine system, and intrusion detection efficiencies of IB-IDS. Experimental environment is as follows. All tests were performed on the ThinkPad T540p notebook. This type of hardware configuration is an Intel Core i5-4300M 2.60 GHz quad-core CPU and 8 G of physical memory. Xen version number is 4.4.1, which manages two domains, privileged VM dom0 and guest VM dom1. These two virtual machines run Ubuntu system with the version 14.04, and the kernel version of Linux is 3.13.0.19. Dom0 is allocated four VCPU and 4 G physical memory, and CPU scheduling weight is set to 256, while Dom1 is allocated four VCPU and 1 G physical memory, and CPU scheduling weight is set to 256.

In IB-IDS, parameters are set as follows. Danger signal parameters , , , and the radius of danger zone . Experiments run 10 times and averaged results were acquired.

4.1. Security Analysis

In the architecture description of the model, each module is distributed in different virtual machines. In domU, data is collected and then passes to dom0 through interdomain communication mechanism. The authorization list of Xen can make sure that a domain’s memory space can only be accessed by its authorized domain. In the model, domU is the owner of a ring sharing buffer, and dom0 has the only granted permission; other domain cannot access. Therefore, data will not be leaked to other unauthorized domain, and the data transfer process is safe.

In paravirtualized Xen, domU accesses the hardware indirectly through dom0. To ensure the safety of the immune calculation, the model passes data to dom0 for computation. In this model, we assume that the privileged virtual machine is a trusted node.

Some traditional intrusion detection tools typically need to be deployed in a client virtual machine. Because the client virtual machine is not a trusted node, and it is exposed to various attacks, so the detection tools are also vulnerable. In this model, we assume that the virtual machine monitor is also a trusted node. The memory space of the two modules which are deployed in domU will be monitored by the virtual machine monitor.

Therefore, the monitoring process and results of the model are reliable.

4.2. Performance Evaluations of the Model

The introduction of IB-IDS to a virtual machine system will obviously bring some performance cost. In cloud computing, many applications are executed concurrently. Therefore, this section firstly uses the appropriate performance test to assess the impact of IB-IDS on parallel programs. In our tests, we used the classic SPLASH-2 program group [18, 19]. The programs are written in C, are composed of 12 benchmarks, and use PThread parallel mode. We randomly select five procedures for testing and Table 1 gives a brief introduction.

Figure 9 shows contrasts of the five benchmarks between loading IB-IDS and unloading IB-IDS. As can be seen from the figure, the calculation time of dom1 is longer than the original system, and the average increased time is 7.33%, up to 10.86% on LU program, which indicates that the additional cost of virtual machine system with integrated IB-IDS is very small and in the acceptable range. Applying IB-IDS to cloud computing platforms will not have significant impact on parallel applications.

In IB-IDS, the main performance overhead of domU is from antigen presenting module and signal acquisition module, as well as the operation of passing data to dom0 through intervirtual machine communication mechanism. These acts are performed regularly, and the cost is limited. For example, antigen presenting module is a proactive monitoring program on system call sequence and is not triggered by every system call. Signal acquisition module is the same. Through the event channel, domU puts antigens and environmental status into the ring buffer, and only if the ring buffer is empty, it will notify dom0, which will cause a context switch between domU and dom0. If there is data in the ring buffer, Dom0 would have been kept reading, and domU’s notification is not required. So, the overhead of context switching is limited. In addition, implementations of immune response module, signal measurement module, and information monitoring module will increase performance overhead of dom0, and the impact on domU can be ignored.

Then, we test the impact of IB-IDS on computation intensive applications. In our tests, we used set of benchmark programs, SPEC (Standard Performance Evaluation Corporation) CPU2000 [20]. The programs include two parts. One is CINT2000 against integer computation intensive applications. The other is CFP2000 against float applications. We choose CINT2000 which has 12 applications. And we randomly select five procedures for testing and Table 2 gives a brief introduction.

Figure 10 shows contrasts of the five benchmarks when loading IB-IDS and unloading IB-IDS. As can be seen from the figure, the calculation time of dom1 is longer than the original system, and the average increased time is 9.12%, up to 11.48% on 254.gap program. Compared with parallel programs, the influence of IB-IDS on the virtual machine is larger, but it is still in the acceptable range. So, IB-IDS can be integrated in the computation intensive program scenario of cloud computing.

At last, we test the impact of IB-IDS on web server. In our tests, DomU runs the web server and is composed of apache http server and PHP. We use the httperf tool [21] to generate continuous network requests that can cause the server to be overloaded. Using autobench tool [22], we can run httperf for many times, increase the number of requests per second, and extract the output of httperf results. Figure 11 shows contrasts of server responses when loading IB-IDS and unloading IB-IDS. As can be seen, when the frequency of HTTP request increases, the response time of the server after the introduction of IB-IDS rises. When the HTTP request frequency is 100, the increased time is less than 0.5 s which is acceptable. Therefore, in the cloud computing platform with the deployment of a web server, IB-IDS system can also be applied.

4.3. Comparisons of Detection Rates and False Alarm Rates

This section will test the ability of IB-IDS for detecting attacks. Experiments adopt detection rate (DR) and false alarm rate (FAR) to measure the effectiveness of the system and to compare with ARTIS model proposed by Glickman et al. [17]. As a general computer immune system, the model has characteristics of diversity, distribution, dynamic learning, adaptability, and self-monitoring. It consists of a series of lymph nodes, and each node independently completes the immune function. Each node contains multiple detectors (a detector is a blend of the nature of B cells, T cells, and antibodies). ARTIS model draws on a variety of biological immune mechanisms, and coordinated stimulus and the dynamic evolution of detectors (immature ones, mature ones, and memory ones) make it continuously learning. The model has been successfully applied in intrusion detection, virus identification, pattern recognition, and so forth [17, 23]. Figure 12 shows the life cycle of detectors.

Figures 13 and 14 show comparisons of DR and FAR for IB-IDS and ARTIS in the simulation environment. In Figure 13, experiments adopt data with 60 nonselves in every 100 antigens, where 30 nonselves are just confirmed. This means that previously this type of antigen is considered to be self (normal procedure) and is now thought of as nonself (abnormal procedure). For example, unload some attack process instantly and stop providing related services. In Figure 14, experiments adopt data with 40 selves in every 100 antigens, where 20 nonselves are just defined. For example, load some new processes to provide new services. Experimental results show that IB-IDS has higher DR and lower FAR.

Then, we adopt wu-ftpd2.6.0 program, sendmail8.12.0 program, and some typical rootkit in Linux which are widely deployed as anomaly detection applications. Attacks against wu-ftpd are the scripting attack of file name matching vulnerability, the attack of getting around access restrictions, the scripting attack of site exec vulnerability, and so on. Attacks against sendmail are the sccp attack, decode attack, remote buffer overflow attack, and so on. Some of the representative rootkits include simple hook rootkit, inline hook rootkit, inline hook complex rootkit, and so on. Simple hook rootkit: a rootkit of this type modifies the system call function’s entry address to a malicious function. When the corresponding system call is called, the malicious function is executed instead of the original system call function. Inline hook rootkit: a rootkit of this type does not modify the system call table entry address but will replace a few bytes of beginning system call function with a jump statement. Compared with the simple hook rootkit, the rootkit is more subtle. Inline hook complex rootkit: a rootkit of this type does not replace the first bytes of the system call function with jump statements, except the other few bytes, for example, bytes in the middle. Table 3 lists DRs and FARs of IB-IDS and ARTIS, and variances are in parentheses. As can be seen from the table, IB-IDS has high detection rates and low false alarm rates under various attacks and is feasible for judging applications in client virtual machines.

5. Conclusions

Cloud computing platforms are usually based on virtual machines as the underlying architecture; the security of virtual machine systems is the core of cloud computing security. Current study on security of user programs and vulnerabilities of virtual monitors cannot accurately judge the real state of the client application in the virtual machine. At the same time, the proposed defense methods are only for specific attacks and vulnerabilities and cannot effectively deal with threats under other attacks. This paper presents an immune-based intrusion detection model in virtual machines of the cloud computing environment, to ensure safety of user-level applications in client virtual machines. The model extracts system call sequences and their parameters of programs, abstracts them into antigens, and fuses environmental information of guest virtual machines into danger signals in client VMs. Then, immune responses will be performed in the privileged VM. During the detection process, information monitoring mechanism will be executed in VMM. Experimental results show that the model brings a small performance overhead for the virtual machine system and has a good detection performance. It is applicable to judge the state of user-level application in guest virtual machine, and it is feasible to use it to increase the user-level security in software services of cloud computing platform.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

The authors would like to acknowledge Sichuan Agricultural University Double Support Project for providing financial aid.