Intrusion Detection and Prevention in Cloud, Fog, and Internet of ThingsView this Special Issue
Research Article | Open Access
Scheduling Parallel Intrusion Detecting Applications on Hybrid Clouds
Recently, Parallel Intrusion Detection (PID) becomes very popular and its procedure of the parallel processing is called a PID application (PIDA). This PIDA can be regarded as a Bag-of-Tasks (BoT) application, consisting of multiple tasks that can be processed in parallel. Given multiple PIDAs (i.e., BoT applications) to be handled, when the private cloud has insufficiently available resources to afford all tasks, some tasks have to be outsourced to public clouds with resource-used costs. The key challenge here is how to schedule tasks on hybrid clouds to minimize makespan given a limited budget. This problem can be formulated as an Integer Programming model, which is generally NP-Hard. Accordingly, in this paper, we construct an Iterated Local Search (ILS) algorithm, which employs an effective heuristic to obtain the initial task sequence and utilizes an insertion-neighbourhood-based local search method to explore better task sequences with lower makespans. A swap-based perturbation operator is adopted to avoid local optimum. With the objective of improving the proposal’s efficiency without loss of any effectiveness, to calculate task sequences’ objectives, we construct a Fast Task Assignment (FTA) method by integrating an existing Task Assignment (TA) method with an acceleration mechanism designed through theoretical analysis. Accordingly, the proposed ILS is named FILS. Experimental results show that FILS outperforms the existing best algorithm for the considered problem, considerably and significantly. More importantly, compared with TA, FTA achieves a 2.42x speedup, which verifies that the acceleration mechanism employed by FTA is able to remarkably improve the efficiency. Finally, impacts of key factors are also evaluated and analyzed, exhaustively.
Cloud computing is a novel service-based paradigm that delivers large-scale computational resources in the form of a pay-as-you-go model. Recently, some innovative providers (e.g., VMware partnered with IBM) deliver hybrid cloud construction solutions (e.g., VMware Cloud Foundation (http://www.vmware.com/products/cloud-foundation.html)), which enable creating an extension of a private cloud on public clouds (as seen in Figure 1). As a result, administrators/programs (e.g., application/task schedulers) of the private cloud are able to use resources of public clouds seamlessly and transparently through unified tools/interfaces, since both the private cloud and its extension use the same virtualization technique provided by hybrid cloud construction solutions. In other words, these administrators/programs can perform actions on public clouds just like on their own private cloud. For example, an administrator wants to create an instance of a small VM type for executing a task. When the private cloud has insufficient resources, the administrator can create an instance of the same small VM type on a public cloud to handle the task.
Intrusion Detection (ID) has been widely used to protect computer/network systems from diverse attacks. Recently, taking advantage of distributed computing technologies (e.g., cloud computing), Parallel Intrusion Detection (PID) becomes very popular because of its high efficiency [1, 2]. PID is an ID whose critical part can be processed in parallel. For instance, in an ID using data mining methods, the data can be divided into multiple partitions. As a result, the entire mining job on all the data is divided into multiple subjobs (or called tasks), which only perform mining work on partitions and can accordingly be executed in parallel. Besides, an ID applying deep learning methods whose time-consuming training procedures can be performed in parallel is also a typical PID . The procedure of the parallel processing in PID is called a PID application (PIDA) in this paper. Actually, these PIDAs can be considered as Bag-of-Tasks (BoT) applications, consisting of many independent tasks processed in parallel without synchronization or communication . Theoretically, a cloud computing environment is the ideal platform to execute BoT applications, since it delivers cloud resources in a pay-as-you-go manner . This is the reason why customers are willing to execute BoT applications on clouds.
Actually, customers may have private clouds, whose resources are free to use. Given multiple PIDAs to be processed for protecting different types of computer/network systems, these customers have to outsource some tasks to public clouds with additional costs, when their private clouds cannot afford all applications’ tasks. Technically, tasks outsourced to public clouds can be achieved easily by the aforementioned hybrid cloud construction solutions. The key issue here is, given a limited budget, how to schedule tasks on hybrid clouds to minimize the total execution time (a.k.a. makespan).
This paper aims to schedule PIDAs on hybrid clouds, which is actually BoT Scheduling Problem (BTSP) with resource demands and budget constraints on hybrid clouds to minimize the makespan. In our previous work , this problem was formulated as an Integer Programming (IP) model, which is generally NP-Hard . Accordingly, we also proposed an Effective Heuristic (EH) to solve the problem. EH starts from a task sequence generated by Longest Task First method (LTF) and uses a Task Assignment (TA) method to schedule all tasks in the obtained sequence to calculate the makespan. Although EH was verified to outperform the well-known RoundRobin method, we observe that the quality of the task schedule output by TA depends on its input task sequence, significantly. In order to further improve the schedule’s quality, in this paper, we construct an Iterated Local Search (ILS) algorithm, which employs LTF to obtain the initial task sequence and utilizes an insertion-neighbourhood-based local search method to explore better task sequences with lower makespans. A swap-based perturbation operator is adopted to avoid local optimum. With the objective of improving the proposal’s efficiency without loss of any effectiveness, instead of using TA to calculate task sequences’ objectives, we construct a Fast TA (FTA) method by integrating TA with an acceleration mechanism designed through theoretical analysis. Accordingly, the proposed ILS is named as FILS. Experimental results show that FILS outperforms the existing best algorithm EH, considerably and significantly. More importantly, compared with TA, FTA achieves a 2.42x speedup and identical effectiveness, which verifies that the acceleration mechanism employed by FTA is able to remarkably improve the efficiency without losing effectiveness. The contributions of this paper are summarized below.(i)We regard PIDA scheduling as BTSP, which can be formulated as an IP model.(ii)We establish an effective algorithm FILS to solve the problem.(iii)We propose an efficient heuristic FTA, which includes an acceleration method designed by theoretical analysis, to improve the efficiency.(iv)We perform exhausted experiments to verify the proposed algorithms’ effectiveness and efficiencies.
The rest of this paper is organized as follows. Section 2 discusses related works. Section 3 presents the problem description. The proposed FILS is described in Section 4. A full performance evaluation is shown in Section 5. Conclusions are given in Section 6 finally.
2. Related Works
In the literature, many efforts have been made to study BTSP in cloud environments, such as [6, 8–30]. Reference  presented a thoroughly comprehensive review on the state-of-the-art. As this paper considers budget-constrained BTSP on hybrid clouds, we detail the two contributions and eight works related to budget constraints and hybrid clouds.
The following two contributions tackled BTSP with budget constraints in the environment of multiple public clouds. Reference  presented an algorithm for solving BTSP with either budget or deadline constraints. In their proposed algorithm, all the VM instances are initialized in the same type and iteratively replaced to be other different VM types. As a result, the objective (the total cost if a deadline constraint is considered or the makespan if a budget constraint is given) can be reduced without violating the constraint. Reference  proposed an approach to scale cloud resources for solving BTSP with both deadline and budget constraints while minimizing the total cost. They formulated the considered problem as an Integer Programming problem and developed a policy to determine the number of each VM type that can meet both constraints. In comparison with the problem tackled in these two papers, our considered problem has a different environment.
In the literature, many efforts have been made to study BTSP in cloud environments. Reference  presented a thoroughly comprehensive review on the state-of-the-art. As this paper considers BTSP on hybrid clouds, we detail the eight works related to hybrid clouds. Van den Bossche et al.  considered and formulated deadline-constrained BTSP as an IP and used the IBM CPLEX to obtain solutions. Later, the same authors  proposed two cost-efficient heuristics considering both computational and data-transferred costs. Similar heuristics are presented in  and a comprehensive analysis is performed through simulation experiments to show the effectiveness. Reference  tackled a similar deadline-constrained problem, in which physical machines on the private cloud are taken into account. The authors proposed a greedy heuristic that dispatches tasks to available physical machines on the private cloud and assigns them to public clouds while there are no available ones. Reference  solved deadline-constrained BTSP with the variation of tasks’ runtimes. Thus, the authors constructed a method to estimate tasks’ runtimes so that the scheduling plan can be updated accordingly. Reference  considered deadline-constrained BTSP on cloud federations (a term of hybrid clouds) and formulated it as an IP. The CPLEX was used to solve the problem with the results showing that the cloud federations benefit customers, compared with single cloud provider. Different from the three aforementioned works assuming that each task can be executed in an instance of any VM type, [29, 30] considered computation-intensive BTSP with resource demands and deadline constraints on hybrid clouds from the perspective of cloud providers with an objective of maximizing profit. Both papers employed Particle Swarm Optimization algorithms to solve the considered problems. Obviously, compared with the problems handled in the aforementioned eight papers (i.e., deadline-constrained BTSP with cost minimization), our considered problem (i.e., budget-constrained BTSP with makespan minimization) shares neither constraints nor objectives.
Our previous work  formulated the considered problem as an IP and proposed EH to solve it. EH uses LTF to generate the initial task sequence and employs TA to schedule all tasks in the obtained sequence to calculate the makespan. In this paper, we establish FILS that is demonstrated to outperform EH by experiments. As FTA is used rather than TA to calculate task sequences’ objectives, we achieve a 2.42x speedup without loss of any effectiveness.
3. Problem Description
The formulation of the considered problem was given in our previous work . For the completeness, we also introduce the formulation in this paper, briefly. We use to denote the cloud providers. represents the private cloud and the others are public clouds. The private cloud provides VM types . Each VM type has two performance parameters and that denote the number of CPUs and the amount of memory, respectively. When a task is outsourced to a public cloud, an instance of the VM type demanded by the task’s application should be created to tackle the task on the public cloud. As aforementioned, technically, this procedure can be achieved easily by the above mentioned hybrid cloud construction solutions. Therefore, we can equivalently regard that the public clouds also provide the VM types. Additionally, represents the price (per time unit) for using an instance of provided by a public cloud . The private cloud’s resources are free to use.
There are applications. Each application requires a user-specified VM type. We use a binary variable to denote this relationship. means demands ; otherwise. Meanwhile, each application consists of tasks . Like most of existing contributions (such as [22–24, 29]), in our considered problem, one task is executed in one VM instance exclusively at a time, and each task is executed consecutively (i.e., no preemption is allowed). Those problems where one VM instance can run multiple tasks simultaneously or tasks can be executed preemptively are beyond the scope of this paper. A task has a runtime . In other words, is the execution duration when is executed in an instance of the VM type required by its application.
Like [22–24, 29], setup times for VM instances (such as VM image loading, software installing, and network configuration) are regarded to be zero. Actually, there are cloud providers that are able to deliver VM instances in minutes (e.g., Amazon EC2) and even in seconds (e.g., qingcloud (https://www.qingcloud.com/)). However, in our considered problem, tasks’ runtimes are longer than one hour at least. Setup times for VM instances are negligible compared with tasks’ runtimes and are thus assumed to be zero. Additionally, though some traditional cloud providers (e.g., Amazon EC2) charge VM instances in hours, there are some innovative ones delivering resources in minutes (e.g., Microsoft Azure) or even in seconds (e.g., qingcloud and TecentCloud (https://www.qcloud.com/)). Obviously, users prefer using resources provided by these providers since they do not need to pay for an entire hour while only fraction of this hour is used. Accordingly, in this paper, we regard resources are charged in seconds; i.e., the time unit is set as a second. As a result, it is not necessary to consider how to make use of an entire hour when we formulate the problem. The time axis is divided into several slots with the granularity of a second.
The private cloud has limited number of available resources. The capacities of CPU and memory are denoted as and , respectively. In other words, for any time slot , the amount of consumed resources cannot exceed and . All the public clouds are regarded to have infinite resources. Let and be the completion time of a task and the application’s completion time, respectively. We have
Accordingly, we can define the maximum of time slots satisfying . Let be the start time of the task . We can calculate by
Let and be two decision variables. means is dispatched to and otherwise. indicates is in execution at time slot on and otherwise. Obviously, if a task is dispatched to the private cloud (i.e., ), its start time ; otherwise, (like , we also focus on computation-intensive BoT applications which require tiny amounts of data and the short duration of transferring these tiny-amount data can be negligible compared with tasks’ runtimes. As setup times for VM instances have been reasonably assumed to be zero, we can regard that tasks can be started at time slot 0 on public clouds). With the consideration of the variables defined above, the total cost can be calculated by
Let be the budget and be the objective makespan. All the notations for problem description are listed in Table 1. The problem can be formulated as an Integer Programming (IP) model given below.
Minimize the makespan :s.t.
Equation (4) is the objective. Equation (5) guarantees that the total cost is not beyond the budget. Equation (6) ensures that a task is assigned to a unique cloud. Equations (7) and (8) make sure that the consumption of CPU and memory of the private cloud at any time slot cannot exceed and , respectively.
4. Fast Iterated Local Search Algorithm (FILS)
The framework of general ILS is given in Algorithm 1. We can see that an ILS starts with an initial solution. If the termination criterion is not met, a local search method is performed on the current solution to explore new good solutions, and a perturbation operator is used to avoid local optimum. In this paper, we proposed a FILS, in which task sequences are considered solutions. LTF is used to generate the initial solution. An Insertion-Based Local Search Method (ILSM) is employed to explore better task sequences with lower makespans. A Swap-Based Perturbation Operator (SPO) is used to perturb the current solution. FTA is constructed to calculate makespans of task sequences. Details are given in this section.
4.1. Longest Task First (LTF)
In our previous work , four heuristics including Highest Lowest Public Cost (LPC) First (HLPCF), Lowest LPC First (LLPCF), Longest Task First (LTF), and Shortest Task First (STF) were examined by experiments with the results showing that LTF is the best and helps the proposed EH to achieve good effectiveness. Accordingly, we also use LTF to generate the initial task sequence of FILS. LTF arranges all tasks by their runtimes in the nonascending order.
Meanwhile, it is worth introducing LPC, which will be used to describe FILS in Section 4.4. Like , the costs of executing tasks on public clouds are defined as public costs. A task’s LPC can be defined as the minimum of all its public costs. Given a task , assume that the index of the VM type demanded by its application is ; i.e., . The task’s LPC can be calculated by
Accordingly, the corresponding public cloud is called the task’s “Ideal” Public Cloud (IPC). Obviously, the index of a task’s IPC should meet
4.2. Insertion-Based Local Search Method
Given a task sequence with the length , ILSM first regards it as a temp task sequence . Then, ILSM removes the -th task from and reinserted this task to the left at each position except the task’s original one. As a result, new task sequences are generated and evaluated by TA/FTA (corresponding to CILS/FILS) to calculate their makespans. If one generated task sequence has a lower makespan than , both and are set as . Afterwards, ILSM processes the -th task in in the same way. After the -th task has been processed, ILSM terminates. Obviously, the complexity of ILSM is , in which represents the complexity of TA/FTA.
We use an example to clarify the procedure of ILSM. In this example, we have one application with three tasks. Given a task sequence with makespan 10, ILSM first sets . Then, ILSM removes the first task and reinserted it to at each position except the first one. As a result, two new task sequences and are generated. Assume their makespans are 12 and 9, respectively. In other words, gets a lower makespan than and we set . Afterwards, the second task in (i.e., ) is removed and reinserted. The two obtained task sequences are and with the makespans 8 and 12, respectively. Accordingly, we set . Finally, the third task in (i.e., ) is removed and reinserted. The two generated task sequences are and with the makespans 6 and 14, respectively. In other words, obtains a lower makespan than does. Consequently, we set and is the final result of ILSM.
4.3. Swap-Based Perturbation Operator
SPO is used to help the two ILSs to avoid local optimum. It iterates the following procedure rounds: randomly select a pair of tasks in a given task sequence and swap them. Obviously, this task swap operator is able to adjust the relative orders of tasks partially and tries to make the two ILSs jump out from local optimum if they have already been trapped in. is a very important parameter and will be determined by an experiment in Section 5.2. It is obvious that the complexity of SPO is .
4.4. Fast Task Assignment Method
FTA is developed by integrating an acceleration mechanism with TA without loss of any effectiveness. In TA (details of TA can be seen in our previous work ), we can find that the makespan corresponding to the case that the task is assigned to the private cloud (i.e., ) is first calculated and then compared with the one corresponding to the case that the task is dispatched to its IPC (). The calculation of is time-consuming because we need to determine the task’s start time first. However, actually, we do not need to calculate for some special cases, since the following theorem is true.
Theorem 1. In TA, given a task to be scheduled, if and , the task will be assigned to its IPC.
Proof. For the case that , according to Theorem 1, we have . Additionally, due to , is true. And, according to Theorem 1 given in our previous work , we have . So, . As a result, the task will be assigned to its IPC.
For the case that , according to Theorem 1, we have . Additionally, due to , is true. Because of and , we have . Therefore, the task will be assigned to its IPC as well.
Theorem 1 shows that if and , the task should be assigned to its IPC without respect to . In other words, we do not need to calculate for these special cases and the efficiency can thus be improved. Accordingly, we construct an acceleration mechanism that uses Theorem 1 to discover these special cases. As a result, FTA is established by integrating TA with this acceleration mechanism and is described in Algorithm 2.
FTA uses the aforementioned acceleration mechanism (Lines 17-19) to improve the efficiency without loss of effectiveness, which is ensured by Theorem 1. If the two conditions in Line 17 are true, is set as a sufficiently large value (so that ) and the “while” loop in Line 5 is terminated. As a result, the two conditions in Line 23 are met; i.e., the task is assigned to its IPC. Obviously, the complexity of FTA is identical to that of TA. Nevertheless, compared with TA, FTA obtains much better efficiency (details can be seen in Section 5.3).
4.5. Description of FILS
Let and be the current best found task sequence and its makespan, respectively. Based on the aforementioned LTF, ILSM, SPO, and FTA, we can describe the proposed FILS in Algorithm 3. FILS starts from a task sequence generated by LTF (Line 2). ILSM (Lines 9-19) is iterated until no improvement is obtained (see the condition in Line 7). If a new best task sequence is found, and are accordingly updated (Lines 21-23). Afterwards, SPO is invoked to perturb so that FILS can jump out from local optimum (Lines 24-27). Task sequences’ makespans are calculated by FTA (Lines 3, 13, and 28).
5. Experimental Results
Following most of existing contributions, we use simulation experiments to evaluate algorithms’ performance.
5.1. Testing Instances
We use the testing instances given in our previous work . For the completeness of this paper, we describe these testing instances as follows. Three different Regions (us-east, us-west, and eu-east) of Amazon EC2 and GoGrid are regarded as four public clouds, and 7 different VM types are considered. The configurations and prices are described in Table 2. Note that the prices in Table 2 are shown per hour and we will convert them to values per second when we implement algorithms. The private cloud also provides the same seven VM types.
In order to explore the compared algorithms’ performance on problems of different sizes, we consider the total number of tasks (i.e., ) as the problem size factor and construct a Testing Instance Set (TIS), which contains 3 groups with . Each group contains 3 same-sized subgroups corresponding to 3 different problem types: Small Application Type (SAT), Medium Application Type (MAT), and Large Application Type (LAT). The SAT problem has many small applications that have only a few tasks, while the LAT problem has a few large applications that include lots of tasks. The MAT problem is in between them. So, multiple types of problems are taken into account in this experiment. Each subgroup has 10 different instances and there are 90 instances in total. In order to generate instances of SAT, MAT, and LAT, the application number is set as a random integer uniformly distributed within intervals , , and , respectively. Each task is attributed to the applications with the same probability , separately. TIS is summarized in Table 3. The runtime of each task is an integer uniformly distributed in (i.e., from one hour to one day). The VM type required by each application is randomly selected from the 7 considered VM types.
Let be the best one among all VM types. The budget is set by (11), where is a “Budget Factor” used to adjust the budget so that algorithms’ performance with different budgets can be explored. According to (11), higher budgets are for larger testing instances.
The private cloud’s available CPU and memory capacities (i.e., and ) are set by (12) and (13), respectively, in which is a “Capacity Factor” used to adjust the two types of resources’ capacities so that algorithms’ performance with different capacities can be investigated. According to (12) and (13), the private cloud has more available resources for larger testing instances.
5.2. Parameter Determination
The proposed FILS has a parameter, i.e., the number of pairs of swapped tasks (i.e., ) in the SPO. This parameter is determined by experiments in this section. In order to use some statistical methods (e.g., the well-known multifactor analysis of variance (ANOVA)) to evaluate algorithms’ performance, we set and , respectively. So, there are 9 factors’ combinations. Each algorithm is tested on TIS with all possible factors’ combinations. All algorithms are implemented in Java and run on the same PC with Dual Core Pentium (R) 3.10 GHz CPU and 4GB Memory. The termination criterion of FILS is set as the maximal number of iterations 100. Relative Error (RE) defined by (14) is adopted to evaluate the performance.
For each instance, denotes the obtained makespan returned by an algorithm in the -th replication. is the total number of replications. represents the makespan’s lower-bound, which can be obtained while both the private cloud’s resource capacity and budget constraints are assumed to be relaxed. In other words, the private cloud’s resource capacity and budget are assumed to be infinite. In this situation, all tasks can be executed in parallel. Accordingly, can be calculated by
Smaller REs indicate better effectiveness since the same is used. Based on the RE for each instance, we further use ANOVA to check whether the differences in the observed average REs are statistically significant. Nonoverlapping confidence intervals between any two pairs of plotted averages mean that the observed differences in such averages are statistically significant at the indicated confidence level. FILS is executed 5 replications (i.e., ) since it is metaheuristics including randomness. In order to determine the value of , we set . In other words, has 5 candidates. FILS is tested on TIS with the 5 candidates and the aforementioned two factors’ ( and ) 9 combinations. The plot of mean REs and 95% confidence LSD intervals for compared algorithms is given in Figure 2, where FILS1-FILS5 represent FILS with , respectively. Figure 2 shows that the mean REs of FILS with are 1.083, 0.995, 1.120, 1.157, and 1.189, respectively. The parameter is thus set as 2 in FILS.
5.3. Performance Evaluation
In order to evaluate FILS’s performance, we generate another New TIS (NTIS) by using the same rules described in Section 5.1. Though the same rules are used, NTIS is different from TIS used in Section 5.2. EH  is the existing best algorithm for the considered problem and is thus regarded to be the baseline, accordingly. Meanwhile, the well-known RoundRobin (RR) is also adopted. In RR, the initial task sequence is generated randomly and each task in the obtained task sequence is assigned to all clouds randomly without violating the private cloud’s resource capacity and the budget constraints. Same as those in Section 5.2, the two factors are set as and , respectively, whereas the termination criterion of FILS is set as the maximal number of iterations 100. Additionally, RR and FILS are executed 5 replications (i.e., ). The plot of mean REs and LSD intervals (95% confidence level) for the compared algorithms is given in Figure 3.
In Figure 3, we can see that the mean REs of EH, FILS, and RR are 1.460, 0.966, and 1.727, respectively. This conclusion shows that FILS outperforms EH that is better than RR, remarkably and significantly. On the side of efficiency, EH and RR consume similar computation times for each testing instance. Their average computation times across over all the testing instances are in the level of tens of milliseconds, whereas that of FILS is in the level of tens of seconds. In other words, compared with the computation times of FILS, those of EH and RR can be negligible. Accordingly, we do not evaluate the efficiencies of all the three compared algorithms together. Instead, with the objective of evaluating the acceleration mechanism employed by FTA, we compare FILS with a Common ILS (CILS) that is the same as FILS except for using TA to calculate task sequences’ makespans. For this purpose, we define the Normalized Efficiency (NE) as follows:
For each instance, denotes an algorithm’s computation time, and represents the computation time of the baseline which is selected from compared algorithms. Obviously, a lower NE indicates a better efficiency. Without loss of generality, we select CILS as the baseline in this experiment. As both algorithms are executed replications on each instance, and are the means of the obtained computation times, while we calculate NE for each instance. The mean NEs of the two ILSs are presented in Figure 4, which shows that their mean NEs are 0.46 and 1.0 (CILS is regarded as the baseline), respectively. This conclusion denotes that FILS is much more efficient than CILS, indicating the acceleration mechanism employed by FTA improves the efficiency, considerably. Moreover, we can calculate the speedup achieved by FILS through calculating the mean of all instances’ speedups, each of which is defined as the reciprocal of (i.e., ). Accordingly, the speedup obtained by FILS is 2.42. In other words, FILS is 2.42x faster than CILS.
In order to investigate impacts of the two key factors (i.e., the Budget Factor and the Capacity Factor ), we present the plots of mean REs and LSD intervals (95% confidence level) for the interactions between the types of algorithms and the two factors in Figures 5 and 6, respectively. Figures 5/6 illustrates that mean REs of the three compared algorithms decrease while / increases. Actually, this conclusion is reasonable. A larger indicates a higher budget, and more tasks can be executed on public clouds in parallel. A larger denotes bigger resource capacity of the private cloud. Accordingly, more tasks can be executed on the private cloud in parallel when the private cloud has more resources. Therefore, we can conclude that the parallelism of task execution can be improved when the two factors are set as large values.
This paper schedules Parallel Intrusion Detection Applications (PIDAs) on hybrid clouds to minimize the makespan with the constraints of resource demands and budget. As this problem is NP-Hard, we construct a Fast Iterated Local Search (FILS) algorithm, which employs an effective heuristic to obtain the initial task sequence and utilizes an insertion-neighbourhood-based local search method to explore better task sequences with lower makespans. A swap-based perturbation operator is adopted to avoid local optimum. A Fast Task Assignment (FTA) method is developed by integrating an existing Task Assignment (TA) method with an acceleration mechanism designed through theoretical analysis and is used to calculate task sequences’ objectives. Experimental results show that FILS outperforms the existing best algorithm for the considered problem, considerably and significantly. More importantly, compared with TA, FTA achieves a 2.42x speedup, which verifies that the acceleration mechanism employed by FTA is able to remarkably improve the efficiency. Impacts of the two key factors (the Budget Factor and the Capacity Factor) are also investigated with the results showing that the parallelism of task execution can be improved when the two factors are set as large values.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work is supported by the National Natural Science Foundation of China [Grant nos. 71501096 and 61502234], by Natural Science Foundation of Jiangsu Province [Grant no. BK20150785], by China Postdoctoral Science Foundation [Grant no. 2015M581801], and by the Fundamental Research Funds for the Central Universities [Grant no. 30916011325].
- C. Modi, D. Patel, B. Borisaniya, H. Patel, A. Patel, and M. Rajarajan, “A survey of intrusion detection techniques in cloud,” Journal of Network and Computer Applications, vol. 36, no. 1, pp. 42–57, 2013.
- P. Mishra, E. S. Pilli, V. Varadharajan, and U. Tupakula, “Intrusion detection techniques in cloud environment: A survey,” Journal of Network and Computer Applications, vol. 77, pp. 18–47, 2017.
- N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning approach to network intrusion detection,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 1, pp. 41–50, 2018.
- L. Thai, B. Varghese, and A. Barker, “A survey and taxonomy of resource optimisation for executing bag-of-task applications on public clouds,” Future Generation Computer Systems, vol. 82, pp. 1–11, 2018.
- R. Costa, F. Brasileiro, G. Lemos, and D. Sousa, “Analyzing the impact of elasticity on the profit of cloud computing providers,” Future Generation Computer Systems, vol. 29, no. 7, pp. 1777–1785, 2013.
- Y. Zhang, J. Sun, and Z. Wu, “An heuristic for bag-of-tasks scheduling problems with resource demands and budget constraints to minimize makespan on hybrid clouds,” in Proceedings of the 5th International Conference on Advanced Cloud and Big Data, CBD 2017, pp. 39–44, China, August 2017.
- P. Brucker, Scheduling Algorithms, Springer-Verlag, 2004.
- K. H. Kim, R. Buyya, and J. Kim, “Power aware scheduling of bag-of-tasks applications with deadline constraints on DVS-enabled clusters,” in Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07), pp. 541–548, Rio De Janeiro, Brazil, May 2007.
- R. N. Calheiros and R. Buyya, “Energy-efficient scheduling of urgent bag-of-tasks applications in clouds through DVFS,” in Proceedings of the 2014 6th IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2014, pp. 342–349, Singapore, Singapore, December 2014.
- G. Terzopoulos and H. D. Karatza, “Bag-of-task scheduling on power-aware clusters using a DVFS-based mechanism,” in Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014, pp. 833–840, Phoenix, Ariz, USA, May 2014.
- Y. Zhang, Y. Wang, and C. Hu, “CloudFreq: Elastic energy-efficient bag-of-tasks scheduling in DVFS-enabled clouds,” in Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, ICPADS 2015, pp. 585–592, Melbourne, Australia, December 2015.
- A.-M. Oprescu and T. Kielmann, “Bag-of-tasks scheduling under budget constraints,” in Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom '10), pp. 351–359, IEEE, Indianapolis, Ind, USA, December 2010.
- A.-M. Oprescu, T. Kielmann, and H. Leahu, “Stochastic tail-phase optimization for bag-of-tasks execution in clouds,” in Proceedings of the 2012 IEEE/ACM 5th International Conference on Utility and Cloud Computing, UCC 2012, pp. 204–208, Chicago, Ill, USA, November 2012.
- M. Vasile, F. Pop, R. Tutueanu, and V. Cristea, “HySARC2: Hybrid Scheduling Algorithm Based on Resource Clustering in Cloud Environments,” in Algorithms and Architectures for Parallel Processing, vol. 8285 of Lecture Notes in Computer Science, pp. 416–425, Springer International Publishing, Cham, Switzerland, 2013.
- J. O. Gutierrez-Garcia and K. M. Sim, “A family of heuristics for agent-based elastic Cloud bag-of-tasks concurrent scheduling,” Future Generation Computer Systems, vol. 29, no. 7, pp. 1682–1699, 2013.
- L. Thai, B. Varghese, and A. Barker, “Executing bag of distributed tasks on the cloud: Investigating the trade-offs between performance and cost,” in Proceedings of the 6th IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2014, pp. 400–407, Singapore, Singapore, December 2014.
- L. Thai, B. Varghese, and A. Barker, “Budget constrained execution of multiple bag-of-tasks applications on the cloud,” in Proceedings of the 8th IEEE International Conference on Cloud Computing, CLOUD 2015, pp. 975–980, New York, NY, USA, July 2015.
- M. Mao, J. Li, and M. Humphrey, “Cloud auto-scaling with deadline and budget constraints,” in Proceedings of the 11th IEEE/ACM International Conference on Grid Computing, Grid 2010, pp. 41–48, Brussels, Belgium, October 2010.
- M. H. Farahabady, Y. C. Lee, and A. Y. Zomaya, “Non-clairvoyant assignment of bag-of-tasks applications across multiple clouds,” in Proceedings of the 13th International Conference on Parallel and Distributed Computing, Applications, and Technologies, PDCAT 2012, pp. 423–428, Beijing, China, December 2012.
- I. A. Moschakis and H. D. Karatza, “Multi-criteria scheduling of Bag-of-Tasks applications on heterogeneous interlinked clouds with simulated annealing,” The Journal of Systems and Software, vol. 101, pp. 1–14, 2015.
- I. A. Moschakis and H. D. Karatza, “A meta-heuristic optimization approach to the scheduling of bag-of-tasks applications on heterogeneous clouds with multi-level arrivals and critical jobs,” Simulation Modelling Practice and Theory, vol. 57, pp. 1–25, 2015.
- R. Van Den Bossche, K. Vanmechelen, and J. Broeckhove, “Cost-optimal scheduling in hybrid IaaS clouds for deadline constrained workloads,” in Proceedings of the 3rd IEEE International Conference on Cloud Computing (CLOUD '10), pp. 228–235, Miami, Fla, USA, July 2010.
- R. Van Den Bossche, K. Vanmechelen, and J. Broeckhove, “Cost-efficient scheduling heuristics for deadline constrained workloads on hybrid clouds,” in Proceedings of the 2011 3rd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2011, pp. 320–327, Greece, December 2011.
- R. van den Bossche, K. Vanmechelen, and J. Broeckhove, “Online cost-efficient scheduling of deadline-constrained workloads on hybrid clouds,” Future Generation Computer Systems, vol. 29, no. 4, pp. 973–985, 2013.
- M. Malawski, K. Figiela, and J. Nabrzyski, “Cost minimization for computational applications on hybrid cloud infrastructures,” Future Generation Computer Systems, vol. 29, no. 7, pp. 1786–1794, 2013.
- B. Wang, Y. Song, Y. Sun, and J. Liu, “Managing Deadline-constrained Bag-of-Tasks Jobs on Hybrid Clouds,” in Proceedings of the 24th High Performance Computing Symposium, Pasadena, Calif, USA, 2016.
- V. Pelaez, A. Campos, D. F. Garcia, and J. Entrialgo, “Autonomic scheduling of deadline-constrained bag of tasks in hybrid clouds,” in Proceedings of the 2016 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS), pp. 1–8, Montreal, QC, Canada, July 2016.
- S. Abdi, L. PourKarimi, M. Ahmadi, and F. Zargari, “Cost minimization for deadline-constrained bag-of-tasks applications in federated hybrid clouds,” Future Generation Computer Systems, vol. 71, pp. 113–128, 2017.
- X. Zuo, G. Zhang, and W. Tan, “Self-adaptive learning pso-based deadline constrained task scheduling for hybrid iaas cloud,” IEEE Transactions on Automation Science and Engineering, vol. 11, no. 2, pp. 564–573, 2014.
- Y. Zhang and J. Sun, “Novel efficient particle swarm optimization algorithms for solving QoS-demanded bag-of-tasks scheduling problems with profit maximization on hybrid clouds,” Concurrency and Computation: Practice and Experience, vol. 29, no. 21, Article ID e4249, 2017.
Copyright © 2018 Yi Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.