Abstract

This paper addresses the problem of task allocation in real-time distributed systems with the goal of maximizing the system reliability, which has been shown to be NP-hard. We take account of the deadline constraint to formulate this problem and then propose an algorithm called chaotic adaptive simulated annealing (XASA) to solve the problem. Firstly, XASA begins with chaotic optimization which takes a chaotic walk in the solution space and generates several local minima; secondly XASA improves SA algorithm via several adaptive schemes and continues to search the optimal based on the results of chaotic optimization. The effectiveness of XASA is evaluated by comparing with traditional SA algorithm and improved SA algorithm. The results show that XASA can achieve a satisfactory performance of speedup without loss of solution quality.

1. Introduction

In many application domains, (e.g., astronomy, genetic engineering, and military systems), increased complexity and scale has led to the need for more powerful computation resources; distributed systems (DS) have emerged as a powerful platform for addressing this issue, alternating to traditional high performance computing systems. DS consists of a set of cooperating nodes (either homogeneous or heterogeneous) communicating over the communication links. An application running in a DS could be divided into a number of tasks and executed concurrently on different nodes in the system, referred to as the task allocation problem (TAP). To improve the performance of DS, several studies have been devoted to the TAP with the main concern on the performance measures such as minimizing the execution and communication cost [13], minimizing the application turnaround time [4, 5], and achieving better fault tolerance [6, 7].

On the other hand, the real-time property is required in many DS (e.g., military systems). In such system, the application should complete its work before deadline, not only promising the logical correctness. While the complexity of DS could increase the potential of system failure, because in such a large and complex system, the nodes and communication links failures are inevitable. Hence, the reliability is a crucial requirement for DS, especially for the real-time DS (RTDS).

Distributed system reliability (DSR) has been defined by Kumar et al. [8] as the probability for the successful completion of distributed programs which requires that all the allocated processors and involved communication links are operational during the execution lifetime. Redundancy and diversity is the traditional technique to attain better reliability [6, 7, 914]. They process hardware and/or software redundancy, hence impose extra cost. Moreover, in many situations, the system configuration is fixed and we have no freedom to introduce system redundancy. Task allocation is the alternative way to improve DS reliability, and this method does not require additional resources, neither hardware nor software.

The TAP with the goal of maximizing the DSR is a typical combinatorial optimization problem; unfortunately, this has been shown to be NP-hard in strong sense, and the computational complexity of optimal algorithms (e.g., branch and bound technique) is exponential in nature. We cannot obtain the optimal results in reasonable time for large scale problems. Hence, several heuristic and metaheuristic algorithms have been implemented, such as genetic algorithm (GA) [7, 14, 15], simulated annealing algorithm (SA) [16], particle swarm optimization (PSO) [17], honeybee mating optimization (HMO) [18], cat swarm optimization (CSO) [19], and iterated greedy algorithm (IG) [20]. These algorithms may obtain suboptimal results, but they can sharply reduce the calculation time.

A common thread among these algorithms is that they all start from a randomly chosen initial solution or a set of solutions in the solution space and then repeat the exploration-decision procedure until convergence and obtaining good enough solutions (maybe suboptimal results) [21]. Here, exploration means obtaining new solutions based on the current solution, in SA algorithm; for instance, new solutions are chosen from the neighbors of the current solution. Decision is made after exploration: a new solution is either accepted or rejected according to some rules, if the new solution is accepted, then it becomes new current solution, and move on, otherwise drop it, start new exploration. According to a series of the exploration-decision procedures, the quality of the obtained solutions becomes better and better until meeting the termination condition which is often defined as some convergent situations. Hence, the convergence speed of algorithms is affected by the choice of initial solutions and rules that are applied in the exploration-decision procedures.

Simulated annealing algorithm is one of the earliest and most wildly used optimization approaches; it introduces random factor in searching process and models the annealing of solids as Metropolis process [22]. SA algorithm accepts “worse” solution with some probabilities related to the current temperature, so it can escape from the local optima and find the global optimal solution. The convergence speed of SA algorithm is depending on its initial solution and cooling schedule.

Chaos is a bounded unstable dynamic behavior that exhibits sensitive dependence on initial conditions and includes infinite unstable periodic motions [23]. Although it appears to be stochastic, it occurs in a deterministic nonlinear system under deterministic conditions. In recent years, chaotic optimization algorithm (COA) has aroused intense interests due to its ergodicity, easy implementation, and ability to escape local optima [24]. However, COA is lack of heuristic, mostly needs a large number of iterations to reach the global optimum, which means its convergence speed is slow.

In this paper, we propose a combinational algorithm called XASA (Chaotic Adaptive Simulated Annealing), where X alludes to the Greek spelling of chaos () and which is proposed to solve the TAP in RTDS with the goal of maximizing the DSR. We take into account several kinds of constraints including deadline. XASA starts from COA and obtains several optima via its ergodicity, then SA algorithm will operate based on these optima in relative smaller ranges to find the best solution. This method can overcome the slow convergence of SA algorithm and COA without loss of solution quality.

The rest of this paper is organized as follows. Section 2 presents the related work in the application of SA algorithm and Chaotic Optimization to the TAP, and the contributions of this paper are stated. Section 3 describes the formulation of the TAP with the goal of maximizing reliability, and the solution approach is presented in Section 4 with some details of implementation. Section 5 discusses the performance evaluation of proposed algorithm according to several experiments and analyses. And Section 6 concludes this work.

The idea of simulated annealing algorithm was proposed by Metropolis et al. [22] in 1953, and was applied to optimization problems by Kirkpatrick et al. [25] in 1983. To our knowledge, the first application of SA algorithm to the TAP was made by van Laarhoven et al. [26] in 1992, which applied SA algorithm to a job shop scheduling problem. From then on, several works [2729] have been done that compare SA algorithm to other optimization algorithms for problems related to the TAP. Attiya and Hamam applied SA algorithm to solve the TAP and compared it with branch-and-bound technique in 2006 in terms of maximizing the reliability of DS [16]; extending this work, Faragardi et al. proposed improved SA algorithms to solve this problem [30, 31], using the hybrid of SA and tabu search with a nonmonotonic cooling schedule in 2012 and adding systematic search of neighborhood and memory to SA algorithm in 2013.

COA often combines with other optimization algorithm to overcome its drawbacks and take advantage of its beneficial property such as ergodicity, for example, chaotic simulated annealing (CSA) [32], chaotic particle swarm optimization (CPSO) [23], and chaotic improved imperialist competitive algorithm (CICA) [33] et al. CSA was proposed by Chen and Aihara [32] to solve combinatorial optimization problems in 1995, which used Hopfield neural networks. Then, several other papers have expanded this work [3437]. Mingjun and Huanwen have proposed another version of CSA to solve the optimization problem of continuous functions [38]. Most of these works focus on the application of continuous functions optimization, while the method proposed by Chen and Aihara is actually based on artificial neural network, not SA algorithm. Ferens and Cook [39] adapted CSA developed by Mingjun and Huanwen into the TAP in 2013, where chaos was infused into a solution by setting the number of perturbations made by the value of a chaotic variable. However, this method does not make full use of the beneficial property of COA and does not improve the convergence speed either.

The present paper differs from the above mentioned researches because it can combine the advantages of both two algorithms. Firstly, with the ergodicity property, COA can get the skeleton of solution space by chaotic walking in it, thereby, preventing the result from falling into local optima. Secondly, based on the results of COA, we can easily determine the cooling schedule of SA algorithm which is very important to the performance of SA algorithm but hard to deal with. Lastly, several adaptive schemes are used in SA algorithm; all these schemes including COA preliminary results can increase the convergence speed significantly.

3. Problem Statement

We consider a heterogeneous DS that runs a real-time application. Each node of DS may have different processing speed, memory size, and failure rate. Moreover, the communication links may also have different bandwidth and failure rate. The state of nodes and communication links is either operational or failed, and the failure events are statistically independent. We also assume that the failure rates of nodes and communication links are constant.

There are tasks to be executed on a DS with nodes, and for most cases. Tasks executed on the node require resources, including processing load resources and memory space. Additionally, two tasks executed on different nodes require communication bandwidth to communicate with each other. Figure 1 illustrates a simple case consisting of 5 tasks and 3 nodes. An application running on DS can be represented by a task interaction graph as shown in Figure 1(b), where represents a set of tasks and represents the interactions between two tasks. Each task is associated with two properties: represents the execution time of the task at different node and represents the processing load and memory requirements of the task. And the label of each edge represents the communication requirements among tasks.

The purpose of this work is to find a task assignment that all tasks are assigned to nodes (note that one task should be assigned to one and only one node, while one node can execute multitasks or none), so that the overall system reliability is maximized, the deadline and other requirements of tasks are satisfied, and the capacities of the system resources are not violated.

3.1. Notations

The notations that are used to formulate the problem are listed in Abbreviation Section.

3.2. System Reliability

Reliability of a distributed system may be defined as the probability that the system can run the entire application successfully [7, 14, 40]. Due to the independence of the failures of the node and path, the system reliability is the product of the components reliabilities. Which is the product of the reliabilities of all nodes and communication links.

We assume that all components of the node except processor are perfect, which means the reliability of the node equals the reliability of its processor. The reliability of the processor at time is [16]; under a task assignment , the total execution time of tasks that are assigned to is , so the reliability of the node is

Similarly, the reliability of the communication link at time is , under a task assignment ; the total communication time via is , so the reliability of the communication link is

Hence, we obtain the reliability of the system: where

3.3. Constraints

In order to achieve a satisfactory allocation, there are several basic constraints of the TAP in RTDS that should be met. Traditionally, the aim of allocation constraints is devoted to not violate the availability of the system resources, like memory capacity. While the deadline requirement should be met for real-time property as well.

(i) Memory Constraint. The total amount of memory requirements of tasks assigned to a node should not exceed the capacity of the node. That is,

(ii) Computation Resource Constraints. The total amount of computation resource requirements of tasks assigned to a node should not exceed the capacity of the node. That is,

(iii) Communication Resource Constraints. The total amount of communication resource requirements of tasks via a communication link should not exceed the capacity of the link. That is,

(iv) Deadline Constraints. All tasks should complete execution before their deadline. Since there is no priority of tasks, all tasks that are assigned to a node can execute in any order. We should take account of the worst case, that is, considering task always executed at last. Hence, the deadline constraints is

3.4. Problem Formulation

According to the above discussion, we can tell that maximizing the reliability of RTDS is equivalent to minimizing the object function , with all constraints mentioned before. Hence, we can formulate the TAP by the following combinatorial optimization problem:

4. Task Allocation Solution

This section describes basic SA algorithm briefly at first, with the discussion of the cooling schedule which has a significant effect on its convergence speed then presents the XASA and explains the details of how it can be applied into the TAP in terms of the statement proposed in this paper.

4.1. Basic Simulated Annealing Algorithm

The SA algorithm starts from a randomly chosen initial solution and generates a series of Markov chains according to the descent of the control parameter (i.e., temperature). In these Markov chains, a new solution is chosen by making a small random perturbation of the solution, and, if the new solution is better, then it is kept, but if it is worse it is kept with some probability related to the current temperature and the difference between the new solution and the previous solution. According to a series of iteration of solutions, an optimal one was found. The SA algorithm applied in the TAP is listed as follows.

Step 1. Choose an initial task arrangement () at random.

Step 2. Calculate the cost () of .

Step 3. Set the initial solution as the optimal , .

Step 4. Initialize the temperature ().

Step 5. Select a neighbor () of .

Step 6. Calculate the cost () of .

Step 7. If , then and , otherwise go to Step 9.

Step 8. If , then and , go to Step 10.

Step 9. If , then and .

Step 10. Repeat Step 5 to Step 9 for a given number of iterations.

Step 11. Reduce the temperature via some cooling function .

Step 12. If the termination condition is satisfied (e.g. ), then go to Step 13, otherwise go to Step 5.

Step 13. Output the solution.

Note that the neighborhood defines the procedure to move from a solution point to another solution point [16]. In this paper, a neighbor is obtained by randomly choosing a task among tasks and replacing its current assigned node with another randomly selected one.

4.2. Cooling Schedule

It has been shown that the SA algorithm converges to the global optimal with probability 1 [41], which needs a sufficiently slow cooling schedule (i.e., sufficient hot initial temperature, sufficient low final temperature, and sufficient slow cooling speed). However, the required slow cooling schedule may lead to an unacceptably long solution convergence time which can be exponential.

The cooling schedule is a set of parameters, which controls the procedure of SA algorithm so that it can be asymptotic converge to a suboptimal in reasonable time. The cooling schedule is made up of these parameters.

(i) The Initial Value of the Control Parameter (i.e., Temperature) . The initial temperature represents one of the most important parameters in SA algorithm. If the initial temperature is very high, it will take very long time to be convergent. On the other hand, poor solutions are obtained if the initial temperature is low. A basic principle of choosing initial temperature is that the acceptance probability of worse solutions is close to 1. Which means the exchanging of neighboring solutions should be almost freely at first. Hence, we can determine the initial temperature via initial acceptance probability of worse solution . In this paper, is set to be . , where is the difference between the neighboring solutions, so . We can use the as the estimation of , where and is the maximum and minimum of energy function among randomly chosen solutions. So the initial temperature is determined by the following formula:

(ii) The Cooling Function . The cooling function defines the cooling method of temperature . A commonly used type of cooling function is exponential descent function: , where is constant and . So that the cooling speed depends on the parameter , and we call it the cooling factor. A large value of represents slow cooling, which yields good solution, but expensive in time. is set to be slightly less than 1 in the most reported literatures, and it is chosen to be in this paper.

(iii) The Final Value of Control Parameter , Termination Condition in Another Word. The criterion for termination can be either final temperature or steady state of the system. The former one can control the total calculation time, but not the solution quality: a given final temperature may introduce calculation redundancy for small scale problem but obtain poor solution for large scale problem. On the other hand, the latter one can take into account both time and quality. In this paper, the SA algorithm will terminate if the solution remains unchanged (neither upgrading nor downgrading) for a given number of iterations. The number of iterations that solution remains unchanged is chosen to be . Furthermore, the validation of the final solution should be satisfied as well, which will be discussed later.

(iv) The Length of Markov Chain . It is the number of inner loop repetitions. This parameter is chosen to be , which is the size of solution neighborhood, because each task can be assigned to another nodes.

The cooling schedule has a significant effect on the results of the algorithm, especially on the convergence speed. Besides, the initial solution of SA algorithm can affect the convergence speed as well.

4.3. Chaotic Optimization Algorithm

The chaotic variables are produced by the following well-known one-dimensional logistic map: where , . The logistic map has special characters such as the ergodicity, stochastic property and sensitivity dependence on initial conditions. The chaotic optimization algorithm applied in this paper is listed as follows.

Step 1. Initialize the chaotic vector at random, note that the value of chaotic variables cannot be 0, 0.25, 0.5, 0.75, and 1.0, which are the fixed points of logistic map, and all chaotic variables are different from each other.

Step 2. Generate the solution vector via , then generate the task assignment via and calculate the cost function .

Step 3. Set the initial solution as the optimal , .

Step 4. Calculate new chaotic vector via formula [9].

Step 5. Generate as Step 2, calculate the cost function .

Step 6. If , then , .

Step 7. Repeat Steps 4, 5, and 6, until remains unchanged for a given number of iterations.

Note that the iteration number in Step 7 is chosen to be as same as SA algorithm, which is discussed before, while the validation of the solution is not required in COA, since it is not heuristic and it will take very long time to get convergence if there are few valid solutions in large scale cases.

4.4. Simulated Annealing Algorithm Combined with Chaos

The basic idea of our proposed algorithm is simulated annealing combined with chaotic search and adding some adaptive schemes to the cooling schedule so that we can improve the convergence speed without loss of solution quality. The flowchart of XASA is shown in Figure 2.

There are 4 schemes to speed up the convergence of SA algorithm.

First, we apply COA to find optimized solutions, which is a preliminary search in the solution space and the solution distribution can be found according to the ergodicity of chaos system. Hence, we can search optimal solution via SA algorithm in a relative smaller range with optimized initial solution.

Second, the initial temperature can also be smaller based on the results of COA because we can replace the and with that of optimized solutions.

Third, the length of Markov chain is constant in SA algorithm, while it is adaptive in XASA. The algorithm will jump out of the inner loop if the rejections of new solution exceed a given threshold . The threshold is given by the formula: at each temperature. Because is set to be , we can set the initial value of to be . represents the maximum threshold, and ; in this paper, it is set to be ; represents the increasing speed of , this is similar to the cooling factor , so it is set to be .

Fourth, the cooling factor is adaptive in XASA as well: the more solutions are accepted (both better and worse solutions) at the current temperature, the smaller the cooling factor is, and vice versa. The rationale is that high temperature at the beginning of the algorithm generates numerous solution acceptances; thus, a rapid reduction of temperature can be made, while fewer solutions will be accepted as the temperature is cooling down; thus, a slow cooling speed should be applied since we need carefully a solution search. Hence, the cooling factor of XASA is , where is the acceptance number of last inner loop, and is the actual length of Markov chains. Note that , so and .

To implement the algorithm, some details should be presented as follows.

(1) Solution Representation. In this paper, solutions are presented with a vector ; each element represents a task, its value is between and , denoting which node this task is assigned to. In order to apply COA, we use a chaotic vector related to , where . A task assignment of the TAP is generated by as follows:

(2) Energy Function. We integrate the object function and all constraints into a cost function to fit the SA algorithm framework. And the cost function is used as the energy function.

All constraints are formulated as penalty functions as follows:

As all constraints are of the same importance, we use a common coefficient for all penalty function. Hence, the energy function is

The criterion of choosing penalty function coefficient is that it should scale the values of the penalty functions to the comparable values as that of object function such that the procedures of the algorithm will be toward the direction of penalty avoiding; hence, the valid solution can be found with high probability. Besides, the validation of a solution can be represented as .

5. Performance Evaluation

To evaluate the performance of the proposed algorithm, both SA and XASA are coded in Matlab and tested for numerous randomly generated task sets that are allocated onto a RTDS. There are two variations of SA algorithm implemented in this paper, the traditional one (SA1) and the improved one (SA2). SA2 applies the last two adaptive schemes of XASA, that is, adaptive length of Markov chains and cooling factor. All other components of SA2 are as same as SA1, including initial solution, initial temperature, and termination condition. The used computation system is Matlab 7.11.0, with Intel Core i7-2600 @ 3.40 GHz and 16 Gb main memory under a Windows 7 environment.

5.1. Experiment Parameters Settings

All DS parameters are followed by the former researches [16, 17, 40]. The failure rates of processors and communication links are given in the ranges [0.00005–0.00010] and [0.00015–0.00030], respectively. The time of processing a task at different processors is given in the range [15–25]. The memory requirement of each task and node memory capacity is given in the range [1–10] and [100–200], respectively. The task processing load versus node processing capacity is given in the ranges [1–50] and [100–300]. The value of data to be communicated between tasks is given in the range [5–10]. The bandwidth and load capacity of communication links are given in the ranges [1–4] and [100–200]. The range of task deadline value is [10–200].

The network topology is star, is set to be and , and is set to be and for two cases, respectively. The coefficient of penalty function is set to be . The number of randomly chosen solutions at the beginning of the algorithm is set to be . And the initial solution of the two SA algorithms is one of these solutions which is chosen at random. Because SA is a stochastic algorithm, each independent run of the algorithm on a same application may yield different result, we, thus, run all of the three algorithms on an application 10 times and obtain the average values.

5.2. Experiment Results

Table 1 summarizes the reliability and calculation time of all TAPs by deploying the XASA and other two SA algorithms. The title with suffix avg represents Reliability with the average value of 10 independent runs, and represents Time where the unit is second. is the acceleration ratio of XASA versus SA1, where . and is the acceleration ratio of XASA versus SA2 and SA2 versus SA1, respectively. represents the average deviation in percentage between XASA and SA algorithms in terms of reliability, where , .

The comparative results from Table 1 show that XASA can sharply reduce the convergence time against the other two SA algorithms, while the solution quality (i.e., reliability) is slightly worse (less than in terms of value and in percentage). Note that the value of is steady in the range of with small variation, which means the third and fourth adaptive schemes take a fixed effect on the SA algorithm. Furthermore, is steady in all cases except the last one (, ) as well, which has an average value of with standard deviation . In our preliminary experiments, the value of function is always below 2 during the whole algorithm, while the value of the penalty function can be hundreds. Besides, according to the experiment parameters set in this paper, there is no significant difference between nodes, nor do the tasks. Hence, there are lots of local minima in the solution space without considering the validation of solutions. Constraints are easy to be satisfied when the problem scale is small, so COA can obtain several valid solutions, and the initial temperature for the SA algorithm in the next step of XASA can be relatively small; therefore, it is fast to get convergence. On the other hand, COA can hardly obtain valid solutions in the case of large scale (e.g., , in this paper), if there are insufficient valid solutions (less than ), a large initial temperature will be set because of the giant value of penalty function, this will affect the convergence speed significantly. Additionally, constraints are not so easy to be satisfied when problem scale is large, so there are much less local minima in the solution space; this will slow down the convergence speed as well. All these factors weaken the speedup effect based on COA, so the performance in the last case is not so good.

Table 2 shows some overall statics characters of all three algorithms. Where the and represent the standard deviation of reliability and calculation time, represents the valid solutions in percentage of COA in XASA and represents that of inSA (inSA represents the SA algorithm in XASA), the other two columns have the same meaning. As we can see, XASA is the best in terms of mean value of time standard deviation, and there is no case that SA1 excels XASA in this criterion. gets the best result compared to other two algorithms as well. Note that shows poor results in the large cases, this is an evidence for our analysis of the large scale issue presented before.

5.3. Time Series Analysis

Figure 3 shows the values of the energy function, which are calculated at each iteration of the algorithms for the case , . Note that the values of the invalid solutions are set to be , because the real values of the invalid solutions are too large, and they may conceal the details of the valid ones.

As we can see, COA cannot guarantee the validation of the solutions, it is completely stochastic without heuristic. However, it can generate a good start for the next step of XASA, which is shown in the results of inSA, where all solutions are valid and the convergence speed is quite fast. The inSA begins to reach to the good enough solutions before 2000 iterations. The other two SA algorithms start with the worse condition, and spend much more iterations to reach to the good enough solutions. They both need thousands of iterations to explore the solution space which generates lots of invalid solutions as COA but the cost is much more expensive. Because of the adaptive schemes, SA2 can quickly pass through invalid solutions as can be seen in Figure 3. Hence, these schemes are effective.

Figure 4 shows the details of the adaptive cooling schedule schemes, which are calculated at each cooling step of the algorithms for the case , . The left three figures are the results of inSA, and right ones are the results of SA2. Note that the temperature in Figures 4(c) and 4(f) is set to be unitary. At the beginning of the cooling steps, the acceptance rates of the new solutions are both high in two algorithms; hence, the cooling factor is small, and the temperature reduces rapidly, while the actual length of Markov chains is much smaller in inSA than SA2, and the cooling factor increases faster as well. It is caused by the differences of the initial temperature and initial solution, since inSA and SA2 are actually the same. Hence, COA can truly improve the convergence speed.

Figure 5 shows the details of the case , where the invalid solution values are set to be . The result of COA is quite bad, only three valid solutions are found; therefore, inSA cannot get a good start and it has to explore wide solution space at the beginning which generates lots of invalid solutions.

Figure 6 shows the results of the case , as Figure 4. The cooling factor and temperature curve are not so different between inSA and SA2, and the character cannot bring as much benefit as before. These cause the inefficient situation of the largest case.

6. Conclusions

In this paper, we consider a heterogeneous DS that runs a real-time application, to achieve maximization of system reliability with task allocation technique. By formulating the reliability and constraints, we model this problem as a combinatorial problem. To solve this problem with fast convergence speed, we improve the well-known simulated annealing algorithm based on the analysis of the cooling schedule of SA algorithm which has a significant effect on its convergence speed. Then, we propose the algorithm XASA, which combines SA algorithm with chaotic optimization algorithm with several adaptive schemes. The experimental results show that the proposed algorithm can achieve a satisfactory performance of speedup, while the solution quality is just slightly worse.

Notations

:Number of tasks
:Number of nodes
:Node
:Task
:Communication link between and
:Whether or not is on
:Execution time of on
:Communication cost between and
:Communication capacity of
:Bandwidth of
:Failure rate of
:Failure rate of
:Memory required by
:Memory capacity of
:Processing load of
:Processing capacity of
:Deadline of .

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors thank the anonymous referees and the editor for their valuable comments and suggestions. This work is supported by the National Natural Science Foundation of China (Grant no. 61374185).