Networked Systems with Complexities and Their Applications to EngineeringView this Special Issue
Research Article | Open Access
Da-Ren Chen, Mu-Yen Chen, You-Shyang Chen, Lin-Chih Chen, "A Work-Demand Analysis Compatible with Preemption-Aware Scheduling for Power-Aware Real-Time Tasks", Mathematical Problems in Engineering, vol. 2013, Article ID 581570, 16 pages, 2013. https://doi.org/10.1155/2013/581570
A Work-Demand Analysis Compatible with Preemption-Aware Scheduling for Power-Aware Real-Time Tasks
Due to the importance of slack time utilization for power-aware scheduling algorithms, we propose a work-demand analysis method called parareclamation algorithm (PRA) to increase slack time utilization of the existing real-time DVS algorithms. PRA is an online scheduling for power-aware real-time tasks under rate-monotonic (RM) policy. It can be implemented and fully compatible with preemption-aware or transition-aware scheduling algorithms without increasing their computational complexities. The key technique of the heuristics method doubles the analytical interval and turns the deferrable workload out the potential slack time. Theoretical proofs show that PRA guarantees the task deadlines in a feasible RM schedule and takes linear time and space complexities. Experimental results indicate that the proposed method combining the preemption-aware methods seamlessly reduces the energy consumption by 14% on average over their original algorithms.
Power management is increasingly becoming a design factor in portable and hand-held computing/communication systems. Energy minimization is critically important for devices such as laptop computers, smartphones, PDAs, wireless sensor networks (WSNs), and other mobile or embedded computing systems simply because it leads to extended battery lifetime. The power consumption problem has been addressed in the last decade with a multidimensional effort by the introduction of engineering components and devices that consume less power, low power techniques involving the designs of VLSI/IC, computer architecture, algorithm, and compiler developments.
Recently, dynamic power management (DPM) and dynamic voltage scaling (DVS) have been employed as available techniques to reduce the energy consumption of CMOS microprocessor system. DPM changes the power state of cores on chip to lower the energy consumption according to the performance constraints. DVS involves dynamically adjusting the voltage and frequency (hence, the CPU speed). By reducing the frequency at which a component operates, a specific operation will consume less energy but may take longer to complete. Although reducing the frequency alone will reduce the average energy used by a processor over that period of time, it may not always deliver a reduction in energy consumption overall, because the power consumption is linearly dependent on the increased time and quadratically dependent on the increased/decreased voltage. In the context of dynamic voltage scaled processors, DVS in real-time systems is a problem that assigns appropriate clock speeds to a set of periodic tasks and adjusts the voltage accordingly such that no task misses its predefined deadline while the total energy savings in the system is maximized.
Many studies have proposed different real-time scheduling based on different system models [1–9], such as online and offline scheduling, handling discrete/continuous voltage levels, assuming average-case execution time (ACET), best-case execution time (BCET), or worst-case execution time (WCET) of each task, allowing intratask/intertask voltage transitions, and assuming fixed/dynamic priority assignment. These approaches have a common objective and encounter the same difficulties. Because reducing the supply voltage decreases the clock speed of processors , most DVS algorithms for real-time systems reduce supply voltage dynamically to the lowest possible level while satisfying the soft/hard timing constraints of each task. To satisfy the timing constraints of real-time tasks, DVS technique must utilize available slack time when adjusting voltage/speed levels. Consequently, the energy efficiency of a DVS algorithm markedly depends on the accuracy of computing available slack time.
Work-demand analysis on embedded real-time scheduling has been investigated by previous studies [3, 5–7, 11]. Pillai and Shin  proposed a cycle-conserving rate-monotonic (ccRM) scheduling scheme that contains offline and online algorithms. The offline algorithm computes the WCET of each task and derives the maximum speed needed to meet all tasks deadlines. It recomputes the utilization by comparing the actual time for completed tasks with WCET schedule. In other words, when a task completes early, they have to compare the used actual processor cycles to a precomputed worst-case execution time schedule. This WCET schedule is also called canonical schedule  whose length could be the least common multiplier of task periods. ccRM is a conservative method, as it only considers possible slack time before the next task arrival (NTA) of a current job. Gruian proposed a DVS method for offline task stretching and online slack distribution . The offline part of this method consists of two separate techniques. One focuses on the intratask stochastic voltage scheduling that employs a task-execution length probability function. The second technique computes stretching factors by using a response time analysis. It is similar to Pillar and Shin’s offline technique, but instead of adopting a stretching factor for all tasks before NTA, Gruian assigns a different stretching factor to the individual task within the longest task period. Kim et al.  proposed a greedy online algorithm called the low-power work-demand analysis (lpWDA) that derives slack from low-priority tasks, as opposed to the method in [3, 7] that gains slack time from high-priority tasks. This algorithm also balances the gap in voltage levels between high-priority and low-priority tasks. Its analysis interval limited by the longest of task periods is longer than NTA. Thus, lpWDA gains more energy saving than the previous rate-monotonic (RM) DVS schemes applying NTA. Many slack time analysis methods considered additional assumptions [4, 11, 12]. Kim et al. proposed a preemption-aware DVS algorithm based on lpWDA, which is composed of accelerated-completion (lpWDA-AC) and delayed-preemption (lpWDA-DP) techniques to decrease the preemption times of DVS schedules . lpWDA-AC attempts to avoid preemption by adjusting voltage/clock speed, such that it is higher than the lowest possible values computed using lpWDA. lpWDA-DP postpones preemption points by delaying an activated high-priority task as late as possible while guaranteeing a feasible task schedule. Both techniques reduce energy consumption more than the initial ccRM and lpWDA techniques on the assumption of context-switching overhead. Mochocki et al. in  also proposed a transition-aware DVS algorithm for decreasing the number of voltage/speed adjustments, called the low-power limited-demand analysis with transition overhead (lpLDAT) scheme, which accounts for both time and energy transition overhead. Its algorithm computes an efficient speed level based on average-case workload; notably, this speed can be used as a limiter. If the limiter is higher than the speed predicted by lpWDA, lpLDAT knows that lpWDA is being too aggressive and applies the limiter to the present schedule. On the assumption of transition overhead, this technique with slack time analysis also saves considerable energy when compared with that by the previous methods. He and Jia  developed a fixed-priority scheduling with threshold (FPPT) scheme that eliminates unnecessary context switches, thereby saving energy. FPPT assigns each task a pair of predefined priority and corresponding preemption threshold. He et al. applied a novel algorithm to compute a static slowdown factors by formulating the problem as a linear optimization problem. In addition, they considered energy consumption of a task set under different preemption threshold assignments.
Recently, experimental results obtained by Kim et al.  indicated that recent DVS algorithms for fixed-priority real-time tasks are less efficient than that of dynamic-priority tasks, leading to more improvements for a better DVS method. The main reason for energy inefficiency of RM DVS scheduling is that, in RM schedules, priority-based slack-stealing methods do not work as efficiently as they do in earliest-deadline first (EDF) scheduling . In the EDF schedules, high-priority tasks play an efficient slack distributor of tasks because their slack can be utilized fully by tasks starting before NTA. Therefore, the energy saving achieved by EDF scheduling algorithms, such as that by the ccEDF , DRA, and AGR  is close to the theoretical lower bound .
So far, there are a large number of studies on DVS-based RM scheduling for energy saving [1–4, 6–8, 11–14]; most existing studies are proposed for computing and predicting the length and occurrence of slack time. The reason is that the more precise estimation on the slack time, the more energy efficiency we obtain. Those methods for computing available slack time either construct a canonical schedule and compare it to current schedule or propose best-effort algorithms under empirical rules and heuristics. Those methods adopting different strategies and assumptions such as task preemption or voltage transition time on the similar models gain considerable energy saving, but few of them can be combined without difficulty to further enhance their performance. Additionally, modern processor with DVS or DPM feature must be equipped with dc-to-dc converter that varies the processor speed in appropriate levels and requires additional switching time and power . It is harmful to power saving in a system when many fragments of short slack time appear. Many of those methods also propose the notions of postponing and advancing task execution for increasing the length of slack time. Their performances for accumulating a continuous slack time are not impressive due to short analysis interval adopted in the schedules. Therefore, it is necessary to study a transplantable method that can cooperate with different existing methods without modification. This idea originate from the layered architecture used in designing computer software, hardware, and communications in which system or network components are isolated in layers so that changes can be made in one layer without affecting the others. The proposed method according the notion also requires the ability to compute and accumulate the slack time solely. By applying the layered architecture, it can also pass the slack time to lower-layered methods and reveals synergy effect to enhance overall energy saving.
In this paper, we propose an online work-demand analysis called parareclamation algorithm (PRA) for RM scheduling which computes the length of potential slack in an interval which is two times longer than the longest task periods. PRA does not rely on the simulation for stochastic data which usually varies according to different applications, and can be applied to many RM scheduling algorithms with various criteria. Moreover, the proposed algorithm has a time complexity of where is the number of tasks. In other words, it does not increase computational complexity of the existing online RM scheduling algorithms. Experimental results indicate that existing RM DVS algorithms combined with the proposed method can reduce energy consumption by 5%–21% compared with that by initial algorithms such as lpWDA and lpLDAT.
The remainder of this paper is organized as follows. Section 3 introduces the preliminaries of power-aware real-time scheduling. Section 4 introduces our technique and algorithm. Section 5 provides theorems to prove the schedulability of PRA as well as lpWDA. We present the performance evaluation in Section 6. Section 7 gives conclusions and the directions for future work.
This paper focuses on how to obtain additional slack for existing RM DVS scheduling methods. Many slack time analysis techniques with different purposes (e.g., transition-aware and preemption-aware schemes) can utilize PRA easily; throughout this paper, these techniques are called host algorithms of PRA. This section also outlines the ideas underlying the lpWDA algorithm. Other techniques, such as the lpLDA, lpWDA-AC, lpWDA-DP , and lpLDAT  techniques, are abridged.
3.1. System Model
This paper considers preemptive hard real-time systems in which periodic real-time tasks are scheduled under an RM scheduling policy. The DVS processor used in the model operates at a finite set of supply voltage levels , each with an associated speed. Processor speed is normalized by corresponding to , yielding a set of speed levels. A set of periodic tasks is denoted by , where the tasks are assumed mutually independent. Each task is described by its worst-case execution cycles and average-case execution cycles (). Throughout this paper, the execution cycles of each task are called work for short. Additionally, each task has a shorter period length (i.e., a higher priority) than that of when , and is the longest of task periods. The relative deadline of is assumed equal to its period length . Each task is invoked periodically by a job, and the th job of task is . The first job of each task is assumed activated at time . Each job is described by a release time, , deadline, , and number of cycles that have been executed . The utilization of a task set is denoted by . During run time, we refer to the earliest job of each task not completed as the current job for that task, and that job is indexed with cur. The deadline of the current job for task is , and denotes the number of cycles that the current job of has executed.
Without loss of generality, when is the first scheduled task after time , where , the bottleneck (shortened to bn) is the next release time of (i.e., the ). In the work-demand analysis method, available slack in the interval [bn, ) is estimated.
3.2. Low-Power Work-Demand Analysis (lpWDA)
This section briefly introduces an online DVS scheme called lpWDA . Notations , , and belong to PRA algorithm and are presented in Section 4. In line 2 of Algorithm 1, is an infinitesimal, and readyQ contains the currently activated tasks, and its subset, , containing the active tasks is In the lpWDA, the tasks in are scheduled according to RM priority policy. When a task is activated (released), its job is moved to , and the remaining WCET of this job is set to , which is . When is executed at time , is the amount of work required to be processed in [).
In Algorithms 1, 2, and 3 and Procedure 1, lpWDA performs in the following steps. First, the system is initialized by setting the initial upcoming deadlines () and remaining worst-case execution () of each task. When is active at time , notation of each task is defined as follows : where is the infinitesimal. The jobs which are active during will be examined for slack estimation. denotes the estimation of higher-priority work that must be executed before (lines 1-2). Whenever a job is completed or preempted at time , the remaining work , upcoming deadline , and high-priority work are updated in line 4. In lines 5–8, when a job is scheduled for execution at time , Algorithm 2 computes the available slack for according to and (see lines 13 and 14), where is the earliest upcoming deadline with respect to . Notably, function computing the amount of low-priority work is performed recursively until it finds with the longest of task periods and lowest priority with respect to . As defined in Section 3.1, the length of interval [0, bn) is . Then, lpWDA computes the length of slack-time stealing from low-priority tasks in the interval [,bn) and applies the slack to the current job. Therefore, Algorithms 2 and 3 play crucial roles in slack-time analysis and dominate the run time complexity of lpWDA algorithm. Formally, to describe the slack analysis method using lpWDA, the following notations are defined: : the amount of work required to be processed in interval [, ); : the available slack for scheduled at time can be computed as follows: In (3), consists of three types of work: (1) , (2) from the higher-priority tasks, and (3) from the lower-priority tasks. The work required by higher-priority tasks is derived as follows: where denotes the work required by uncompleted tasks released before , and denotes the work released during . We compute and as follows: where is the infinitesimal. According to the above statements, the amount of work required by the scheduled task can be formulated as where notation stands for . Equations (6), (7), and (8) are repeated iteratively until is the lowest priority task in (i.e., ). Conceptually, lpWDA uses this linear-time heuristics to estimate available slack in an interval up to the upcoming deadline of lower-priority tasks.
3.3. Motivational Example
The proposed method is to provide lpWDA-based algorithms (e.g., lpWDA, lpLDAT, lpWDA-DP, and lpWDA-AC) with a subroutine to improve their work-demand analysis. The main advantage is that PRA can be independent of each function-specific slack analysis method. For instance, the main purpose of lpWDA-AC and lpWDA-DP techniques is to decrease context-switch overhead while that of lpLDAT is to reduce transition time and energy overhead. PRA can work together with these lpWDA-based algorithms to enhance their slack computation capability.
Example 1. Consider a periodic task set in Table 1, which presents the period length, WCET, and ACET of each task. Figure 1(a) presents the execution schedule under the worst-case workload in the first hyperperiod. Figure 1(b) shows the speed schedule using lpWDA algorithm for task set and assumes that actual work of each task equals its ACET. Before assigning at time , lpWDA computes available slack time in an interval up to by calling Algorithm 3, recursively. However, interval has no slack-time under the WCET schedule. If the length of the analysis interval is extended to , one unit of slack time is derived from . The slack in can be moved backward to the current scheduling point by a deferred execution of earlier work. For instance, in Figure 1(a), the slack in interval can be exchanged with the work in interval , and then slack in interval can be exchanged with the work in interval , and it can be exchanged once again with the work in interval . Finally, the slack in interval can be exchanged with the work in interval . Therefore, is scheduled with speed (Figure 1(c)). Additional slack can be reclaimed without deadline missing from the interval that is, two times longer than the longest task period. Notably, this idea actually neither moves all of the jobs of a schedule to (e.g., ) nor Exchanges the slack with work for using this slack time. However, this primitive idea does not work in some situations. For example, in Figure 1(d), when is increased to 6, slack in the interval cannot be transferred before . In fact, jobs , and are released simultaneously at time 6. The slack in interval cannot follow this idea, because a deadline is likely to be missed by one of those three jobs. Our goal is to devise an efficient work-demand analysis method that obtains additional slack while satisfying th tasks’ deadline.
4. Work-Demand Computation
Let be the bn of , which is the first scheduled job at time where . PRA computes the length of additional slack in the interval [bn, ). As long as the slack time can be reclaimed at a time earlier than bn, lpWDA can utilize it by postponing lower-priority task and improve energy efficiency of schedules. Why PRA focuses on slack computation in the interval [bn, ) while longer or shorter intervals? Even if all job (except ) periods are within [bn, ) and cannot make a target slack be available for the task right side to bn, job can still postpone its work for moving the slack forward and approaching bn. For example, in Figure 1(a), when the period length of is increased from 3 to 4, the slack in interval cannot be reclaimed by postponing the work of or because it is hampered at time 8. Therefore, can defer its work and the slack time in will be available. On the contrary, if one extends the additional analysis interval such that it is longer than or even several times of , job cannot move the slack after to approach the bn and may be blocked in this interval. For an analytical interval whose length is equal to , it has the following advantages. After deriving the amount of slack time which will be available to the tasks nearby bn, those jobs whose period spanning astride the bn can be deferred to reclaim additional slack before bn. That is, the current job can utilize the additional slack by performing a lpWDA-based method. Notably, in an actual scheduling process, PRA does not exchange any work with slack. Instead, it only passes the length of additional slack time for current job to lpWDA and does not affect schedulability of subsequent jobs.
To present the proposed method, we define the following notations: where denotes the number of tasks in the set of , , and is the task with the longest period in . A set of tasks are called synchronous at time if their jobs are released at time . In an extended analysis interval [bn, ), the number of synchronization points of the tasks in can be derived as follows: where denotes the least common multiplier of task periods in . As shown in Figure 2, the first synchronization point of within the interval [bn, ) is derived as When , slack time is likely to be blocked or shrunken at time . In Figure 3, when all tasks except are synchronized at time , a slack may not be moved backward from the right to the left side of . In this case, slack can still be moved to the current time by postponing the execution of the work of . When tasks are synchronized in interval , we can derive , and their the earliest synchronization point is derived by . The worst-case execution time in interval is Therefore, the available slack for in this interval is at least Similarly, when and , there are tasks that synchronize at time . The available slack time for in interval is at least Therefore, if of tasks are synchronized in interval , the minimal available slack time for in interval is denoted as where denotes the estimated slack in interval . For example, does not synchronize with other tasks in (Figure 4). Therefore, one can compute the value of for each , where and . Suppose , the earliest synchronization point of tasks in is derived using (11).
After deriving available slack time within interval where , we compute the length of the slack time which is available for the task in interval . We assume denotes a set of tasks in which task periods go astride the bn. Let ; the lengths of left and right parts of split by bn are defined as and , respectively, and the longest and are defined as and , respectively. Additionally, we define as the total amount of work in . As shown in Figure 3, the lengths of , , and limit the maximum length of slack that can be moved in interval [, bn). Consequently, the restriction on the length of slack time is as follows: According to the work demand in a WCET schedule, the slack time in interval [) is computed as follows: PRA computes the length of additional slack time within interval [bn, ) by (17). It then computes the length of this slack time that can be available for the jobs in interval [, bn) according to (17) and (18). Finally, it changes the priority of a job that goes astride the bn when this job is moved to readyQ according to RM scheduling. In line 1 of Procedure PRA, denotes an infinitesimal value.
Example 2. Consider the WCET schedule shown in Figure 1(a), is scheduled at time , we set and because the period of goes astride bn. Procedure PRA computes the length of available slack time from interval as follows. When task set , Procedure PRA computes . Therefore, the bottleneck caused by and is and , respectively. Line 6 derives and . Equations (14)–(17) derive . In line 10, the value of , , , and is 1, 1, 1, and 2, respectively. The value of is 1 by line 12. Therefore, Procedure PRA returns to the lpWDA algorithm and passes additional slack to CalcLowerPriorityWork() in Algorithm 3. Notably, the tasks using PRA still execute under RM priority policy except one of the jobs whose periods span astride the bn. At time , when jobs , , and enter at time , has the highest priority and utilizes additional slack estimated by PRA. Therefore, job obtains one unit of time of slack and changes its voltage level from 1 to 0.5. On the contrary, if primitive lpWDA performs at time , cannot obtain any slack. When lpWDA executes iteratively, the value of does not change until is completed. Figure 1(c) presents the scheduling result obtained using Procedure PRA. After completing , unit of slack has been run out, primitive lpWDA continuously performs voltage scaling on the subsequent jobs of . In the case of , it begins after () and obtains one unit of slack time from primitive lpWDA. Therefore, its WCET under voltage is changed to , and actual execution time is . At time , job is released and moved to . Its priority is changed to and lower than the remaining execution time of by executing line 14 in Procedure PRA. Therefore, job begins its work after completing the remaining work of . Notably, PRA only changes job’s priority in and does not affect the feasibility of lpWDA schedule. The correctness proof is discussed later in the next section. Table 2 shows the values of scheduling parameters. The rightmost job in is being executed at that time. In Algorithm 1, job is a global variable. Whenever job executes and , Procedure PRA lowers its priority to guarantee the timing constraint of jobs and .
5. Correctness Proof
In this section, we prove the correctness of the schedules produced by lpWDA and PRA based on worst-case response time (WCRT) analysis and assume that the given task sets are feasible under preemptive RM scheduling. For the fixed-priority preemptive scheduling, a critical instant for a task is given by a moment in which the release time of coincides with all higher-priority tasks. Let denote the WCRT of , without loss of generality, the higher-priority tasks have simultaneous release time with the job of .
Lemma 3. When a task set contains only one task , the available slack produced by lpWDA for is
In lpWDA, the slack derived from lower-priority task is given to the highest priority job in the readyQ. In the WCET case, after applying the slack to the highest priority job, the execution cycles of lower-priority jobs are postponed and their WCRTs will be increased in the length equal to that slack.
Lemma 4. When a task set contains tasks where , the amount of work required to be processed in () for the highest priority job is
Proof. Assuming that contains tasks where , we prove that this lemma form the lowest priority task (i.e., ) to the highest priority task using mathematical induction. The case of is proved separately because the third term of is different from those with in (22). From (8), when is the lowest priority task (i.e., ), the workload of the tasks whose priorities lower than are zero (i.e., ). Therefore, the amount of work required to be processed at time is
Because before completing , we add () to (23) and derive
and this completes the proof of .
Basic Step. First, we discuss the value of which can be estimated under two cases: (Case 1) and (Case 2).
Case 1 . If is later than , is greater than , that is, the work in is the subset of the work in . Let , if , amount of work can be processed in [, ], and only is required to be processed before . Otherwise, amount of work should be processed before . That is, where the notation stands for . Substituting (23) in the above equation, we get Substituting this result in defined in (8), we obtain
Case 2 . When is earlier than , must be processed before . However, the value of does not change, and the value of is obtained as follows:
Inductive Step. When task is considered, the value of can be obtained as follows: When we consider the task with the highest priority (i.e., ), the amount of work of its lower-priority task is Substituting in (30), we get When all tasks release at time , we have . Therefore, we get By (8), substituting (32) in , we get The proof of Case 2 is similar to that of Case 1 and this completes the proof.
Lemma 5. The length of slack, that is, provided by lpWDA for the highest priority task in readyQ is at most
The following theorem proves the schedulability of lpWDA by using worst-case response time analysis. We consider each active job in readyQ has a simultaneous release with all higher priority tasks.
Theorem 6. Given a set of tasks is feasible in RM schedule, the maximum response time of task under lpWDA is less than or equal to its deadline.
Proof. Assuming job has the highest priority in readyQ. By (8), we get at time . Due to (3) proposed by lpWDA, the deadline of can be guaranteed by when . Therefore, we get When runs out , all of subsequent jobs of have to postpone their response times separately at most unit of times comparing to those in their WCET RM schedule. Assuming has lower priority than that of , we prove that the length of new WCRT of including is less than . In a feasible RM schedule, the WCRT of is The new WCRT of considering the length of is denoted as Based on (22) in Lemma 4 and set , we derive From the definitions of function in (4) and (5), we derive and is the infinitesimal. Because we derive from (40) and complete the proof.
Corollary 7. For some tasks , and , and is not the multiple of these ; the difference between and is formulated as Notably, presents the length of WCRT proposed by lpWDA, and therefore the slack between and