Mathematical Problems in Engineering

Volume 2013, Article ID 581570, 16 pages

http://dx.doi.org/10.1155/2013/581570

## A Work-Demand Analysis Compatible with Preemption-Aware Scheduling for Power-Aware Real-Time Tasks

^{1}Department of Information Management, National Taichung University of Science and Technology, No. 129, Section 3, Sanmin Road, North District, Taichung City 404, Taiwan^{2}Department of Information Management, Hwa Hsia Institute of Technology, No. 111, Gongzhuan Road, Zhonghe District, New Taipei City 235, Taiwan^{3}Department of Information Management, National Dong Hwa University, No. 1, Section 2, Da Hsueh Road, Shoufeng, Hualien 97401, Taiwan

Received 4 January 2013; Accepted 13 April 2013

Academic Editor: Yang Tang

Copyright © 2013 Da-Ren Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Due to the importance of slack time utilization for power-aware scheduling algorithms, we propose a work-demand analysis method called parareclamation algorithm (PRA) to increase slack time utilization of the existing real-time DVS algorithms. PRA is an online scheduling for power-aware real-time tasks under rate-monotonic (RM) policy. It can be implemented and fully compatible with preemption-aware or transition-aware scheduling algorithms without increasing their computational complexities. The key technique of the heuristics method doubles the analytical interval and turns the deferrable workload out the potential slack time. Theoretical proofs show that PRA guarantees the task deadlines in a feasible RM schedule and takes linear time and space complexities. Experimental results indicate that the proposed method combining the preemption-aware methods seamlessly reduces the energy consumption by 14% on average over their original algorithms.

#### 1. Introduction

Power management is increasingly becoming a design factor in portable and hand-held computing/communication systems. Energy minimization is critically important for devices such as laptop computers, smartphones, PDAs, wireless sensor networks (WSNs), and other mobile or embedded computing systems simply because it leads to extended battery lifetime. The power consumption problem has been addressed in the last decade with a multidimensional effort by the introduction of engineering components and devices that consume less power, low power techniques involving the designs of VLSI/IC, computer architecture, algorithm, and compiler developments.

Recently, dynamic power management (DPM) and dynamic voltage scaling (DVS) have been employed as available techniques to reduce the energy consumption of CMOS microprocessor system. DPM changes the power state of cores on chip to lower the energy consumption according to the performance constraints. DVS involves dynamically adjusting the voltage and frequency (hence, the CPU speed). By reducing the frequency at which a component operates, a specific operation will consume less energy but may take longer to complete. Although reducing the frequency alone will reduce the average energy used by a processor over that period of time, it may not always deliver a reduction in energy consumption overall, because the power consumption is linearly dependent on the increased time and quadratically dependent on the increased/decreased voltage. In the context of dynamic voltage scaled processors, DVS in real-time systems is a problem that assigns appropriate clock speeds to a set of periodic tasks and adjusts the voltage accordingly such that no task misses its predefined deadline while the total energy savings in the system is maximized.

Many studies have proposed different real-time scheduling based on different system models [1–9], such as online and offline scheduling, handling discrete/continuous voltage levels, assuming average-case execution time (ACET), best-case execution time (BCET), or worst-case execution time (WCET) of each task, allowing intratask/intertask voltage transitions, and assuming fixed/dynamic priority assignment. These approaches have a common objective and encounter the same difficulties. Because reducing the supply voltage decreases the clock speed of processors [10], most DVS algorithms for real-time systems reduce supply voltage dynamically to the lowest possible level while satisfying the soft/hard timing constraints of each task. To satisfy the timing constraints of real-time tasks, DVS technique must utilize available slack time when adjusting voltage/speed levels. Consequently, the energy efficiency of a DVS algorithm markedly depends on the accuracy of computing available slack time.

Work-demand analysis on embedded real-time scheduling has been investigated by previous studies [3, 5–7, 11]. Pillai and Shin [7] proposed a cycle-conserving rate-monotonic (ccRM) scheduling scheme that contains offline and online algorithms. The offline algorithm computes the WCET of each task and derives the maximum speed needed to meet all tasks deadlines. It recomputes the utilization by comparing the actual time for completed tasks with WCET schedule. In other words, when a task completes early, they have to compare the used actual processor cycles to a precomputed worst-case execution time schedule. This WCET schedule is also called *canonical schedule* [1] whose length could be the least common multiplier of task periods. ccRM is a conservative method, as it only considers possible slack time before the next task arrival (NTA) of a current job. Gruian proposed a DVS method for offline task stretching and online slack distribution [3]. The offline part of this method consists of two separate techniques. One focuses on the intratask stochastic voltage scheduling that employs a task-execution length probability function. The second technique computes stretching factors by using a response time analysis. It is similar to Pillar and Shin’s offline technique, but instead of adopting a stretching factor for all tasks before NTA, Gruian assigns a different stretching factor to the individual task within the longest task period. Kim et al. [6] proposed a greedy online algorithm called the low-power work-demand analysis (lpWDA) that derives slack from low-priority tasks, as opposed to the method in [3, 7] that gains slack time from high-priority tasks. This algorithm also balances the gap in voltage levels between high-priority and low-priority tasks. Its analysis interval limited by the longest of task periods is longer than NTA. Thus, lpWDA gains more energy saving than the previous rate-monotonic (RM) DVS schemes applying NTA. Many slack time analysis methods considered additional assumptions [4, 11, 12]. Kim et al. proposed a preemption-aware DVS algorithm based on lpWDA, which is composed of *accelerated-completion* (lpWDA-AC) and *delayed-preemption* (lpWDA-DP) techniques to decrease the preemption times of DVS schedules [11]. lpWDA-AC attempts to avoid preemption by adjusting voltage/clock speed, such that it is higher than the lowest possible values computed using lpWDA. lpWDA-DP postpones preemption points by delaying an activated high-priority task as late as possible while guaranteeing a feasible task schedule. Both techniques reduce energy consumption more than the initial ccRM and lpWDA techniques on the assumption of context-switching overhead. Mochocki et al. in [12] also proposed a transition-aware DVS algorithm for decreasing the number of voltage/speed adjustments, called the low-power limited-demand analysis with transition overhead (lpLDAT) scheme, which accounts for both time and energy transition overhead. Its algorithm computes an efficient speed level based on average-case workload; notably, this speed can be used as a limiter. If the limiter is higher than the speed predicted by lpWDA, lpLDAT knows that lpWDA is being too aggressive and applies the limiter to the present schedule. On the assumption of transition overhead, this technique with slack time analysis also saves considerable energy when compared with that by the previous methods. He and Jia [4] developed a fixed-priority scheduling with threshold (FPPT) scheme that eliminates unnecessary context switches, thereby saving energy. FPPT assigns each task a pair of predefined priority and corresponding preemption threshold. He et al. applied a novel algorithm to compute a static slowdown factors by formulating the problem as a linear optimization problem. In addition, they considered energy consumption of a task set under different preemption threshold assignments.

Recently, experimental results obtained by Kim et al. [6] indicated that recent DVS algorithms for fixed-priority real-time tasks are less efficient than that of dynamic-priority tasks, leading to more improvements for a better DVS method. The main reason for energy inefficiency of RM DVS scheduling is that, in RM schedules, priority-based slack-stealing methods do not work as efficiently as they do in earliest-deadline first (EDF) scheduling [6]. In the EDF schedules, high-priority tasks play an efficient slack distributor of tasks because their slack can be utilized fully by tasks starting before NTA. Therefore, the energy saving achieved by EDF scheduling algorithms, such as that by the ccEDF [7], DRA, and AGR [1] is close to the theoretical lower bound [13].

#### 2. Motivations

So far, there are a large number of studies on DVS-based RM scheduling for energy saving [1–4, 6–8, 11–14]; most existing studies are proposed for computing and predicting the length and occurrence of slack time. The reason is that the more precise estimation on the slack time, the more energy efficiency we obtain. Those methods for computing available slack time either construct a canonical schedule and compare it to current schedule or propose best-effort algorithms under empirical rules and heuristics. Those methods adopting different strategies and assumptions such as task preemption or voltage transition time on the similar models gain considerable energy saving, but few of them can be combined without difficulty to further enhance their performance. Additionally, modern processor with DVS or DPM feature must be equipped with dc-to-dc converter that varies the processor speed in appropriate levels and requires additional switching time and power [15]. It is harmful to power saving in a system when many fragments of short slack time appear. Many of those methods also propose the notions of postponing and advancing task execution for increasing the length of slack time. Their performances for accumulating a continuous slack time are not impressive due to short analysis interval adopted in the schedules. Therefore, it is necessary to study a transplantable method that can cooperate with different existing methods without modification. This idea originate from the layered architecture used in designing computer software, hardware, and communications in which system or network components are isolated in layers so that changes can be made in one layer without affecting the others. The proposed method according the notion also requires the ability to compute and accumulate the slack time solely. By applying the layered architecture, it can also pass the slack time to lower-layered methods and reveals synergy effect to enhance overall energy saving.

In this paper, we propose an online work-demand analysis called parareclamation algorithm (PRA) for RM scheduling which computes the length of potential slack in an interval which is two times longer than the longest task periods. PRA does not rely on the simulation for stochastic data which usually varies according to different applications, and can be applied to many RM scheduling algorithms with various criteria. Moreover, the proposed algorithm has a time complexity of where is the number of tasks. In other words, it does not increase computational complexity of the existing online RM scheduling algorithms. Experimental results indicate that existing RM DVS algorithms combined with the proposed method can reduce energy consumption by 5%–21% compared with that by initial algorithms such as lpWDA and lpLDAT.

The remainder of this paper is organized as follows. Section 3 introduces the preliminaries of power-aware real-time scheduling. Section 4 introduces our technique and algorithm. Section 5 provides theorems to prove the schedulability of PRA as well as lpWDA. We present the performance evaluation in Section 6. Section 7 gives conclusions and the directions for future work.

#### 3. Preliminaries

This paper focuses on how to obtain additional slack for existing RM DVS scheduling methods. Many slack time analysis techniques with different purposes (e.g., transition-aware and preemption-aware schemes) can utilize PRA easily; throughout this paper, these techniques are called *host* algorithms of PRA. This section also outlines the ideas underlying the lpWDA algorithm. Other techniques, such as the lpLDA, lpWDA-AC, lpWDA-DP [11], and lpLDAT [12] techniques, are abridged.

##### 3.1. System Model

This paper considers preemptive hard real-time systems in which periodic real-time tasks are scheduled under an RM scheduling policy. The DVS processor used in the model operates at a finite set of supply voltage levels , each with an associated speed. Processor speed is normalized by corresponding to , yielding a set of speed levels. A set of periodic tasks is denoted by , where the tasks are assumed mutually independent. Each task is described by its worst-case execution cycles and average-case execution cycles (). Throughout this paper, the execution cycles of each task are called *work* for short. Additionally, each task has a shorter period length (i.e., a higher priority) than that of when , and is the longest of task periods. The relative deadline of is assumed equal to its period length . Each task is invoked periodically by a *job*, and the th job of task is . The first job of each task is assumed activated at time . Each job is described by a release time, , deadline, , and number of cycles that have been executed . The utilization of a task set is denoted by . During run time, we refer to the earliest job of each task not completed as the *current* job for that task, and that job is indexed with cur. The deadline of the current job for task is , and denotes the number of cycles that the current job of has executed.

Without loss of generality, when is the first scheduled task after time , where , the *bottleneck* (shortened to bn) is the next release time of (i.e., the ). In the work-demand analysis method, available slack in the interval [bn, ) is estimated.

##### 3.2. Low-Power Work-Demand Analysis (lpWDA)

This section briefly introduces an online DVS scheme called lpWDA [6]. Notations , , and belong to PRA algorithm and are presented in Section 4. In line 2 of Algorithm 1, is an infinitesimal, and *readyQ* contains the currently activated tasks, and its subset, , containing the active tasks is
In the lpWDA, the tasks in are scheduled according to RM priority policy. When a task is activated (released), its job is moved to , and the remaining WCET of this job is set to , which is . When is executed at time , is the amount of *work* required to be processed in [).

In Algorithms 1, 2, and 3 and Procedure 1, lpWDA performs in the following steps. First, the system is initialized by setting the initial upcoming deadlines () and remaining worst-case execution () of each task. When is active at time , notation of each task is defined as follows [6]:
where is the infinitesimal. The jobs which are active during will be examined for slack estimation. denotes the estimation of higher-priority *work* that must be executed before (lines 1-2). Whenever a job is completed or preempted at time , the remaining *work *, upcoming deadline , and high-priority *work * are updated in line 4. In lines 5–8, when a job is scheduled for execution at time , Algorithm 2 computes the available slack for according to and (see lines 13 and 14), where is the earliest upcoming deadline with respect to . Notably, function computing the amount of low-priority *work* is performed recursively until it finds with the longest of task periods and lowest priority with respect to . As defined in Section 3.1, the length of interval [0, bn) is . Then, lpWDA computes the length of slack-time stealing from low-priority tasks in the interval [,bn) and applies the slack to the current job. Therefore, Algorithms 2 and 3 play crucial roles in slack-time analysis and dominate the run time complexity of lpWDA algorithm. Formally, to describe the slack analysis method using lpWDA, the following notations are defined: : the amount of *work* required to be processed in interval [, ); : the available slack for scheduled at time can be computed as follows:
In (3), consists of three types of *work*: (1) , (2) from the higher-priority tasks, and (3) from the lower-priority tasks. The *work *required by higher-priority tasks is derived as follows:
where denotes the *work *required by uncompleted tasks released before , and denotes the *work* released during []. We compute and as follows:
where is the infinitesimal. According to the above statements, the amount of *work* required by the scheduled task can be formulated as
where notation stands for . Equations (6), (7), and (8) are repeated iteratively until is the lowest priority task in (i.e., ). Conceptually, lpWDA uses this linear-time heuristics to estimate available slack in an interval up to the upcoming deadline of lower-priority tasks.

##### 3.3. Motivational Example

The proposed method is to provide lpWDA-based algorithms (e.g., lpWDA, lpLDAT, lpWDA-DP, and lpWDA-AC) with a subroutine to improve their work-demand analysis. The main advantage is that PRA can be independent of each function-specific slack analysis method. For instance, the main purpose of lpWDA-AC and lpWDA-DP techniques is to decrease context-switch overhead while that of lpLDAT is to reduce transition time and energy overhead. PRA can work together with these lpWDA-based algorithms to enhance their slack computation capability.

*Example 1. * Consider a periodic task set in Table 1, which presents the period length, WCET, and ACET of each task. Figure 1(a) presents the execution schedule under the worst-case workload in the first hyperperiod. Figure 1(b) shows the speed schedule using lpWDA algorithm for task set and assumes that actual *work* of each task equals its ACET. Before assigning at time , lpWDA computes available slack time in an interval up to by calling Algorithm 3, recursively. However, interval has no slack-time under the WCET schedule. If the length of the analysis interval is extended to , one unit of slack time is derived from . The slack in can be moved backward to the current scheduling point by a deferred execution of earlier *work*. For instance, in Figure 1(a), the slack in interval can be exchanged with the *work* in interval , and then slack in interval can be exchanged with the *work* in interval , and it can be exchanged once again with the *work* in interval . Finally, the slack in interval can be exchanged with the *work* in interval . Therefore, is scheduled with speed (Figure 1(c)). Additional slack can be reclaimed without deadline missing from the interval that is, two times longer than the longest task period. Notably, this idea actually neither moves all of the jobs of a schedule to (e.g., ) nor *Exchanges* the slack with *work* for using this slack time. However, this primitive idea does not work in some situations. For example, in Figure 1(d), when is increased to 6, slack in the interval cannot be transferred before . In fact, jobs , and are released simultaneously at time 6. The slack in interval cannot follow this idea, because a deadline is likely to be missed by one of those three jobs. Our goal is to devise an efficient work-demand analysis method that obtains additional slack while satisfying th tasks’ deadline.

#### 4. Work-Demand Computation

Let be the bn of , which is the first scheduled job at time where . PRA computes the length of additional slack in the interval [bn, ). As long as the slack time can be reclaimed at a time earlier than bn, lpWDA can utilize it by postponing lower-priority task and improve energy efficiency of schedules. Why PRA focuses on slack computation in the interval [bn, ) while longer or shorter intervals? Even if all job (except ) periods are within [bn, ) and cannot make a target slack be available for the task right side to bn, job can still postpone its *work* for moving the slack forward and approaching bn. For example, in Figure 1(a), when the period length of is increased from 3 to 4, the slack in interval cannot be reclaimed by postponing the *work* of or because it is hampered at time 8. Therefore, can defer its *work* and the slack time in will be available. On the contrary, if one extends the additional analysis interval such that it is longer than or even several times of , job cannot move the slack after to approach the bn and may be blocked in this interval. For an analytical interval whose length is equal to , it has the following advantages. After deriving the amount of slack time which will be available to the tasks nearby bn, those jobs whose period spanning astride the bn can be deferred to reclaim additional slack before bn. That is, the current job can utilize the additional slack by performing a lpWDA-based method. Notably, in an actual scheduling process, PRA does not *exchange* any *work* with slack. Instead, it only passes the length of additional slack time for current job to lpWDA and does not affect schedulability of subsequent jobs.

To present the proposed method, we define the following notations:
where denotes the number of tasks in the set of , , and is the task with the longest period in . A set of tasks are called synchronous at time if their jobs are released at time . In an extended analysis interval [bn, ), the number of synchronization points of the tasks in can be derived as follows:
where denotes the least common multiplier of task periods in . As shown in Figure 2, the first *synchronization* point of within the interval [bn, ) is derived as
When , slack time is likely to be blocked or shrunken at time . In Figure 3, when all tasks except are synchronized at time , a slack may not be moved backward from the right to the left side of . In this case, slack can still be moved to the current time by postponing the execution of the *work* of . When tasks are synchronized in interval , we can derive , and their the earliest synchronization point is derived by . The worst-case execution time in interval is
Therefore, the available slack for in this interval is at least
Similarly, when and , there are tasks that synchronize at time . The available slack time for in interval is at least
Therefore, if of tasks are synchronized in interval , the minimal available slack time for in interval is denoted as
where denotes the estimated slack in interval . For example, does not synchronize with other tasks in (Figure 4). Therefore, one can compute the value of for each , where and . Suppose , the earliest synchronization point of tasks in is derived using (11).

After deriving available slack time within interval where , we compute the length of the slack time which is available for the task in interval . We assume denotes a set of tasks in which task periods go *astride* the bn. Let ; the lengths of left and right parts of split by bn are defined as and , respectively, and the longest and are defined as and , respectively. Additionally, we define
as the total amount of *work* in . As shown in Figure 3, the lengths of , , and limit the maximum length of slack that can be moved in interval [, bn). Consequently, the restriction on the length of slack time is as follows:
According to the work demand in a WCET schedule, the slack time in interval [) is computed as follows:
PRA computes the length of additional slack time within interval [bn, ) by (17). It then computes the length of this slack time that can be available for the jobs in interval [, bn) according to (17) and (18). Finally, it changes the priority of a job that goes astride the bn when this job is moved to *readyQ* according to RM scheduling. In line 1 of Procedure PRA, denotes an infinitesimal value.

*Example 2. *Consider the WCET schedule shown in Figure 1(a), is scheduled at time , we set and because the period of goes astride bn. Procedure PRA computes the length of available slack time from interval as follows. When task set , Procedure PRA computes . Therefore, the bottleneck caused by and is and , respectively. Line 6 derives and . Equations (14)–(17) derive . In line 10, the value of , , , and is 1, 1, 1, and 2, respectively. The value of is 1 by line 12. Therefore, Procedure PRA returns to the lpWDA algorithm and passes additional slack to ** CalcLowerPriorityWork()** in Algorithm 3. Notably, the tasks using PRA still execute under RM priority policy except one of the jobs whose periods span astride the bn. At time , when jobs , , and enter at time , has the highest priority and utilizes additional slack estimated by PRA. Therefore, job obtains one unit of time of slack and changes its voltage level from 1 to 0.5. On the contrary, if primitive lpWDA performs at time , cannot obtain any slack. When lpWDA executes iteratively, the value of does not change until is completed. Figure 1(c) presents the scheduling result obtained using Procedure PRA. After completing , unit of slack has been run out, primitive lpWDA continuously performs voltage scaling on the subsequent jobs of . In the case of , it begins after () and obtains one unit of slack time from primitive lpWDA. Therefore, its WCET under voltage is changed to , and actual execution time is . At time , job is released and moved to . Its priority is changed to and lower than the remaining execution time of by executing line 14 in Procedure PRA. Therefore, job begins its

*work*after completing the remaining

*work*of . Notably, PRA only changes job’s priority in and does not affect the feasibility of lpWDA schedule. The correctness proof is discussed later in the next section. Table 2 shows the values of scheduling parameters. The rightmost job in is being executed at that time. In Algorithm 1, job is a global variable. Whenever job executes and , Procedure PRA lowers its priority to guarantee the timing constraint of jobs and .

#### 5. Correctness Proof

In this section, we prove the correctness of the schedules produced by lpWDA and PRA based on worst-case response time (WCRT) analysis and assume that the given task sets are feasible under preemptive RM scheduling. For the fixed-priority preemptive scheduling, a critical instant for a task is given by a moment in which the release time of coincides with all higher-priority tasks. Let denote the WCRT of , without loss of generality, the higher-priority tasks have simultaneous release time with the job of .

Lemma 3. * When a task set contains only one task , the available slack produced by lpWDA for is
*

*Proof. * By (8), the amount of *work* required to be processed in interval is
According to (3), the available slack is derived as
which completes the proof.

In lpWDA, the slack derived from lower-priority task is given to the highest priority job in the *readyQ*. In the WCET case, after applying the slack to the highest priority job, the execution cycles of lower-priority jobs are postponed and their WCRTs will be increased in the length equal to that slack.

Lemma 4. *When a task set contains tasks where , the amount of work required to be processed in () for the highest priority job is
*

*Proof. * Assuming that contains tasks where , we prove that this lemma form the lowest priority task (i.e., ) to the highest priority task using mathematical induction. The case of is proved separately because the third term of is different from those with in (22). From (8), when is the lowest priority task (i.e., ), the workload of the tasks whose priorities lower than are zero (i.e., ). Therefore, the amount of *work* required to be processed at time is
Because before completing , we add () to (23) and derive
and this completes the proof of .*Basic Step*. First, we discuss the value of which can be estimated under two cases: (Case 1) and (Case 2).*Case* *1 *. If is later than , is greater than , that is, the *work* in is the subset of the *work* in . Let , if , amount of *work* can be processed in [, ], and only is required to be processed before . Otherwise, amount of *work* should be processed before . That is,
where the notation stands for . Substituting (23) in the above equation, we get
Substituting this result in defined in (8), we obtain
*Case 2 *. When is earlier than , must be processed before . However, the value of does not change, and the value of is obtained as follows:
*Inductive Step*. When task is considered, the value of can be obtained as follows:
When we consider the task with the highest priority (i.e., ), the amount of *work* of its lower-priority task is
Substituting in (30), we get
When all tasks release at time , we have . Therefore, we get
By (8), substituting (32) in , we get
The proof of Case 2 is similar to that of Case 1 and this completes the proof.

Lemma 5. * The length of slack, that is, provided by lpWDA for the highest priority task in readyQ is at most
*

*Proof. * Assuming that has the highest priority in *readyQ*, this proof can be derived directly from (3), (19), and (22).

The following theorem proves the schedulability of lpWDA by using worst-case response time analysis. We consider each active job in *readyQ* has a simultaneous release with all higher priority tasks.

Theorem 6. * Given a set of tasks is feasible in RM schedule, the maximum response time of task under lpWDA is less than or equal to its deadline. *

*Proof. * Assuming job has the highest priority in *readyQ*. By (8), we get at time . Due to (3) proposed by lpWDA, the deadline of can be guaranteed by
when . Therefore, we get
When runs out , all of subsequent jobs of have to postpone their response times separately at most unit of times comparing to those in their WCET RM schedule. Assuming has lower priority than that of , we prove that the length of new WCRT of including is less than . In a feasible RM schedule, the WCRT of is
The new WCRT of considering the length of is denoted as
Based on (22) in Lemma 4 and set , we derive
From the definitions of function in (4) and (5), we derive
and is the infinitesimal. Because
we derive
from (40) and complete the proof.

From (41) in Theorem 6, the difference between and is derived from the following corollary.

Corollary 7. * For some tasks , and , and is not the multiple of these ; the difference between and is formulated as
**
Notably, presents the length of WCRT proposed by lpWDA, and therefore the slack between and could be utilized by PRA.*

Consider the example shown in Figure 4(a). The value of (0) is set to the sum of and (0) which is shown in the gray box of Figure 4(b). There are 6 time units that are required to be processed before . In order to guarantee the feasible schedule of higher-priority jobs whose periods span *astride * (i.e., and ), lpWDA estimates how much time should be reserved for the higher-priority jobs. In this case, is derived from (43). We investigate the difference between and to keep the deadlines of PRA jobs.

Lemma 8. * Algorithm lpWDA selects an effective feasible speed for the active job in the analysis scope generated by upcoming deadlines.*

*Proof. * Lemma 8 is derived directly from Theorem 6 and Corollary 7.

Lemma 9. *Let , and . When job is feasible under PRA, also keeps its deadline.*

*Proof. * According to Theorem 6, we have
The WCRT of job under PRA is changed to
By (45), the deadline of job is still guaranteed by PRA. Additionally, the priority of job is changed temporarily lower than , and executes immediately after . Therefore, the WCRT of job is denoted as
Due to the definition of , we get
and complete the proof.

Lemma 9 proves that additional slack produced by PRA is shorter than the right part of split by bn. Therefore, the deadline of job is kept after changing the priority of to the lowest priority in the *readyQ*. After completing , the schedule is performed continuously under lpWDA. The schedulability proof in interval ) is similar to Theorem 6 except additional *work * in the WCET schedule.

Lemma 10. * An lpWDA schedule remains feasible when unit of work is postponed to interval using PRA. *

*Proof. * In the interval ), the critical instant appears in a situation that job released before bn may remain incomplete at bn, while other tasks are released at bn. In accordance with (47), the maximum uncompleted *work* for job at bn is . Let denote the WCRT of job in the interval ) where . According to (40) in Theorem 6, we have

According to line 13 in algorithm PRA, the length of available slack is . From (48), the following statement still holds:
Therefore, we derive
which completes the proof.

In fact, Procedure PRA focuses on providing the potential and available slack time to the lpWDA-based algorithms.

Theorem 11. *Procedure PRA provides additional slack that guarantees all task deadlines in the lpWDA.*

*Proof. * In the interval [, suppose job is being executed in [. By executing line 9 in Algorithm 1 while passing the additional slack to the function ** CalcSlackTime**() in the lpWDA algorithm, Algorithm 2 computes the length of the slack, which can use by calling the function

**() recursively. When job uses complete and all of its subsequent jobs execute in their WCET, job is likely to miss its deadline. However, line 14 in Procedure PRA solves this problem by changing the priority of job to the lowest priority job in**

*CalcLowerPriorityWork**readyQ*. Because , according to Lemma 9, the deadline of jobs and is guaranteed in the interval [.

In the interval ), Procedure PRA computes the length of additional slack in ). Given that has been assigned to the lowest priority in

*readyQ*, and all deadlines before the completion time of are met. Based on the completion time of , we divide it into two cases.

*Case*

*1*. Job completes at time and . In this case, the slack computed by Procedure PRA is not to be used by the jobs started before . Because produced by Procedure PRA does not shift to the time before the bn, it has no influence on the job execution cycles after bn. Therefore, the initial lpWDA algorithm guarantees the deadline of jobs that started after the bn.

*Case*

*2*. Job completes at time and . When a scheduling point is at , the analysis scope defined in (2) is extended up to by calling the Procedure

**() in Algorithm 3 recursively. In a feasible schedule, when slack has been exchanged on the left side of the bn and jobs before have been performed according to their WCET, the length of delayed**

*CalcLowerPriorityWork**work*is not longer than . Therefore, the length of the additional

*work*moved in interval ) is at most . By Lemma 10, the additional

*work*does not affect the feasibility of lpWDA schedule and completes the proof.

#### 6. Performance Evaluation

In this section, we evaluate the time and energy efficiency of PRA scheme for randomly generated task sets and compare them with those of the ccRM, lpWDA and lpLDAT schemes. Both the ccRM and lpWDA are modified to account for transition time overhead. In simulations, lpWDA, and lpLDAT are called the *host* algorithms of PRA and cooperate with PRA to compare its performance with those of initial ccRM, lpWDA, and lpLDAT methods.

##### 6.1. Complexity and Execution Time of Algorithms

Theorem 12. * The PRA algorithm has a computational complexity of O(n) per scheduling point, where n denotes the number of tasks in the systems.*

*Proof. * Lines 4 and 5 are completed in constant time for each iterative step according to (19) and (23). In line 8, the value of is derived from (24)–(27), where the value of in (27) needs time to compute the length of slack time. In line 10, the computation of and needs time. Therefore, the overall time complexity is .

In the simulation results, for any given pair of , and ratio in , 10000 task sets are generated randomly. The experiment result is the average value over the 10000-task sets. In a task set, every task period (as well as deadline ) is uniformly distributed in the range ms. The length of each schedule is at least ten times of its tasks hyperperiod except the fourth experiment in Section 6.2. Execution time of each task is assigned a real number in the range of [1, min1, 90}]ms. In a task set, after assigning values to the execution time of all tasks, we give a utilization to the task set and rescale the of each task, such that the summation of task weights (i.e., ) equals the given .

Due to the limited execution speed of embedded processors, an function is implemented by offline nonrecursive programs. This function is composed of and which compute the least common multiplier (LCM) and greatest common divisor (GCD), respectively. In these programs, each integer is represented by a 32-bit word. Each experiment (schedule) has a maximum of 20 tasks, and each task has an integer variable for storing the period lengths of accumulated values of an LCM. Additionally, two integers are needed to record GCD and LCM of all task periods. Therefore, each schedule requires at most 100 bytes for storing data, which include local variables.

In Figure 5, we examine the execution time required by each online algorithm, including the following algorithms: ccRM: the ccRM algorithm from [7] is modified to account for transition overhead; lpWDA: the lpWDA algorithm from [6] is modified to account for transition overhead; lpLDAT: the algorithm from [12]; lpWDA-PRA: the lpWDA is the *host* algorithm of PRA and cooperates with Procedure PRA; lpLDAT-PRA: the lpLDAT is the *host* algorithm of PRA and cooperates with Procedure PRA; lpWDA-DP-PRA: the lpWDA-DP is a *host* algorithm and cooperates with Procedure PRA; lpWDA-AC-PRA: the lpWDA-AC is a *host* algorithm and cooperates with Procedure PRA.

Figure 5 presents maximum execution time of each algorithm on a processor versus the number of tasks in the system. Notably, the simulation results produced by lpWDA-DP-PRA and lpWDA-AC-PRA are very close to those of lpWDA-PRA and lpLDAT-PRA, for a clear presentation of this figure, they are abridged. All algorithms executing on the simulated processor are based on ARM8 core with the highest speed (100 MHz) and voltage level. The measurement results were generated by inserting a system timer function and executing each algorithm individually. Obviously, ccRM has a significant advantage in terms of execution time when compared with that of the other online algorithms. Because the algorithms are invoked upon each release and completion, one must increase execution time of each task by two times the maximum execution time of the algorithm to account for scheduling overhead. To present the maximum execution times of these algorithms, the simulation process is as follows. First, a set of experiments performed Procedure PRA with its *host* algorithms (lpWDA and lpLDAT) and other initial algorithms. The functions of the system timer are used to record the duration of each algorithm, choose their longest execution times in each schedule, and accumulate execution times separately with respect to different methods. At experiment end, these accumulated execution times are divided by the number of schedules generated. PRA is an efficient online algorithm that increases in average additional execution time by less than 12% of those of their *host* algorithms.

##### 6.2. Simulation Results

The following four parameters are varied in simulations: (1) number of tasks in is varied at 2–18 in two task increments; (2) utilization for task set is varied at 0.1–0.9; (3) the ratio of BCET to WCET is varied at 0.1–0.9; and (4) the analytical interval in bound being the multiples of is denoted as . Before performing these experiments, 10000 task sets have been generated randomly including the number of tasks in each set, task period lengths, and their worst-case execution requirements in accordance with a uniformed distribution function. Early completion time of each job in simulation (1), (2), and (4) was randomly drawn from a Gaussian distribution in the range of [BCET, WCET], where BCET/WCET = 0.1. In simulation (3), each experiment was performed by varying BCET at 10%–90% of WCET.

For all experiments, we assume that 10 frequency levels are available in the range of 10–100 MHz, with corresponding voltage levels of 1–3.3 Volts. The energy consumption caused by memory access and cache misses are ignored, and all experimental results are normalized against the same processor running at maximum speed without a DVS technique (non-DVS for short). Table 3 presents the power specification of the ARM8 processor [15].

The overhead considered in simulations is as follows.

(1) *Algorithm Execution Time and Energy*. The execution time overhead is referred to the simulation results in Figure 5. Energy overhead is obtained under the assumption of maximum speed .

(2) *Voltage Transition Time and Energy*. The assumption of voltage scaling overhead is the same with that in [17]. For the voltage scaling from to , the transition time is
where and the charge to the capacitor and maximum output current of the converter, respectively. Transition time is at most 70 between maximum transition [15]. The energy consumed during each transition is
where denotes the efficiency of a DC-DC converter.

(3) *Context-Switch Time and Energy*. Context-switch time is assumed 50 at the highest speed , as in [18].

Figures 6, 7, 8, and 9 list the energy consumption of each method. Energy consumption includes both execution duration of PRA and its *host* algorithms (i.e., the lpWDA and lpLDA), and the context-switch time required to switch to and from other real-time tasks. Since the range of task periods has been shortened to a scale of ms, the difference between task periods and context-switch times or transition times is smaller than those assumed in [6, 12]. Additionally, these energy overheads arose from PRA, and its *host* algorithms are also taken into account, such that the experimental results are close to actual situations. In these simulations, the *host* algorithms cooperating with PRA are better than its initial algorithms, respectively.

Figures 6, 7, 8, and 9 also present the results for a clairvoyant algorithm, named bound, which knows the actual execution cycle of each task beforehand and adopts the optimal speed accordingly. The length of analytical interval utilized by algorithm bound is set at least four times the length of except the fourth experiment in Figure 9. This setting ensures that bound’s analytical length is longer than those utilized by other methods and equal to the length of the schedules. Every scheduling point in the entire schedule is examined when looking for the best start and finish times. The context-switch and transition overhead assumed in the bound are the same as those with other methods while its execution time is assumed zero. In fact, bound is not a practical algorithm because it is extremely time consuming when finding the suitable start, preemption, and completion times, and no algorithm can predict the exact amount of job execution cycles beforehand. Thus, bound functions a yardstick in simulations because no real DVS algorithm can achieve better performance than that of bound.

As shown in Figure 6, the lpWDA-PRA and lpLDAT-PRA method, respectively, reduces the energy consumption by at least 12% and 4% over that of primitive lpWDA and lpLDAT. The energy efficiency of lpWDA-DP-PRA is between those of lpWDA-PRA and lpLDAT-PRA, while the energy efficiency of lpWDA-AC-PRA is 6%–3% worse than those of lpWDA-PRA and lpLDAT-PRA. The difference between lpWDA-AC-PRA and lpWDA-PRA is that lpWDA-AC in lower layer tries to schedule tasks with higher speed and leaves additional slack that PRA is unaware of. Therefore, this slack time becomes a fragment of slack time that is harmful to power saving. The following series of experiments almost reveal the phenomenon that the performance of lpWDA-AC-PRA is worse than those of lpWDA-PRA and lpWDA-DP-PRA. The value of of each task set is assigned randomly at 10%–70% by uniform probability distribution function. The value of and the ratio for each task set are 0.8 and 0.5, respectively. With a large *U*, PRA outperforms its *host* algorithms in a small , consuming up to 25% and 24% less energy than lpWDA and lpLDAT, respectively. Although execution overhead of PRA is included, for a large , PRA still outperforms its *host* algorithms. The reason is likely that when the number of tasks increases, the number of task synchronization points appeared in the analytical interval likely decreases and benefits slack time computation.

Experimental results in Figure 7 indicate that lpWDA-PRA and lpLDAT-PRA saved up to 25% and 4% more energy than their *host* algorithms. Among lpWDA-based methods, although lpWDA-DP-PRA has the best performance, it slightly outperforms lpWDA-PRA. In other words, when PRA is applied in lpWDA, the - (DP) technique contributes little energy saving to the host methods. In the experiment, the and the *bc/wc* ratio of each task set are 10 and 0.5, respectively. Increasing the value of in Figure 7 increases the energy consumption of PRA and its *host* algorithms. With a small , the gain from PRA is modest, with 1% and 4% saving compared to that of the initial lpLDAT and lpWDA algorithms, respectively. Additionally, with these methods, is an important factor when computing the slack for deciding processor speeds. With a moderate value, lpWDA-PRA and lpLDAT-PRA consume at most 16% and 4% less energy than that of the initial lpWDA and lpLDAT algorithm, respectively. Therefore, PRA utilizes not only the advantages of its *host* algorithms but also slack belonging to the value of as possible and shifts the slack to the current job.

In Figure 8, the set of experiments varies the ratio at 0.1–0.9, and the value of and is 0.8 and 10, respectively. The energy consumed by PRA is positively correlated with the ratio, while their *hosts* are not sensitive to the ratio. In this experiment, lpWDA-DP-PRA still outperforms other lpWDA-based methods but not obviously. With a low ratio, PRA is the best, consuming up to 26% and 16% less energy than the lpWDA and lpLDAT, respectively. With a ratio of 0.9, PRA collaborated with lpLDAT which consumes slightly more energy than the initial lpLDAT algorithm. The reason is likely that the additional saving gained by the PRA algorithm is compromised by its execution overhead.

In Figure 9, the analytical interval in bound is exactly times the length of . The values of are controlled by the simulation from 2 to 18 and the *totaltask* of each task set is assigned randomly at 2 to 20 tasks. For simplicity, the length of each schedule is also controlled in . When the value of increases, the energy saving of bound is not obvious, and the energy consumption required by the schemes with PRA are not sensitive to . Notably, when , PRA and bound have equal length of analytical interval; bound gains at most 33% energy savings less than the proposed schemes. Therefore, extending the additional analysis interval, such that it is several times longer than does not increase an already substantial energy saving but rather increases computing overhead during slack time analysis. There exist some limits and difficulties whenever extending analysis interval that is longer than two times of . Firstly, the length of the potential slack time is affected by the length of analytical interval, the values of and , and by the locations of actual execution cycles of job. In Figure 3, the locations of the job workload affect the available length of slack time. Additionally, utilization of a task set influences the length of available slack time. For example, for a task set with high utilization, actual workload of many tasks may appear after where or and cannot be exchanged with the slack. Therefore, the longer the analytical interval is, the harder the prediction of the length of additional slack becomes. For a longer interval than , we may need considerably more time and memory space than those required by PRA. Additionally, the length of a hyperperiod can vary from one to many times of as long as one task period is changed. When an analytical interval extends up to a hyperperiod, execution time required by algorithms may change severely and adversely affect the predictability of a real-time schedule.

#### 7. Conclusions

In this paper, we proposed the parareclamation algorithm (PRA) based on the concept of work-demand computation. This method can serve many existing RM scheduling methods as a *guest* algorithm. PRA cooperating with the *host* algorithms such as lpWDA, lpWDA-DP, lpWDA-AC, lpLDA, and lpLDAT can further decrease energy consumption without increasing time complexities. It is fully compatible not only with transition-aware (i.e., lpLDAT,) methods but also preemption-aware (i.e., lpWDA-AC and lpWDA-DP,) methods. Experimental results indicate that PRA can utilize the additional slack produced by lpWDA-AC and lpWDA-DP and reduce average energy consumption by 14% when compared with that of the initial schemes.

#### Acknowledgment

The authors would like to thank the National Science Council of the Republic of China, Taiwan, for financially supporting this paper under NSC 102-2221-E-025-003.

#### References

- H. Aydin, R. Melhem, D. Mossé, and P. Mejia-Alvarez, “Power-aware scheduling for periodic real-time tasks,”
*IEEE Transactions on Computers*, vol. 53, no. 5, pp. 584–600, 2004. View at Publisher · View at Google Scholar · View at Scopus - J.-J. Chen and C.-F. Kuo, “Energy-efficient scheduling for real-time systems on dynamic voltage scaling (DVS) platforms,” in
*Proceedings of the 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications(RTCSA '07)*, pp. 28–35, Taipei, Taiwan, August 2007. View at Publisher · View at Google Scholar · View at Scopus - F. Gruian, “Hard real-time scheduling for low-energy using stochastic data and DVS processors,” in
*Proceedings of the 2001 International Symposium on Low Power Electronics and Design (ISPLED '01)*, pp. 46–51, ACM Press, Huntington Beach, Calif, USA, August 2001. View at Scopus - X. C. He and Y. Jia,
*Energy-Efficient Scheduling Fixed-Priority Tasks with Preemption Thresholds on Variable Voltage Processors*, vol. 4672 of*Lecture Notes in Computer Science*, Springer, Berlin, Germany, 2008. - W. Kim, J. Kim, and S. L. Min, “A dynamic voltage scaling algorithm for dynamic- priority hard real-time systems using slack time analysis,” in
*Proceedings of the 2002 Design Automation and Test in Europe (DATE '02)*, pp. 788–797, Paris, France, March 2002. - W. Kim, J. Kim, and S. L. Min, “Dynamic voltage scaling algorithm for fixed-priority real-time systems using work-demand analysis,” in
*Proceedings of the 2003 International Symposium on Low Power Electronics and Design (ISPLED '03)*, pp. 396–401, ACM Press, New York, NY, USA, August 2003. - P. Pillai and K. G. Shin, “Real-time dynamic voltage scaling for low-power embedded operating systems,” in
*Proceedings of the 18th ACM symposium on Operating Systems Principles (SOSP '01)*, pp. 89–102, ACM Press, New York, NY, USA, October 2001. - G. Quan and X. S. Hu, “Energy efficient fixed-priority scheduling for real-time systems on variable voltage processors,” in
*Proceedings of the 2001 Design Automation Conference (DAC '01)*, pp. 828–833, Las Vegas, Nev, USA, June 2001. - D. Shin, J. Kim, and S. Lee, “Intra-task voltage scheduling for low-energy hard real-time applications,”
*IEEE Design and Test of Computers*, vol. 18, no. 2, pp. 20–29, 2001. View at Publisher · View at Google Scholar · View at Scopus - J. W. S. Liu,
*Real-Time Systems*, Prentice Hall, Upper Saddle River, NJ, USA, 2000. - W. Kim, J. Kim, and S. L. Min, “Preemption-aware dynamic voltage scaling in hard real- time systems,” in
*Proceedings of the 2004 International Symposium on Low Power Electronics and Design (ISPLED '04)*, pp. 393–398, ACM Press, New York, NY, USA, August 2004. - B. Mochocki, X. S. Hu, and G. Quan, “Transition-overhead-aware voltage scheduling for fixed-priority real-time systems,”
*ACM Transactions on Design Automation of Electronic Systems*, vol. 12, no. 11, no. 2, Article ID 1230803, 2007. View at Publisher · View at Google Scholar · View at Scopus - W. Kim, D. Shin, H. -S. Yun, J. Kim, and S. L. Min, “Performance evaluation of Dy23 namic voltage scaling algorithms for hard real-time systems,”
*Journal of Low Power Electronics*, vol. 1, no. 3, pp. 207–216, 2005. View at Publisher · View at Google Scholar - M. Weiser, B. Welch, A. Demers, and S. Shenker, “Scheduling for reduced CPU energy,” in
*Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation*, vol. 1, pp. 13–23, Berkeley, Calif, USA, 1994. - T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, “Dynamic voltage scaled microprocessor system,”
*IEEE Journal of Solid-State Circuits*, vol. 35, no. 11, pp. 1571–1580, 2000. View at Publisher · View at Google Scholar · View at Scopus - Advanced Micro Devices, “Mobile AMD athlon 4 processor model 6 CPGA data sheet,”
*Technique Report*24332, Advanced Micro Devices, Sunnyvale, Calif, USA, 2003. View at Google Scholar - T. D. Burd and R. W. Brodersen, “Design issues for dynamic voltage scaling,” in
*Proceedings of the International Symposium on low Power Electronics and Design (ISLPED '00)*, pp. 9–14, Redondo Beach, Calif, USA, July 2000. View at Scopus - F. M. David, J. C. Carlyle, and R. H. Campbell, “Context-switch overheads for linux on ARM platforms,” in
*Proceedings of the 2007 Workshop on Experimental Computer Science (ExpCS '07)*, vol. 3, San Diego, Calif, USA, June 2007.