Mobile Information Systems

Volume 2016, Article ID 2010837, 10 pages

http://dx.doi.org/10.1155/2016/2010837

## Power Optimization of Multimode Mobile Embedded Systems with Workload-Delay Dependency

^{1}Department of ECE, Ajou University, Suwon 16499, Republic of Korea^{2}Department of CSE, Seoul National University, Seoul 08826, Republic of Korea

Received 24 March 2016; Accepted 14 June 2016

Academic Editor: Yuh-Shyan Chen

Copyright © 2016 Hoeseok Yang and Soonhoi Ha. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper proposes to take the relationship between delay and workload into account in the power optimization of microprocessors in mobile embedded systems. Since the components outside a device continuously change their values or properties, the workload to be handled by the systems becomes dynamic and variable. This variable workload is formulated as a staircase function of the delay taken at the previous iteration in this paper and applied to the power optimization of DVFS (dynamic voltage-frequency scaling). In doing so, a graph representation of all possible workload/mode changes during the lifetime of a device, Workload Transition Graph (WTG), is proposed. Then, the power optimization problem is transformed into finding a cycle (closed walk) in WTG which minimizes the average power consumption over it. Out of the obtained optimal cycle of WTG, one can derive the optimal power management policy of the target device. It is shown that the proposed policy is valid for both continuous and discrete DVFS models. The effectiveness of the proposed power optimization policy is demonstrated with the simulation results of synthetic and real-life examples.

#### 1. Introduction

Today’s mobile embedded systems often interact with physical processes or external environments, referred to as Cyber-Physical Systems (CPSs). Such systems are usually modeled with interactions between the physical world and the devices [1]. For instance, handheld or stationary embedded systems need to continuously interact with environments in the example of smart building [2]. The system performs a computational task and responds through an actuator to the physical side, while the resulting change at the physical side, in turn, makes a variation on the input (sensor) of the device. In order not to make this control loop unstable, it is common that the embedded system has a real-time constraint within which all the computation should be completed.

In a class of applications, the computational workload of the embedded systems depends on the variation of the sampled input value, while the computation delay, in turn, affects the input variation of the next iteration. Usually, if it invests more time at one iteration for processing information, it would have more work to do at the next iteration. One example of such delay-workload dependency can be found in an object tracking which is frequently used in drone, surveillance camera, or augmented reality [3–5]. The image obtained from the camera is processed by the object tracker to follow an object. As the object may continuously change its position meanwhile, the object tracker should reactively take an image from the adjusted position/angle to make the next decision. The more time the object spends in the tracker, the more distance the object will move by.

Such workload-delay relations can be popularly found in modern mobile embedded systems, which rely on computer vision algorithms to capture what happens in the external world. In those applications, it is typical that the* current internal state* is maintained to figure out the* difference* caused by what happened in the external world. The examples of such internal states range from a simple snapshot of a sensor reading to a complicated model of the scene obtained from camera. No matter what the model is, it is generally true that the longer execution delay between two consecutive invocations of the algorithm results in the larger workload in the successive iteration as the degree of the heterogeneity gets bigger.

The workload-delay dependency can also be found in many different types of applications. Real-time pattern matching over event streams [6], for instance, exhibits similar behavior: the queries can be handled either by small amount (shorter delay, less workload) or in an aggregated manner (longer delay, more workload). Similarly, haptic rendering in Human-Computer Interface (HCI) uses adaptive sampling techniques to deal with the stringent real-time constraint [7] and the rendering algorithm can be warm-started to exploit the temporal coherence [8]. In essence, applications which exploit temporal coherence have possible workload-delay dependencies. That is, any iterative algorithms that can be warm-started can lead to one.

Nowadays, most modern microprocessors used in mobile embedded systems support dynamic voltage-frequency scaling (DVFS) [9] for power-efficient operations. Generally, delay and energy in the systems with DVFS are in a tradeoff relationship for a given workload. That is, given a certain amount of work to be handled, a faster solution (with a higher frequency) is less energy efficient. Considering this control knob with the aforementioned delay-workload dependency, the power optimization problem gets very challenging. Conventionally, it has just been understood that “working as slow as possible” within the real-time constraint is the best discipline in terms of minimizing the power dissipation. However, with the existence of the workload-delay dependency, it is no longer valid since a slower execution may cause a bigger workload at the next iteration. On the other hand, “as fast as possible” is not optimal either, as the power consumption is a strong function of the operating speed [10].

The workload-delay dependency has been firstly modeled and applied to the DVFS optimization in [11]. It is assumed that the workload is a continuous and monotonically increasing function of the delay, under which a simple yet effective power management technique has been proposed. Specifically, it has been shown that staying in a certain DVFS mode is better than alternating between different DVFS modes dynamically. Later, the optimization is generalized to various power models and formally proven to be optimal [12].

This work differs from our previous work [12] in that we take different optimization approach tailored for discrete workload levels. We observed that the continuity assumption does not always hold true in reality. Rather, there are a number of applications that have discrete levels of workload. For instance, recall that many image or signal processing algorithms handle input data in the unit of macroblock or frame. In such application domains, the workload tends to grow in a discrete manner. In this paper, the workload is modeled as a staircase function of the delay taken in the previous iteration. Since the solution obtained by the previous work [11, 12] is no longer optimal or nonexisting at all in the staircase model, a new power management technique is proposed. The contributions of this paper can be summarized as follows:(i)The workload-delay dependency is modeled in a staircase function generalizing the previous model and validated with a real-life example.(ii)A novel data structure, Workload Transition Graph (WTG), is proposed to represent all possible operation workload/mode changes of a device.(iii)Based on WTG, a power management policy is derived and shown to be optimal.

#### 2. Related Work

Bogdan and Marculescu [13] observed that workloads from physical processes tend to be nonstationary but exhibit some systematic relationship in space and time. They proposed a workload characterization approach based on statistical physics and showed how the workload-awareness can improve the design of electronic systems. Zhang et al. [14] studied the relationship between the control stability and workload in inverted pendulum control. While enlarged invocation periods may lower the degree of stability, more inverted pendulums can be controlled by a system as the lengthened invocation periods lower the utilization of the algorithm. This can be seen as trading off the control stability for resource efficiency. In other words, they proposed to sacrifice the stability to accommodate more workload in a system. The proposed technique also deals with variable workload in electronic systems but differs from the above-mentioned works in that the effect of execution delay on workload is systematically considered.

Recently, Pant et al. [15] proposed a codesign of computation delay and control stability based on anytime algorithm. Anytime algorithm is a kind of algorithms that can be stopped at any point in time but still provides a decent solution. Typically, the quality of the solution is increasing function of the computation delay. In their work, it is the duty of the control algorithm to adaptively change the real-time deadline constraint and error bound (quality of control). On the contrary, the relationship between execution delay and workload is formally described in the form of workload-delay function; thus no explicit runtime monitoring/control is required in the proposed technique.

A design guideline for flexible delay constraints in distributed embedded systems was proposed by Goswami et al. [16], where some of the samples are allowed to violate the given delay deadline. They presented the applicability of the proposed approach using the FlexRay dynamic segment as a communication medium. This work is similar to the proposed approach in the sense that they do not stick to a given fixed real-time deadline. While they could avoid the resource overprovisioning by trading off the hard real-time constraints, the workload dependency to the delay has not been considered. Moreover, from a real-time standpoint, the proposed work is more rigorous as it allows no real-time constraint violations.

#### 3. Problem Definition

This section presents the system model assumed in this paper, which is followed by the formulation of the power optimization problem.

##### 3.1. System Model

###### 3.1.1. Dynamic Voltage-Frequency Scaling

In this paper, we assume that a system has multiple operation modes due to DVFS feature, where the operating frequency and voltage can be modulated. For simplicity, we first assume that there are infinitely many operation modes available, among which one is chosen at each iteration. It will be shown that the proposed technique can be applied to a discrete DVFS as well in Section 5. The operation mode at the th iteration is represented with the speed scaling factor ranging from to (). Then, the operating frequency of the th iteration, , iswhere is the maximum frequency of the microprocessor.

###### 3.1.2. Workload

The workload is defined to be a number of clock cycles elapsed to complete the given computation. We denote the number of cycles elapsed to handle the workload of the th iteration at the full speed of the microprocessor () as . That is,Note that the elapsed time increases as the speed is scaled down (). Then, the delay is automatically determined when a speed scaling factor is chosen for the given workload ().

###### 3.1.3. Real-Time Constraint

The delay cannot be unboundedly long as the system is associated with real-time constraint . For all iterations, the elapsed time should be no more than :

###### 3.1.4. Delay-Workload Dependency

As stated earlier, the workload is dependent upon the previous execution delay. Usually, the workload is not a continuous function of the delay variation. Rather, the changes happen in a discrete manner. Therefore, the workload at the th iteration is a monotonically increasing staircase function of the delay of the previous iteration, : . If the given system has workload levels, the workload function can be formulated as follows: in which the workload levels are and the delay thresholds (workload changing moments) are .

###### 3.1.5. Execution Trace

At the th iteration, the speed scaling factor uniquely defines an execution* mode* as the delay is fixed accordingly by (2). The initial workload is assumed to be given as . Then, an execution trace of length is defined to be a sequence of the speed scaling factors of iterations:

###### 3.1.6. Average Power Consumption

The dynamic power consumption of CMOS circuits is , where , , and are capacitance, operating voltage, and frequency, respectively. As the operating frequency is proportional to [10], the power consumption is an increasing function of . It is worth noting that the proposed model is not dependent upon any specific DVFS model. We denote the energy consumption of a unit workload at the full speed () as and assume that energy dissipation grows linearly to the size of workload. Then, the reference energy of a workload at the full speed is . Given a DVFS energy model as a function of the speed scaling factor , the energy consumption at the th iteration is formulated as follows:in which and . Then, the average power consumption of a trace can be formulated as follows:It is worthwhile to mention that the proposed technique is not specific to a certain workload-energy model. While we adopt linear model for the workload-energy relation for ease of presentation, any, possibly nonlinear, model can be used in (6).

##### 3.2. Problem Formulation

Our objective is to minimize the average power consumption of a given system as follows: Given the modeling constant , DVFS energy modeling function , workload function , and the real-time constraint , determine an execution trace such that the average power consumption formulated in (7) is minimized.

#### 4. Proposed Technique

In this section, we describe the proposed operation management policy as an answer to the problem defined in the previous section. In doing so, we first derive the condition for feasible and schedulable systems. Then, we study when the workload changes and how it affects the power dissipation. Based on that, we propose a novel graph representation that captures all possible workload transitions in the power-optimal operation. Finally, we derive the power-optimal operation policy with the given workload function .

##### 4.1. Feasibility

In this subsection, we examine under which condition a given system is feasible. First, the system should be schedulable within the given real-time constraint at every iteration.

Theorem 1 (schedulability). *Given the workload function and the real-time constraint , the system is not schedulable if , .*

*Proof. *Suppose that the delay at the th iteration is . Then, and . Since , . That is, the delay is increasing as iteration goes by and will eventually reache the real-time constraint: . At the next iteration, the system becomes unschedulable even with the full speed, as requires .

Once the workload gets bigger than , the system is trivially not schedulable afterwards even with the full speed, . Thus, the workload must not be bigger than at any time. Moreover, once the workload reaches , should remain the full speed afterwards. We can make the upper bound of workload even tighter if there exists such that for all . In this case, the workload larger than is not allowable as it makes the execution delay longer and longer, eventually violating the deadline.

Given the workload function and the initial workload , one can calculate the lower bound of the workload as well. If a value exists which satisfies and , the workload will never become smaller than . In other words, even with the full speed, the execution delay never goes below .

Then, the valid workload levels and the execution delay range during the lifetime of a given system can be formulated as below.

*Definition 2 (valid ranges). *Given the workload function and the initial workload , the minimum and maximum workload levels of a system are defined to beThen, the valid range of the execution delay is formulated as according to and with

##### 4.2. Workload Transitions

In this subsection, we examine when a workload transition between valid workload levels possibly occurs and how it affects the system.

As presented in (4), workload is a function of the delay taken at the previous iteration. If the delay taken at an iteration is and . Then, if the system works fast enough to result in a shorter delay, , the next workload will get smaller than . Similarly, in case that the delay gets longer (), the system will need to handle a larger workload than at the successive iteration.

However, such workload transitions can occur only within limited ranges. Figure 1 depicts valid and invalid transitions from one workload level. Figure 1(a) shows two transitions from a workload level to lower ones and (). To make the next workload level , the delay should be in the range of . Given the current workload , the speed scaling factor should be larger than or equal to from (2). If , this workload transition can possibly occur. In contrast, if for another workload level , the transition from to never happens because the delay never goes below even with the full processing speed.