Research Article | Open Access
Power Optimization of Multimode Mobile Embedded Systems with Workload-Delay Dependency
This paper proposes to take the relationship between delay and workload into account in the power optimization of microprocessors in mobile embedded systems. Since the components outside a device continuously change their values or properties, the workload to be handled by the systems becomes dynamic and variable. This variable workload is formulated as a staircase function of the delay taken at the previous iteration in this paper and applied to the power optimization of DVFS (dynamic voltage-frequency scaling). In doing so, a graph representation of all possible workload/mode changes during the lifetime of a device, Workload Transition Graph (WTG), is proposed. Then, the power optimization problem is transformed into finding a cycle (closed walk) in WTG which minimizes the average power consumption over it. Out of the obtained optimal cycle of WTG, one can derive the optimal power management policy of the target device. It is shown that the proposed policy is valid for both continuous and discrete DVFS models. The effectiveness of the proposed power optimization policy is demonstrated with the simulation results of synthetic and real-life examples.
Today’s mobile embedded systems often interact with physical processes or external environments, referred to as Cyber-Physical Systems (CPSs). Such systems are usually modeled with interactions between the physical world and the devices . For instance, handheld or stationary embedded systems need to continuously interact with environments in the example of smart building . The system performs a computational task and responds through an actuator to the physical side, while the resulting change at the physical side, in turn, makes a variation on the input (sensor) of the device. In order not to make this control loop unstable, it is common that the embedded system has a real-time constraint within which all the computation should be completed.
In a class of applications, the computational workload of the embedded systems depends on the variation of the sampled input value, while the computation delay, in turn, affects the input variation of the next iteration. Usually, if it invests more time at one iteration for processing information, it would have more work to do at the next iteration. One example of such delay-workload dependency can be found in an object tracking which is frequently used in drone, surveillance camera, or augmented reality [3–5]. The image obtained from the camera is processed by the object tracker to follow an object. As the object may continuously change its position meanwhile, the object tracker should reactively take an image from the adjusted position/angle to make the next decision. The more time the object spends in the tracker, the more distance the object will move by.
Such workload-delay relations can be popularly found in modern mobile embedded systems, which rely on computer vision algorithms to capture what happens in the external world. In those applications, it is typical that the current internal state is maintained to figure out the difference caused by what happened in the external world. The examples of such internal states range from a simple snapshot of a sensor reading to a complicated model of the scene obtained from camera. No matter what the model is, it is generally true that the longer execution delay between two consecutive invocations of the algorithm results in the larger workload in the successive iteration as the degree of the heterogeneity gets bigger.
The workload-delay dependency can also be found in many different types of applications. Real-time pattern matching over event streams , for instance, exhibits similar behavior: the queries can be handled either by small amount (shorter delay, less workload) or in an aggregated manner (longer delay, more workload). Similarly, haptic rendering in Human-Computer Interface (HCI) uses adaptive sampling techniques to deal with the stringent real-time constraint  and the rendering algorithm can be warm-started to exploit the temporal coherence . In essence, applications which exploit temporal coherence have possible workload-delay dependencies. That is, any iterative algorithms that can be warm-started can lead to one.
Nowadays, most modern microprocessors used in mobile embedded systems support dynamic voltage-frequency scaling (DVFS)  for power-efficient operations. Generally, delay and energy in the systems with DVFS are in a tradeoff relationship for a given workload. That is, given a certain amount of work to be handled, a faster solution (with a higher frequency) is less energy efficient. Considering this control knob with the aforementioned delay-workload dependency, the power optimization problem gets very challenging. Conventionally, it has just been understood that “working as slow as possible” within the real-time constraint is the best discipline in terms of minimizing the power dissipation. However, with the existence of the workload-delay dependency, it is no longer valid since a slower execution may cause a bigger workload at the next iteration. On the other hand, “as fast as possible” is not optimal either, as the power consumption is a strong function of the operating speed .
The workload-delay dependency has been firstly modeled and applied to the DVFS optimization in . It is assumed that the workload is a continuous and monotonically increasing function of the delay, under which a simple yet effective power management technique has been proposed. Specifically, it has been shown that staying in a certain DVFS mode is better than alternating between different DVFS modes dynamically. Later, the optimization is generalized to various power models and formally proven to be optimal .
This work differs from our previous work  in that we take different optimization approach tailored for discrete workload levels. We observed that the continuity assumption does not always hold true in reality. Rather, there are a number of applications that have discrete levels of workload. For instance, recall that many image or signal processing algorithms handle input data in the unit of macroblock or frame. In such application domains, the workload tends to grow in a discrete manner. In this paper, the workload is modeled as a staircase function of the delay taken in the previous iteration. Since the solution obtained by the previous work [11, 12] is no longer optimal or nonexisting at all in the staircase model, a new power management technique is proposed. The contributions of this paper can be summarized as follows:(i)The workload-delay dependency is modeled in a staircase function generalizing the previous model and validated with a real-life example.(ii)A novel data structure, Workload Transition Graph (WTG), is proposed to represent all possible operation workload/mode changes of a device.(iii)Based on WTG, a power management policy is derived and shown to be optimal.
2. Related Work
Bogdan and Marculescu  observed that workloads from physical processes tend to be nonstationary but exhibit some systematic relationship in space and time. They proposed a workload characterization approach based on statistical physics and showed how the workload-awareness can improve the design of electronic systems. Zhang et al.  studied the relationship between the control stability and workload in inverted pendulum control. While enlarged invocation periods may lower the degree of stability, more inverted pendulums can be controlled by a system as the lengthened invocation periods lower the utilization of the algorithm. This can be seen as trading off the control stability for resource efficiency. In other words, they proposed to sacrifice the stability to accommodate more workload in a system. The proposed technique also deals with variable workload in electronic systems but differs from the above-mentioned works in that the effect of execution delay on workload is systematically considered.
Recently, Pant et al.  proposed a codesign of computation delay and control stability based on anytime algorithm. Anytime algorithm is a kind of algorithms that can be stopped at any point in time but still provides a decent solution. Typically, the quality of the solution is increasing function of the computation delay. In their work, it is the duty of the control algorithm to adaptively change the real-time deadline constraint and error bound (quality of control). On the contrary, the relationship between execution delay and workload is formally described in the form of workload-delay function; thus no explicit runtime monitoring/control is required in the proposed technique.
A design guideline for flexible delay constraints in distributed embedded systems was proposed by Goswami et al. , where some of the samples are allowed to violate the given delay deadline. They presented the applicability of the proposed approach using the FlexRay dynamic segment as a communication medium. This work is similar to the proposed approach in the sense that they do not stick to a given fixed real-time deadline. While they could avoid the resource overprovisioning by trading off the hard real-time constraints, the workload dependency to the delay has not been considered. Moreover, from a real-time standpoint, the proposed work is more rigorous as it allows no real-time constraint violations.
3. Problem Definition
This section presents the system model assumed in this paper, which is followed by the formulation of the power optimization problem.
3.1. System Model
3.1.1. Dynamic Voltage-Frequency Scaling
In this paper, we assume that a system has multiple operation modes due to DVFS feature, where the operating frequency and voltage can be modulated. For simplicity, we first assume that there are infinitely many operation modes available, among which one is chosen at each iteration. It will be shown that the proposed technique can be applied to a discrete DVFS as well in Section 5. The operation mode at the th iteration is represented with the speed scaling factor ranging from to (). Then, the operating frequency of the th iteration, , iswhere is the maximum frequency of the microprocessor.
The workload is defined to be a number of clock cycles elapsed to complete the given computation. We denote the number of cycles elapsed to handle the workload of the th iteration at the full speed of the microprocessor () as . That is,Note that the elapsed time increases as the speed is scaled down (). Then, the delay is automatically determined when a speed scaling factor is chosen for the given workload ().
3.1.3. Real-Time Constraint
The delay cannot be unboundedly long as the system is associated with real-time constraint . For all iterations, the elapsed time should be no more than :
3.1.4. Delay-Workload Dependency
As stated earlier, the workload is dependent upon the previous execution delay. Usually, the workload is not a continuous function of the delay variation. Rather, the changes happen in a discrete manner. Therefore, the workload at the th iteration is a monotonically increasing staircase function of the delay of the previous iteration, : . If the given system has workload levels, the workload function can be formulated as follows: in which the workload levels are and the delay thresholds (workload changing moments) are .
3.1.5. Execution Trace
At the th iteration, the speed scaling factor uniquely defines an execution mode as the delay is fixed accordingly by (2). The initial workload is assumed to be given as . Then, an execution trace of length is defined to be a sequence of the speed scaling factors of iterations:
3.1.6. Average Power Consumption
The dynamic power consumption of CMOS circuits is , where , , and are capacitance, operating voltage, and frequency, respectively. As the operating frequency is proportional to , the power consumption is an increasing function of . It is worth noting that the proposed model is not dependent upon any specific DVFS model. We denote the energy consumption of a unit workload at the full speed () as and assume that energy dissipation grows linearly to the size of workload. Then, the reference energy of a workload at the full speed is . Given a DVFS energy model as a function of the speed scaling factor , the energy consumption at the th iteration is formulated as follows:in which and . Then, the average power consumption of a trace can be formulated as follows:It is worthwhile to mention that the proposed technique is not specific to a certain workload-energy model. While we adopt linear model for the workload-energy relation for ease of presentation, any, possibly nonlinear, model can be used in (6).
3.2. Problem Formulation
Our objective is to minimize the average power consumption of a given system as follows: Given the modeling constant , DVFS energy modeling function , workload function , and the real-time constraint , determine an execution trace such that the average power consumption formulated in (7) is minimized.
4. Proposed Technique
In this section, we describe the proposed operation management policy as an answer to the problem defined in the previous section. In doing so, we first derive the condition for feasible and schedulable systems. Then, we study when the workload changes and how it affects the power dissipation. Based on that, we propose a novel graph representation that captures all possible workload transitions in the power-optimal operation. Finally, we derive the power-optimal operation policy with the given workload function .
In this subsection, we examine under which condition a given system is feasible. First, the system should be schedulable within the given real-time constraint at every iteration.
Theorem 1 (schedulability). Given the workload function and the real-time constraint , the system is not schedulable if , .
Proof. Suppose that the delay at the th iteration is . Then, and . Since , . That is, the delay is increasing as iteration goes by and will eventually reache the real-time constraint: . At the next iteration, the system becomes unschedulable even with the full speed, as requires .
Once the workload gets bigger than , the system is trivially not schedulable afterwards even with the full speed, . Thus, the workload must not be bigger than at any time. Moreover, once the workload reaches , should remain the full speed afterwards. We can make the upper bound of workload even tighter if there exists such that for all . In this case, the workload larger than is not allowable as it makes the execution delay longer and longer, eventually violating the deadline.
Given the workload function and the initial workload , one can calculate the lower bound of the workload as well. If a value exists which satisfies and , the workload will never become smaller than . In other words, even with the full speed, the execution delay never goes below .
Then, the valid workload levels and the execution delay range during the lifetime of a given system can be formulated as below.
Definition 2 (valid ranges). Given the workload function and the initial workload , the minimum and maximum workload levels of a system are defined to beThen, the valid range of the execution delay is formulated as according to and with
4.2. Workload Transitions
In this subsection, we examine when a workload transition between valid workload levels possibly occurs and how it affects the system.
As presented in (4), workload is a function of the delay taken at the previous iteration. If the delay taken at an iteration is and . Then, if the system works fast enough to result in a shorter delay, , the next workload will get smaller than . Similarly, in case that the delay gets longer (), the system will need to handle a larger workload than at the successive iteration.
However, such workload transitions can occur only within limited ranges. Figure 1 depicts valid and invalid transitions from one workload level. Figure 1(a) shows two transitions from a workload level to lower ones and (). To make the next workload level , the delay should be in the range of . Given the current workload , the speed scaling factor should be larger than or equal to from (2). If , this workload transition can possibly occur. In contrast, if for another workload level , the transition from to never happens because the delay never goes below even with the full processing speed.
The same principle is also applied to the transition from a workload level to higher ones. If the delay can be lengthened properly with a speed scaling factor within the range , the transitions are valid. Figure 1(b) illustrates that the transition from to is valid, while the one to is not. One can tell if a transition can happen or not with the following definition.
Definition 3 (valid transition). A workload transition from to is said to be valid if and .
4.3. Workload Transition Graph
The essential difficulty of the presented power optimization problem lies in the fact that two conflicting forces should be handled at the same time. In order to minimize the power, on one hand, it tries to scale down the speed (thus lengthen the delay) as much as possible as described in (2) and (6). On the other hand, the lengthened delay is not desirable as it imposes a bigger workload in the successive iteration, as shown in (4).
Therefore, no one simple intuition can be exploited to solve the problem. Rather, we need to compare different modes in a comprehensive way. In order to be able to explore all possible execution modes and quantify their effects, we need to devise a data structure that includes elementary information on how workload transitions change the system status and power dissipation. In line with that purpose, we propose a graph representation of the workload evolution, Workload Transition Graph (WTG), which captures all possible transition scenarios of the workload changes during the lifetime of a system.
A valid workload transition from one workload level to another can be caused by any delay within the corresponding range. A transition from to in Figure 1(a), for instance, can be caused by any delay within the range of . In other words, when handling a workload of , any scaling factor within the range of can cause the transition. In the power-optimal execution trace, however, only one specific scaling factor is always chosen for a certain transition even though it happens many times. We show this in the next theorem.
Theorem 4 (optimal scaling factor). In the power-optimal trace , if the workload level handled by is and the next workload level is ,
Proof. We prove this by contradiction. Let us suppose that there exists a power-optimal trace where (which results in the transition from to ) is not equal to . Then, we make a new execution trace from by replacing with . Note that all other execution modes in are the same as in . That is,By the definition of the scaling factor,since the transition is from to . Thus,and accordingly we obtainThen, from (6) and (7), the average power in is lower than that of and this contradicts the proposition.
From Theorem 4, we know that only one speed scaling factor is associated with all transitions of a certain type in the power-optimal operation scenario. So, we define a scaling factor of a valid transition as follows.
Definition 5 (scaling factor of a transition). The scaling factor of a valid workload transition from to is
Now, we define WTG.
Definition 6 (Workload Transition Graph). WTG is defined to be a graph , where and are the sets of vertices and edges, respectively. Each valid workload level forms a vertexwhile a valid transition from a workload level to another forms an edge between vertices. That is, there exists an edge from to corresponding to a valid transition from to :We denote the source and destination vertices of an edge as and , respectively.
With these definitions, a power-optimal execution trace can be represented as a walk (a sequence of vertices where any pair of consecutive vertices are connected through an edge) of length in WTG. In a different form, the execution trace is a sequence of edges in WTG of length , where the workload transition from to is caused by the execution mode .
Algorithm 1 shows how WTG is generated out of the given workload function and the initial workload . After the initialization, all valid workload levels are added as vertices in lines . Then, for each permutation of two workload levels (lines ), it is checked if the in-between transition is valid or not in line . If valid, it is added as an edge in line .
Let us take Figure 2 as an example of workload function . Once given the initial workload, one can easily get the valid workload levels according to (8) and (9). When , for instance, and . Figure 3(b) illustrates the corresponding WTG of the workload function shown in Figure 2 with the initial workload of . It is not a complete graph as some pairs of vertices cannot be connected directly since it is not valid according to Definition 3.
The feasible delay range to handle workload is highlighted in Figure 2, justifying that vertex has three outgoing edges to , , and itself. To be more specific, when the workload is given as , the shortest possible delay is the case when the speed is chosen as . Then, the delay ( coordinate of the cross point of and ) is between and as shown in Figure 2. This means that the lowest possible workload in the next iteration is . Similarly, the biggest possible workload can also be calculated as as the biggest possible delay is between and . Note also that some of the vertices may not have an edge directed to itself (self-loop) such as vertex . It has outgoing edges only to higher workload levels. This means that the computation burden of that state is so big that it only results in higher workload levels at the next iteration even with the full speed.
Different initial workload levels may result in different WTGs as shown in Figures 3(a)–3(c). The WTG derived from the initial workload level of is illustrated in Figure 3(a). In contrast to Figure 3(b), vertex is included in the graph. The WTG derived from higher initial workload levels, , , and , is shown in Figure 3(c). Note that the WTGs in Figures 3(a) and 3(c) are not strongly connected. Vertex in Figure 3(b), for example, is not reachable from . However, from the definition of valid workload levels, all vertices are reachable from the initial workload level. This property is important for deriving the optimal operation policy that will be presented in the next subsection.
4.4. Proposed Operation Policy
In this subsection, we present the proposed operation policy that compromises the energy-delay tradeoff caused by the delay-workload dependency.
As stated earlier, a power-optimal execution trace can be represented as a walk of WTG. Then, we have the following definition.
Definition 7 (corresponding walk). Given the power-optimal execution trace and its initial workload , the corresponding walk of the trace is denoted as where and (for simplicity, we also denote it as for the rest of the paper). The average power consumption of the walk can be formulated as follows:where is the total delay elapsed for traversing the walk, , and is the total energy consumption for the walk, .
A cycle (closed walk) of WTG is a walk whose starting and ending vertices are the same. That is, a walk is a cycle if . Hence, if the corresponding walk of a trace in WTG is a cycle, the trace is ever repeatable. We argue that, in case that the length of the trace is sufficiently long (), the average power consumption is minimized when the cycle which minimizes (20) repeats over and over again in the trace.
Theorem 8 (optimal cycle of WTG). Suppose that a cycle of WTG,has the minimum value of (20) among all cycles of the WTG. Then, if the length of the trace is long enough, the average power consumption of the optimal trace converges to
Proof. An arbitrary walk of a WTG, , can be decomposed into a path (a walk with distinct vertices) from to and a set of cycles (see, e.g., Section 10.3 of ). Figure 4 depicts a walk example of length 8, where the initial workload level is and the last vertex that it traverses is . If a path from to , , highlighted in the dashed arrows, is removed from the walk, the remaining part is a set of cycles.
Now, consider a power-optimal trace of length that starts from the workload level of . The corresponding walk of can be decomposed into a path starting at and a set of cycles . Then, the average power consumption of the trace is Since contains at most edges by definition, and can be upper bounded by a certain value. Then, if is sufficiently large, and . That is, if ,Note also that the average power consumption of every cycle in is not smaller than by definition:Therefore, the average power consumption of the power-optimal trace , , will get infinitesimally close to .
From Theorem 8, it is understood that changing the DVFS mode of the given system following the optimal cycle presented above results in asymptotically optimal average power consumption. Thus, we propose following rules for DVFS operation policy:(i)If the current workload level vertex is in the optimal cycle , just follow the cycle repeatedly ever; that is, at the th iteration, the speed scaling factor is chosen to be such that is in the optimal cycle and .(ii)Otherwise, take a path that has the minimum value and is in the optimal cycle . In other words, try to get in the optimal cycle with the minimum cost.The optimal cycle of WTG can be searched by using an existing cycle enumeration algorithm. In this paper, we use the one proposed by Tarjan . The minimum path to the optimal cycle, , can also be searched by simple enumeration. It is worthwhile to mention that we cannot simply use the minimum weight cycle searching algorithm as the weight is not simply a summation of the weights but a complex function of them as presented in (20).
Figure 5 shows the proposed operation policy in a flowchart diagram. Given the initial workload and the workload function, we generate , from which and are derived in the next steps. It is worth mentioning that this can be done in a tractable time and is just one-time effort taken offline. As long as the initial workload vertex is not included in , the system follows the trace represented in until it reaches the optimal cycle of WTG. Then, it simply repeats the trace implied in from that iteration on.
5. Extension to Discrete DVFS
Whilst we assume a continuous DVFS model for ease of presentation and generality, modern microprocessors in reality have finite DVFS modes with a set of predefined operation voltages and frequencies. In this section, we show that the continuity of the model presented in (1) can be relaxed by modifying Definitions 3 and 5 without harming the effectiveness of the proposed technique.
Let us suppose that we now have a system with a discrete and finite DVFS model, where only can be chosen as a speed scaling factor at the th iteration. Valid transitions, in Definition 3, are redefined: the transition is valid if there is that meets the same requirement.
Definition 9 (valid transition in discrete DVFS). Given a set of feasible scaling factors , a workload transition from to is said to be valid if
Likewise, the scaling factor of a valid transition, in Definition 5, is also reformed to be minimum that keeps the elapsed delay fallen into the range which results in the same transition. One has the following definition.
Definition 10 (scaling factor in discrete DVFS). Given a set of feasible scaling factors , the scaling factor of a valid workload transition from to is
In this section, we validate the proposed model and operation policy with experimental analysis and simulations.
6.1. A Case Study: Object Tracking
It is firstly shown that the proposed workload-delay dependency is evidently observed in an object tracking application. The performance of a commonly used object tracking method  is profiled, as a tracking solution, using the publicly available implementation . We choose Exynos5422  as the target mobile embedded computing platform, which has 2 GB main memory running Linux operating system. The actual power dissipations of the processor are measured individually for five different DVFS modes. That is, and at the maximum speed the core is operating at 2 GHz. Leveraging on a priori knowledge on the maximum speed of the object, the maximum distance that the object could have moved between two iterations is calculated. In this experiment, it is assumed that the object’s speed never exceeds 10 pixels/ms. If an iteration takes ms, for instance, the search area for the next iteration is given as a square with the side length of pixels. This search area is growing in a discrete manner at every 5 ms. The real-time constraint is set to 25 ms. The workload function in a shape of staircase is illustrated in Figure 6 with five guiding lines, each of which denotes for .
We compare the proposed power management policy with two others. The first one is ALAP, where the speed is chosen to be the slowest one with respect to the real-time constraint. The other comparison is made against a stable trace with the maximum speed as another extreme (ASAP). When the initial workload is , that is, , ASAP and ALAP result in the average power consumption of W and W, respectively. The average power consumption of the proposed power management policy outperforms the others as W. The optimal cycle of its WTG is the self-loop of node , which implies a stable operation mode, .
6.2. Stable versus Alternating Operation Modes
There are two kinds of cycles in WTG: the first one is a self-loop which implies a stable operation mode, where no mode changes happen over the edge. Other than these self-loops, the WTG has one non-self-loop cycle as well. From the perspective of the operation policy, this non-self-loop cycle implies a predefined sequence of mode changes that can repeat over and over again. We call this alternating operation mode. Many examples including a real-life application shown in the previous subsection, as well as one presented in , tend to be optimal in a stable operation mode. However, in principle, the optimal solution cannot be achieved in a stable mode in some configurations. In order to illustrate this, we show a counter example shown in Figure 7 and apply the proposed technique to the example, with a DVFS power modeling function of and . The delay threshold and the real-time constraint are , , , and , respectively. Figure 7(b) shows the derived WTG for the synthetic example.
The average power consumption of all self-loops in the synthetic example is tied to W. On the other hand, an alternating operation mode implied in shows the least average power consumption in the discrete model. This is due to the fact that a computer system cannot simply operate on an ideal design point. In case that the theoretically optimal design point cannot be captured by a commodity hardware, the proposed technique is particularly useful. It can effectively explore design space and find out the best one in alternating modes.
In principle, a stable operation mode is the case that the staircase workload function has a crossing point with . If this is sufficiently small, it is likely to be a near-optimal operation mode. However, it does not always result in a near-optimal power consumption. Particularly, in case that only a limited number of DVFS modes are available in a microprocessor, this which crosses the current workload level may not exist in .
This paper formulates the delay-workload dependency in power optimization problem of embedded systems as a staircase function of the delay taken at the previous iteration. In applying it to the power optimization of DVFS-enabled electronic devices, a novel graph representation, called WTG, is proposed for exploring all possible workload/mode changes. Then, it is shown that the power optimization problem is equivalent to finding a cycle of the graph that has the minimum average power consumption. The effectiveness of the proposed operation policy is proven by the power simulations of synthetic and real-life examples. It has been observed that staying in a low speed scaling factor in a stable operation mode is often the best discipline (self-loop in WTG). However, alternating modes, where the DVFS modes change over a predefined pattern, sometimes outperform the stable ones.
A preliminary version of this paper appeared in July 2015 at the International Symposium on Low Power Electronics and Design (ISLPED), under the title of “Modeling and Power Optimization of Cyber-Physical Systems with Energy-Workload Tradeoff” .
The authors declare that they have no competing interests.
This work was supported by ICT R&D program of MSIP/IITP (B0101-15-0661, the research and development of the self-adaptive software framework for various IoT devices), Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2013R1A2A2A01067907), and the new faculty research fund of Ajou University.
- E. A. Lee, “Cyber physical systems: design challenges,” Tech. Rep. UCB/EECS-2008-8, EECS Department, University of California, Berkeley, Calif, USA, 2008.
- J. Kleissl and Y. Agarwal, “Cyber-physical energy systems: focus on smart buildings,” in Proceedings of the 47th Design Automation Conference (DAC '10), pp. 749–754, ACM, Austin, Tex, USA, June 2010.
- J. S. Kim, D. H. Yeom, and Y. H. Joo, “Fast and robust algorithm of tracking multiple moving objects for intelligent video surveillance systems,” IEEE Transactions on Consumer Electronics, vol. 57, no. 3, pp. 1165–1170, 2011.
- D. K. Park, H. S. Yoon, and C. Sun Won, “Fast object tracking in digital video,” IEEE Transactions on Consumer Electronics, vol. 46, no. 3, pp. 785–790, 2000.
- J. Pestana, J. L. Sanchez-Lopez, P. Campoy, and S. Saripalli, “Vision based GPS-denied object tracking and following for unmanned aerial vehicles,” in Proceedings of the 2013 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR '13), pp. 1–6, IEEE, Linkoping, Sweden, October 2013.
- J. Agrawal, Y. Diao, D. Gyllstrom, and N. Immerman, “Efficient pattern matching over event streams,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '08), pp. 147–160, ACM, Vancouver, Canada, June 2008.
- J. Barbič and D. James, “Time-critical distributed contact for 6-dof haptic rendering of adaptively sampled reduced deformable models,” in Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA '07), pp. 171–180, Eurographics Association, San Diego, Calif, USA, August 2007.
- J. Barbič and D. L. James, “Six-DoF haptic rendering of contact between geometrically complex reduced deformable models,” IEEE Transactions on Haptics, vol. 1, no. 1, pp. 39–52, 2008.
- P. Padmanabhan and K. G. Shin, “Real-time dynamic voltage scaling for low-power embedded operating systems,” SIGOPS—Operating Systems Review, vol. 35, no. 5, pp. 89–102, 2001.
- T. Mudge, “Power: a first class design constraint for future architectures,” in High Performance Computing—HiPC 2000, pp. 215–224, Springer, 2000.
- H. Yang and S. Ha, “Modeling and power optimization of cyber-physical systems with energy-workload tradeoff,” in Proceedings of the 20th IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED '15), pp. 315–320, Rome, Italy, July 2015.
- H.-C. An, H. Yang, and S. Ha, “A formal approach to power optimization in cpss with delay-workload dependence awareness,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 35, no. 5, pp. 750–763, 2016.
- P. Bogdan and R. Marculescu, “Cyberphysical systems: workload modeling and design optimization,” IEEE Design & Test of Computers, vol. 28, no. 4, pp. 78–87, 2011.
- F. Zhang, K. Szwaykowska, W. Wolf, and V. Mooney, “Task scheduling for control oriented requirements for cyber-physical systems,” in Proceedings of the Real-Time Systems Symposium (RTSS '08), pp. 47–56, Barcelona, Spain, December 2008.
- Y. V. Pant, H. Abbas, K. Mohta, T. X. Nghiem, J. Devietti, and R. Mangharam, “Co-design of anytime computation and robust control,” in Proceedings of the 2015 IEEE Real-Time Systems Symposium (RTSS '15), pp. 43–52, IEEE, San Antonio, Tex, USA, December 2015.
- D. Goswami, R. Schneider, and S. Chakraborty, “Co-design of cyber-physical systems via controllers with exible delay constraints,” in Proceedings of the 16th Asia and South Pacific Design Automation Conference, pp. 225–230, IEEE Press, Yokohama, Japan, January 2001.
- A. Schrijver, “Combinatorial optimization: polyhedra and efficiency,” Discrete Applied Mathematics, vol. 146, pp. 120–122, 2005.
- R. Tarjan, “Enumeration of the elementary circuits of a directed graph,” SIAM Journal on Computing, vol. 2, no. 3, pp. 211–216, 1973.
- J.-Y. Bouguet, “Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm,” vol. 5, pp. 1–10, 2001.
- G. Bradski, “The opencv library,” Doctor Dobbs Journal, vol. 25, no. 11, pp. 120–126, 2000.
- Samsung Exynos, Octa 5422, 2015, http://www.samsung.com.
Copyright © 2016 Hoeseok Yang and Soonhoi Ha. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.