Abstract

Due to the development of mobile technology and wide availability of smartphones, the Internet of Things (IoT) starts to handle high volumes of video data to facilitate multimedia-based services, which requires energy-efficient video playback. In video playback, frames have to be decoded and rendered at high playback rate, increasing the computation cost on the CPU. To save the CPU power, dynamic voltage and frequency scaling (DVFS) dynamically adjusts the operating voltage of the processor along with frequency, in which appropriate selection of frequency on power could achieve a balance between performance and power. We present a decoding model that allows buffering frames to let the CPU run at low frequency and then propose an algorithm that determines the CPU frequency needed to decode each frame in a video, with the aim of minimizing power consumption while meeting buffer size and deadline constraints, using a dynamic programming technique. We finally extend this algorithm to optimize CPU frequencies over a short sequence of frames, producing a practical method of reducing the energy required for video decoding. Experimental results show a system-wide reduction in energy of , compared with a processor running at full speed.

1. Introduction

The Internet of Things (IoT) allows physical objects to interact and cooperate with one another by exchanging data, and multimedia-related services based on the IoT are now gaining popularity in various applications areas [1]. For example, users of home security systems now see images from cameras on a smartphone, and telemedicine systems allow doctors to monitor a patient’s health using video communication.

To support multimedia applications within the IoT, the characteristics of video need to be considered carefully. For example, the amount of data involved requires the use of compression techniques for codecs, but encoding and decoding processes are computationally intensive. Video transmission is a real-time process, which requires continuously periodic decoding to avoid distorted playback. Most importantly, mobile IoT devices have a limited energy budget, making the energy requirements of video transmission an important issue.

An effective way of reducing CPU power consumption is to use a dynamic voltage and frequency scaling (DVFS) technique, which adjusts the operating voltage and frequency of the processor [24]. Because the energy dissipated by the CPU scales quadratically with the supply voltage, reducing the voltage saves a lot of energy but also slows program execution, so that an appropriate compromise is always required.

In video playback, frames have to be decoded and rendered at playback rate to avoid a loss of quality. For example, to play a video at 25 frames per second, a frame must be decoded every 40 ms. This decoding process needs to finish within this period, but workload imposed by each frame varies significantly with video content [510].

In most previous work on the application of DVFS to videos, the lowest frequency that satisfies the deadline of the decoding time is chosen to reduce power consumption [11], but more energy can be saved by introducing flexibility in timing, by means of buffering techniques: if several frames are decoded in advance, the CPU can operate at lower frequencies on average, but buffering comes with its own costs [12]. Therefore, power saving is only effective with an appropriate frequency selection method subject to buffer constraints, but previous work took no account of this issue.

We propose a new scheme that determines the CPU frequency needed to decode each frame, which minimizes energy consumption while avoiding buffer overrun. We start by developing a video playback and energy model, formulate the energy optimization problem, and go on to use a dynamic programming technique to determine a sequence of frequencies. We finally present experimental results based on measurement of smartphone energy consumption and decoding times.

The rest of this paper is organized as follows. We present related work in Section 2 and the system model in Section 3. We formulate an optimization problem in Section 4, propose a new frequency selection algorithm in Section 5, and extend it in Section 6. We assess our scheme in Section 7 and finally conclude the paper in Section 8.

CPU power management has been the subject of a lot of research, and most of the resulting techniques involve either dynamic power management (DPM) or DVFS. DPM puts an idle CPU into sleep mode [13], whereas DVFS reduces the voltage and frequency of an active CPU [2, 4]. DPM is not generally suitable for real-time applications that run continuously, because the idle intervals are too short to allow the CPU to enter sleep mode [8]. Therefore, we only review previous works about DVFS only in this section.

DVFS techniques can be classified into interval-based and task-based algorithms [7, 14]. Interval-based schemes monitor the CPU load at intervals and respond by changing the CPU frequency and voltage. A representative scheme is the Linux Ondemand governor, which adjusts frequency periodically based on CPU utilization in the preceding interval [15]. Another scheme is LongRun [16] which varies the frequency to suit the measured utilization. These methods are typically easy to implement but can make inaccurate predictions based on the assumption that loads are similar to recent loads [14].

Task-based schemes can overcome this problem to some extent by classifying tasks into several types to which different frequency selection policies are applied. Ayoub et al. [17] manage frequency and voltage to meet a performance target, expressed as a fraction of maximum system performance. Flautner and Mudge [18] propose a method that chooses a CPU frequency for each task based on its recent computational requirements. Seo et al. [14] present a frequency allocation method to reduce the average response time of tasks. However, all of these methods have been developed for general workloads and therefore may not be suitable for multimedia applications with real-time constraints.

DVFS techniques for real-time systems are generally integrated with real-time scheduling [24]. Based on the analysis of worst-case execution times, they select CPU frequencies that satisfy the real-time constraints; but tasks are often complete before their worst-case execution times, so several algorithms incorporate methods of reclaiming the unused time [2, 4]. The CPU starts each period running at a frequency which will meet the worst-case demands and the frequency is then reduced in response to the actual computation requirement.

Several groups have investigated DVFS techniques for video applications [6, 7], in which the key issue is to estimate the computational requirements of successive frames. Most of these techniques predict the workload required to decode a frame from the workloads incurred in decoding previous frames and adjust the CPU frequency. The accuracy of these schemes has been improved by feedback mechanisms, which take previous prediction errors into account [19].

It has been widely observed [510] that frame decoding times vary significantly. For example, some of the frames in an MPEG video can take ten times as long to decode as an average frame [20]. That makes it difficult to estimate the computational requirements of successive frames to meet their deadlines [57]. Several workload estimation techniques have been proposed for video applications [57, 11, 19], and they can be categorized [5] into methods which make use of the relationship between the amount of data in a frame and decoding time and methods which predict decoding times based on recent times and aim to correct prediction errors using a feedback mechanism.

A close relationship between frame size and decoding time has been widely observed [5, 6, 11], especially in videos encoded with MPEG-style compression, and this relationship allows decoding times to be predicted with reasonable confidence. For example, Liu et al. [6] established a linear relationship between frame size and decoding time and used it to predict decoding times, while Yang and Song [11] improved the accuracy of this approach by introducing a logarithmic relationship, and Bavier et al. [21] used it to predict decoding times. Lee et al. [5] introduced particle-filter techniques to further improve the accuracy of this approach for H.264 codecs.

Yuan et al. [810] proposed several DVFS techniques in which the CPU speed is adjusted on the basis of a statistical analysis of past workloads. Urunuela et al. [7] developed a history-based DVFS technique, but it is tailored to video kiosks rather than general video players. Choi et al. [22] adopted a hybrid approach in which different DVFS policies are applied depending on the characteristics of each frame. Im and Ha [12] presented DVFS techniques in which buffers were used to reclaim unused CPU time, and Huang et al. [20] introduced a method of predicting decoding times from offline analysis of frame characteristics.

Most of these techniques do not consider the characteristics of video playback, in which some deadline misses and frame skipping are acceptable. Kim et al. [23] presented a DVFS scheme specifically for scalable video coding (SVC) codecs, which makes use of temporal scalability. The scheme put forward by Yang and Song [11] acknowledges the effect of the ratio of deadline miss on energy consumption, but this paper does not provide a satisfactory solution that selects the appropriate frequency while minimizing energy consumption, nor does it examine how buffering affects power consumption.

3. Model

3.1. System Model

To support periodic nature of video playback, a video player decodes frames per second, and so the decoding period of a frame, , is . The Notations explain important symbols used in this paper. Suppose that a CPU supports frequency levels and that level is the frequency (). If , then , so that is the highest possible frequency. Let be the number of frames decoded in a video. Let and , respectively, be the active and idle power consumption of the system at frequency level .

We will assume that the decoding time of each frame is known in advance: decoding times can be predicted by an offline analysis of the bitstream of a video [20] or by formulating a relationship between frame size and decoding time [5, 6, 11]. This decoding time information can be inserted into the header of a video [20], and we assume that these frames are available to our frequency selection algorithm. Specifically, is the decoding time of frame at frequency level .

3.2. Video Playback Model

Frame-level DVFS is appropriate for a media player [57, 11], which then selects the frequency which best matches the CPU workload imposed by the current frame, before that frame is decoded. The CPU does not change its frequency until the frame has been decoded.

Figure 1 shows our video playback model. The decoding task produces frames at playback rate and passes them to a buffer which stores frames for consumption by a display task, which fetches frames at playback rate. If there was no buffer, then only one frame can be handled by the display task, so the decoder enters sleep state until the frame is consumed by the display task. However, if a number of frames can be stored in the buffer, then decoding can run late, allowing lower frequencies to be selected, but this flexibility is limited by the size of buffer. For example, suppose that the buffer can accommodate frames. If there are already frames in the buffer, then decoding of a new frame must be delayed until the next decoded frame has gone to the display task. For example, consider Figure 1, where . If the buffer already contains 4 frames, then the decoder enters sleep state until frame has been consumed by the display task.

To explain how this buffering technique can decrease CPU power consumption, consider a CPU with 4 frequency levels of 0.8 GHz, 1.2 GHz 1.6 GHz, and 1.8 GHz. We assume that ms and that the process of a frame requires 36 ms at level 4, 40.5 ms at level 3, 54 ms at level 2, and 81 ms at level 1. If there was no buffer, then frequency level 4 must be chosen for every frame to keep the decoding time within 40 ms, as shown in Figure 2(a). However, if there is a buffer, which contains frames decoded when playback starts, then frequency level 1 can be selected for the first three frames, level 2 for the next two frames, and level 3 for the final frame as shown in Figure 2(b), without violating deadlines.

4. Problem Formulation

We formulate an optimization problem with a solution which will minimize energy consumption subject to the constraints of buffer size and decoding deadlines. Frame must be decoded before its deadline , which is . At , frame leaves the buffer to be displayed on the screen. Let be the frequency level selected for decoding frame ; and let be the earliest possible time at which the decoding of frame can start. Because the buffer can contain frames, the decoding of frame can start at (i.e., ), at which frame can be removed from the buffer and displayed, allowing a new frame to be decoded and stored in the buffer. Thus, can be expressed as follows:

Let be the length of time by which the decoding of frame would overrun if frequency level is chosen, relative to the start time of the next frame . The value of is initialized to 0. This overrun can be expressed as follows:If the decoding of frame indeed finishes after , then is the time difference between the actual time at which the decoding of frame finishes (i.e., ) and , when frequency level is chosen for frame . Conversely, if time remains after the decoding of frame , the CPU enters its idle state and remains in this state until , when is set to 0.

If , then the CPU stays in its idle state for the length of , which can be expressed as ; otherwise, is set to . The determination of can thus be summarized as follows:The energy consumed during the period of while frame is decoded at frequency level is written as , which can be expressed as follows:

At this point, we must introduce a further variable , which is the difference between the actual time at which the decoding of frame finishes () and , and this difference can be expressed as follows:Figure 3 shows the relationship between , , and in a short sequence of frames.

The decoding of frame must start after and finish before . We can express this period for each frame    as , so that . Each frame must be decoded before its deadline , so . Our frequency selection policy has to minimize the total energy consumption . We can now formulate this frequency selection problem that determines    as follows:

5. Frequency Allocation Algorithm

5.1. Algorithm Concept

We now propose an algorithm to solve the problem using a dynamic programming technique. We will use a resolution of 1 ms for time values such as . Let be the minimum amount of energy when is milliseconds and frame is decoded ( and ). Let be the frequency level required to achieve the energy consumption of ; further, let be the corresponding value of and the value of .

The main idea of the dynamic programming is to construct a table of the optimal energy for each frame when (), as described in Table 1, where the minimum value in the final row, , represents the amount of energy consumed by the optimal frequency allocation. For this purpose, we first initialize the values of and then develop the recurrence relationship between consecutive frames so as to find all the values of in the table.

We also maintain a two-dimensional array of tuples () which leads to the minimum energy of as illustrated in Figure 4. Using this array, a backtracking phase starts from frame to frame 1 to select frequency of every frame. For example, Figure 4 shows an array of these tuples when and . Suppose that the third column in the last row has the minimum energy value. Because points to the column index of the previous frame, a sequence of frequencies can be selected as follows: . Likewise, our dynamic programming algorithm has three phases: initialization, establishment of recurrence relation, and backtracking.

5.2. Initialization

For initialization, consider the following:(1), , , and are all initialized to (, and ).(2), where , the values of , , and are calculated from (2), (3), and (4), respectively. Next, is replaced with ; then, is updated to , and is replaced with frequency level .

5.3. Establishment of Recurrence Relation

During the recurrence establishment phase, , , , and and are updated as follows:(1)For each value of and and , we maintain a two-dimensional array, ( and ). The following steps are repeated to find the value of , if :(a)Calculate the value of using (2), after replacing with .(b)Using the resulting value of , calculate from (3).(c)Using the resulting value of , calculate from (4) and use this value of to update .(2), , , and are updated as follows:

5.4. Backtracking

We find values of using a backtracking technique as follows:(1) is initialized to , and is set to , so that represents the amount of minimum energy consumption.(2)While , the following procedures are repeated: is set to , is substituted for , and is decremented by .Pseudocode for this frequency selection algorithm (FSA) is presented as Algorithm 1. If is the maximum round length so that , we can easily see from Algorithm 1 that the complexity of FSA is .

()Temporary variables: , , and ;
()for   to   do
()for   to   do
()  , , and ;
()end for
()end for
()for   to   do
()for   to   do
()  if   and   then
()    Calculate , and using (2), (3), and (4), respectively;
()    ;
()    ;
()    ;
()   end if
()  end for
() end for
() for   to   do
()  for   to   do
()   for   to   do
()    if    then
()     Calculate the value of from (2) by replacing with ;
()      is calculated from (3) by replacing with ;
()      is calculated from (4), and is updated using this value of ;
()    end if
()   end for
()  end for
()  ;
()  ;
()  ;
()  ;
() end for
() ;
() ;
() while    do
()  ;
()  ;
()  ;
() end while

6. Algorithm Execution

If the frame decoding time is known in advance, then FSA can run without modification. For example, before playback, frequency allocation table during the entire playback can be obtained as a result of algorithm execution. However, since the algorithm complexity depends on the number of frames to be decoded, we divide the algorithm into iterations and limit the number of frames taken by the algorithm to   . Therefore, at the beginning of the th iteration, the algorithm chooses the frequency for frames between and , which we call FSA-split, as shown in Algorithm 2.

()Temporary variables: , , and ;
()Input parameter from the previous iteration ():
();
()for   to   do
()for   to   do
()  , , and ;
()end for
()end for
()for   to   do
()  for   to   do
()   if   and   then
()   if    then
()    Calculate , and using (2), (3), and (4), respectively, by replacing
     with ;
()   else
()    Calculate , and using (2), (3), and (4), respectively, by replacing with 0;
()   end if
()   ;
()   ;
()   ;
()   end if
()  end for
() end for
() for   to   do
()  for   to   do
()   for   to   do
()    if  then
()     Calculate the value of from (2) by replacing with ;
()      is calculated from (3) by replacing with ;
()      is calculated from (4), and is updated using this value of ;
()    end if
()   end for
()  end for
()  ;
()  ;
()  ;
()  ;
() end for
() ;
() ;
() while    do
()  ;
()  ;
()  ;
() end while

FSA-split has the following characteristics in comparison with FSA:(i)FSA-split determines the frequencies of frames between and .(ii)An initialization part (lines between and in Algorithm 2) takes the length of overrun in the previous iteration () for the calculation of the parameter values.

Several methods were developed for decoding time estimation, most of which predict future decoding times based on recent measured times [57, 11, 19]. The decoding times of the frames in a certain GOP do not change a lot compared with those of its neighboring GOPs [11]. We can therefore predict the decoding times of the next GOP on the basis of those of the current GOP. For example, if is set to the number of frames of a GOP, then the frequency allocation table can be established for the next GOP by passing predicted decoding times of the next GOP to the input parameters of the FSA-split.

7. Experimental Results

7.1. Setup

We performed simulations to evaluate our schemes using power data and timings obtained experimentally. The power consumption of a Samsung Nexus S smartphone (not just the CPU) was measured, and Table 2 shows its active and idle power values. The time taken to decode video frames was also measured for the two videos in Table 3. We compared our scheme with two other algorithms as follows:(1)HF always selects the highest frequency, which is equivalent to no DVFS.(2)LF selects the lowest frequency level which will get each frame decoded in time. This method is a good heuristic, because CPU frequency can be expected to have a monotonic relationship with energy consumption [47, 11].

7.2. Efficacy of FSA

Table 4 shows how energy consumption depends on the number of frames that are buffered. We see that FSA always shows the best performance, using 13% less energy than LF and 27% less energy than HF on average, and increasing the size of the buffer saves more energy, but this amount of energy saved gradually tails off. In particular, even when so that only one additional buffer is used, FSA uses 11% less energy than LF on average, suggesting that the buffer overhead of FSA is not high.

The results in Table 4 can be attributed to FSA’s effective use of the slack times generated by storing decoded frames in the buffer, allowing the CPU to operate at lower frequencies. For example, Table 5 shows the average percentage of the frames in both video clips that are decoded at each frequency; FSA chooses lower frequencies than LF, decreasing energy consumption. FSA chooses the highest frequency (1000 MHz) more often than LF, which increases the idle time, allowing relatively lower frequencies to be chosen than LF. These results suggest that frequency selection has a great effect on energy consumption.

7.3. Efficacy of FSA-Split

To evaluate the efficacy of FSA-split, we examined how the values of affect the energy consumption against different values of as tabulated in Table 6. We see that their energy difference is marginal, exhibiting 1.47% difference at maximum, even when is set to 12 which is the GOP size; increasing the value of decreases the energy gap; and increasing the buffer size increases the energy gap even though the difference is negligible. Although FSA exhibits slightly better performance than FSA-split, it takes all frame parameters for algorithm execution, requiring a lot of computation. These results suggest that FSA-split is a practical method of reducing the energy required for video decoding.

8. Conclusions

We have proposed a new frequency allocation scheme which minimizes energy consumption while avoiding buffer overrun, using a dynamic programming technique. This scheme establishes recurrence relationship between consecutive frames to construct a table of the minimum energy values required to decode each frame and determines a sequence of frequencies required to decode every frame using a backtracking technique. It was extended to optimize CPU frequencies over a short sequence of frames, which gives a basis for energy-saving video decoding in practice.

Experimental results show that it uses less energy than a processor at the highest frequency on average. In particular, it uses 13% less energy, compared to the widely used heuristic which chooses the lowest frequency to get each frame decoded in time. We believe that these results give a useful guideline for low-power video service by providing the minimum bound on power consumption required for video playback.

Notations

:Playback rate of a video (fps)
:Number of frequency levels supported by a CPU
:Decoding period of a video
:Frequency corresponding to the frequency level
:Active power at frequency level
:Idle power at frequency level
:Number of frames that a display buffer can accommodate
:Number of frames decoded in a video
:Decoding time of frame at frequency level
:Decoding deadline for frame
:Earliest possible start time for decoding frame
:
:
:CPU sleep time when frequency level is chosen for frame
:Energy consumed during the decoding period for frame at frequency
:Two-dimensional array for when , , and is the frequency level chosen for decoding frame
:Time between completion of decoding frame and
:Frequency level selected for decoding frame
:Minimum energy consumption in decoding frames 1 to , when  ms
:Frequency level selected for decoding frame to achieve an energy of
:Value of to achieve an energy of
:Value of to achieve an energy of
:Number of frames for which frequencies are determined by FSA-split.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This research is supported by Inha University Research Grant.