Abstract

Recently, HTTP streaming has become very popular for delivering videos over the Internet. For adaptivity, a provider should generate multiple versions of a video as well as the related metadata. Various adaptation methods have been proposed to support a streaming client in coping with strong bandwidth variations. However, most of existing methods target at constant bitrate (CBR) videos only. In this paper, we present a new method for quality adaptation in on-demand streaming of variable bitrate (VBR) videos. To cope with strong variations of VBR bitrate, we use a local average bitrate as the representative bitrate of a version. A buffer-based algorithm is then proposed to conservatively adapt video quality. Through experiments in the mobile streaming context, we show that our method can provide quality stability as well as buffer stability even under very strong variations of bandwidth and video bitrates.

1. Introduction

Recently, video streaming has rapidly gained popularity over the mobile Internet. It is predicted that global video traffic will reach 80 percent of total consumer Internet traffic in 2019 [1]. Besides, HTTP protocol has become a cost-effective solution thanks to the abundance of Web platform and broadband connections [2, 3]. Furthermore, for interoperability of HTTP streaming in the industry, ISO/IEC MPEG has developed “Dynamic Adaptive Streaming over HTTP” (DASH) [4] as the first standard for video streaming over HTTP.

Due to the heterogeneity of communication networks nowadays, adaptivity is a principal requirement for any streaming clients. In DASH, multiple versions of an original video as well as related metadata (e.g., describing bitrates and resolutions) are generated and stored at servers [4, 5]. Based on the information of metadata as well as the terminal and networks, a client can adaptively decide which/when data parts should be downloaded. Currently, the question of video adaptation for HTTP streaming is still an open issue [6].

Various adaptation methods for HTTP streaming have been proposed over the past few years [717]. These methods can be roughly classified into two groups, throughput-based and buffer-based; each has its own strengths and weaknesses [18]. Throughput-based methods decide the version based on the estimated throughput only, while buffer-based methods mainly use buffer characteristics as references for making decisions. Throughput-based methods are usually able to react quickly to throughput variations; however, the streaming quality may be unstable [7]. Meanwhile, buffer-based methods try to maintain a smooth video stream, but may cause sudden changes in video quality when the buffer level drastically drops [811]. It should be noted that as the data size of each segment varies according to the requested version, the buffer size and buffer level should be measured in seconds of media.

So far, existing adaptation methods have mostly focused on CBR (constant bitrate) videos. The research on HTTP streaming for VBR (variable bitrate) videos is still limited. The problem with VBR videos is that even though the throughput is stable, strong fluctuations of video bitrate may result in buffer underflows [12]. Our previous work in [12] is the first study on HTTP streaming that supports VBR videos by estimating both the instant bitrate and the instant throughput. In the context of managed IPTV networks, where the bandwidth is allocated in advance, the delay-quality tradeoff of a VBR video is optimally achieved by replacing some high-bitrate segments with low-bitrate segments [13]. In [9], a buffer-based adaptation method is proposed for VBR video streaming by using a partial-linear buffer prediction model along with a strategy to select versions in different buffer ranges. However, this method does not consider bitrate values, and so it cannot avoid sudden changes of quality when video bitrate and throughput are drastically varying.

In this paper, we present a novel adaptation method which can effectively support VBR videos. By extending our preliminary work in [14], the proposed method can cope with variations of throughput as well as video bitrate. Our method especially takes into account the moving average of bandwidth and video bitrate to provide stable streaming quality without sudden changes. The experimental results in the mobile streaming context show that our approach can provide consistent VBR video streaming with smooth video quality and stable buffer level. To the best of our knowledge, this is the first method that can provide smooth version transitions for VBR video streaming over HTTP. It should be noted that our method does not require the exact values of video segment bitrates to make decisions.

The organization of this paper is as follows. Section 2 presents an overview of HTTP streaming and the related work. The principles of our method as well as the algorithm description are presented in detail in Section 3. Section 4 provides our experimental results and discussions. Finally, conclusion and direction for future work are given in Section 5.

The general architecture of an HTTP streaming system consists of servers, delivery networks, and clients [3, 4]. In MPEG DASH terminology, to support adaptivity, a video is encoded in multiple versions (also called alternatives or representations), each of which is further divided into short segments. Video segments together with metadata are hosted at a server and will be requested by the client. In most cases, for each request from the client, the server will send one segment. Therefore, a video will be delivered by a sequence of HTTP request-response transactions. The version (low or high) of a requested segment is decided based on the metadata and status of terminals/networks. Figure 1 depicts an illustration of media delivery in DASH. More information about the HTTP streaming structure as well as DASH concepts could be found in [2, 4].

In general, an adaptation method needs to answer two key questions: should the current version be maintained? and if not, which version should be switched to? As already mentioned, existing methods can be divided into a throughput-based group and a buffer-based group. Throughput-based methods are different in the ways they estimate or use the throughput. In terms of throughput estimation, the simplest way is to use the measured throughput right after having fully received a segment (called instant throughput) as the throughput estimate of the next segment. Another approach is to use a smoothed throughput measure [3, 10] to avoid short-term fluctuations, which are a drawback of using instant throughput. However, this may cause late reaction of the client to large throughput drops. In [3] we propose a throughput estimation method that has the advantages of both instant throughput and smoothed throughput. Furthermore, other studies also present ways to obtain the estimated throughput based on sampled throughput values and RTT [12], probing or stored data (lookup table) [15]. Once the client obtains the estimated throughput, the version can be decided in many ways. A simple solution is using a safety margin to compute an appropriate version for the next segment [3]. In [7], the version is controlled by a TCP-like mechanism where a measure proportional to the instant throughput is used as the key input.

Buffer-based methods, which mainly use buffer characteristics to decide the video versions, may take into account a throughput estimate as well. A popular strategy of these methods is dividing the buffer into multiple ranges with buffer thresholds , , , () [8, 10, 11]. When the buffer level stays in different ranges, different actions are applied. For instance, methods of [8, 10] try to maintain the current version in a specific buffer range (e.g., ~ for the method of [8] and ~ for the method of [10]) and dynamically switch up/down the quality in other buffer ranges. Meanwhile, the method of [11] chooses video versions following the variations of instant throughput (also with the use of upscaling and downscaling factors) based on different buffer ranges. In [16] we introduce a trellis-based method that represents all possible changes of the versions and corresponding buffer levels in the near future. Thus, this approach can make good decisions on the bitrates of some future segments. In [17], the buffer level deviation and instant throughput are employed as inputs of a proportional-integral controller for adaptation.

Yet, most of the existing methods have been developed only for the context of CBR videos, where the bitrate of a version is constant. Compared to CBR videos, videos encoded in VBR mode have important advantages in terms of quality and network resource usage [20]. However, the variations of video bitrate over time, together with throughput fluctuations, result in a big challenge for HTTP adaptive streaming [12]. Two examples of how bitrates of different versions vary, especially in some scene changes, are illustrated in Figure 2. Detailed information of these videos will be described in Section 4.

Our previous work in [12] is the first study on VBR video streaming over HTTP. Besides throughput estimation, this method also considers the estimation of the instant video bitrate, which can be divided into (1) intrastream estimation and (2) interstream estimation. The former estimates the bitrates of segments within a version, while the latter estimates the bitrates of segments across different versions. This method can provide a very stable buffer and, moreover, can support a CBR-like streaming service from VBR videos. In [9], a buffer-based adaptation method for VBR videos is proposed, where the buffer is divided into multiple ranges. In order not to take into account varying video bitrates, this method uses a partial-linear trend prediction of the buffer level for choosing versions in different buffer ranges. If no significant change of buffer level is estimated, the client will maintain the current version for the next segment. However, this method still causes sudden version changes if the actual buffer level declines drastically.

In this paper, we propose a novel adaptation method that can effectively support VBR videos in on-demand streaming. The distinguishing features of our method include the following:(i)To cope with strong variations of video bitrates, we propose using a local average bitrate as the (moving) representative bitrate of a version. Since this representative bitrate is stable in short term, it helps the client clearly differentiate between the available versions and make a good version selection at each time instance.(ii)Because the client does not have enough information about the segment bitrates of all versions, we provide an algorithm to obtain an estimated representative bitrate of each version.(iii)Regarding the first key question, quality stability is effectively maintained by (1) using a smoothed throughput estimate when the buffer is not in danger, (2) using the representative bitrate, and (3) being conservative in quality switching.(iv)Regarding the second key question, smooth transitions of quality are supported by avoiding jumping simply to the lowest version in panic case and by early switching down when there is a throughput-bitrate mismatch.

3. The Proposed Adaptation Method

In this section, we present a new buffer-based quality adaptation method for VBR video streaming. Some notations along with their definitions used in the paper are provided in Notations.

3.1. Handling Throughput and Bitrate Fluctuations

Generally, based on the measured throughput and the video bitrate, the client should choose an appropriate version for the next segment. Suppose that after receiving the current (or last) segment of version , the client measures the bitrate and throughput of this segment. Now the client will decide the version for the next segment . Our proposed method will also leverage the client buffer to cope with the fluctuations of both the throughput and the video bitrate. Depending on the current buffer level , the client will decide whether the version should be increased, decreased, or maintained.

For the version selection of the next segment, it is necessary to estimate the throughput based on the throughput history of received segments. To avoid the effects of short-term fluctuations of instant throughput, we use a smoothed throughput measure [3, 10] as the throughput estimate for the next segment :where is a weighting value, which is set to 0.1 in this paper.

As video bitrate is highly fluctuating, we propose using a representative bitrate for each version, which can differentiate the available versions and can also be appropriate for maintaining quality stablity. Denote the representative bitrate for version at segment index . In our method, is calculated as the average bitrate of recent segments of version .

The problem is that the client only knows the bitrates of the received segments, which may belong to different versions. So, after receiving each segment , we will estimate the segment bitrates of other versions with the same index . This is enabled by the bitrate estimation method proposed in our previous study [12], where the segment bitrates of other versions are estimated from the bitrate of the received segment using the interstream bitrate prediction. Specifically, the estimated bitrate of version can be calculated from the (actual) bitrate of the received segment with the selected version as follows:where and are the quantization parameter (QP) values of the versions and is an empirical factor used as the compensation for the approximation error of the model [12]. In our notation (see Notations), bitrate can be either the actual bitrate or the estimated one . Once the bitrates of all segments are obtained, the representative bitrate of each version at segment index is calculated as the average of the bitrates of segment and previous segments in that version. The algorithm to calculate the representative bitrates is provided in Algorithm 1. In this algorithm, the current representative bitrate is actually computed using the previous representative bitrate. Also, it should be noted that, when , will be set to . In Section 4, we will investigate how different values of affect the performance of our method.

Input: , ,
Output:
for    do
  // Check if bitrate is the original bitrate
  if    then
    ;
  // Otherwise the bitrate is the estimated bitrate
  else
    Estimate by (2);
    ;
  end
  // Compute
  ;
end
3.2. Adaptation Algorithm

Our algorithm will address the two key questions above, so as to avoid rebuffering and to reduce the number and the degree of (version) switches. Based on the current buffer level of the client, we define four possible cases in a streaming session, which are uptrend, stable, downtrend, and panic cases. In these four cases, the client will be likely to switch up, maintain, switch down, or aggressively decrease the version. For this purpose, our method divides the buffer into three ranges with thresholds and (). Here is the buffer size and also the target buffer level of the adaptation method.

When the current buffer level exceeds , the uptrend case is activated (for the reason why the buffer level could be higher than , please refer to our previous work [18]). However, it is not good if the client frequently goes back and forth between the uptrend case and downtrend case. To avoid this fluctuation, our method switches up to the next higher version () only if the representative bitrate of that version is smaller than the throughput estimate of the next segment (i.e., ; otherwise, the client will maintain the current version.

The stable case is determined by the condition . In this case, the buffer level is judged as in a very safe condition, so no change in quality is needed. The client just maintains the current version to avoid unnecessary switches.

The downtrend case is activated when . In this case, the client needs to carefully decide the requested version in order to avoid buffer underflows as well as sudden quality changes when the throughput and/or the video bitrate change drastically. In general, the version should be decreased; however, it is unnecessary to switch down always when the buffer level is in this range. In this process, we define the target bitrate for the next segment as the highest representative bitrate, which is lower than the throughput estimate: . If the instant bitrate and the representative bitrate do not exceed the target bitrate ( and ), the current version will be maintained. Otherwise, the client will switch down to the next lower version.

We can see that sometimes the instant throughput might be much smaller than the instant bitrate. Even though the buffer is not in danger yet, the downtrend case should be activated earlier to avoid having to quickly reduce the version in the future. This is enabled by increasing the value of . In our method, is controlled using a logistic function as follows:In this way, the larger the mismatch between and becomes, the higher the value of will be, while still satisfying the condition .

The final case, called the panic case, is when the buffer level is in danger; that is, . To ensure that the buffer will not be empty, the client will aggressively switch down the version. Yet, instead of jumping directly to the lowest version, the client employs the method of our previous work [12] in this case. Specifically, the client will choose a version of which the instant bitrate is the highest but still lower than the instant throughput: The algorithm of our method is summarized by pseudocode as in Algorithm 2.

   Input: , , , , ,
   Output:
   // Uptrend case
(1) if    then
   if    then
     ;
   else
     ;
   end
   // Stable case
(2) else if    then
   ;
   // Downtrend case
(3) else if    then
   ;
   if    and    then
     ;
   else
     ;
   end
   // Panic case
(4) else
   Select based on (4);
   end

Generally, it can be seen that our method tries to provide a smooth video stream by two techniques. First, it is somewhat conservative in increasing the quality because that is allowed only when the buffer is full. In addition, the selected version is constrained by the long-term values of throughput and bitrate. Second, the downtrend case is also conservative by avoiding continuously switching down the quality. Meanwhile, in the panic case, we take into account the instant bitrate and instant throughput. Thus, the client can switch down gradually when possible; and there is no need to switch to the lowest version if the instant bitrate of a higher version still meets the throughput constraint.

4. Experimental Results and Discussions

In this section, we will evaluate our proposed method and two reference methods in the context of on-demand VBR streaming over mobile networks, focusing on the behaviors of version switching and buffer level after the initial buffering stage. Two bandwidth traces, a simple one and a complex one, are employed in our experiments.

4.1. Experiment Setup

Our test-bed organization used for the experiments is similar to that of [18], which consists of an HTTP Web server, a streaming client, and an IP network as in Figure 3. The IP network includes a router and wired connections connecting the server and the client. The server is an Apache HTTP server of version 2.2.21 running on Ubuntu 12.04. Our test-bed uses DummyNet tool [21] installed at the client side to emulate network characteristics. The packet loss rate is set to 0%, assuming that the bandwidth trace used in the experiments already takes into account the fluctuations caused by packet loss. RTT value of DummyNet is set to 40 ms. The client is implemented in Java and runs on a Windows 7 notebook with Core i5 2.6 GHz CPU and 4 GB RAM.

The test videos are “Sony Demo” and “Terminator 2” [19] with a frame rate of 30fps and a resolution of 1280 × 720. The duration of each video is 600 seconds. We encode 6 VBR versions by the high profile of H.264/AVC [22], corresponding to 6 different values of QP, namely, 22, 28, 34, 38, 42, and 48. Each version is divided into small video segments of 2 seconds. The version index, QP, and the average bitrate of each version are listed in Table 1. The bitrate traces of the video versions are shown in Figure 2.

As we focus on on-demand streaming, the buffer size of the client is set to 50 s (i.e., 25 segment durations), which is similar to previous work (e.g., [9, 14]). For comparison, two reference methods which are the instant throughput-instant bitrate based method [12] (called ITB) and the buffer-based method with trend prediction [9] (called TBB) are employed. The TBB method is implemented with buffer thresholds , which are the same as the settings in [9]. In our method, and are 10 s and 50 s, respectively. Also, we investigate whether the number of segments used for representative bitrates affects our method’s performance. The values of considered are 10, 30, and 50, and these options of our method are referred to as AVG-10, AVG-30, and AVG-50, respectively.

4.2. Simple Bandwidth Scenario

First, we investigate the performance of the methods in a simple bandwidth scenario, when the available bandwidth drops suddenly. The bandwidth has a rectangular shape with two bandwidth levels, 2500 kbps and 500 kbps, as shown in Figure 4(a). This case is important in evaluating adaptation methods because we need to know how they perform when the bandwidth drops drastically. In this case, the Sony Demo video is used.

Figure 4 shows the comparisons of requested versions, bitrates, and buffer levels of the three methods. It is clear that the TBB method tries to maintain the high quality (version 5) for too long even when the available bandwidth drops and stays at the low level for a long interval, resulting in the worst buffer level curve (Figure 4(c)) as well as a drastic drop of quality (around  s, from version 5 to version 1 in Figure 4(b)). As for the ITB method, because it requests versions following the variations of instant throughput and instant video bitrate, the quality is aggressively changed over time while the buffer level variations are very small.

As for our method, all the three options have similar behaviors in terms of bitrate, version switch, and buffer level. Our method reduces the video quality gradually with no switches larger than 1 while the buffer level is higher than 25 seconds. Also, the minimum version provided by our method is 2, while the other two methods sometimes jump to version 1.

4.3. Complex Bandwidth Scenario

In this part, we evaluate the adaptation methods with a complex bandwidth trace (Figure 5), which was obtained from a mobile network [11]. Both test videos, “Sony Demo” and “Terminator 2,” are employed in this scenario.

Figure 6 shows the experimental results for the “Sony Demo” video. As seen in Figure 6(a), it is very difficult to use the curves of segment bitrates to compare the adaptation methods. From Figure 6(b), it is obvious that the TBB method’s behavior is similar to that in the simple bandwidth case. Usually, this method tries to maintain a high version as long as the buffer level permits. This behavior results in sudden drops of quality (e.g., from version 6 to version 3 at 100 s and from version 5 to version 1 at 240 s). Also, the buffer level curve of this method is the worst (Figure 6(c)), as in the previous scenario.

As for the ITB method, its bitrate curve closely follows the throughput curve, resulting in a highly fluctuating version curve with many switches, including switches of large degrees. Anyway, this method has the most stable buffer level curve as seen in Figure 6(c).

Meanwhile, our method provides both quality stability and buffer stability. The version curves of our method (Figure 6(b)) have no sudden switches. Moreover, when the throughput decreases, the selected version of our method is always higher than or equal to those of the other methods. The buffer level curve of our method is not as stable as that of the ITB method; however, it is always higher than that of the TBB method. Compared to the TBB method, our method is more conservative in switching-up and less conservative in switching-down.

Some statistics of the adaptation results are provided in Table 2. The statistics are related to bitrate, requested version, version switch, and buffer level. In terms of bitrate, the ITB method has the lowest average bitrate. Its quality is also very fluctuating with many switches and large switch degree/deviation. The TBB method has the highest average bitrate since this method tends to select and maintain the best possible quality. However, its average version is interestingly lower than that of our AVG-10 option (4.19 vs. 4.30). Even, the ITB method has very low average bitrate, but its average version is higher than that of the TBB method. In fact, the average version values of all methods are very similar (about 4.2 or 4.3). Among the three options, the AVG-10 option is the most aggressive as it always tries to request higher versions; however, this causes a little more switches than other options. These points will be explained and discussed further in the next part.

Moreover, our method provides the smallest values in terms of the number of switches and the degree of switches. In addition, the minimum version of our method is higher than the other methods. The minimum value and standard deviation (STD) of buffer level of our method are also much better than those of the TBB method. For more information about the buffer, the cumulative distribution functions (CDFs) of the buffer levels are provided in Figure 7(a). Again, it is evident that the buffer of our method is much more stable than that of the TBB method.

Figure 8 shows the experimental results for Terminator 2 video, where similar behaviors of the methods can be found. Our method again provides a smoother version curve and a more stable buffer level curve than the TBB method. The aggressiveness of AVG-10 can also be seen in Figure 8(b). For example, during 330 s~370 s, the AVG-10 option shortly increases the quality to version 6 and then reduces the quality to version 5 and version 4. Meanwhile, the other two options still request version 5 during the same interval.

The statistics of the adaptation results for Terminator 2 video is provided in Table 3. With this video, the average version values of our method are all higher than that of the TBB method. The average bitrate of the AVG-10 option especially is higher than that of the TBB method. This is because version bitrates of Terminator 2 are smaller than those of Sony Demo, so the AVG-10 option can easily reach the highest possible quality. In addition, the other parameters also show that our method provides a smoother quality and a more stable buffer than the TBB method.

Besides, the CDFs of buffer levels (Figure 7(b)) confirm the stability of buffer level of our method. Figure 7(b) also shows that the buffer levels of the AVG-30 and AVG-50 options are a little better than that of the AVG-10 option. This is because the AVG-10 option is more aggressive than the other options. Especially compared to the case of Sony Demo video (Figure 7(a)), the buffer of the TBB method is worse while the buffer of our method is better. This is actually due to the characteristics of the Terminator 2 video as discussed in the next part.

4.4. Discussions

From the above experimental results, we can see some interesting points. First, in some cases the average bitrate of the TBB method is higher than that of our proposed method, while its average version is lower than that of our method. This can be explained by the fact that sometimes the instant bitrate is very high. So, having some more segments of high bitrates, especially those belonging to the top version, will significantly increase the average bitrate. In case of Terminator 2 video, which has lower version bitrates than Sony Demo video, the average bitrate of TBB method is not better than that of our method.

Meanwhile, in low-throughput periods, the selected versions of the TBB method are lower than those of our method, resulting in a lower value of average version. So, as the bitrate of VBR video is highly fluctuating, the parameter of average bitrate should not be the main indicator to evaluate the performance of VBR video streaming (as in [9]); rather, the parameter of average version should be considered. This is different from CBR video streaming, where the average bitrate is always of high interest.

The results of our adaptation method show that a normal buffer size (i.e., similar to that in CBR video streaming) is already enough to cope with strong bitrate variations of VBR videos. Also, a representative bitrate for each version is important for VBR videos. Even the ITB method benefits a lot from learning the instant bitrate. Though having a highly fluctuating version curve, this method can be used for live streaming thanks to its very stable buffer. On the other hand, the TBB method does not consider bitrate values and thus poorly handles the strong variations of VBR video. For example, although Terminator 2 video has lower version bitrates than Sony Demo video, the buffer stability of the TBB method in the case of Terminator 2 video is worse than in the case of Sony Demo video. This is just because the version bitrates of Terminator 2 video are extremely varying (with repeated and quick changes from a high value to a low value), whereas the performance of our method is not degraded in the case of Terminator 2 video.

The three options of our method are similar in terms of average version values. Yet, among the three options, the AVG-10 option seems to be more aggressive in switching up and maintaining a high quality. This is because the representative bitrate of this option is more “local” and can quickly detect low-bitrate segment intervals, where the client can switch to a higher version when the throughput permits. Such property can increase the average version; however, that also causes more switches as seen in the statistics of the AVG-10 option. The above statistics show that AVG-30 and AVG-50 options have nearly the same performance with good quality stability and good buffer stability.

In general, it is shown that our method is more effective than the reference methods in providing stable streaming quality, with smooth version transitions even when the available bandwidth dramatically drops. If a user prefers streaming with the highest possible quality, more “local” representative bitrates should be used. On the other hand, if the user prefers a stable streaming session with fewer version switches, more “global” representative bitrates should be used.

It should be noted that, actually, the quality of a CBR video is “variable” and the quality of a VBR video is “constant.” Meanwhile, as shown in [12], it is totally feasible to provide a quasi-CBR streaming service from VBR video data sets. The above results also show that, with the simple method of representative bitrate estimation, we do not need to know exactly the bitrates of all VBR video segments even though the instant bitrate is extremely varying. This means that VBR videos and the client-based adaptation method proposed in this paper can be easily deployed in DASH-compliant streaming systems without the need to modify the current standard specification.

5. Conclusion

In this paper, we have presented an adaptation method for VBR videos and evaluated the method in the mobile streaming context. To cope with strong variations of video bitrate, we employed a local average bitrate as the representative bitrate of a version. A buffer-based algorithm was then proposed to conservatively adapt video quality by taking into account a smoothed throughput estimate, the representative bitrate, and even the instant bitrate. The experimental results showed that our method can provide smooth video quality and stable buffer level, even under very strong variations of bandwidth and video bitrates. For future work, we will focus on live streaming scenarios such as surveillances with VBR video sources.

Notations

:The throughput of segment
:The throughput estimate of segment
:The current buffer level
:The minimum buffer threshold
:The flexible buffer threshold
:The buffer size and also the target buffer level
:The bitrate of the segment in version . It could be the actual value or the estimated value
:The representative bitrate of version at segment
:The number of available video versions
:The index of the version which is chosen for segment (version of higher quality has a higher index value).

Competing Interests

The authors declare that they have no competing interests.