Abstract

We propose methods for selecting the modelling parameters of H.263-quantized video traffic under two different encoding scenarios. For videos encoded with a constant quantization step (unconstrained), we conclude that a two-parameter power relation holds between the exhibited video bit rate and the quantizer value and that the autocorrelation decay rate remains constant for all cases. On the basis of these results, we propose a generic method for estimating the modelling parameters of unconstrained traffic by means of measuring the statistics of the single “raw” video trace. For rate-controlled video (constrained), we propose an approximate method based on the adjustment of the “shape” parameter of the counterpart—with respect to rate—unconstrained video trace. The convergence of the constructed models is assessed via q-q plots and queuing simulations. On the assumption that the popular MPEG-4 encoders like XVID, DIVX usually employ identical H.263 quantization and rate control schemes, it is expected that the results of this paper also hold for the MPEG-4 part 2 family.

1. Introduction

With the rapid spread of multimedia applications and the great progress of video streaming technologies such as the MPEG-4 and H.26x standards, network-based multimedia applications, for example, IPTV, VoD, and videoconference, have become increasingly popular services. Video traffic, which is going to be streamed by these services, is expected to account for large portions of the multimedia traffic in future heterogeneous networks (wireline, wireless, and satellite). Despite the high data rates of the contemporary network settings, there is still a need for quality assurance for the above services especially when a real-time session has to be established (e.g., videoconference or video streaming without buffering options, e-collaboration, remote control, etc.). Since such services rely on the exchange of bandwidth demanding video information, with the MPEG-4 and H.263 encoders being the most commonly used standards for the moment, extensive deployment of these services calls for careful modelling of the associated network traffic, so that the appropriate amount of resources may be anticipated by the network. The video traffic models for these networks must cover a wide range of traffic types and characteristics because the type of the terminals will range from a single home or mobile user (low video bit rate), where rate-constrained (or rate-controlled) video traffic is mainly produced, to a terminal connected to a backbone network (high video bit rate), where the traffic is presented to be out of the loop, that is, the encoder is not forced to conform to a certain video bit rate. Furthermore, successful video traffic modelling can lead to a more economical network usage (improved traffic policing schemes), leading to lower communication costs and a more affordable and higher quality service to the end-users.

Partly due to the above reasons, the modelling and performance evaluation of video traffic have been extensively studied in the literature, and a wide range of modelling methods exist. The results of relevant early studies [110] concerning the statistical analysis of variable bit rate videoconference streams being multiplexed in ATM and IP networks indicate that the histogram of the videoconference frame-size sequence exhibits an asymmetric bell shape and that the autocorrelation function decays approximately exponentially to zero. An important body of knowledge, in videoconference traffic modelling, is the approach in [5] where the DAR [11] model was proposed. More explicitly, in this study, the authors noted that AR models of at least order two are required for a satisfactory modelling of the examined H.261-encoded traffic patterns. However, in the same study, the authors observed that a simple DAR model, based on a discrete-time, discrete state Markov Chain performs better—with respect to queueing—than a simple AR model. The results of this study are further verified by similar studies of videoconference traffic modelling and VBR video performance and simulation [6, 12]. The above studies certainly constitute a valuable body of knowledge. However, most of the above studies examine video traffic traces compressed by encoders like MPEG-2, MPEG-4, H.261, and H.263 that were operating in an unconstrained mode and as a result produced traffic with similar characteristics. As denoted in [13], for active sequences, that is, movies, which is the subject of the current study, the use of a single model, for example, DAR [5], based on a few meaningful parameters and applicable to large number of sequences does not appear to be possible. On this basis, complicated scene-based models have been proposed. Furthermore, most of the above studies examined MPEG encoding schemes which were implemented with B-frames encoding. However, for real-time streaming applications, which is the interest of this paper, only I- and P-frames usually appear in the generated traffic patterns.

Our modelling approach, in this study, is based on the recommendations towards a good traffic model that were proposed in [14]. According to them, a model must be realistic, reusable, and computationally efficient. In order not to decline from the above requirements, we used realistic experimental data, movies, and concerts and worked on the modelling parameters of well-established and simple models proposed in the literature. More explicitly, we provide methods for calculating the parameters of the simple DAR model. Taking into account the reasonable assumption that the statistical characteristics of the same video compressed with different encoding schemes are similar, we use the modelling parameters of a raw unconstrained offline video trace as a basis and adjust it to the traffic traces under different encoding sets, that is, different quantization levels, mean video bit rate. The q-q plots of the sample versus the model data show that the adjusted models provide accurate fits.

The rest of the paper is structured as follows: Section 2 presents the state of the art in H.263-quantized video compression and traffic modelling. Section 3 discusses the measurement procedure and the statistical analysis of H.263-quantized video traffic, unconstrained and constrained. Section 4 analyses the appropriate methods for selecting the parameters of the autoregressive video traffic models. Finally, Section 5 culminates with conclusions and pointers to further research.

2. State of the ART: Video Traffic Modelling

Today, a large number of video systems exist using practical implementations of the MPEG-4-ISO/IEC open standard for video encoding developed by MPEG (Moving Picture Experts Group) [15]. The MPEG-4 standard is characterized by a small output video file size and quite good picture quality even when a relatively low bit rate is used. It is coded with XviD, DivX, 3ivx, Nero Digital, and other video codecs. Moreover the H.263 (H.263+ included) codec [16] is a widely adopted standard for videoconference communication as well as for video streaming via mp4 encapsulation. Both of the above standards use the H.263 quantization scheme and employ the same rate control algorithms. In addition, they are capable of working in both unconstrained and constrained modes of operation. In unconstrained VBR mode, the video system operates independently of the network (i.e., using a constant quantization scale throughout transmission). This type of quantization/compression is usually applied in high capacity networks. In the constrained mode, the encoder has knowledge of the networking constraints (either imposed offline by the user) and modulate its output in order to achieve the maximum video quality for the given content (by changing the quantization step). This is the typical encoding scenario in low capacity networks where a QoS algorithm has to be implemented.

MPEG-4-H.263-quantized videoconference traffic, thanks to its widely used compression algorithms which result in lower bandwidth requirements, accounts for large portions of the multimedia traffic in today’s heterogeneous networks (wireline, wireless, and satellite), with the ADSL network being the most notable one. Under the above expectation, it is evident that a statistical model for this type of traffic would be very useful to predict network usage and estimate resources. For this reason, a lot of traffic models exist mainly as autoregressive (see [17] for a review of such models). Newer studies of video traffic modelling, for example, [1820] reinforce the general conclusions obtained by the above earlier studies by evaluating and extending the existing models and also proposing new methods for successful and accurate modelling. An extensive public available library of frame-size traces of unconstrained and constrained MPEG-4, H.263, and H.263+ off-line encoded video was presented in [21] along with a detailed statistical analysis of the generated traces. In the same study, the use of movies, as visual content, led to frames generation with a Gamma-like frame-size sequence histogram (more complex when a target rate was imposed) and an autocorrelation function that quickly decayed to zero (a traffic model was not proposed though in the certain study). Of particular relevance to our work is the approach in [22], where an extensive study on multipoint videoconference traffic (H.261-encoded) modelling techniques was presented. In this study, the authors discussed methods for correctly matching the parameters of the modelling components to the measured H.261-encoded data derived from realistic multipoint conferences (in “continuous presence” mode). In [23, 24], the authors propose an accurate DAR model based on the Pearson V distribution which on the basis of their statistical tests provides the best fit. Moreover, in [25], the authors use wavelets to model the distribution of I-frames and a simple time-domain model for P/B frames and present a novel method to capture the correlation properties of vbr traffic using group of pictures analysis. Finally, in [26], traffic modeling of M2M mobile video services is studied via several distributions with heavy tail. According to the authors results, the Lognormal distribution was able to represent more accurately the video traffic.

Aiming at a realistic, reusable, and simple video traffic model, accurate enough for queueing analysis and network estimation, this study discusses methods for calculating the appropriate model parameters from the observed traffic data and proposes methods for correctly estimating the parameters of the DAR model on different compression and network scenarios. This is addressed by improving the models presented in [22] by means of importing the compression parameters, that is, the quantizer value for the case of unconstrained traffic and the mean video bit rate for constrained traffic.

3. Video Traces: Measurement and Processing of Video Data

The data we are modelling were gathered off-line using the ffmpeg libavcodec suite [27]. The off-line mode assured that no packet losses exist during the trace collection process and that the traffic model will always represent the best quality of the encoded video. On this basis, it is stressed that, in the current study, there was no point in investigating an online environment. It is evident that the proposed model is applicable in any network environment as it represents source-faithful video traffic encoded during UDP communication of video terminals. Movies scenes of at least 20 minutes were selected among popular commercial DVDs, for example, Aviator, Jethro Tull concert, Lord of War, and were encoded (in fact transcoded from the common MPEG-2 DVD format) using the libavcodec H.263 codec. We used as raw video some popular movies scenes from the DVD-Video movies Aviator (VTS, 22 min), Jethro Tull concert (VTS, 30,55 min), Lord of War (VTS, 15,20 min), and Million Dollar Hotel (VTS, 27,05 min). All video files were stored in a common DVD-Video format MPEG-2 at a high resolution  frames/sec with average rate (approximately for all video files) of 5500 Kbps. There were two encoding scenarios: the first one was designed to contribute results for traffic which is quantized with a constant quantization scale (step) and as a result is presented to be out of the loop or unconstrained, in a sense that no rate limitations are imposed. The second one gave results for the counterpart in the loop cases; that is, no quantization scale was selected, and instead a rate control was imposed, at a certain target rate. An encoder’s conformation to the rate control of the system is commonly performed by reducing the video quality (and consequently the frame-size quantity) through the dynamic modulation of the quantization step. These operation modes were presented in [13] where Variable Bit Rate (VBR) video is thoroughly examined and categorized according to encoding and networking parameters (From now on, U-VBR will stand for unconstrained video and C-VBR for constrained video.).

In both scenarios, the following parameters of the ffmpeg command were set: -vcodec: h263, -r: 25, -g: 250, and -s: qcif where -vcodec is the encoder used, -r is the video frame rate, -g is the number of P-frames before and I-frame appearing and-s is the video size. B-frame encoding was not employed as it is not recommended for real-time streaming. As a consequence, the resulting video sequence is consisted of I- and P-frames. An  .mp4 encapsulation enables this type of traffic to be streamed at an RTP level via a common streaming server, for example, the Darwin Streaming Server [28]. However, in this paper, we examined the video source in an off-line mode, and no network feedbacks were included, for example, F-VBR traffic as explained in [13]. To implement the above encoding scenarios we added an appropriate parameter correspondingly, that is, -qscale set from 2 to 15 for U-VBR traffic and -b set to a certain rate (100, 200, and 400 kbps) for C-VBR traffic.

In all cases the video statistics at a frame level were collected using the -vstatsfile parameter and were processed for further analysis. We must note here, that I-frames are excluded from the analysis to follow as it was found that they have a minimal impact with respect to queueing performance. However, a uniform I-frame generator could be also integrated so as to ensure the conservativeness of the proposed model.

4. Estimation of Modelling Parameters

4.1. The Discrete Autoregressive Model

The DAR model that was proposed and used analytically in [5] can be directly applied for full modelling and analytical treatment of video traffic presented in this context. This model is defined as a discrete state Markov chain with a transition rate matrix of the form where is the autocorrelation decay rate of the length frame-size sequence (always of type ), is the identity matrix, and is a rank-one stochastic matrix with all rows equal to the probabilities resulting from the fit of the frequency histogram of . The DAR model demands the representation of with a constant number of states, whose probabilities values will fill the rows of the stochastic matrix . These states can be easily chosen by dividing the interval between the maximum and the minimum frame sizes of the sequence into frame-size states. So, if    is the minimum and   the maximum frame-size value, then a reasonable state step is , with rounded to the nearest integer. The rate of each state can be easily calculated by the relative mean rate of a histogram window as follows: if is the probability mass of frame size (derived from the corresponding density), then the rate value of the state value is equal to , with being the frame rate of the traffic.

The DAR model has an exponentially matching autocorrelation and so matches the autocorrelation of the data over approximately hundred frame lags. This match is more than enough for real-time streaming of video traffic engineering. When using the DAR model, it is sufficient to know the mean, variance, and autocorrelation decay rate of sequence . These parameters can be calculated using a commonly wide approach in this area, proven to be efficient in a variety of studies, for example, [5, 21], that is, using the Gamma density for modelling the frequency histogram and an exponential model for fitting the autocorrelation function, for instance, the compound exponential model proposed in [22]: where

4.2. Estimation of the DAR Parameters for U-VBR Traffic

Under the context of encoding video with a constant quantizer, that is, none rate control scheme is employed, we statistically analysed the resulting U-VBR video data for a quantizer range between 2 and 15 (for values of video quality was suppressed for the selected video size). We present here results for the video data of Aviator and Jethro Tull concert in Table 1. With reference to this table, it is easily observed that, in all cases, the mean bit rate of the video streams is decaying along with the increase of the quantizer value. This is normal as for higher quantization values compensation criteria are adjusted in order to achieve lower frame sizes and as a consequence lower quality video. Although this appear to be a trivial result, to the best knowledge of these authors, not an analytical function has been proposed in the literature to express the relation of these two parameters. Given the above and the results of Table 1, the following questions arose naturally (and their answers were pursued) during data analysis.(i)What is the form of the frequency histogram and of the autocorrelation function of the frame-size sequence for all quantization values? Could a common model be applied for all cases? (ii)Is the DAR model applicable to the measured data? If, yes; are the modelling parameters (mean, variance, and decay rate) related somehow to the quantization value? (iii)What is the type of the function that relates bit rate with quantization value?

In brief, the answers to these questions, as supported by consistence evidence from the experiments’ results, are as follows: the sequence of frame sizes for all cases   2–15 exhibits an autocorrelation function that decays exponentially to zero with approximately the same autocorrelation decay rate value as calculated via (3) and a frequency histogram that can be fitted successfully by a Gamma density from (2) using the method of moments: if is the mean and the variance of the sample sequence , then , . In Figure 3, the autocorrelation graphs are plotted for indicative quantization values for Aviator () where it is noticed that all graphs present similar decay rates, a claim that was further verified by applying a least squares fit to the model of (3) where the critical parameter was found to be equal to for all cases. Concerning the frequency frame-size histograms, the moments matching method gave satisfactory fitting results (we show indicative q-q plot results for Aviator and Jethro Tull ( in Figure 2). The corresponding and parameters of the Gamma density (2) are presented in Table 1.

What is of great importance, at this moment, is to find a simple rule that relates the estimation of the modelling parameters to the quantization value. A first step towards this direction is to try to model the empirical data of Table 1.

Consider, where was found to be approximately equal to for all cases and is the rate of the raw video, that is, the video with the higher quality under the given encoding scenario. The above equation also holds for the mean frame size of the sequence , that is, . Equation (4) is a two-parameter power function that provides a satisfactory fit as shown in Figure 1 for both samples (Aviator and Jethro Tull). However, towards a more accurate representation of the sample data a three-parameter power equation was also tested, with a least square fit, as follows: where values were found to be in the area and for all cases. With , a fit via (5) is presented in Figure 1 for Aviator and Jethro Tull. However, for simplicity reasons, in the analysis to follow, we will use the two-parameter model since it is simpler (a more careful analysis upon the has to be conducted before adoption of the “pow3” model).

On the basis of the model “pow2” (4), as an analytical function that relates the mean frame size of the sequence to the quantization value, and taking into account the property of the Gamma density , similarly we have where ,  ,  , and . Since , it is expected to hold .

On the basis of (4) and (6), a relation between the Gamma modelling parameters and the quantizer has been established, with an a priori knowledge of the maximum rate . Hence, only an off-line measurement of the video at is adequate in order to estimate some few meaningful parameters for all traces encoded with a constant quantizer with . These are the mean bit rate (or mean frame size) given by (4) and the autocorrelation decay rate which has equal values for all values of .

Towards a validation of the above method, we examine two simple scenarios. Given the statistics of the trace , acquired by an off-line measurement, we estimate the DAR modelling parameters of the video trace with , that is, , , and . From (6), we have , , and as calculated for all cases via (3). As can be concluded from Table 1, the values of and are approximately close to the actual ones. With the same analysis for movie Jethro, we have, for trace, , , and . The above parameters fed the DAR model (1) for each trace separately. The resulting model is called “Approx." Another model was created, too, where the and parameters were calculated via the moment matching method by means of the actual trace (see Table 1, , for Aviator, and , for Jethro, ), to be referenced as the “Moments” model. Then, following the fluid-flow approach via the Continuous Markov Chain model C-DAR model [29, 30], we calculated the buffer size distribution in a single-server queueing scenario with a common traffic intensity equal to . This method is analytically described in a previous study [31] (see Section ) where the important literature references are also presented. More explicitly, we consider a single-server queueing system fed by video traffic as a Markov modulated rate process with a finite number of states and transition rate matrix from the C-DAR model using the infinite buffer assumption. A trace-driven simulation—under an identical queuing testbed—was also conducted for both traces (sample). The complementary buffer size distributions are plotted in Figure 4 ((a): Aviator, (b): Jethro). With reference to these figures, it is seen that the “Approx” and “Moments” graphs deviate at a small percentage in both cases. This was expected since their mean rates differ as a property of the difference of the and parameters (). With respect to queueing, both models are conservative—in terms of convergence to the sample—as a consequence of the choice of based on the slow decay of the autocorrelation function of the sample. However, if an upper target bound is a priori given, that is, a worst case scenario (usually the case of the section to follow) for single source video traffic, for example, , the “Approx" model could be adjusted to the certain mean rate by means of multiplying the DAR rate states with the factor .

4.3. Estimation of the DAR Parameters for C-VBR Traffic

We present here the results for the constrained counterpart traffic; see Table 2 for Aviator and Jethro Tull concert movies encoded under a rate control scheme (variable quantization) with certain rate constraints at 100, 200, and 400 kbps. A similar analysis to that of the previous subsection leads to some first conclusions concerning the statistical characteristics of constrained vbr traffic. Briefly, the autocorrelation function appeared to decay faster (compare to the u-vbr case) to zero for the first 200 lags. In low bit rate cases, for example, encoding at a target rate of 100 or 200 kbps, this property was even stronger giving rough indications of uncorrelated traffic (calling for an model to be applied). However, in the 400 kbps encoding case, frame correlation values were higher and a least squares fit to (3) gave autocorrelation decay rate values approximately equal to . To find a reasonable explanation for the above, the sequence of the quantization values was also examined for each case, and the corresponding quantization histograms were plotted (see Figure 5). In the case of  kbps, the quantization values followed a Gaussian (bell-shaped) histogram whereas in the case of  kbps an asymmetrical (more narrow) histogram is exhibited; that is, the majority of q-values are between 2 and 5. Hence, the variance of the quantization sequence is a measure of the correlation of the frame sequence. However, since an a priori knowledge of the quantization sequence is not given, for our analysis in this paper, and taking into account that with respect to queueing what matters the most is the long-term trend of the autocorrelation function; its conservative approach to consider correlated models for all cases by selecting the autocorrelation decay rare value from the analysis of the unconstrained counterpart trace.

The frequency histograms for c-vbr traffic have Gamma form more narrow and tall compared to the counterpart cases of u-vbr traffic, that is, the traffic is mostly concentrated around the target mean rate (see Figure 6 for the case of Aviator with a target rate of 100 kbps). With reference to this figure, it is observed that the moments matching method fails to meet the characteristics of the sample’s histogram (approximately divergent results hold for the other movies too in c-vbr mode). This phenomenon was also presented in [22, 31] where H.261 and H.263 videoconference traffic in multipoint sessions was analysed. In both studies, due to rate constraints imposed by a video server (multipoint control units/gatekeepers), the video traffic exhibited similar characteristics. In order to overcome the fitting problems of the moments method, the authors estimated different parameters of the Gamma family. The C-LVMAX model, presented in both studies, appeared to have a stable behaviour in cases of c-vbr traffic. This method relates the peak of the histogram’s convolution to the location at which the Gamma density achieves its maximum and to the value of this maximum. The values of the shape and scale parameters of the Gamma density are derived from and , where is the unique maximum of the histogram’s convolution density at . Numerical values for this fit appear also in Table 2, and the corresponding q-q plots, demonstrating the dominance of the model, are shown in Figure 6. From the values of the C-LVMAX Gamma parameters, it is also observed that the shape parameter has larger values compared to the ones of the “moments" model. This explains in an analytical way the narrowness of the frame-size histogram property we have already noted. The specific model, though demands the full-histogram information (i.e., the actual trace), hence, separates measurements for each constraint scenario of mean rate . In the paragraph to follow, we propose a simple method towards an approximate estimation of the Gamma parameters of c-vbr video traffic.

Within the context of the dominance of the C-LVMAX model for c-vbr traffic, we adopt in this paper—for this type of traffic—the idea in [13] (Section , p. 40). According to this paper, an approach towards modelling contrained video traces is to “assume worst case sources, operating close to the maximum capacity and then characterize these sources.” Based on this idea, we considered as a “worst case” source at a constrain rare the counterpart u-vbr trace with mean rate close to . From (4) and (6) it is simple to calculate and for a certain trace of rate .

Consider,

However, the values of the above parameters correspond to unconstrained video traces which—as described in the corresponding section—exhibit frequency histograms of small shape values (, see Table 1) while for the constrained traces the shape values where found to be larger (, as given by C-LVMAX; see Table 2). Based on that, the fit of the sample c-vbr histogram via the parameters of (7) would be divergent, as shown in Figure 7. To overcome this problem, we adjust the above parameters so as to increase shape and reduce scale by a factor equal to , that is, adjusted shape and adjusted scale . The improved fitting results appear in Figure 7 (where and for the Worse Case model and and ).

5. Conclusions

In this paper, we proposed methods for selecting the modelling parameters of H.263-quantized video traffic under two different encoding scenarios. For videos encoded with a constant quantization step (unconstrained), we conclude that a two-parameter power relation holds between the exhibited video bit rate and the quantizer value and that the autocorrelation decay rate remains constant for all cases. On the basis of these results, we propose a generic method for estimating the modelling parameters of unconstrained traffic by means of measuring the statistics of the single “raw” video trace. For rate-controlled video (constrained), we propose an approximate method based on the adjustment of the “shape” parameter of the counterpart—with respect to rate—unconstrained video trace. The convergence of the constructed models is assessed via queuing simulations. On the assumption that the popular MPEG-4 encoders like XVID, DIVX usually employ identical H.263 quantization and rate control schemes, it is expected that the results of this paper also hold for the MPEG-4 part 2 family.