Abstract

This paper presents modelling results for H.26x video traffic generated by popular videoconference software applications. The analysis of videoconference data, that were measured during realistic point-point videoconference sessions, led us to the general conclusion that the traffic can be distinguished into two categories: unconstrained and constrained. In the unconstrained traffic, there is a direct relation between the encoder and the form of the frequency histogram of the frame-size sequence. Moreover, for this type of traffic, strong correlations between successive video frames can be found. On the other hand, where bandwidth constraints are imposed during the encoding process, the generated traffic appears to exhibit similar characteristics for all the examined encoders with the very low autocorrelation values being the most notable one. On the basis of these results, this study proposes methods to calculate the parameters of a widely adopted autoregressive model for both types of traffic.

1. Introduction

H.26x videoconference traffic is expected to account for large portions of the multimedia traffic in future heterogeneous networks (wire, wireless and satellite). The videoconference traffic models for these networks must cover a wide range of traffic types and characteristics because the type of the terminals will range from a single home or mobile user (low video bit rate), where constrained video traffic is mainly produced, to a terminal connected to a backbone network (high video bit rate), where the traffic is presented to be unconstrained.

Partly due to the above reasons, the modelling and performance evaluation of videoconference traffic has been extensively studied in literature and a wide range of modelling methods exist. The results of relevant early studies [18] concerning the statistical analysis of variable bit rate videoconference streams being multiplexed in ATM networks, indicate that the histogram of the videoconference frame-size sequence exhibits an asymmetric bell shape and that the autocorrelation function decays approximately exponentially to zero. An important body of knowledge, in videoconference traffic modelling, is the approach in [5] where the DAR [9] model was proposed. More explicitly, in this study, the authors noted that AR models of at least order two are required for a satisfactory modelling of the examined H.261 encoded traffic patterns. However, in the same study, the authors observed that a simple DAR model, based on a discrete-time, discrete state Markov Chain performs better—with respect to queueing—than a simple AR model. The results of this study are further verified by similar studies of videoconference traffic modelling [7] and VBR video performance and simulation [6, 10]. In [11], Dr. Heyman proposed and evaluated the GBAR process, as an accurate and well-performed single-source videoconference traffic model.

The DAR and GBAR models provide a basis for videoconference traffic modelling through the matching of basic statistical features of the sample traffic. On this basis and towards the modelling of videoconference traffic encoded by the Intra-H261 encoder of the ViC tool, the author in [12] proposed a DAR(p) model using the Weibull instead of the Gamma density for the fit of the sample histogram. In [13], the authors concluded that Long Range Dependence (LRD) has minimal impact on videoconference traffic modelling.

Relevant newer studies of videoconference traffic modelling reinforce the general conclusions obtained by the above earlier studies by evaluating and extending the existing models and also proposing new methods for successful and accurate modelling [1418]. An extensive public available library of frame size traces of unconstrained and constrained MPEG-4, H.263, and H.263+ offline encoded video was presented in [19] along with a detailed statistical analysis of the generated traces. In the same study, the use of movies, as visual content, led to frames generation with a Gamma-like frame-size sequence histogram (more complex when a target rate was imposed) and an autocorrelation function that quickly decayed to zero (a traffic model was not proposed though in the certain study).

Of particular relevance to our work is the approach in [20], where an extensive study on multipoint videoconference traffic (H.261-encoded) modelling techniques was presented. In this study, the authors discussed methods for correctly matching the parameters of the modelling components to the measured H.261-encoded data derived from realistic multipoint conferences (in “continuous presence” mode).

The above studies certainly constitute a valuable body of knowledge. However, most of the above studies examine videoconference traffic traces compressed by encoders (mainly H.261) that were operating in an unconstrained mode and as a result produced traffic with similar characteristics (frame-size histogram of Gamma form and strong short-term correlations). Today, a large number of videoconference platforms exist, the majority of them operating over IP-based networking infrastructures and using practical implementations of the H.261 [21], H.263 [22, 23], H.263+ [22, 23] and H.264 encoders [24]. The above encoders operate on sophisticated commercial software packages that are able of working in both unconstrained and constrained modes of operation. In unconstrained VBR mode, the video system operates independently of the network (i.e., using a constant quantization scale throughout transmission). In the constrained mode, the encoder has knowledge of the networking constraints (either imposed offline by the user or online by an adaptive bandwidth adjustment mechanism of the encoder) and modulate its output in order to achieve the maximum video quality for the given content (by changing the quantization scale, skipping frames, or combining multiple frames into one). Furthermore, most of the previous studies have dealt with the H.261-encoding of movies (like Starwars) that exhibit abrupt scene changes. However, the traffic patterns generated by differential coding algorithms depend strongly on the variation of the visual information. For videoconference, the use of a single-model based on a few physically meaningful parameters and applicable to a large number of sequences seems possible, as the visual information is a typical head and shoulders content that does not contain abrupt scene changes and is consequently more amenable to modelling. Moreover, an understanding of the statistical nature of the constrained VBR sources is useful for designing call admission procedures. Modelling constrained VBR sources, to the best knowledge of these authors, is an open area for study. Our approach towards this direction was to gather video data generated by constrained VBR encoders that used a particular rate control algorithm to meet a predefined channel constraint and then model the resulting trace using techniques similar to those used for unconstrained VBR. The difficulty with this approach is that the resulting model could not be used to understand the behaviour of a constrained VBR source operating with a different rate control algorithm or a different channel constraint. However, given that in constrained VBR the encoder is in the loop, it is more likely that network constraints are not violated and that the source operates closer to its maximum allowable traffic. This may make constrained VBR traffic more amenable to modelling than unconstrained VBR traffic. The basic idea is that we can assume worst case sources (i.e., high motion contents), operating close to the maximum capacity and then characterize these sources.

Taking into account the above, it is important to examine whether the models established in literature are appropriate for handling this contemporary setting in general. It is a matter of question whether all coding strategies result in significantly different statistics for a fixed or different sequence. Along the above lines, this study undertook measurements of the videoconference traffic encoded, during realistic low and high motion head and shoulders experiments, by a variety of encoders of popular commercial software modules operating in both unconstrained and constrained modes. Moreover, the modelling proposal was validated with various traces available in literature [19] (to be referred as “TKN traces” from now on).

The rest of the paper is structured as follows. Section 1 describes the experiment characteristics and presents the first-order statistical quantities of the measured data. Section 2 discusses appropriate methods for parameter assessment of the encoded traffic. Finally, Section 3 culminates with conclusions and pointers to further research.

2. The Experimental and Measurement Work

The study reported in this paper employed measurements of the IP traffic generated by different videoconference encoders operating in both unconstrained and constrained modes. More explicitly, we measured the traffic generated by the H.26x encoders (The NV, NVDCT, BVC and CellB encoders [25] were examined in [26] and it was found that they resulted in similar traffic patterns with the H.261 encoder. Hence, the modelling proposal for H.261, in the current study, is expected to be applicable for these encoders, too.) included in the following videoconference software tools: ViC (version v2.8ucl1.1.6) [27], VCON Vpoint HD [28], Polycom PVX [29], France Telecom eConf 3.5 [30], and Sorenson EnVision [31]. These are H.261, H.263, H.263+, and H.264. In particular, H.264 was examined in [32], results of which are also presented under the generic context of the current study. All traces examined are representative of the H.26x family video systems. Especially, the ViC video system uses encoders implemented by the open H.323 community [33]. These encoders are based on stable and open standards and as a consequence their examination is more probable to give reusable modelling results. At this point we must note that VCON Vpoint HD could not establish an H.264 connection with Polycom PVX and vice versa. This is due to the fact that the RTP payload format for the H.264 has still some open issues (media unaware fragmentation). More explicitly, the clients use different RTP payload types to communicate.

For all the examined encoders, compression is achieved by removing the spatial (intraframe) and the temporal (interframe) redundancy. In intraframe coding, a transform coding technique is applied at the image blocks, while in interframe coding, a temporal prediction is performed using motion compensation or another technique. Then, the difference or residual quantity is transform coded. Here, we must note that the ViC H.261 encoder [34, 35] performs only intraframe coding oppositely to the H.261 encoders of Vpoint, eConf, and EnVision, where blocks are inter- or intracoded. The above encoding variations influence the video bit rate performance of the encoders and as a consequence the statistical characteristics of the generated traffic traces.

At this point, we may discuss about the basic functionality of the examined video systems which is a fundamental factor in the derived statistical features of the encoded traffic and a basic reason of the experiments’ philosophy we followed. The rate control parameter (bandwidth and frame rate) sets a traffic policy, that is, an upper bound on the encoded traffic according to the user’s preference (obviously depending on his/her physical link). An encoder’s conformation to the rate control of the system is commonly performed by reducing the video quality (and consequently the frame size quantity) through the dynamic modulation of the quantization level. In the case of ViC, a simpler method is applied. The video quality remains invariant and a frame rate reduction is performed when the exhibited video bit rate tends to overcome the bandwidth bound. In fact, in ViC, the video quality of a specific encoder is a parameter determined a priori by the user. In the case of Vpoint, Polycom, eConf, and EnVision, the frame rate remains invariant and a video quality reduction is performed when the exhibited video bit rate tends to overcome the bandwidth bound. This threshold can be set through the network setting of each client. Moreover, Vpoint utilizes adaptive bandwidth adjustment (ABA). ABA works primarily by monitoring packet loss. If the endpoint detects that packet loss exceeds a predefined threshold, it will automatically drop to a lower conference data rate while instructing the other conference participant’s endpoint to do the same.

Two experimental cases were examined as presented in Table 1 (TKN traces are also included). Case 1 included experiments where the terminal clients were operating in unconstrained mode while Case 2 covered constrained-mode trials. In both Cases, two “talking-heads” raw-format video contents were imported in the video systems through a Virtual Camera tool [36] and then peer-to-peer sessions of at least half an hour were employed in order to ensure a satisfactory trace length for statistical analysis. These contents were offline produced by a typical webcam in uncompressed RGB-24 format: one with mild movement and no abrupt scene changes, “listener”, (to be referred as VC-L) and one with higher motion activities and occasional zoom/span, “talker” (VC-H). The video size was QCIF (176 × 144) in both Cases and all scenarios (VC-H and VC-L). In Case 1, no constraint was imposed either from a gatekeeper or from the software itself. The target video bit rates that were imposed in Case 2 are shown in Table 1. In each case, the UDP packets were captured by a network sniffer and the collected data were further postprocessed at the frame level (it is important to note here, that analysis at the MacroBlock (MB), as in [14], level has been examined and found to provide only a typical smoothing in the sample data. We believe that the analysis at the frame level is simpler and offers a realistic view of the traffic.) by tracing a common packet timestamp. The produced frame-size sequences were used for further statistical analysis.

Specific parameters shown in Table 1, for the VC-H and VC-L traces, depend on the particular coding scheme, the nature of the moving scene, and the confidence of the measured statistics. Moreover, traffic traces available in literature where used for further validation. Specifically, the traces used were “office cam” and “lecture room cam” (from the TKN library). These traces were offline H.263 encoded in a constrained and unconstrained mode.

Some primary conclusions, as supported by the experiments’ results (see Table 1), arise concerning the statistical trends of the encoders’ traffic patterns. Specifically, H.263+ produces lower video bit rate than H.263 and H.261 do. This was expected, since the earlier encoder versions have improved compression algorithms than the prior ones (always with respect to the rate produced). Finally, for all the encoders, the use of the VC-H content led to higher rate results (as reasonable). Similar results were observed for the mean frame size and variance quantities. In all cases, the variance quantities of the VC-H content were higher than that of VC-L with the exception of the ViC H.263+ encoder (Case 1—Traces 5, 6) where the opposite phenomenon appeared.

H.264 and the encoders used for the production of the TKN traces tend to adjust their quality in a “greedy” manner so as to use up as much of the allowed bandwidth as possible. At this point, we must note that Trace 4 of Case 2 is semiconstrained (i.e., the client did not always need the available network bandwidth). However, this particular case can be covered by the “worst-case” Case 2—Trace 3, where the target rate is reached (full-constrained traffic).

Taking into account the above context, the following questions naturally arise.(i)What is the impact of the encoders’ differences on the generated videoconference traffic trends?(ii)Can a common model capture both types of traffic, unconstrained, and constrained?(iii)Are the traffic trends invariant of the constraint rate selected?(iv)How does the motion of the content influence the generated traffic—for each encoder—and the parameters of the proposed traffic model?(v)Can a common traffic model be applied for all the above cases?

The above questions pose the research subject which is thoroughly examined in the context to follow. Their answers will be given along with the respective analysis.

3. Traffic Analysis and Modelling Assessment

The measured traffic analysis for all experimental sets confirms the general body of knowledge that literature has formed concerning videoconference traffic. Traffic analysis was employed for all experimental cases. More explicitly, in all cases, the frame-size sequence can be represented as a stationary stochastic process, with a frequency histogram of an approximately bell-shaped (more narrow in the case of H.263, H263+ and H.264 encoding) Probability Distribution Function (PDF) form, see Figures 1(a)1(c) and 2(a)2(e) more complex in the TKN traces as their content (office and lecture cam) probably contained more scene changes than our contents VC-L and VC-H. Examining more thoroughly the sample histograms, we noted that the smoothed frame-size frequency histograms of the H.261 encoder have an almost similar bell-shape (see Figures 1(a), 1(b), 2(a), and 2(b)) while a more narrow shape appears in the H.263, H.263+, and H.264 histograms (Figures 1(c) and 2(b)2(e)). The VC-H frequency frame-size histograms appeared to be more symmetrically shaped than the correspondent VC-L histograms. This is reasonable as the rate of the H.26x encoders depends on the activity of the scene, increasing during active motion (VC-H) and decreasing during inactive periods (VC-L).

Furthermore, the AutoCorrelation Function (ACF) of the unconstrained traffic appeared to be strongly correlated in the first 100 lags (short-term) and slowly decaying to values near zero (see some indicative Figures 3(a)3(c) of the traces of Case 1). On the contrary, the ACFs of the constrained traffic decayed very quickly to zero denoting the lack of short-term correlation (see Figures 3(d) and 3(e)). This conclusion is very critical in queueing as the short-term correlation parameter has been found to affect strongly buffer occupancy and overflow probabilities for videoconference traffic. In fact, to verify this assumption, we measured the buffer occupancy of the constrained traces in queueing experiments of different traffic intensities. Buffering was found to be very small at a percentage not affecting queueing. On this basis, it is evident, that for the purpose of modelling of the two types of traffic not a common model can be applied. More explicitly, a correlated model is needed for the case of unconstrained traffic while a simpler noncorrelated model is enough for constrained traffic.

The DAR model, proposed in [5], has an exponentially matching autocorrelation and so matches the autocorrelation of the data over approximately hundred frame lags. This match is more than enough for videoconference traffic engineering. Consequently, this model is a proper solution for the treatment of unconstrained traffic. When using the DAR model, it is sufficient to know the mean, variance and the autocorrelation decay rate of the source, for admission control, and traffic forecasts. For the constrained traffic traces, a simple random number generator based on the fit of the sample frame-size histogram can be directly applied. The DAR model with the autocorrelation decay rate value equal to zero can also be a solution. This feature turns constrained videoconference traffic more amenable to traffic modelling than its counterpart unconstrained as only two parameters are needed, the mean and the variance of the sample.

Specifically, as shown in [32] for the H.264 traffic, it can be characterized as constrained and the generated traffic is uncorrelated and can be accurately represented by a queue.

The rest of the paper discusses methods for correctly matching the parameters of the modelling components to the data and for combining these components into the DAR model.

3.1. Fitting of the Frame-Size Frequency Histograms of the Traces

A variety of distributions was tested for fitting the sample frame-size frequency histograms. These are the following: Gamma, Inverse Gamma (or Pearson V), Loglogistic, Extreme Value, Inverse Gauss, Weibull, Exponential, and Lognormal. The most dominant ones found to be the first three. Even though the Inverse Gauss density performed similarly to the Gamma distribution, it is not included in the analysis to follow, as the Gamma distribution is widely adopted in literature. Finally, the Extreme Value distribution performed, in total, worse than the other ones.

For the purpose of fitting the selected distributions’ density to the sample frame-size sequence histogram, although various full histogram-based methods (e.g., [20]) have been tried in literature, as well as maximum likelihood estimations (MLE), we followed the approach of the simple moments matching method. This method has the advantage of requiring only the sample mean frame size and variance quantities and not full histogram information. Thus, taking into account that the sequence is stationary—and as a result the mean and the variance values are almost the same for all the sample windows—it is evident that only a part of the sequence is needed to calculate the corresponding density parameters. Furthermore, this method has the feature of capturing accurately the sample mean video bit rate, a property that is not ensured in the case of MLE or histogram-based models. However, in the cases of not satisfactory fit by none of the examined distributions (as in the case of the TKN traces) a histogram-based method can be applied.

If is the mean, the variance of the sample sequence and the mean and the variance of the logarithm of the sample , then the distribution functions and the corresponding parameters derived from the moments matching method are given by the following equations, for each distribution correspondingly, (1): Gamma, (2): Inverse Gamma, and (3): Loglogistic: where , and where and where , , and .

Given the dominance of the above distributions, modelling analysis and evaluation will be presented for the above three densities. The numerical results (densities’ parameters) from the application of the above parameters-matching methods appear in Table 2. The modelling evaluation of the above methods has been performed from the point of queueing. As a consequence, we thoroughly examined fits of cumulative distributions. This was done as follows: we plotted the sample quantiles from the sample cumulative frequency histogram and the model quantiles from the cumulative density of the corresponding distribution. The Q-Q plot of this method refers to cumulative distributions (probabilities of not exceeding a threshold).

Figures 1(a)1(c) and Figures 2(a)2(c) present Q-Q plots for all traces of both Case 1 and 2, respectively. The results suggest that for fitting videoconference data, the coding algorithm used should be taken into consideration. There seems to be a relationship between the coding algorithms and the characteristics of the generated traffic. For instance, for H.261, in most cases, the dominant distribution is Gamma (1), as can be verified from the Q-Q plots depicted in Figures 1(a) and 1(b), and for H.263 and H.263+, the Loglogistic density (3) has a more “stable” performance than the other two (Q-Q plots shown in Figures 1(c) and 2(c). The Inverse Gamma density (2) seems to be suitable for H.263 traffic (see Figure 1(c)) although it was outperformed by the Loglogistic density in some cases. However, as will be commented upon later, it did not provide a solution in all cases of constrained traffic.

We must note that in Case 2, where a constrain was imposed, the moments matching method for calculating the distribution’s parameters did not always provide a good fit, and performed as shown in Figures 2(a)2(c) (inverse Gamma and Loglogistic are depicted. The Gamma density provided similar fit). To provide an acceptable fit, a histogram-based method proposed for H.261 encoded traffic in [20], known as C-LVMAX, was used. This method relates the peak of the histogram’s convolution to the location at which the Gamma density achieves its maximum and to the value of this maximum. The values of the shape and scale parameters of the Gamma density are derived from: and where is the unique maximum of the histogram’s convolution density at . Numerical values for this fit appear also in Table 2 (for Case 2 only). Figures 2(a)2(c) show how the three distributions fit the empirical data using the method of moments (Inverse Gamma, Loglogistic) and the C-LVMAX method (Gamma C-LVMAX). The Inverse Gamma density could not be calculated for all the constrained traces Case 2—, due to processing limitations (for large a, b parameters the factor in (2) is very small, near zero, and consequently its inverse quantity could not be calculated (however, the values of the parameters of the Inverse Gamma density for Case 2—Traces 5, 6, 8, 9 are given in Table 2)).

Summing up the above analysis, it is evident that the Gamma density is better for H.261 unconstrained traffic, the Loglogistic for unconstrained H.263, H.263+ traffic, and the C-LVMAX method for all cases of constrained traffic (including H.264 traffic, as proposed in [32]). However, if a generic and simple model needed to be applied for all cases then the most dominant would be the Loglogistic density.

3.2. Calculation of the Autocorrelation Decay Rate of the Frame-Size Sequences

At this point, we may discuss about the calculation of the autocorrelation decay rate of the frame-size sequence of the unconstrained traces (as denoted in the previous sections, constrained traffic appeared to be uncorrelated and as a result the decay rate of its autocorrelation function can be set to zero). From the Figures 3(a)3(d), it is observed that the ACF graphs of unconstrained traffic exhibit a reduced decay rate beyond the initial lags. It is evident that unconstrained video sources have very high short term correlation, feature which cannot be ignored for traffic engineering purposes. This is a behaviour also noted in earlier studies [4].

To fit the sample ACF, we applied the model proposed in [20] that is based on a compound exponential fit. This model fits the autocorrelation function with a function equal to a weighted sum of two geometric terms: where , are the decay rates with the property: . This method was tested with a least squares fit to the autocorrelation samples for the first 100, since the autocorrelation decays exponentially up to a lag of 100 frames (short-term behavior) or so and then decays less slowly (long-term behavior). This match is more than enough for traffic engineering, as also noted in [37]. What is notable is that using this model, the autocorrelation parameter is chosen not at lag − , as in DAR model. For each encoder (in Case 1), the parameter numerical values of the above fit appear in Table 2.

4. Conclusions

This paper proposed modelling methods for two types of videoconference traffic: unconstrained and constrained. The analysis of extensive data that were gathered during experiments with popular videoconference terminals and different encoding schemes, as well as, of traffic traces available in literature, suggested that while the unconstrained traffic traces exhibited high short-term correlations, the constrained counterpart patterns appeared to be mostly uncorrelated, in a percentage not affecting queueing.

We used the measured data to develop statistical traffic models for unconstrained and constrained traffic. These models were further validated with different videoconference contents (low motion and high motion, TKN library). Different statistical models for fitting the empirical distribution (method of moments and C-LVMAX method) were examined.

For fitting the videoconference data, the coding algorithm used should be taken into consideration. There seems to be a relationship between the coding algorithms and the characteristics of the generated traffic. For instance, for H261, in most cases, the dominant distribution is Gamma, and for H263 and H263+, Loglogistic has a more “stable” performance. Moreover, the Inverse Gamma density could not be calculated for all constrained traces, due to processing limitations. This fact constitutes the Inverse Gamma density as impractical as a generic model for H.263 traffic.

Regarding the unconstrained traces, a careful but simple generalization of the DAR model can simulate conservatively and steadily the measured videoconference data. For the constrained traces, the traffic can be captured by the C-LVMAX method via a random number generator, producing frames at a time interval equal to the sample. On the other hand, if a moments matching method needed to be applied, then the Loglogistic density is a direct solution. Another interesting assumption is that the traffic trends remain invariant when a different network constrained is selected, as evident from the TKN traces. So, the proposed model for the constrained traffic can be applied without taking into account the specific network constraint.

It is evident that if a generic and simple model needed to be applied for all cases of videoconference traffic then the most dominant would be the DAR model based on the fit of the Loglogistic density with a decay rate properly assigned to the fit of the sample ACF at the first 100 lags (although a 500 lags fit would lead to a more conservative queueing performance), for the case of unconstrained traffic, and to zero for the constrained traffic.

Future work includes the integration of the proposed models in dynamic traffic policy schemes in real diffserv IP environments. The study of semiconstrained traffic cases, although their counterpart “worst-case” full-constrained cases cover their traffic trends, is of particular interest, too.