Abstract

Video service has become a killer application for mobile terminals. For providing such services, most of the traffic is carried by the Dynamic Adaptive Streaming over HTTP (DASH) technique. The key to improve video quality perceived by users, i.e., Quality of Experience (QoE), is to effectively characterize it by using measured data. There have been many literatures that studied this issue. Some existing solutions use probe mechanism at client/server, which, however, are not applicable to network operator. Some other solutions, which aimed to predict QoE by deep packet parsing, cannot work properly as more and more video traffic is encrypted. In this paper, we propose a fog-assisted real-time QoE prediction scheme, which can predict the QoE of DASH-supported video streaming using fog nodes. Neither client/server participations nor deep packet parsing at network equipment is needed, which makes this scheme easy to deploy. Experimental results show that this scheme can accurately detect QoE with high accuracy even when the video traffic is encrypted.

1. Introduction

Video service has become a killer application for mobile terminals and most of video traffic is carried by the Dynamic Adaptive Streaming over HTTP (DASH) technique. Mobile video traffic, which accounted for 60% of the total mobile traffic in 2016, is expected to rise to 78% by 2021 [1]. This significant growth is accompanied by the wide adoption of DASH standards [2, 3]. DASH has distributed video on large scale owing to the reuse of the existing HTTP infrastructure and ability to penetration through firewall. It offers video viewers the possibility of avoiding video play-out interruptions in case of variations in terms of network conditions and adaptive changes in video bit rate.

In order to provide better Quality of Experience (QoE) for video users, network operators have to understand and monitor the video quality perceived at users, which has become a hot topic in recent years [4]. Some existing solutions use measuring mechanisms at client/server side to probe the QoE [5, 6]. However, these solutions are unfeasible for network providers because they are not able to access the measuring results at client/server side. Some other solutions investigated how to measure the QoE inside the network. These network-based solutions relied on deep packet inspection (DPI) or deep packet parsing (DPP) to evaluate the QoE [79]. However, more and more video services are being encrypted in order to protect user privacy [10], which means that these solutions will not work well soon.

Mobile Fog Computing (MFC), which brings the computing capabilities close to mobile users, provides a potential solution to probe users’ QoEs at network edge without weakness of the above work [11]. In this paper, we propose a Fog-assisted Real-time QoE Prediction (FRQP) approach to enable network provider to predict users’ QoEs with slightly increased computing power. Specifically, we deploy a probe mechanism at fog nodes to observe bidirection video traffic which enables us to infer users’ QoEs according to the temporal features of the bidirection video traffic. FRQP is based on the network-measured traffic, which means that FRQP can work well without client/server’s participation. FRQP also does not need deep packet parsing since it only observes packet header information.

Our contributions in this paper are as follows: we design an MFC-assisted architecture, which uses fog computing capability to predict QoE from two-way traffic; we for the first time divide the normal playing duration of a video into two subphases so as to effectively predict user QoE; we creatively introduce the concept of request distance to characterize the density of request packets so as to avoid false detection of rebuffering events; we conduct experiments and the results validate the effectiveness of our proposed approach.

The remainder of this paper is organized as follows. Section 2 presents the scenario under study and how DASH works. Section 3 presents the challenge in QoE prediction. Section 4 describes our QoE prediction method. Section 5 presents our experiments and evaluation results. Section 6 introduces related work, and conclusions and future work are given in Section 7.

2. Application Scenario and Working Mechanism of DASH

2.1. Application Scenario

In this paper, we propose a fog-assisted real-time QoE predicting approach to enable network provider to predict users’ QoE with slightly increased computing power. Its application scenario is shown in Figure 1. In the figure, the mobile user device hosts a DASH client and connects to Access Node (AN), which can be a wireless network such as Wi-Fi and LTE. The AN, as a fog node, connects to DASH server via backhaul network, e.g., the Internet. Both fog node and backhaul network are managed by network operators.

We deploy a probe mechanism at fog nodes to observe the bidirection video traffic which enables us to infer users’ QoE according to the temporal features of the bidirection video traffic. Specifically, it periodically collects the bidirection video traffic and predicts users QoE from the traffic. The details of the prediction algorithm will be described in Section 4. The fog node can report the predicted QoE results to cloud that can allocate network resource accordingly in order to improve user QoE. In this paper, we do not care how the fog node communicates with the cloud but focus on how to predict user QoE.

The benefit of deploying probe mechanism at fog node is as follows. The probe mechanism needs accurate observation of temporal feature of bidirection video traffic to infer the user QoE. The nearer the probe mechanism is located away from client, the less chance cross traffic interferes with the video traffic.

2.2. Working Mechanism of DASH

In order to better understand the idea behind our proposed approach, we will first describe the working mechanism of DASH in this subsection.

Figure 2 shows an abstracted model of DASH delivery system. The DASH server encodes the video file into multiple versions with different qualities and slices each video file into video chunks with the same playtime. The DASH server uses HTTP to provide video services. When a client pushes the start button, it sends the server a HTTP GET message for fetching the corresponding video chunks. The fetched chunks will be kept in local buffer at the client. When the buffer has received enough number of chunks, the local player will start to play on the screen by withdrawing the chunks from the buffer continuously.

After initialization, the video client enters steady state with normal video playing, which is usually divided into two states [12]: ON and OFF, as shown in the down part of Figure 3. With the client fetching chunks from server continuously, the number of chunks in buffer, which we call buffer size, also increases. Once the buffer size reaches the max threshold, the client will stop downloading and this state of client is called OFF state. When the buffer size drops to the min threshold because of the player’s continuous withdrawing of chunks from the local buffer, the client begins to send request messages to the server for starting downloading again. The state in which the client fetches data from server continuously is ON state. From Figure 3 it can be seen that, corresponding to the working states, the traffic between the client and server exhibits ON-OFF pattern and the ON-OFF pattern is regular. Accordingly, the buffer size at the client oscillates regularly (see the top part of Figure 3).

In a video viewing session, the video playing will be interrupted when the buffer is drained out. If this happens, the playback will be frozen and the client will enter initialization to locally accumulate enough video chunks again. The video interruption is referred to as rebuffering or frozen event. Rebuffering event is an important factor affecting end-user perceived QoE [13, 14].

3. Design Challenge

Because of the importance of rebuffering event in video streaming services, in this paper we use rebuffering event as the key metric for characterizing QoE at users. This work aims to use probe mechanism at fog node to predict whether a rebuffering event occurs at client. According to the above-introduced DASH working mechanism, it seems that we can easily predict user’s QoE by monitoring the ON-OFF pattern of traffic between user and DASH server. For example, in Figure 3, it can be seen that video traffic presents apparent ON-OFF pattern. In ON state, a client continuously issues HTTP request messages to video server and the server continuously sends data chunks to the client. In OFF state, there exists no traffic between the client and server. From these observations, we can reasonably assume that video is played smoothly if the probe mechanism observes ON-OFF traffic between client and server.

Unfortunately, in reality, inferring the occurrence of rebuffering event is not easy. Next, we will set up experiments in a lab controlled environment to show the challenge.

The experiment settings are as follows. We started a DASH client under an interface with sufficient bandwidth to enable a DASH client to play video smoothly; we also limited the bandwidth of terminal interface in a certain time period to produce rebuffering events. In the experiments, we captured the packets transferred between the client and DASH server and also buffer size at the client. Specifically, we split time into fixed sampling intervals such that there is at most one request falling into a sample interval and counted the amount of bytes fetched from the server in each sampling interval as the download volume.

Figure 4(a) shows the variation of bandwidth observed at user terminal interface during the video playback. The horizontal axis is time, and the vertical axis is the bandwidth. We limit the bandwidth to 20 Mbps in duration 2 (i.e., from 120 to 184 seconds about). Figure 4(b) shows the buffer size variation with time. In this figure, according to the buffer size we divide the playback into three phases: Steady State (SS), Closing Frozen (CF), and frozen phase. In SS phase, the buffer size is between the min and max threshold. In CF phase, the buffer size is smaller than the minimum threshold but larger than zero while it is zero in frozen phase. Figure 4(c) shows the download volume variation with time. We also used circle points to mark those time instants at which the client issues request packets. To ease the visualization, we move them vertically as shown in Figure 4(d). It is worth noting that it takes time for the client to respond to the bandwidth variation. Hence there exists delay between the variation of bandwidth and those of buffer size and bidirection traffic.

There is apparent ON-OFF traffic pattern during smooth video playing. In Figure 4, it can be seen that the player works smoothly in durations 1 and 3, since there are sufficient bandwidths (see Figure 4(a)). Accordingly, the buffer size oscillates regularly. Moreover, the download and request traffic also show apparent ON-OFF pattern, as shown in Figures 4(c) and 4(d), respectively, which is consistent with what we discussed in Section 2.

However, there also exists apparent ON-OFF traffic pattern when rebuffering event occurs, which means that we cannot merely use ON-OFF traffic pattern to predict occurrence of rebuffer event. In time duration 2 (see Figure 4(a)), the client will encounter frozen events because the bandwidth has been limited to 20 Mbps. The buffer is drained out in this duration (see Figure 4(b)). This is why we call this duration as frozen phase. From Figures 4(c) and 4(d), it can be seen that the traffic in both directions in frozen phase is ON-OFF. It means that we cannot simply use ON-OFF traffic pattern to figure out if a rebuffer event will occur or not because there is apparent ON-OFF traffic pattern in both SS and rebuffering cases.

One simple way to tackle this problem is to use the request density to infer the possible occurrence of frozen events as described in [15]. For example, request density during frozen phase in Figure 4(d) is denser than that in SS phase. However, this method would cause false detection of rebuffering event in the following situation: in the duration labelled as CF in Figure 4(b), the video playing is smooth; however, the request sequence is denser (see Figure 4(d)). Thus, in this case, this method will falsely report occurrence of rebuffering event.

To address the above challenge, in this paper, we propose a method to identify rebuffering event by using combination of two-way traffic in an online video viewing session. The details will be discussed in the next section.

4. Prediction Method

In this section, we first explain our definition of the client working phase and establish the relationship between working phases at a user and traffic pattern observed. Then we show detailed method to characterize the traffic pattern. Finally, we show how the scheme proposed in this paper works.

4.1. Redefinition of Client Working State

According to the DASH working mechanism, we divide the client operation into three phases, each of which corresponds to a different level of buffer state. These phases and their definitions are shown in first column of Table 1. In this table, we also give the traffic pattern and its corresponding QoE by observing the experiment results in Figure 4. For example, during the CF phase, we can see the following: first, it is seen that the download traffic is in OFF pattern (see Figure 4(c)); second, the client issues request densely (see Figure 4(d)); and, finally, the video is played smoothly because buffer size is above zero (see Figure 4(b)). Thus, the corresponding rows for the CF phase are “OFF”, “dense”, and “no rebuffering”, respectively.

Existing work only classifies the operation of a client into two states according to the buffer size: buffer size above zero and equals to zero. However, to predict the occurrence of rebuffering events, we find that it is crucial to divide the client operation when buffer size is above zero into two substates, i.e., SS and CF (see Table 1) because of the following reasons: traffic pattern of CF is different from SS and frozen; CF means imminent frozen although the client is not in frozen state yet.

The results in Table 1 motivate us to predict QoE by observing the traffic pattern of two-way traffic between client and video server during the playback of a video. However, the descriptive term of traffic patterns used in Table 1 is still qualitative which is infeasible in actual implementation of QoE prediction. During the playback of online videos, what we can observe is the time sequence of two-way traffic between client and server. Thus, we need to establish the relationship between time sequences of two-way traffic and corresponding traffic patterns, which will be explored by classification of machine learning in the following.

4.2. Metrics of Traffic Pattern
4.2.1. Download Throughput

It needs to quantize the download traffic; we use “download throughput” to describe the speed of traffic downloading. Assume an in-network observer can count the traffic volumes in both directions at discrete time instants. Specifically, are the sampling time instants. Denote the current time instant, which is the time instant when we predict QoE, as . At time instant , we define as the amount of bytes transferred from the server to the client in time duration [] and define vector =[….], where is length of observation time duration.

Denote the download throughput during interval [, ] with length as , which is calculated aswhere The value of may be zero since it is possible that no video content is downloaded during interval [, ].

We use moving average to smooth the download throughput. For the download throughput at time , the corresponding moving average download throughput is calculated as follows. where is a parameter for the moving average and in this paper is fixed to be 0.98. Thus, for time duration [], we can get vectors and .

4.2.2. Request Distance

A simple metric to describe the density of request is to use request intervals. For example, in Figure 5, assume we need to predict QoE at time instant , also known as the current time instant, and are the time instants at which previous request packets are sent, where is length of observation time duration. The request interval is the time difference between two adjacent time instants of sending requests.

However, we find that different requests may have different contributions to predict the QoE at a specific time instant. For example, as shown in Figure 5, given request instant history , we find the time interval between request instant and current time instant contributes more to predict QoE. We will show this by the following experiment results.

In the absence of request interval, we introduce a new metric, known as request distance. The concept of request distance is also shown in Figure 5. Specifically, the request distance () refers to time difference between and time instant at which the previous request arrives, which is calculated aswhere is time instant at which the previous request is sent. At current time instant, it is possible that no request arrives, just like time instant .

Figure 6 shows the CDF curves of request distance and request interval for SS and frozen phase. The horizontal axis is the discrete request distance, which is defined as request distance divided by sample interval. The vertical axis is the CDF. The two thin lines with label “” are CDF of request interval because when the request distance is just the request interval. The two thick lines with label “” are the CDF when we observe, for example, the request distance. In Figure 6, we can see that the difference between the curve labelled with “SS ()” and its corresponding curve labelled with “frozen ()” is apparent, which means that we can discern state SS and state frozen by observing the request distance when . By contrast, according to the two thin lines, we cannot discern state SS and state frozen by observing the request interval because the two curves in this case are very close. Thus, it is more effective in figuring out if rebuffering event occurs using the request distance than request interval. In the next section we will further prove this conclusion by experiments.

4.3. Features Extraction and Selection

Table 2 summarizes the features extracted from the traffic patterns. We shall use information gain to evaluate the importance of each feature and select those most important features as the classifier inputs. For this purpose, the following operations will be taken.

Firstly, we construct training set To ease the presentation, we rename the items in as in which is the feature () and is the class label vector for given features , where is the class label of feature and it is a QoE label in Table 1. In order to get the QoE label for each feature, we select some users to work as trainers to report their buffer size information using probe mechanism. Then the information will be translated into QoE labels.

Secondly, we calculate information gain to evaluate the importance of a feature. The information gain for feature is defined aswhere entropy iswhere denotes, in the whole training set, the probability that the class label equals “1” or “0”, which represent the video being frozen or not, respectively.

The conditional entropy iswhere is a specific value of . denotes the probability that . Andwhere denotes the probability that the class label ( = “1” or “0”) when .

By this way, we obtain the information gain of each feature in .

4.4. QoE Prediction Algorithm

Based on the features extracted and selected from the bidirection traffic, we propose a Fog-assisted Real-time QoE Prediction (FRQP) approach working as probe mechanism which is implemented at fog node as shown in Figure 1. The FRQP has two working states: training state and predicting state.

At the training state (offline phase), some users are selected to use special devices, each of which will periodically report its buffer size to FRQP, then FRQP translates the report into QoE labels. Then FRQP will train a classifier based on the reported buffer size working as QoE labels and features extracted from the observed bidirection video traffic.

At the predicting state (online phase), FRQP will predict a specific user’s QoE by feeding the features extracted from this user’s bidirection video traffic.

We will select a subset of features from to reduce complexity of training state. Denote the number of features fed into classifier as , which is less than or equal to . We will sort the features in the decreasing order of information gain and select top features as input of the classifier. The value of will be tuned using experimental method.

5. Experiments and Evaluation

5.1. Experiment Setup

In order to evaluate the performance of FRQP, we set up a testbed in a controlled lab environment. Figure 7 shows the experiment setup. In Figure 7, an HP server running centos Linux 7 and Apache HTPP Sever acts as the DASH server. A mobile computer running windows 7 is used as the client. The mobile computer also acts as a capture device, on which a capture tool is installed to monitor traffic. FRQP, which runs as an application, is installed on the mobile computer as well. Another PC computer is used to install a bandwidth limitation module (BLM) to limit the bandwidth between the video server and client. The bandwidth limitation module is implemented with Iperf software, which sends background traffic at rate of 80 Mbps between the server and the client. The limited bandwidth is in the range of 20 Mbps and 100 Mbps.

At the server side, a video clip, Big Buck Bunny, is hosted and available for retrieval by the client. This video file, lasting for about 10 minutes, has twenty different representations, in which encoding bitrates range between 50 Kbps and 500 Kbps. These representations of the video are divided into 6 seconds of chunks. At the client side, a Google Chrome browser runs, which is able to record the status of the playback, such as requested bitrate and buffer filling level. The information of buffer size is translated to QoE labels for training and evaluation.

5.2. Feature Evaluation

In order to select appropriate features for classification, we calculate the information gains of all features as shown in Table 2. Table 3 lists the eleven features with top highest information gain . We get some insights from the result. First, the download throughput is dominant. Second, the request distances are noticeable. Based on the ranking of the features, we select those features with top information gains for classification. Here the difficulty in decision is how many features listed in Table 3 should be selected for classification. We make a decision through experiments which will be described in the next subsection.

5.3. Classification Results

We study the impact of the number of features on the classification result. The results of decision tree classifier are shown in Figure 8. It can be seen that both of the precision and recall rate are high enough when the top eight features (see Table 3) are used as input; the performance improvement is insignificant when more features are used. We therefore use the top eight features in the following tests.

Using the selected features, we evaluate the performance and accuracy of different classification algorithms based on the machine learning tool Sklearn. Specifically, we compare the performance of five different machine learning algorithms: binary decision trees, random forest, support vector machine, naive Bayes, and classification based on linear regression. The results are based on 5-fold cross validation. In general we observe that random forest and decision trees perform better than the other three with satisfied classification rates while having less consumption of CPU time. From the perspectives of accuracy, simplicity, explainability, and execution speed, we finally adopt decision trees as our classifier.

We also study the impact of depth of decision tree on the performance of the classification. The classification result is depicted in Figure 9. The results show that depth of 11 is enough.

The prediction result with 8 features and 11-depth is shown in Table 4. The results show that rebuffering events can be recognized with a precision of 98% while there are 3% rebuffering events missed and just 1% of no rebuffering events are identified wrongly. Comparatively the results using request interval as features are also shown in Table 4. Here the number of request intervals is the same as that of request distance. It can be seen that, using request interval as the metric, there would be 21% of false detection of rebuffering events, although the detection of those no rebuffering events is accurate.

In the literature, there have been many approaches for estimating or measuring QoE of online video streaming services. According to how the data collection works and where the approaches are implemented, existing work can be classified into the following three categories: approaches assessing QoE at the client/server side, approaches assessing QoE in the network, and hybrid approaches.

The first type of approaches estimates QoE based on measurement tools that run at the client/server to collect QoE statistics [16]. However, the information collected by these tools is not accessible for the network providers, which make it difficult for them to guide network resource allocation according to up-to-date user QoE.

The second type of approaches can be further divided into two subcategories. The first subcategory relies on deep packet inspection [8] or logs obtained from a network node [15, 17, 18] to infer the QoE. In [8], the manifest files are parsed to obtain information for traffic prediction. Reference [19] gets the video information using packet traces. These existing work assessed QoE based on abundant information of video packets, e.g., complete manifest file and timestamps with respect to requests (e.g., HTTP requests, redirected HTTP requests, and the HTTP response) for each video chunk. The costs are efforts made in collecting and extracting such video information. A comprehensive overview on this topic was presented in [4, 20]. Meanwhile, as more and more video traffic is encrypted, it will impact the ability of operators to assess the user’s QoE via this type of approaches. As for the second subcategory approaches, QoE is estimated by measuring the network-related QoS parameters such as throughput, loss rate, delay, and accordingly build model to map these network-related QoS parameters into user QoE [21]. Most of these works leverage machine learning (ML) technique to estimate QoE [2125]. The study in [26] used network performance metrics, such as delay and packet losses, while [24] mapped application QoS (such as video bitrate, frame rate) to assess the QoE. The authors in [13] proposed a model to predict user engagement in terms of viewing time and number of visits using video application QoS as input. The video application related QoS include average bitrate, join time, buffering ratio, and rate of buffering, which are fed back from the client software at video viewers. An overview of QoE prediction based on QoS using machine learning techniques and more in-depth discussions were presented in [27, 28].

The third category combines the measurement at both client/server and network to estimate the QoE. The client/server report certain video playing setting to the network and the network accordingly infers the QoE based on such information [7, 29]. However, this type of approaches needs to modify video delivery protocols, which made the measurement hard to be deployed.

Unlike the above work, our method in this paper is more simple and practicable since it can predict QoE from network traffic even when it is encrypted. Moreover, all the above existing approaches work offline. However, in practice, it is desired for network providers to detect QoE in a real-time fashion and then allocate network resource to provide better services. Our proposed method in this paper meets this demand.

The authors in [10] measured QoE from encrypted traffic. The main difference between our work in this paper and that in [10] is that we use less information, i.e., just bidirectional traffic quantity to make traffic prediction.

Currently, some studies about integrating edge computing in multimedia applications have appeared. The work in [30] highlights the potentials of using edge computing in multimedia services, interactive media applications, and video streaming. In [31], Mobile Edge Computing (MEC) server is used as a controlling component to implement the video caching strategy and also to adjust the video bitrates flexibly. The work in [32] designs and implements a video streaming service exploiting MEC functionalities. The study in [33] proposes an architecture for adaptive HTTP video streaming tailored to an MEC environment. In the proposed architecture, the adaptation algorithm runs as an MEC service, with an aim to relax network congestion while improving the quality of user experience. In [34], the authors discussed the network service migration from the cloud to fog nodes for video distribution with QoE support. The aforementioned works present a generic discussion on system architecture, cache, adaptive bitrate, and service migration for edge technologies. However, there is no video quality assessment.

7. Conclusions and Future Work

In this paper, we presented a novel method to predict buffering events in real time at network edge, i.e., fog node. Our solution is based on the monitoring of network traffic, which means that it works without client and server’s participations. In addition, our proposed solution does not need deep packet parsing. Experimental results show that our solution can accurately detect buffering events with about 98% accuracy.

In the future, we will explore how to extend our solution to work in multihop network environment and further how to accelerate the classification calculation. We will also study the communication between the fog node and cloud.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by NSFC (61271199) and the Fundamental Research Funds in Beijing Jiaotong University (2011JBZ003).