Abstract

We investigate the video assignment problem of a hierarchical Video-on-Demand (VOD) system in heterogeneous environments where different quality levels of videos can be encoded using either replication or layering. In such systems, videos are delivered to clients either through a proxy server or video broadcast/unicast channels. The objective of our work is to determine the appropriate coding strategy as well as the suitable delivery mechanism for a specific quality level of a video such that the overall system blocking probability is minimized. In order to find a near-optimal solution for such a complex video assignment problem, an evolutionary approach based on genetic algorithm (GA) is proposed. From the results, it is shown that the system performance can be significantly enhanced by efficiently coupling the various techniques.

1. Introduction

With the explosive growth of the Internet, the demand for various multimedia applications is rapidly increasing in recent years. Among different multimedia applications, Video-on-Demand (VOD) is playing a very important role. With VOD, customers can choose their desired video at arbitrary time they wish via public communication networks. Nevertheless, the VOD system is required to store several hundreds of videos as well as serve thousands of customers simultaneously. In order to build a cost-effective and scalable system, various designs have been proposed in terms of system architecture [1], bandwidth allocation [2], and transmission schemes [3]. Among different techniques, data broadcasting and proxy caching are two commonly used approaches.

To improve the scalability of a VOD system using data broadcasting, the broadcast capability of a network is exploited such that video contents are distributed along a number of video channels shared among clients. Staggered broadcasting [4] is the simplest way to support broadcast services in the early day. After that, a number of efficient broadcasting protocols [58] were proposed. Apart from data broadcasting, hierarchical architectures [3] have also been explored to reduce the resources requirement. To leverage the workload of the central server and reduce the service latencies, an intermediate device called proxy is sit between the central server and the clients. In such architecture, a portion of video is cached in the proxy. The request generated by a client is served by the proxy if it caches the requested portion of the video. Meanwhile, the central server also delivers the remaining portion of the video to the client directly. Existing caching mechanisms can be mainly classified into four categories [9]: sliding-interval caching [10], prefix caching [11], segment caching [12], and rate-split caching [13]. Content distribution network (CDN) is an extension of the proxy caching in which a number of CDN servers are deployed at the edge of the network core. Unlike proxy which only stores a portion of the video, a full copy of the video is replicated in each CDN server. Then, the clients request the video from their closest CDN servers directly. This architecture significantly reduces the workload of the central server and provides a better quality of service (QoS) to the clients. Nevertheless, most of the previous works mainly focused on providing VoD services in a homogeneous environment. In a practical situation, the clients can connect to the network, say Internet, with different communication technologies such as modem, ASDL, and wireless link. Their downstream rates vary from 56 kbps to 100 Mbps or even higher. To meet different clients’ bandwidth requirement, the videos are encoded into different quality levels by the replication or layering approach. Replication [14] provides multiple versions of the video but at different data rates and one of them will be retrieved according to the requested video quality from the client. On the other hand, layering [15, 16] encodes the video into a number of layers and the client needs to retrieve several video layers concurrently to meet his/her requirements. To adapt such coding scheme, Kangasharju et al. [16] considered delivering layered video through proxy cache and developed a model for the layered video caching problem to determine which videos and which layers should be cached in order to maximize the revenue from the streaming services. The effectiveness of replication and layering for video transmission in a heterogeneous environment has been investigated in [1719]. Kim and Ammar [17] compared the replication and layering approaches and the results showed that replication is better. However, they only focused on time-dependent streaming of a single video from the central server to the clients. Later, Hartanto et al. [18] studied the system performance with a proxy cache and compared replication with layering in a hierarchical framework. It was found that layering is more appropriate when a proxy server is used. In [19], the authors extended this work by exploring the proxy cache coupled with video broadcast technology. It was observed that layering can have further improvement in such framework. In addition, it was found that the proxy size, the efficiency of the broadcasting scheme, the bandwidth reserved for broadcasting as well as the layering overhead have significant impacts on the system performance. In general, the performance of layering is superior to that of replication. However, from the result in [19], replication performs better in some situations. For instance, replication should be used when the proxy size is zero. Thus, in this paper, we not only use both coding schemes to support different quality of video streams but also explore a hierarchical VoD system using proxy caching coupled with video broadcasting to further improve the system performance in a heterogeneous environment. Different from [19], in the proposed framework, the video streams with different quality levels can be encoded by replication or layering. Each of the video streams are then either cached in the proxy server or delivered over the broadcast/unicast channels. The objective of this work is to determine the appropriate coding strategy as well as the efficient transmission mechanism for a specific quality level of a video such that the overall system blocking probability is minimized. In order to find a near-optimal solution for such a complex video assignment problem, an evolutionary approach based on a genetic algorithm (GA) is proposed. GA has been successfully demonstrated as a powerful optimization tool for solving various real-world complex problems [20] and has been deployed in some VoD applications, such as those mentioned in [21, 22]. The main contribution of this paper is that we explore the benefits of complementary coding schemes for a hierarchical VoD system. To determine the appropriate encoding schemes and the efficient transmission strategies, a mathematic model is formally stated to represent this complex video assignment problem. Then, we present an evolutionary approach based on GA to solve the proposed system model.

This paper is organized as follows. The proposed system architecture and the system model will be first described in Section 2. In Section 3, the formulation of the problem will be derived and the conditions to minimize the system blocking probability will be discussed. The optimal video assignment strategy using GA, where the fitness function and chromosome representation for the problem will then be outlined and explained in Section 4. In Section 5, the experiment results will be presented. Finally, some concluding remarks will be given in Section 6.

2. System Model

In this section, we describe the system architecture for video streaming services. Before we go into the details, the notations used in this paper are defined and listed in Table 1.

Figure 1 shows a two-tier VoD system which consists of one central server and several proxy servers. The central server, which has a large storage space to store M videos for clients, is connected to the proxy servers that are physically located closer to the clients. The clients can connect to the network with different communication technologies such as modem, ASDL, or wireless link and their downstream rates vary from 56 kbps to 100 Mbps. To cater for the heterogeneous requirement, video m will be encoded into different quality levels of video streams which will be delivered to the clients according to their capacity constraints. If the clients have a low bandwidth connection such as 56 Kbps, they will receive the videos encoded at a low bit rate. On the other hand, the high-quality video will be streamed to the customers having the broadband access capability. In the proposed architecture, th quality of video m, , can be encoded by the replication or layering approach. Note that a layered-encoded video incurs around 20%–30% overhead compared with a replicated video for the same quality level [17, 18, 23] and thus it requires more transmission bandwidth. Let be the overhead of the layered-encoded video where . Then, the relationship of the streaming rate of between these two approaches is given by .

It is assumed that the proxy servers are independent and a large group of heterogeneous clients is served by a single proxy server. The proxy server has a limited storage space of K bits to cache some of the popular videos for users’ repeating requests in order to minimize the transmission cost. Let denote a proxy cache map matrix, where is set to 1 if a copy of is stored in the proxy server. It is set to 0, otherwise. As mentioned, the videos can be layer-encoded or replicated with different quality levels and stored in the proxy server. For layering, the base layer can be decoded independently while the enhancement layers should be decoded cumulatively. That means, layer k should be decoded along with layer 1 to layer To find a feasible cache assignment solution, we define a coding approach instance as the vector e, where indicates the highest quality level of video m encoded by the layering approach reconstructed correctly. In addition, to satisfy the storage space constraint in the proxy server, we have where . The first term and the second term calculate the storage requirement in the proxy server for the layered video and the replicated video for video , respectively.

Upon receiving the user’s request, the proxy server will acknowledge the request if the requested item has been already cached. Otherwise, it will bypass the request to the higher level. Because the storage capacity of the proxy server is limited, some videos cannot be cached and eventually should be delivered from the central server. It is clearly seen that the system is not scalable as the bandwidth requirement will linearly increase with the arrival rate. Because of recent deployment of IP multicast delivery [24], to further enhance the system performance, broadcasting capability in such a hierarchical architecture is also exploited. Apart from storing the popular videos in the proxy server, some videos will also be broadcast to the clients over the backbone network. Thus, it is assumed that a generic network infrastructure that supports broadcasting operations is used to implement the broadcasting protocols. Since our focus is on the performance of the whole architecture, the broadcasting techniques are not our major concern. In general, any efficient protocols, such as those mentioned in [48], can be applied to the system framework. Let be the number of channels required for the protocol x to broadcast a video such that the start-up delay is insensitive to the clients. Given the bandwidth reserved for broadcasting , we define as a broadcast map matrix to indicate which quality level of a video should be sent over the broadcast channels. is set to 1 if a copy of is broadcast over the broadcast channels. Otherwise, it is set to 0. Therefore, the bandwidth required for broadcasting is equal to and . We can then construct a cache-broadcast map matrix , where to indicate whether is cached in the proxy server or delivered over the broadcast channels. is equal to 0 if is simply transmitted over unicast channel.

3. Problem Formulation

In this section, the optimization problem of the proposed system is formally defined. It is reported in [25] that the interarrival time of client requests in multimedia streaming applications are exponentially distributed. Thus, the client requests follows a Poisson process with a rate of . Let and be the popularity of video m and the probability of client requesting th quality of video, respectively, where and . As the request arrival processes for different videos with different quality levels are mutually independent, the request rate of is given by . It is assumed that the video popularity follows Zipf’s distribution [26] with the skew parameter . Then where . Without loss of generality, it is further assumed that the service time of each unicast channel handled by the central server is exponentially distributed with mean by considering the varying length of different videos.

As mentioned in Section 2, some of the requests can be satisfied by the proxy server and the broadcast channels but the central server still opens the dedicated channels to serve the clients due to the small proxy storage capacity and the limited broadcasting bandwidth. Equation (1) calculates the requests that go up to the central server for the dedicated streams: Since multiple qualities of video streams are delivered at different data rates from the central server to the clients, the average streaming rate of the dedicated channels can thus be found by where is the complement of . The first term calculates the average bandwidth of the dedicated channels required for the layered-encoded videos while the second term computes that for the replicated videos.

To evaluate the performance of the central server, denote B as the available bandwidth between the central and proxy servers. Therefore, on average, the central server can support N virtual channels concurrently for the clients, where . According to the Erlang’s loss formula [27], the system can thus be modeled as an M/G/N/N queueing system and the blocking probability is equal to If the bandwidth from the proxy server to the clients is large enough and no requests will be blocked, the overall blocking probability of the system is given by

Considering the coding approach (replication and layering) and transmission strategy (caching and broadcasting), the optimization problem (OPT) can thus be formally stated as follows: Equation (5) indicates the constraint that the total size of the cached videos is less than or equal to the proxy size and (6) shows that the broadcasting bandwidth is not larger than the bandwidth reserved for broadcasting.

4. Evolution Optimization

In this section, we exploit a GA-based approach to obtain a near optimal solution for the OPT problem in Section 3. We first briefly review the terminologies and operations of GA. Then, to solve the problem, the chromosome representation, the population size, and the fitness function for the OPT problem are discussed.

4.1. Genetic Algorithm

Genetic Algorithm (GA) is a population-based generic search method inspired by the survival of the fittest principal [2830] that is derived from the mechanism of natural evolution context, where the stronger individual would likely be the champion in a competing world. The potential solution to the problem known as chromosome is constructed by a finite length of gene represented by a finite-length string over some finite alphabet (e.g., in a binary form). A pool of chromosomes forms a population, which is randomly generated at the beginning of the process. In each iteration, GA performs multidirectional stochastic search through a genetic evolution process by applying a number of genetic operators to the individual of the current population in order to produce individuals for the next generation. In general, a genetic operator known as crossover is used to combine two or more individuals from the pool to produce new individuals in the next generation. To introduce a genetic variation into the individual, mutation operator is applied to alter the value of each gene (i.e., allele) in an individual randomly with a small probability. Based on the fitness of the individuals in the current population, the individuals with a higher degree of fitness will be selected as a member of the population in the next generation through the selection process of GA. After a certain generation, it is expected that the best chromosome can be obtained which is reasonably close to the optimal solution. Figure 2 shows the general procedures of GA. The detailed working principle and implementation of GA can be found in [2830]. GA has been successfully demonstrated as a powerful optimization tool for solving various real-world complex problems [20] and has been deployed in some applications, such as those mentioned in [21, 22].

4.2. Chromosome Representation

To represent the coding strategy and the caching mechanism of , 3 vectors are defined. Let vector and is set to 1 if is delivered over broadcast channel as mentioned. Then, vector , that is, , is defined (it is reminded that ). In addition, let be the binary form of ei for video i (note that is MSB while is LSB (MSB means most significant bit, LSB means least significant bit)). Since the highest value of ei is l, the number of bits required for representing ej is given by for all . Therefore, the chromosome can be represented in the form of binary string as depicted in Figure 3 and the allele space of each gene is . The total number of bits required for the chromosome can then be expressed by and thus the searching space includes possible solutions.

4.3. Population Size

Population size is a critical factor affecting the performance of GA. Basically, a large population size requires a high computational cost while a small population size increases the chance of premature convergence. Other than randomly choosing initial populations, Reeves [31] proposed the principle of minimum population sizes for -ary alphabets to decide an appropriate value. The author suggested a preferable property of an initial population such that “every possible point in the search space should be reachable from the initial population by crossover only.” This property can be satisfied only if there is at least one instance of every allele at each locus in the whole population of chromosomes [31]. Given the population size Z, the length of chromosome G and the cardinality of the gene at each locus, the probability that at least one allele is presented at each locus in the initial population () can be computed by where is the Stirling number of the second kind. Equation (7) provides a guideline to choose a suitable Z so that it is large enough to ensure a high probability in the initial population. For example, to achieve , the minimum value of Z should be 21 given , and .

4.4. Fitness Function

In GA, the fitness function is used to evaluate the goodness of a chromosome for the problem. The fitness function F of a chromosome is closely related to the output of the objective function (i.e., OPT) by this chromosome. Note that can be either cached in the proxy server or delivered over the broadcast channels if is set. However, it is obvious that the proxy capacity required for caching and the bandwidth required for broadcasting may exceed the limitations and the constraints in OPT are violated. A penalty scheme is thus applied to those chromosomes violating these constraints. Hence, we transform OPT to an unconstrained form to produce the fitness function: where is the penalty function. To reflect the condition of the low performers, we square the violation of the constraints [29].

5. Experimental Results

In our experiment, GAlib [32], which is a set of C++ genetic algorithm objects to perform optimization, is used to solve the OPT problem. It is assumed that there are 50 videos in the system and each of them is fixed as 90 minutes long and is encoded into seven quality levels. The client requests are modeled as the Poisson arrival process and the video popularity is followed by Zipf’s distribution with the skew parameter . Assume that the streaming rate of the base layer of all videos is  Kbps and all layers that have the same rate [33], that is, . As the backbone bandwidth is fixed, the proportion of bandwidth, , is reserved for video broadcasting, that is, . The results in [8] showed that less than 10 broadcasting channels are sufficient to provide delay insensitive VoD services. Hence, Hx is set to 10 for the following experiments. As reported in [23], the amount of overhead incurred by the layered encoded videos is varied from 0 to 30%. To analogize the heterogeneity of network environments, two requesting patterns, namely, “SCENARIO(A)” and “SCENARIO(B)”, are defined in our experiment [19]. “SCENARIO(A)” is to model the less heterogeneity environment where the system only serves two types of clients (e.g., modem and Ethernet), that is, but . “SCENARIO(B)” focuses on the high heterogeneity environment that all the qualities of a video are requested uniformly, that is, , for all . Table 2 summarizes the parameters used in the experiment.

We first evaluate the performance impact of various arrival rates to the blocking probability and compare the proposed system with the system using either the layering (S_L) or replication (S_R) approach (i.e., the system only uses layering or replication [19]). In Figure 4, as expected, the blocking probability is increasing when the arrival rate is increased under various configurations. It can be seen that the system with both layering and replication (S_MIX) can perform better than S_L and S_R in both scenarios. It can be found in Figure 4(a) that S_MIX can have a significant improvement in less heterogeneity environment. When the arrival rate is 0.1 req/s, the blocking probability of S_MIX is reduced to 0.018 (S_R is 0.048 and S_L is 0.143). Note that S_MIX can still obtain up to about 20% reduction of blocking probability if the arrival rate is increased to 1 req/s. In SCENARIO(B), it can be observed that S_MIX can have an improvement up to 8% as shown in Figure 4(b).

To investigate how the system can be improved by S_MIX approach, we first look at how the coding and cache strategy for different quality levels of videos in S_MIX is organized by GA as compared with that in S_L and S_R. Tables 3 depict the coding scheme and proxy-broadcast map for different system configurations. In the table, the coding and cache strategy for a specific quality level of video is represented by “()”, where “” and “ = coding scheme (R = Replication, L = Layering)”. “—” represents that the corresponding quality level is not required. We only show the configuration of the first 25 videos as the configuration of the rest videos are the same as the 25th one. In Table 3(a), it can be observed that all quality levels of the videos should be encoded by layering in S_L and only two quality levels are needed if replication is used in S_R. In S_MIX, it can be seen that the quality levels are encoded by the layering approach only if the upper quality levels of the corresponding video is cached in proxy or delivered over the broadcast channels. On the other hand, replication is used when the video is not cached or broadcast. As layering is suitable for caching and replication is favor to end-to-end transmission, S_MIX takes the benefits from both approaches. Unlike S_R and S_L that videos are cached according to the videos, S_MIX takes both coding strategy as well as the bandwidth usage into account. It is found that S_MIX allocates the cache space to most of the 2nd quality level of layered-encoded videos. Although the 1st quality level of the corresponding videos is required to be transmitted over the dedicated channel when the users request for the 2nd quality level of the videos, the server bandwidth requirement of S_MIX is still less than that of S_R because part of the video data can be obtained from the proxy server or the broadcast channels directly. Similar observations can been found in “SCENARIO(B).” Only cached or broadcast videos are layered-encoded and the others use replication so that more videos can be served by the proxy server or the broadcast channels as compared to S_R and fewer server bandwidth are required as compared to S_L.

In order to have a close look on the effectiveness of S_MIX, Figures 5 and 6 show the blocking probability of the systems when these parameter are varied. We first investigate the impact of the proxy size. Figure 5 illustrates the system blocking probability as the proxy size is changed. Increasing the proxy size results in fewer video requests to the central server and thus the blocking probabilities are decreasing. It can be seen that S_MIX can perform better than S_L and S_R in both requesting patterns, especially at low arrival (i.e., 0.3 req/s) and large proxy size. In Figure 5(a), S_MIX can have significant improvement but S_L and S_R only have a linear improvement when the proxy size is changed. When K is set to 10%, S_MIX obtains up to about 65% reduction of blocking probability. When the arrival rate is increased to 0.8 req/s, the system can still achieve 50% improvement compared to S_L. When the proxy size is increased, more layered-encoded videos with lower quality levels are assigned to the proxy server in S_MIX. Thus, more videos with less popularity can also be served by the proxy server directly. The similar trend can be observed in “SCENARIO(B)” which is shown in Figure 5(b). The results show that the blocking probability of S_MIX can be less than that of S_L up to 10%.

Figure 6 shows the blocking probability when the proportion of bandwidth reserved for broadcasting is changed. It can be seen that the system performance is greatly improved in S_MIX compared to S_L and S_R when is increased, especially in less heterogeneity network environment. Although the system blocking probability can be further reduced when is increased, the system will suffer from a problem that the remaining bandwidth is not sufficient for the less popular videos.

The skew parameter against the blocking probability is plotted in Figure 7. As expected, the blocking probability is increasing with the skew parameter. The performance of S_MIX is superior to that of S_L and S_R even if the popularity of all quality levels of all the videos are uniformly distributed, that is, . In “SCENARIO(A)”, the blocking probability of S_MIX is reduced to 0.5 (S_R is 0.696 and S_L is 0.686) when and . For high arrival rate, S_MIX can still achieve up to about 18% reduction of the blocking probability.

6. Conclusion

In this paper, we investigate a feasible enhancement solution to a hierarchical VoD system using proxy caching coupled with video broadcasting and appropriate coding schemes in a heterogeneous environment. In the proposed framework, different quality levels of video can be encoded by either replication or layering approach. Each of them is then either cached in proxy server or delivered over video broadcast channels/or unicast channels. The objective of this work is to determine the appropriate coding strategy as well as the suitable delivery mechanism to a specific quality level of video such that the overall system blocking probability is minimized. To solve this complex problem, an evolutionary approach based on a genetic algorithm (GA) is used for finding a near-optimal solution for this difficult video assignment problem. From the results, it can be seen that the system performance can be significantly enhanced by efficiently coupling the various techniques. In this paper, we focus on videos coded with MPEG2 with different coding layers. Recently, the new scalable video coding (SVC) extension of H.264/AVC standard [34] provides network-friendly scalability at a bit stream level has been proposed. We are going to investigate the performance of the system with this coding technique in our framework in the future.