Abstract

This paper proposes a service-aware cross-layer approach between application/transport layers on the mobile terminal and link layer on the wireless base station to enable dynamic control on the level of per-packet error protection for multimedia data streams. Specifically, in the context of cellular networks, the proposed scheme enables the mobile terminal to specify to the base station the desired level of Hybrid ARQ (HARQ) protection by using an in-band control feedback channel. Such protection is dynamically adapted on a per-packet basis and depends on the perceptual importance of different packets as well as on the reception history of the flow. Experimental results demonstrate the potential benefits deriving from the proposed strategy either for audio and video real-time streams as well as for TCP-based data transfers.

1. Introduction

Nowadays, IP networks and the Internet in particular are used as transport facilities for a whole plethora of novel applications that go far beyond the data transfer for which IP was originally designed. Those applications introduce specific requirements in terms of delivery performance of the underlying transport infrastructure. Indeed, as services are evolving to a “triple play” vision, implying delivery of data, voice, and video to the end user using the same IP transport facility, strong emphasis is put on providing a satisfactory user experience and as a consequence on identifying techniques able to control packet losses and delay.

In addition, more than 30% of the current Internet users are mobile, that is, use wireless networks to access the Internet and its services. The usage of a wireless access technology increases the complexity in the management of delivery of data and multimedia flows, due to the time-varying performance of the wireless medium, handover management, and so forth.

While no solution for end-to-end quality of service (QoS) assurance over heterogeneous networks is available, still several approaches are available for improving data transfer performance on the wireless access trunk [1]. However, most of the available solutions are flow-based (i.e., intended to differentiate services based on flows) and furthermore need to introduce relevant modifications to the protocol stacks on the mobile node and wireless base stations—which reduces the possibility of deployment of such schemes.

In the specific framework of multimedia (e.g., voice and video), several works are available based on the Unequal Error Protection (UEP) paradigm [26]. The goal of UEP is to provide higher protection to the most perceptually relevant data, where protection can be achieved through means of adaptive power levels, forward error correction codes, retransmission control, and so forth. Nevertheless, since UEP is usually performed or managed at source level and thus without specific knowledge of the contingent operating scenario, such solutions (while increasing the complexity of multimedia codecs) can lead to nonoptimal performance due to waste of available capacity in case network/channel conditions are good (and no packet drops are experienced) or time-varying performance of the transport infrastructure (particularly true in the case of wireless networks).

The proposed scheme represents a novel paradigm of dynamic and “link-level” UEP, focused on the access network and the actual “reception history” at the receiver. The scenario is “triple-play” service delivery over 3G cellular networks, with specific focus on the wireless link between the Base Station and the User Terminal. The core idea is to adaptively tune the level of Hybrid ARQ (HARQ) protection based on the relative importance on the overall user experience of the packet being transmitted by the base station. The introduction of such term enables to differentiate protection on the basis of the actual content of the packet; for voice and video flows, the impact of losing the current packet (and the corresponding required level of protection) is estimated in terms of the potential decrease in audio or visual quality as measurable by Mean Opinion Score (MOS) or Peak Signal-to-Noise Ratio (PSNR), respectively. The authors agree that other automatic means for finer and more accurate evaluation of the multimedia quality are available in the literature (e.g., E-model for audio, V-model for video). However, the method is presented using PSNR for sake of simplicity (as the focus in on the overall approach and not on the specific building blocks), while the introduction of more sophisticated algorithms based on finer models or Rate-Distortion characteristic is possible due to the modular architecture of the proposed solution.

An important aspect to be underlined is that the above concept is applicable also in the case of data transfer. In this case, assuming data flows are transported by TCP (which is true for more than 80% of the Internet traffic), packet losses can have a different impact on the overall performance (in terms of time required to complete the delivery) due to the corresponding modifications of the congestion window evolution.

The proposed scheme (Service-Aware Retransmission Control—SARC) is inspired, first, by basic ideas of adjusting ARQ/HARQ over the lossy links for multimedia traffic [79], second, by multiple studies adapting TCP data transfer performance to wireless environment [10], and, finally, by cross-layer optimization in wireless and mobile networks [11, 12] However, one of the main design features which differentiates the proposed approach is the availability of the feedback channel between the MN where the decision on ARQ tuning is taken and the BS where it is actually implemented.

The structure of the paper is as follows: Section 2 describes in detail the proposed framework, while performance evaluation is presented in Section 3. Finally, Section 4 concludes the paper with final remarks and outlines about future work on the topic.

2. Proposed Approach

The main idea of the proposed approach, called Service-Aware Retransmission Control (SARC), is to allow the mobile terminal receiver to control the level of HARQ protection applied by the base station for every frame transmitted on the radio link. The decision of the mobile terminal is based on the potential benefit in correctly receiving the next packet given the current reception history and the actual perceptual relevance of the packet itself.

Automatic Repeat Request (ARQ) is an error detection mechanism used in UMTS, where the transmitter uses a stop-and-wait procedure, transmitting a data block and waiting for a response from the receiver before sending a new data block or retransmitting an incorrectly received data block. As an evolution of such approach, Hybrid ARQ (HARQ) scheme is used in High Speed Downlink Packet Access (HSDPA), an evolution of UMTS where incorrectly received data blocks are not discarded but stored and soft-combined with successive retransmissions of the same information bits.

3GPP specifications have defined two HARQ processes for HSDPA: Incremental Redundancy and Chase Combining [13]. In the former scheme, successive retransmissions of an incorrectly received data block are sent with additional redundancy—that is, increased with each consecutive retransmission. The retransmissions consist of redundant information in order to increase the chances of successful delivery. Since each transmitted block is not the same as the previous transmission, it is demodulated and stored at the receiver and subsequently soft-combined to reproduce the original data block [14]. In the chase combining strategy, an erroneously received data packet is stored and soft-combined with later retransmissions that are an exact copy of the original transmission.

HSDPA uses HARQ (Hybrid Automatic Repeat Request) retransmission mechanism with Stop and Wait (SAW) protocol. HARQ mechanism allows the User Equipment (UE) to rapidly request retransmission of erroneous transport blocks until they are successfully received. HARQ functionality is implemented at MAC-hs (Medium Access Control—high speed) layer, which is a new sublayer introduced in HSDPA. MAC-hs is terminated at node B, instead of RLC (Radio Link Control) which is terminated at RNC (Radio Network Controller). This enables a smaller retransmission delay ( 10 ms) for HSDPA rather than UMTS Rel. 99 (up to 100 ms).

In this paper, the level of HARQ protection (also indicated as “HARQ Strength” in the following) is considered in terms of the maximum number of retransmission attempts taken for a packet delivery in case of failure.

Figure 1 illustrates architectural principles of the proposed SARC approach. As outlined in the previous sections, SARC operates on the wireless 3G link. At the mobile terminal side, whenever a packet is received by the application, the latter can specify packet importance for subsequent incoming packets for a given flow.

The information about the importance of the next packet is then transferred into corresponding values of HARQ protection by the SARC module (implemented within the protocol stack of the mobile terminal) and delivered to the HARQ entity at the link layer of the Base Station using cross-layer signaling. At the link layer, the specified HARQ protection parameter is sent along with HARQ acknowledgement, which is generated for every frame received according to stop-and-wait HARQ type.

The SARC module implemented at the BS analyses incoming traffic (assuming to have access to TCP and IP protocol headers) and specifies the HARQ entity to use the requested HARQ protection on a per-packet basis.

2.1. Packet Importance Metric

The adaptation of the level of HARQ protection based on the content carried in the packet payload represents a solution belonging to the framework of cross-layer service-aware networking solutions, which optimize “pure” networking techniques based on services and their traffic demand.

In this paper, we address three different classes of service—voice, video, and data transfer—improving delivery performance by adapting network response to the relevance of packet being delivered over the radio link.

The level of HARQ protection in the proposed approach varies on the basis of a packet importance metric, which consists of two components.(i)Initial packet importance corresponds to the level of quality reduction for a given flow in case the packet is lost during transmission or corrupted at the receiver [15]. The quality of the flow is determined by end-to-end application requirements and user demands. For example, commonly used metric for VoIP is Mean Opinion Score (MOS), for video is Peak Signal-to-Noise Radio (PSNR), and for TCP-based data is transfer throughput level.(ii)Dynamic packet importance component accounts for the “reception history” of the flow and adjusts initial packet importance. For example, the importance of frame in a video sequence can be dynamically adjusted in case its decoding depends on the neighboring frames and and frame is not correctly received.

2.2. Packet Importance Metric in Video Streams

For sake of a clear explanation, we consider a scenario with a mobile node receiving MPEG-4 video flows from a streaming server located in the wired Wide-Area Network (WAN). However, similar reasoning can be applicable to H.263 and H.264 encoded video streams, as well as embedded video streams. The Base Station (BS) serves as a gateway between fixed and wireless network segments.

An MPEG-4 video is composed of Groups of Pictures (GOPs), consisting of video frames of three types. I-Frames (Intra coded frames) are encoded without reference to any other frame in the sequence, and are usually inserted every 12 to 15 frames as well as at the beginning of a sequence. Video decoding can start at an I-frame only. P-Frames (Predicted frames) are encoded as differences from the last I- or P-frame. The new P-frame is first predicted on the basis of the reference I- or P-frame through motion compensation and encoding of the prediction error. B-Frames (Bidirectional frames) are encoded as the difference from the previous or following I- or P-frames. B-frames use prediction as for P-frames but for each block either the previous or the following I- or P-frame is used.

Due to the correlation property of P- and B-frames, the effective impact deriving from the loss of an I-frame can be clearly considered much higher than that of P- or B-frame. In addition, the loss of one I- or P-packet may generate error propagation: while the loss of a B-frame does not affect the quality of the consecutive frames, the loss of an I-frame may disable correct decoding of subsequent P- and B- frames. This leads to the conclusion that I-frames are more important than P-frames, which are more important than B-frames.

To validate the above considerations, Figure 2 shows the quality reduction of a real video flow transmitted using VideoLan software [16] in terms of PSNR measured at the receiver versus the loss of different types of packets within a GOP. The horizontal scale indicates which frame within the GOP was lost, while the first value (obtained with no losses) serves as a reference point.

The highest loss in PSNR quality corresponds to the case when the I-frame is lost-making decoding of the entire GOP either not possible or prone to error propagation. On the other hand, the loss of any of B-frame does not degrade the quality by more than a minor fraction. However, the loss of a single P-frame has high influence on the video quality and the level of its degradation depends on the relative position of the lost P-frame within the transmitted GOP sequence: higher quality degradation is measured for P-frames losses located closer to the beginning of the GOP.

Following such observation, the importance of P-frames is defined ranging linearly from to , where is the importance level of I-frames and is the importance level of B-frames with . Indeed, the loss of P1 (which follows immediately after the reference I-frame) leads to almost the same drop in PSNR as the loss of the I-frame, while the loss of P9 which is transmitted right before the last pair of B-frames leads to PSNR loss comparable with those caused by B-frame losses. The bold line presented in Figure 2 is obtained by curve fitting with the first order polynomial RSM model for PSNR values achieved for different P-frames lost. The obtained R-square equal to 0.97 shows good match between the experimental data and the proposed linear model and, as a result, for the chosen P-frame packet importance.

2.3. Packet Importance Metric in VoIP Flows

A large variety of Voice-over-IP (VoIP) encoders are available, representing different tradeoffs between quality and bandwidth consumption. Encoders can be either sample based (e.g., G.711) or frame based (e.g., G.729), periodically coding individual speech samples or grouping a certain number of samples within a time window, respectively. VoIP speech payload is typically encapsulated into RTP/UDP/IP packets.

At the receiver side, speech frames are demultiplexed and inserted into a playout buffer. The playout buffer plays an important role in perceived speech quality, since it enforces speech frames delivery at the same interval at which they are generated by the encoder. This is done through reordering, delaying or even dropping the frames which arrive later than their expected playback time. However, whenever the frame is dropped, it causes a relevant decrease of the quality of the voice stream.

Based on the above, initially, equal packet importance (i.e., “initial packet importance”) is associated to all transmitted speech frames. However, in case the receiver detects frame losses after out-of-order frame reception, it increases importance (and error redundancy) for the subsequent packets of the stream (i.e., increases the “dynamic packet importance”). Summarizing, SARC aims at avoiding bulk frame losses, which are critical for the quality of the speech stream, while single frame losses can be easily compensated or concealed by the decoder.

2.4. Packet Importance Metric in File Transfer

TCP is the most widely used protocol in Internet and it provides a flow of equally-important packets for the user viewpoint. However, depending on the context (e.g., the evolution of the TCP congestion window), packet losses can severely decrease the data transfer performance.

The proposed SARC scheme dynamically adapts the level of HARQ protection used on the radio link based on the value of the TCP congestion window computed at the receiver node.

The core idea is to provide higher protection on the radio link (and more retransmission attempts) when congestion window is small and lower protection for high window values. Indeed, when congestion window is small, any link error will trigger window reduction to its half-unnecessarily reducing the throughput of the TCP flow. In the opposite case, the impact of link errors becomes less significant, since the window will be possibly reduced due to congestion-related losses.

Figure 3 presents congestion window evolution in TCP New Reno and the corresponding proposed variation of the packet importance metric . Specifically, the proposed approach assigns the highest importance (“High Imp”) to TCP segments produced right after each window reduction and decreases it down to the “Low Imp” threshold following linear or any other monotonically decreasing function and defined as follows: where

In summary, SARC provides higher protection for low congestion window values or flow sending rates. This reduces the probability of packet losses due to link errors on the wireless channel, which is a well-known reason for TCP performance degradation [17].

3. Performance Evaluation

3.1. Simulation Scenario

The proposed scheme is evaluated in the context of an UMTS/HSDPA cellular network. Network Simulator 2 (NS-2) [18] is used to perform experiments, with the additional Enhanced UMTS Radio Access Network Extensions (EURANE) module [19] for HSDPA implementation.

Figure 4 illustrates the reference scenario and the main parameters employed in the experiments. All considered flows originate from a server (the Fixed Host—FH) on the Internet and are delivered to the User Equipment (UE) located in a 3G cellular network. SARC approach is implemented between the Node-B and the UE.

3.2. Video Transfer Performance

In the first scenario, the FH is a video server which transmits video streams to the video receiver located at UE. Results are presented for the “Foreman” video sequence, using MPEG-4 (open-source ffmpeg [20]) video coding. The video format is Quarter Common Intermediate Format (QCIF, 176 * 144). The GOP structure is IBBPBBPBBPBBPB. Stored in YUV format, the video clip is processed by MPEG-4 encoder which generates the encoded video stream.

The Video Sender (VS) reads the encoded video stream and generates the trace file containing information related to frame type, size, and so forth, for each video frame. Based on this trace file, the NS-2 streaming server application generates the data—which is encapsulated at all the protocol layers and sent over the network.

The effect of streaming video over the network is captured in the streaming client log file generated by NS-2. It contains timestamp, size, and ID for each transmitted and received packet. The trace file and the log files are used by the Evaluate Trace (ET) program to generate an output corresponding to the result of video file transmission over the error prone network. In order to examine the video quality obtained at the receiver, the original video file and the one obtained after transmission over the network are compared using PSNR calculation module. The employed simulation methodology was proposed and developed within the framework of EvalVid [21], enhanced as in [22] for including NS-2.

The portion of the multimedia stream which is crucial to the overall quality is retransmitted by SARC with a higher HARQ strength; that is, packets belonging to an I-frame are retransmitted with HARQ Strength = 8, while packets belonging to B-frames are retransmitted with a HARQ strength = 2. P-frames are retransmitted with a variable HARQ strength ranging from 8 to 3 depending on the position of the frame in the GOP. Default value of HARQ strength is set to 4 for all packets in the legacy scenario (i.e., without SARC).

Achieved results are illustrated in Figure 5, where SARC increases the range of packet error tolerance to 10−2-10−1. The detailed behavior of the proposed scheme with respect to the legacy approach is described in Figures 6 and 7 where it is possible to clearly identify the unequal and dynamic protection implemented by SARC on I- and P-frames.

3.3. VoIP Transfer Performance

Experiments on VoIP flows are performed using the simulation model presented in [23]. The sender and the receiver side are separately modeled. The sender includes a customizable codec, which generates generic speech frames (the latter being either voice samples of voice frames, depending on the codec) and a multiplexer, which aggregates several speech frames into one payload. The most common codecs employed in network simulation (e.g., G.711, GSM.AMR) are supported by the VoIPSender, while others can be easily added.

Initially, an HARQ strength equal to 3 is associated to all transmitted speech frames. However, in case the receiver detects frame losses after out-of-order frame reception, it increases HARQ strength linearly for the subsequent packets of the stream (with HARQ strength max equal to 8) in order to avoid bulk frame losses. Once no loss is detected, SARC decreases the HARQ strength to the initial value.

Achieved results (Figures 7 and 8) demonstrate that SARC is able to provide a relevant improvement in terms of MOS both for G.711 and GSM AMR speech flows. In average, application of SARC scheme enables the codec to deliver the same speech quality for error rate of 5% higher if compared with the case when SARC is not enabled.

3.4. File Transfer Performance

In this scenario, a TCP sender located at FH connects to the receiver implemented at UE. For the entire duration of the flow, the receiver maintains up-to-date value of the congestion window (cwnd) computed by counting the number of packets received during the current RTT. Whenever the loss detection signal (three duplicate acknowledgements) is sent to the sender, packet importance is increased according to the function presented in Section 2—introducing higher HARQ protection and, as a result, producing higher resistance to the link errors.

Figure 9 presents TCP throughput achieved by the flows for different PERs of the wireless link. As expected, higher protection against the link errors for low congestion values of the congestion window brings evident performance improvement and underlines advantages of dynamic error protection techniques based on application awareness introduced by SARC.

Figure 10 analyzes a scenario where both video and data flows are delivered on the wireless link. In this scenario, two UEs are considered: one is receiving a video stream, while the other is receiving data via FTP. The same parameters are used as in the previous scenarios. It is possible to observe that while video performance remains as in Figure 5, data transfer is affected by relatively lower protection—while still achieving better results than in the legacy scenario (without SARC).

4. Conclusions and Future Work

This paper proposes a cross-layer approach between application/transport layers on a mobile terminal and link layer on the wireless base station to enable dynamic control on the level of per-packet HARQ protection. The level of protection is dynamically adapted on a per-packet basis and depends on the perceptual importance of different packets as well as on the reception history of the flow. Experimental results demonstrate the potential benefits deriving from the proposed strategy, underlining relevant improvements either for audio and video flows as well as for TCP-based data transfers.

Clearly, further improvement is possible on the building blocks of the proposed scheme, for example by introducing a more precise user quality assessment. However, such aspects are left out of the scope of the paper, as they are well reported and analyzed in the scientific literature.

Future work will be aimed at validating and optimizing the proposed scheme in the framework of embedded multimedia streams.