Abstract

As intelligent content management of IPTV moves popular material nearer to the end-user, application-layer channel coding schemes, involving the retransmission of extra redundant data, become attractive as a result of the reduced latency. Application-layer, adaptive rateless channel coding is exploited in this paper's scheme to reconstruct streamed video across an IEEE 802.16e (mobile WiMAX) channel. The paper concentrates on the trade-offs in implementing the scheme, showing that exact calculation of the redundant data has the potential to reduce the forward error correction bit-rate overhead. To reduce delay, an appropriate compression rate should also be selected.

1. Introduction

The BBC’s iPlayer in the UK [1] has demonstrated the demand for Internet Protocol TV (IPTV) value-added video streaming in the form of content-on-demand and time-shifted TV. However, this service is primarily aimed at asymmetric digital subscriber line (ADSL) receivers and may be ill adapted to mobile wireless broadband delivery. In such broadband wireless networks, including IEEE 802.16e (mobile WiMAX) [2], adverse channel conditions are a concern for video streaming and will become more so as the transition to higher data-rate IEEE 802.16m (WiMAX 2) [3] occurs. The IEEE 802.16e standard provides Turbo coding and hybrid ARQ at the Physical layer with scalable transmission bursts depending on radio frequency (RF) conditions. However, application-layer forward error correction (AL-FEC) [4] is still recommended for IPTV during severe error conditions. This paper demonstrates packet-by-packet adaptive rateless channel coding to guard against burst errors possibly caused by slow and fast fading on a WiMAX channel.

The prior IPTV content delivery network (CDN) scheme discussed in [5] was end to end, providing adaptation through a form of FEC simulcast. For severe conditions, it relies on the lower overhead and linear decoding complexity that one form of rateless coding, Raptor codes [6], provides. However, it now seems likely [7] that intelligent content management will result in local caching of frequently requested content. This development enables packet-by-packet adaptive rateless coding, depending on local measurements of channel conditions. Given the reduced latency, it may also be possible to include limited retransmission of extra redundant data, made feasible through rateless channel coding.

The iPlayer, as mentioned above, is a simple simulcast service with H.264/AVC (Advanced Video Coding) codec compression [8] and streaming rates available at 500 kbps, 800 kbps, and 1500 kbps, which once selected are fixed to the capacity of the access network. Other related schemes (though not IPTV) such as Google’s Video Talk similarly keep the compression ratio fixed but alter the screen resolution on request by the user. As the iPlayer depends on Adobe Flash Player technology, files are delivered by TCP transport, as this protocol underlies HTTP. TCP is unsuitable for real-time services over wireless because of misinterpretation of channel packet losses as packet drops through congestion. The delays introduced may well be compounded by the progressive download employed by Adobe Flash Player, which, according to [9], when used for YouTube clips may result in the cancellation of up to 10% of downloads. The need to reduce start-up delay may also lead to reductions in quality, as the initial download block must be compressed to a suitable size.

Rateless coding has been utilized [10] for 3GPP’s Release 6 Multimedia Broadcast/Multicast Service (MBMS). Unfortunately, there is no direct feedback control channel [11] in an UMTS wireless access network, and, hence, temporal scalability is employed in [10] in conjunction with Raptor coding without packet-by-packet adaptation. Notice that in the current paper data-partitioned source coding [12] is employed as a means of providing graceful degradation of video quality according to RF channel conditions. Detailed discussion of data partitioning is postponed to Section 2. Another approach [13] is to use a scalable variety of rateless coding to provide unequal protection of data-partitioned video data. Growth codes can superimpose additional redundant data for the more important partition-A and -B data. However, that rateless coding scheme [13] is not adaptive either. The work in [14] explored the possibility of multiple sources generating Raptor code independently of each other in order to protect layers within scalable video coding. The paper investigated the coordination of the scheme to achieve rate-distortion optimization. In fact, perhaps the nearest scheme to ours provided for Internet video streaming [15]. However, it is for single-layer delivery and accepts long latencies.

In a mobile WiMAX channel, the packet size critically affects the risk of packet corruption. If it is possible to estimate the channel conditions then the amount of redundant data can be set accordingly, thus controlling the packet size. If H.264/AVC data partitioning [16] is employed and if the quantization parameter is selected appropriately, then the packet size decreases as the compressed data priority increases. Higher-priority packets are sufficient to partially reconstruct a video frame. Reducing the size of the redundant component of a packet rather than employing a fixed ratio of redundant data becomes particularly appropriate in this type of streaming. The WiMAX standard already specifies that a station should provide channel measurements that can form a basis for channel quality estimates. These are either received signal strength indicators or may be carrier-to-noise-and-interference ratio measurements made over modulated carrier preambles.

In a realistic adaptive scheme, perfect channel knowledge cannot be assumed, as a channel estimate will be affected by measurement noise. If it is not possible to reconstruct a packet with the amount of redundant data available, then in the proposed scheme, a single automatic repeat request (ARQ) is permitted (to avoid excessive delay to the video-rate application by more ARQs), again allowing the rateless features of the channel code to be exploited. There is, however, a danger that if the number of packets corrupted during transmission increases, then the overall delay will increase significantly. Two variants of an adaptive rateless scheme are introduced in this paper. In the first, additional redundant data is adjusted up to the amount needed to prevent the retransmission of redundant data (assuming perfect channel knowledge). This comes at a cost in increased overhead but reduces overall delay. The second variant dynamically calculates the amount of adaptive redundant data required to match the probability of error. Though it reduces the FEC overhead, it may introduce extra delay. However, delay can also be adjusted by varying the video quality and, hence, the packet sizes.

If due to traffic congestion buffer overflow occurs, packets will be dropped outright. Should this occur, then rateless coding is powerless to protect. However, packet duplication may be applied. Without data partitioning, it is necessary to completely duplicate slices. However, with data partitioning, it is possible to only duplicate part of the bit-stream, namely, those packets carrying the most important packets (header and motion information). Though the main theme of the paper is the role of the adaptive scheme to protect against packet corruption, the protection of packets through duplication is also assessed.

Section 2 introduces background material necessary for a fuller understanding of this paper. Section 3 then describes the adaptive rateless scheme. Section 4 describes the experimental methodology employed to derive the results evaluated in Section 5. Section 6 concludes this paper.

2. Background

This section principally introduces rateless channel coding and data partitioning. Some results are also given to explain the interest in data partitioning.

2.1. Rateless Codes

Rateless or fountain coding [17], of which Raptor coding [6] is a subset, is ideally suited to a binary erasure channel in which either the error-correcting code works or the channel decoder fails and reports that it has failed. In erasure coding, all is not lost as flawed data symbols may be reconstructed from a set of successfully received symbols (if sufficient of these symbols are successfully received). A fixed-rate (𝑛,𝑘) Reed-Solomon (RS) erasure code over an alphabet of size 𝑞=2𝐿 has the property that if any 𝑘 out of the 𝑛 symbols transmitted are received successfully, then the original 𝑘 symbols can be decoded. However, in practice not only must 𝑛,𝑘, and 𝑞 be small but also the computational complexity of the decoder is of order 𝑛(𝑛𝑘)log2𝑛. The erasure rate must also be estimated in advance.

The class of fountain codes [17] allows a continual stream of additional symbols to be generated in the event that the original symbols could not be decoded. It is the ability to easily generate new symbols that makes fountain codes rateless. Decoding will succeed with small probability of failure if any of 𝑘(1+𝜀) symbols are successfully received. In its simplest form, the symbols are combined in an exclusive OR (XOR) operation according to the order specified by a random, low density generator matrix, and, in this case, the probability of decoder failure is 𝜕=2𝑘𝜀, which for large 𝑘 approaches the Shannon limit. The random sequence must be known to the receiver, but this is easily achieved through knowledge of the sequence seed.

Luby transform (LT) codes [18] reduce the complexity of decoding a simple fountain code (which is of order 𝑘3) by means of an iterative decoding procedure. The “belief propagation” decoding relies on the column entries of the generator matrix being selected from a robust Soliton distribution. In the LT generator matrix case, the expected number of degree one combinations (no XORing of symbols) is 𝑘𝑆=𝑐ln(𝑘/𝜕), for small constant 𝑐 (see later discussion). Setting 𝜀=2ln(𝑆/𝜕)𝑆 ensures that by sending 𝑘(1+𝜀) symbols these symbols are decoded with probability (1𝜕) and decoding complexity of order 𝑘ln𝑘.

Encoding of the LT in the form used in this paper is accomplished as follows. For coded symbol 𝑖, choose degree 𝑑𝑖 randomly from some distribution of degrees, that is, a probability distribution 𝜌(𝑑𝑖)=Pr[degree𝑑𝑖], where Pr is the probability of a given event. Then randomly select from amongst the 𝑘 information symbols a set 𝑅𝑖 of size 𝑑𝑖 symbols. The symbols of set 𝑅𝑖 are then XORed together to produce a new composite symbol, which forms one symbol of a transmitted packet. Thus, if the symbols are bytes, all of a byte’s bits are XORed in turn with the corresponding bits of the other randomly selected bytes. It is not necessary to specify the random degree or the random symbols chosen if it is assumed that the (pseudo-)random number generators of sender and receiver are synchronized, as mentioned above.

Symbols are processed at the decoder as follows. If a symbol arrives with degree greater than one, it is buffered. If a clean symbol arrives with degree one, then it is XORed with all symbols in which it was used in the encoding process. This reduces the degree of each of the symbols to which the degree one symbol is applied. When a degree-two symbol is eventually reduced to degree-one, it too can be used in the decoding process. Notice that a degree-one symbol is a symbol for which no XORing has taken place. Notice also that for packet erasure channels, a clean degree-one symbol (a packet) is easily established as such.

The ideal Soliton distribution [17] is a first stage in finding an appropriate degree distribution. It is defined as 1𝜌(1)=𝑛,1𝜌(𝑑)=𝑑(𝑑1),𝑑={2,3,,𝑘},(1) where 𝑘 is the number of source symbols.

As already mentioned, in practice, the robust Soliton distribution [18] is employed as this produces degree-one symbols at a more convenient rate for decoding. It also avoids isolated symbols that are not used elsewhere. Two tuneable parameters 𝑐 and 𝛿 are used to form the expected number of useable degree one symbols. We have already given the following equation in the prior discussion:𝑘𝑠=𝑐ln𝛿𝐾,(2) where 𝑐 is a constant close to 1, and 𝛿 is a bound on the probability that decoding fails to complete. Now define 𝑠𝜏(𝑑)=𝑘1𝑑𝑘for𝑑=1,2,,𝑠=𝑠1𝑘𝑠ln𝛿𝑘for𝑑=𝑠𝑘=0for𝑑>𝑠(3) as an auxiliary positive-valued function to give the robust Soliton distribution:𝜇(𝑑)=𝜌(𝑑)+𝜏(𝑑)𝑧,(4) where 𝑧 normalizes the probability distribution to unity and is given by 𝑧=𝑑(𝜌(𝑑)+𝜏(𝑑)).(5)

The essential differences between fountain erasure codes and RS erasure codes are that fountain codes in general (not Raptor codes [6]) are not systematic; even if there were no channel errors, there is a very small probability that the decoding will fail. In compensation, they are completely flexible, and have linear decode computational complexity, and generally their overhead is considerably reduced compared to fixed erasure codes. Apart from the startling reduction in computational complexity, a Raptor code has the maximum distance separable property, that is, the source packets can be reconstructed with high probability from any set of 𝑘 or just slightly more than 𝑘 received symbols. A further advantage of Raptor coding is that it does not share the high error floors on a binary erasure channel [19] of prior rateless codes.

2.2. Data Partitioning

In an H.264/AVC codec [8], the network abstraction layer (NAL) facilitates the delivery of the video coding layer (VCL) data to the underlying protocols such as RTP/UDP/IP, H.32X, and MPEG-2 transportation systems. Compressed data are grouped by the codec into self-contained subvideo frame units called slices. Normally a slice is encapsulated by the codec into a NAL unit (NALU). Each NALU can be considered as a packet that contains an integer number of bytes including a header and a payload. The header specifies the NALU type and the payload contains the related data. When data partitioning is enabled, every slice is further divided into three separate partitions, and each partition is located in either of type 2 to type-4 NALUs, as listed in Table 1. A NALU of type 2, also known as partition A, comprises the most important information of the compressed video bit stream of P- and B-pictures, that is, the MB addresses, motion vectors (MVs), and essential headers. If any MBs in these pictures are intra-coded, their transform coefficients are packed into a type-3 NALU, also known as partition B. Type 4 NAL, also known as partition C, carries the transform coefficients of the motion-compensated interpicture coded MBs. Partitions A and B of data-partitioned P- and B-slices are small for broadcast quality video, but their C-type partitions can be very long.

Data partitioning is a form of source-coded error resilience [20]. Combining error resilience with error control involves additional data overhead. However, Figure 1 shows that, of four common error resilience tools in H.264/AVC, data partitioning has the least overhead. The illustration is for the well-known Foreman clip, created by the jerky motion of a hand-held camera with a rapid pan towards the end of the sequence. In Figure 1, the horizontal axis represents the mean bitstream rate arrived at by setting the QP to the given value, while the vertical axis represents the mean overhead rate with that QP. As the quality decreases (higher QP), the advantage of data partitioning increases as the relative overhead of all schemes increases. Tests of the Akiyo, Coastguard, and Mobile sequences showed that the overhead is not strongly dependent on coding complexity, with the size of overhead ordering between the error resilience schemes preserved.

The relative mean sizes (across all frames in the sequence) of the data partitions for a sequence with higher spatial coding complexity, Paris, and one with high temporal coding complexity, Stefan, were examined. The results for these sequences are reported in Table 2 according to video quality given by the QP setting. Both sequences with 4:2:0 sampling were variable bitrate (VBR) encoded at common intermediate format (CIF) (352×288 pixel/frame), with a group of picture (GoP) structure of IPPP… at 30 frame/s. Experiments not shown indicate that including B-pictures, with a GoP structure of IPBP... (sending order) and an intrarefresh rate of 15, did not noticeably change the distribution of partition sizes.

2.3. Intrarefresh Macroblocks

The insertion of intra-refresh (IR) MBs into pictures [21] normally encoded through motion-compensated prediction allows temporal error propagation to be arrested if matching MBs in a previous picture are lost. In the H.264/AVC JM implementation, various IR schemes exist such as random, which sets a maximum percentage of MBs, or cyclic, which replaces each line of the picture in turn in cyclic order. Notice that naturally encoded IR MBs are also inserted into predictively coded p-pictures when intercoding brings limited or no advantage. (e.g., this may occur during rapid motion or when a new object that is not present in a previous picture is uncovered.) The inclusion of IR MBs does lead to some increase in the size of partition B bearing packets as shown in Table 3 for different QPs and percentages of IR MBs. The sequence is Football with high temporal coding complexity, encoded with the same configuration as in Section 2.2. It is also possible to vary the IR rate according to scene type or channel conditions [22].

2.4. Slice Duplication

In the extended protection scheme, one of the data partitions is duplicated so that dropped packets can be replaced. Rather than duplicate or copy, it is possible to send reduced-quality versions of a slice, which in H.264/AVC is called a redundant slice [23]. However, employing a coarser quantization than the main stream can lead to drift between the encoder and decoder, as the encoder never knows which version of the slice has been decoded. Besides, replacing one partition with a redundant slice with a different QP to the other partitions would not even permit reconstruction.

A possibility [23] is to use correctly received reference pictures for the reconstruction of redundant pictures rather than the reference pictures used by primary pictures. The decoder is able to select from a set of potential replacement redundant pictures according to the possibility of correct reconstruction. Alternatively, in [24], MBs were selected for their relative impact on reconstruction and placed within FMO slices, at some increase in computational complexity.

3. Adaptive Scheme

In the proposed adaptive scheme, the probability of channel loss (PL) serves to predict the amount of redundant data to be added to the payload. Assume that “bursty” error conditions are generated by the widely used Gilbert-Elliott model [25, 26], which is a form of hidden Markov model with a good and bad state. In the Gilbert-Elliott model, Figure 2, 𝑝gb is probability of the transition from the good state to the bad state, and 𝑝bg is the probability of transition from the bad state to the good state. Then 𝑝gg and 𝑝bb are the probabilities of staying in the good state and the bad state respectively𝑝gg=1𝑝gb,𝑝bb=1𝑝bg.(6) Steady-state probabilities for staying in the good and bad state are defined as 𝜋𝐺 and 𝜋𝐵 and are found to be𝜋𝐺=𝑝bg𝑝bg+𝑝gb,𝜋𝐵=𝑝gb𝑝bg+𝑝gb.(7) Consequently, the mean probability of loss is given byPLmean=PG𝜋𝐺+PB𝜋𝐵,(8) where PG and PB are the probabilities of loss in the good and bad states, respectively. The instantaneous PL (taken from a distribution with mean PLmean) is used to calculate the amount of redundant data adaptively added to the payload.

If the original packet length is 𝐿, then the redundant data is given simply by𝑅=𝐿×PL×𝐴.(9) However, the factor 𝐴 must accommodate all values of PL for a particular value of PB. Subsequent tests reported in Section 4 showed that factor 𝐴 can be dispensed with in favor of a dynamically determined value for the redundant data𝑅=𝐿×PL+𝐿×PL2+𝐿×PL3=𝐿1PL𝐿.(10) with successively smaller additions of redundant data, based on taking the previous amount and multiplying by PL.

Assuming perfect channel knowledge of the channel conditions when the original packet was transmitted establishes an upper bound, beyond which the performance of the adaptive scheme cannot improve. However, we have included measurement noise to test the robustness of the scheme. Measurement noise was modelled as a zero-mean (truncated) Gaussian (normal) distribution and added to the packet loss probability estimate.

If despite the redundant data the packet’s payload still cannot be reconstructed, then extra redundant data are piggybacked onto the next packet. For example, in order to model Raptor coding, the following statistical model [27] was employed:𝑃𝑓(𝑚,𝑘)=1if𝑚<𝑘,=0.85×0.567𝑚𝑘if𝑚𝑘,(11) where 𝑃𝑓(𝑚,𝑘) is the failure probability of the code with 𝑘 source symbols if 𝑚 symbols have been received. Notice that the authors of [27] remark and show that for 𝑘>200, the model almost perfectly models the performance of the code.

As previously remarked, the rateless channel code symbol size was set to a byte. Clearly, if the symbol size is a packet and instead 200 packets are accumulated before the rateless decoder can be applied (or at least (11) is relevant), there is a penalty in start-up delay for the video stream and a cost in providing sufficient buffering at the mobile stations.

4. Methodology

This section presents technical details of the evaluation methodology.

4.1. Detecting Errors

Rateless code decoding is reliant upon the identification of clean symbols. This latter function is performed by PHY-layer FEC which passes up correctly received blocks of data (through a cyclic redundancy check) but suppresses erroneous data. For example, in IEEE 802.16e [2], a binary, nonrecursive convolutional encoder with a constraint length of 7 and a native rate of 1/2 operates at the physical layer. Upon receipt of the correctly received data, decoding of the information symbols is attempted, which will fail with a probability given by (11) for 𝑘>200. This implies from (11) that if less than 𝑘 clean symbols (bytes) in the payload are successfully received, then 𝑘𝑚+𝑒, 𝑒>0 redundant bytes can be sent to reduce the risk of failure. In tests, 𝑒=4, resulting in an 8.7% risk of failure because of the exponential decay of the risk evident from (11). The extra data are additional data over and above the adaptively estimated redundant data originally added to the packet.

To reduce the network path latency, the number of retransmissions, after an ARQ over the uplink, was limited to one. Recall that there are strict display deadlines for video decoding. Figure 3 shows how ARQ triggered retransmissions work. In this figure, the payload of packet 𝑥 is corrupted to such an extent that it cannot be reconstructed. Therefore, in packet 𝑥+1, some extra redundant data are included up to the level that its failure is no longer certain. If the extra redundant data are insufficient to reconstruct the original packet’s payload, the packet is simply dropped to avoid further delay. Otherwise, of course, the payload is passed to the H.264/AVC decoder.

In the extended protection scheme with duplication, both original and duplicate packets were protected by rateless channel coding. However, if both packets are found to be corrupted, the receiver decides to request the retransmission of piggybacked redundant data for the least corrupted packet. Piggybacking only takes place for the original stream packets.

4.2. WiMAX Simulation Configuration

To establish the behavior of rateless coding under WiMAX, the ns-2 simulator augmented with a module [28] that has proved an effective way of modeling IEEE 802.16e’s behavior. For the Gilbert-Elliott model parameters, 𝑝gg (probability of remaining in a good state) was set to 0.95, 𝑝bb=0.96, PG=0.02, and PB was made variable taking values 0.05, 0.10, 0.15, and 0.165. These values were not chosen because they represented the underlying physical characteristics of a particular channel but because they represent error statistics [26] seen by an application. Burst errors can be particularly damaging to compressed video streams, because of the predictive nature of source coding. Therefore, the impact of “bursty” errors [29] should be assessed in video-streaming applications. In this case, we were interested in settings that implied high levels of packet corruption and, hence, the risk of significant delay to a video-streaming application.

The PHY-layer settings selected for WiMAX simulation are given in Table 4. The antenna heights and transmit power levels are typical ones taken from the standard [30]. The antenna is modeled for comparison purposes as a half-wavelength dipole, whereas a sectored set of antenna on a mast might be used in practice to achieve directivity and, hence, better performance. Similarly, multiple-input multiple-output (MIMO) antennas were not modeled. The IEEE 802.16e Time Division Duplex (TDD) frame length was set to 5 ms, as only this value is supported [31] in the WiMAX forum adaptation of the standard. The data rate results from the use of one of the mandatory coding modes [30] for a TDD downlink/uplink subframe ratio of 3 : 1. The BS was assigned more bandwidth capacity than the uplink to allow the WiMAX BS to respond if necessary to multiple mobile devices. Thus, the parameter settings in Table 4 such as the modulation type and PHY coding rate are required to achieve a data rate of 10.67 Mbps over the downlink.

4.3. Congestion Sources and Packet Losses

Video was transmitted over the downlink with UDP transport. In order to introduce sources of traffic congestion, an always available FTP source was introduced with TCP transport to a second mobile station (MS). Likewise, a constant bit-rate (CBR) source with packet size of 1000 B and interpacket gap of 0.03 s was also downloaded to a third MS. WiMAX supports a set of quality-of-service queues. While the CBR and FTP traffic occupies the WiMAX non-rtPS (non-real-time polling service) queue, rather than the rtPS queue, they still contribute to packet drops in the rtPS queue for the video, if the packet rtPS buffer is already full or nearly full, while the nrtPS queue is being serviced.

However, in the first set of tests, in which VBR video was used, buffers were set to such a value that no drops through buffer overflow occurred. In the second set of tests, in which CBR video was used, PB was increased to 0.165. In addition, packet loss through buffer overflow was now introduced as buffer sizes were set to 50 packets (a single Medium Access Control (MAC) Service Data Unit within a WiMAX MAC Protocol Data Unit). This buffer size was selected as appropriate to mobile, real-time applications for which larger buffer sizes might lead both to increased delay and larger memory energy consumption in mobile devices. As a point of comparison, capacity studies [31] suggest up to 16 mobile TV users per mobile WiMAX cell in a “lossy” channel, depending on factors such as the form of scheduling and whether MIMO is activated.

4.4. Video Configuration

The JM 14.2 version of the H.264/AVC codec software was employed with the EvalVid environment [32] used to reconstruct sequences according to reported packet loss from the simulator and assess the objective video quality (PSNR) relative to the input YUV raw video. The reference Football video sequence was employed for the WiMAX downlink tests. Football was VBR encoded with 4:2:0 chroma subsampling in CIF at 30 frame/s. Football’s rapid motion is a cause of its coding complexity, making it a difficult test of the system. A frame structure of IPPP.... was employed to avoid the data-rate increases and delay associated with periodic I-frames. With all predictive P-frames except the initial frame, it was necessary to protect against temporal error propagation in the event of P-frame slices being lost. To ensure higher-quality video, up to 5% intra-refresh macroblocks (MBs) (randomly placed) were included in each frame (apart for the first I-frame) to act as anchor points in the event of slice loss.

5. Evaluation

This section considers the adaptive rateless coding scheme next, followed by the extended version of the protection scheme with duplicated packets.

5.1. Adaptive Scheme Recovery from Corrupted Packets

Initial empirical investigation showed that there was insufficient provision, unless approximately 10% extra data were added over and above that allowed for by a direct use of the instantaneous value of PL to estimate the redundant data. Moreover, this adjustment varies according to channel conditions though the change monotonically increases according to PL. For example, Table 5 shows some sample values of factor A from (9) found during these investigations.

There are two potential gains from applying the adaptive scheme from formula (10), rather than a fixed factor A according to Table 5. The first is that Table 5’s values can only be arrived at through a considerable number of tests for a particular scenario. The second gain is that Table 5’s values may overcompensate with redundant data and consequently require extra bandwidth.

Corrupted packets are those that are received but affected by Gilbert-Elliott channel noise to such an extent that they cannot be reconstructed without additional piggybacked redundant data. In general, because extra redundant data is retransmitted, it is likely that most packets will be repaired. However, a rising percentage of corrupted packets will result in increased delay. This delay will affect interactive applications. In Figure 4, the percentage of corrupted packets is recorded according to a variation in the value of QP and increasing probability of data loss in the bad state, PB. Recall from Section 2.2 that varying QP changes the ratio between the data-partitioning packet sizes as well as changing the overall size of the compressed data for any one picture. The conditions for transmission (refer to Section 4.3) for this set of tests included large enough buffers to avoid buffer overflow and reduced risk of outright packet loss from channel conditions. These conditions are unlikely to be met in practice, and, thus, Figure 4’s results represent an upper bound to performance. For the response when there is outright packet loss refer to Section 5.2.

In Figure 4(a), when PB=0.05, assuming no measurement noise results in zero packet loss whether a fixed or variable factor is used in the estimation of additional data. (Zero loss is represented by a flat bar in Figure 4(a).) This is because the exact fixed factor for this PB has been selected. For the same reason, when various percentages of Gaussian measurement noise (GN) are added, the percentage of corrupted packets is better when the fixed factor is used. For both schemes, when video quality is reduced, then the percentage of corrupted packets also reduces, as packet lengths reduce. However, when the Gilbert-Elliott PB value is increased but the fixed factor is not changed, then the percentage of corrupted packets increases, as shown for PB=0.10,0.15 in Figures 4(b) and 4(c). Thus, in Figures 4(b) and 4(c), the adaptive scheme is now superior.

An interesting feature of these results is that adding measurement noise to the estimate of packet loss, PL, may actually cause less corrupted packets to occur (as more redundant data may be added). Keeping the fixed factor constant in Figures 4(b) and 4(c) represents the situation when a misestimate of this factor has occurred. From Table 5, it can be seen that the misestimate may only need to be by a small percentage before weaker performance results than the variable factor scheme.

Table 6 shows the resulting objective video quality and the packet delay, for the packets that did not receive retransmitted data before they were passed to the decoder and those that did (corrupted packets). Though Table 6 is for PB=0.05, variable factor-adaptive scheme with 2% additive Gaussian measurement noise, other results were very similar. Because no packets are dropped outright, the video quality is high, as it only represents compression distortion. The real impact arises from the end-to-end delay introduced by the need to retransmit extra redundant data, because the proportion of such packets can considerably prolong the streaming period. The time taken to send packets is also dependent on the packet sizes, which reflect the QP setting for the VBR video.

5.2. Adaptive Scheme Recovery from Corrupted Packets and Packet Drops

Though research studies often assume VBR transmission, as this represents a way to maintain consistent video quality, in practice a decision is often made by broadcasters to use CBR transmission. CBR streaming has the advantage that storage and/or bandwidth requirements can be reserved in advance. This section tests CBR transmission at two different rates, 500 kbps and 1 Mbps. As mentioned in Section 4.3, the buffer sizes were set to a lower but realistic size introducing a risk of outright packet loss from buffer overflow.

To compensate for the risk of losing vital information in partition A, a duplicate packet carrying partition A was introduced in some of the tests. This is compared in Table 7 for Football with complete duplication of a slice. For example, at 1 Mbps, duplication of partition A adds an extra 37% to the data rate, while duplicating a complete slice obviously adds an extra 100% to the data rate. Because Football is relatively complex to code, its partition-A contribution is smaller, but conversely the impact of packet corruption is higher, because of predictive dependency in the video-encoding process.

Figure 5 shows the number of packets dropped outright when streaming Football with and without data partitioning. In the case of duplication, a packet is lost when no duplicate is available. For data-partitioning streaming, obviously more packets are sent. Duplication in all cases reduces the packet drop rate (because missing packets can be replaced with their duplicates) and in the case of streaming at 1 Mbps without data partitioning, the reduction is comparatively large. The gain for the scheme without data partitioning is relatively larger, though for generally fewer and larger packets. From Figure 6, the luminance PSNR is better with duplication than without, and greater gain results from using duplication at the higher data rate. Video quality at 25–31 dB is approximately equivalent to an ITU-T recommendation P.910 “fair” rating, whereas above 31 dB, it is “good.” Thus, duplication is needed to pass these quality thresholds. However, the main point is that data partitioning results in equivalent objective video quality to slice duplication at a higher data rate.

However, this gain is only achieved with adaptive Raptor channel coding protection, as is apparent for the high levels of packet corruption in Figure 7, as previously reported in Section 5.1. In all but one case, the percentage of packets corrupted is larger for the non-data partitioning scheme though the number of packets sent is larger when employing data-partitioning. As also previously observed, the main impact of packet corruption is delay from retransmission. The non-data-partitioned scheme incurred greater delay (a mean of 4 to 5 ms for 500 kbps and 1 Mbps with duplication compared to 2 ms for the equivalent data-partitioned packets), as well as a greater risk of corruption. These delays are generally lower than streaming with VBR video, principally because of the more regular sizes of CBR packets but also because of the reduced number of packets after packet losses contributing to the mean delay. Though the delays are generally small, there is the possibility of accumulated delay leading to missed display deadlines for long video streams.

6. Conclusion

Two adaptive rateless channel coding schemes were presented. To reduce the number of corrupted packets, it is possible to empirically estimate the redundant data quantity that will minimize the corrupted packet number (assuming some measurement noise in estimating the packet loss rate). However, this estimate must be made before transmission begins and must be made for each possible channel condition. The paper shows that in practice a dynamically calculated redundant data overhead can be effective. The scheme will also reduce the FEC overhead. Whichever scheme reduces, the number of corrupted packets reduces the overall delay introduced into the video stream. However, if packets are simply not received then the adaptive rateless coding scheme cannot help. However, by duplicating the more important partition A packets, it is shown that video quality remains acceptable with a moderate increase in the data rate. As CDNs have brought data closer to the end user, latency is likely to be reduced. Consequently, the protection scheme presented represents a way forward for CDN video streaming.