Abstract

Systematic lossy error protection (SLEP) is a robust error resilient mechanism based on principles of Wyner-Ziv (WZ) coding for video transmission over error-prone networks. In an SLEP scheme, the video bitstream is separated into two parts: a systematic part consisting of a video sequence transmitted without channel coding, and additional information consisting of a WZ supplementary stream. This paper presents an adaptive SLEP scheme in which the WZ stream is obtained by frequency filtering in the transform domain. Additionally, error resilience varies adaptively depending on the characteristics of compressed video. We show that the proposed SLEP architecture achieves graceful degradation of reconstructed video quality in the presence of increasing transmission errors. Moreover, it provides good performances in terms of error protection as well as reconstructed video quality if compared to solutions based on coarser quantization, while offering an interesting embedded scheme to apply digital video format conversion.

1. Introduction

Over the last few years, the Wyner-Ziv coding theorem [1] has found several applications in digital video coding and transmission [25]. Among these applications, error resilience properties of Wyner-Ziv (WZ) coding are here considered to strengthen the robustness of transmitted video bitstreams against channel distortions. Many application scenarios are concerned, including broadcast TV or video transmission over mobile networks. In [4], Rane et al. proposed a systematic lossy error protection (SLEP) scheme, in which a supplementary bitstream is generated using WZ coding and transmitted jointly with the unprotected MPEG-2 compressed bitstream. The so-called WZ stream is obtained by coarsely quantization and entropy coding of the main MPEG video stream. After transmission over error prone packet networks, the WZ stream is used to replace the lost data from the main stream, leading to a graceful degradation of reconstructed video quality with worsening error conditions.

In this paper, we present a new SLEP scheme which is based on the solution proposed in [4]. MPEG-2 video compression is considered in this work because of its widespread use in digital video broadcasting [5]. Regarding the scheme described in [4], however, many techniques have been added or improved within the present work to enhance the performances of the SLEP architecture. First, the supplementary WZ bitstream is generated using frequency filtering [6], instead of coarser quantization. This modification gives good performances in terms of error protection as well as reconstructed video quality compared to coarse quantization. Moreover, frequency filtering can be combined conveniently with decimation to perform video format conversion easily, which constitutes a great advantage. Finally, the proposed SLEP scheme is adaptive, so that error resilience varies according to picture encoding mode as well as motion properties of the video scene.

The remainder of the paper is organized as follows: first, we remind the systematic lossy source channel coding framework proposed in [4] for error resilient MPEG-2 broadcasting. Then, we detail our modified SLEP scheme based on combined frequency filtering and unequal picture protection, and demonstrate its advantages over the one based on coarser quantization. In Section 3, we give experimental results that illustrate the performances of the proposed algorithm, and then propose a hybrid SLEP scheme, which switches adaptively between spatial error concealment and WZ decoding, based on motion detection. Finally, concluding remarks are given in Section 4.

2. The Proposed Slep Scheme

In this section, we describe the proposed SLEP scheme which is based on frequency filtering and unequal picture protection. The block diagram of the systematic lossy source-channel coding framework proposed in [4] for error resilient MPEG-2 broadcasting is shown in Figure 1.

The input video signal is first encoded by means of MPEG-2, and the resulting bit stream is transmitted over the error-prone packet network without error protection. In addition, a supplementary bit stream is generated using WZ encoding. First, a coarsely quantized version is generated from the main MPEG bit stream and entropy coded. As the entropy-encoded slices are of variable length, shorter slices are filled with zero bytes in order to adjust the slices to the same size. Then, systematic Reed-Solomon (RS) codes are applied across the slices of the resulting data stream, after zero filling, as illustrated in Figure 2. Only the generated parity slices which constitute the so-called WZ stream are transmitted to the decoder. If packet losses occur, the WZ decoder uses both parity packets and the error-prone decoded MPEG video sequence as side information in order to obtain the error-free WZ description. Since the location of the lost slices is known, the RS decoder can perform erasure decoding across the error-prone slices. Therefore, the erroneous slices can be substituted with the corresponding correct but coarser versions, leading to a reconstructed video sequence of better visual quality. If RS error correction capacity is overcame, spatial error concealment is applied using the previously decoded frame.

This transmission scheme is fully compatible with actual digital video coding standards. It is more resilient to channel losses [4, 5] while adding negligible complexity increase with respect to conventional FEC systems. Indeed, it only requires requantization of DCT coefficients, the coding parameters (motion vectors, mode decisions) from the main compressed version being reused. Such requantization process is simple to implement in the SLEP scheme. We note that FEC constitutes a special case of the SLEP scheme, for which the quantization step is the same for both MPEG and Wyner-Ziv video encoders.

We proposed in a previous work [6] to replace coarse quantization in the SLEP scheme, described in Figure 1, with frequency filtering (also called hereafter frequency scalability). Doing so, only those transform coefficients within a specified zone of a block are processed further, with the remaining set to zero. This process is also simple to implement, and corresponds to low-pass filtering if only low-frequency transform coefficients are selected. By computing directly the 2D IDCT of blocks, we reconstruct a WZ description which consists in an image version of original picture size, but reduced details. The corresponding low-pass filtering process, which retains low-frequency DCT coefficients from the original block (in our case, ), is denoted hereafter : (i.e., 8:4 means halving the number of coefficients in both directions: only the lowest DCT coefficients of the original block are retained).

Moreover, low-pass filtering can also be combined with decimation to provide format conversion capabilities. Indeed, it is well known that downsampling and upsampling can also be performed in the transform domain [710]. To convert a high-resolution video signal into a low-resolution version of half size in both directions, frequency filtering is first applied to the 2D DCT of blocks of the original signal to retain only the low-frequency coefficients (Figure 3). Then, a low-resolution video signal is obtained from the 2D IDCT of blocks after zonal filtering [7]. Such downscaling method in the transform domain is of low-computational complexity in comparison with those performed in the spatial domain: it avoids full decoding, spatial filtering and decimation, then full reencoding of the video signal. The authors in [10] show that more than 40% savings can be obtained as compared to spatial methods. This characteristic of our scheme should be of great interest with the recent deployment of technologies such as HDTV or mobile video, associated to a wide variety of receiving devices. Professional video manipulation (including video browsing, compositing, editing, or previewing) is also concerned, for which a low-spatial resolution version of the video content is generally used.

In the following subsections, we compare the proposed SLEP scheme with the one based on coarse quantization, in terms of error protection, as well as reconstructed video quality. The performance of our system has been evaluated over a wide range of symbol error rates and test video contents, as well as different bit rates for the main MPEG-2 bit stream. For comparison, we consider in what follows that the main MPEG-2 bit stream is encoded at the same bit rate as reported in [5], that is to say, 2 Mbps. For clarity, the SLEP scheme based on coarser quantization is called the SNR scalability-based SLEP scheme; the one using frequency filtering is called frequency scalability-based SLEP scheme.

2.1. Error Protection

In order to ensure a fair comparison between the two SLEP schemes, we consider, hereafter, a fixed bit rate for error protection. It means that the numbers of parity slices as well as their length are set identical. Indeed, the length of a parity slice from the WZ bit stream corresponds to the maximal size of a picture slice generated from the low-bit rate (coarsely quantized or low-pass filtered) version. Hence, in the present case, we consider the maximal image slice length after entropy coding as a significant parameter. The calculation of the WZ stream size denoted , as a function of is given by where and are the RS encoding parameters, and is the maximum length of an entropy-encoded slice (expressed in bits). Frequency scalability is applied with 8:4 ratio, that is, only the lowest frequency DCT coefficients are retained before applying IDCT. It provides efficient robustness to transmission errors, with a resulting WZ image of good visual quality. Obviously, a higher downscaling ratio could be applied in order to increase error robustness; but we verified that the reconstructed video quality rapidly decreases with increasing downscaling ratio. The corresponding coarsely quantized version has been determined in order to reach as much as possible the same bit rate of error protection. As mentioned above, only the parity slices are sent to the decoder, once shorter slices have been filled with zeros up to the maximal slice length. Hence, the more variable the image slice size is, the more the protection is unnecessarily applied to filling zeros, so the less useful the protection is. We define the ratio as a parameter representative of the slice length variability, where corresponds to the mean slices length expressed in bits. The higher this ratio is, the more variable the encoded slices length.

Table 1 gives the slices length characteristics of WZ images for different CIF test sequences. Only the results obtained for intracoded frames have been reported because it is well known that the distortion associated to intracoded pictures mostly impacts the overall video quality due to error propagation. We can see that, for a given maximal slice length , the variability of slices length is higher in the SNR scalability case than in the frequency scalability one. Indeed, by limiting each block to the same number of low-frequency DCT coefficients, frequency scalability clearly makes the slices length more homogeneous. In the same way, experiments have shown that the variability in the SNR scalability case is more dependent on video content (motion amount, detailed areas corresponding to high-frequency content). As zero-filling is applied before RS encoding across slices, the increased variability causes more zeros to be added to useful data and unnecessarily protected in the SNR scalability case. Hence, for the same SLEP parity overhead, the protected image data rate is superior in the case of a frequency-filtered WZ video bit stream. Consequently, the proposed scheme provides better efficiency in terms of error resilience in comparison with the one based on coarse quantization.

2.2. Reconstructed Video Quality

The reconstructed WZ images exhibit different kinds of artefacts depending on the use of SNR or frequency scalability mode. In the case of SNR scalability, both high-frequency and low-frequency DCT coefficients are strongly quantized. Inner distortions of block content (ringing) as well as well-known blocking effect appear consequently in the reconstructed image [11]. In the case of frequency scalability, the higher frequency coefficients are discarded, leading to a smoothing effect in areas of high-spatial activity. But as the low-frequency DCT coefficients are left untouched, areas of low and moderate activity are not affected, preserving the overall image quality. Thus, the frequency filtering drawback is generally less salient than the coding artefacts due to coarse quantization, which are reinforced with respect to the ones due to the original MPEG2 compression process.

We use the PSNR metric in order to evaluate the quality of the reconstructed video for the two SLEP schemes using the same protection rate. The PSNR is increased of 1.7 dB on average when frequency scalability is used (Table 2). This corresponds to a significant improvement in image quality. We also notice that the variations of PSNR values are strongly related to the spatial activity of the processed video sequences. But even for highly detailed videos (the worst case being Old boat) results are in favor of the frequency case. We can conclude that the proposed SLEP scheme offers better performances in terms of reconstructed video quality of intracoded pictures, at a parity of protection bit rate.

2.3. Unequal Protection Based on Picture Type

We propose now an adaptation of our SLEP scheme to account for I, P, or B picture coding mode during the WZ encoding step. It relies on changing the resolution protected by the WZ stream rather than the RS capacity, as proposed in [12]. Typically, an MPEG-2 compressed video sequence is made of a series of groups of pictures (GOPs), each GOP being composed of one intracoded (I) picture and the subsequent intercoded predicted (P) and/or bidirectional (B) pictures. The transmitted data in the latter include motion information (i.e., motion vectors) and intercoded residual error data.

The study described below was conducted for different MPEG-2 bit rates on a set of well-known CIF video sequences edited by the video quality expert group [13]. Table 3 gives as an example the results for the targeted bit rate of 2 Mbps with 30 frames/sec; the GOP characteristics are given by , where defines the distance between I frames, and is the distance between consecutive I or P frames. We use as a distortion measure the normalized mean squared error (MSE) with respect to overall picture variance.

It is clear from the results in Table 3 that frequency filtering mostly affects intracoded pictures. In addition, since (I) pictures serve as the reference for (P)/(B) picture reconstruction, the corresponding high distortion will propagate to the subsequent pictures inside the entire GOP. In the case of intercoded pictures, motion should be recovered with no loss. Hence, the loss of intercoded residual data results in a hardly noticeable blurring around edges, with little or no propagation. In addition, the effectiveness of lowering the resolution becomes questionable for picture areas with little or no motion, while masking acts in the case of moving scenes.

For these reasons, we propose to adapt the WZ protection depending on the picture coding type. This protection consists in varying the 8: low-pass filtering strength over the entire GOP structure, so that the spatial resolution associated to (I) pictures is higher than the one of (P) and (B) pictures. This prevents blur from affecting visual quality of static scenes containing details. On the other hand, we propose also to adapt the decoding strategy depending on the motion characteristics of the video scene. Such adaptations are described in the following section.

3. Performance Evaluation

In this section, we analyze the performances of the proposed frequency scalability SLEP scheme by considering different configurations. Unequal picture protection is applied using the following WZ protection streams:

I(8:4), P(8:2), B(8:1) noted hereafter I4-P2-B1,I(8:2), P(8:1), B(8:1) noted hereafter I2-P1-B1.

We also consider equal picture protection using 8:4, 8:2, and 8:1 (DC only) WZ protection. Finally, the conventional FEC case is presented as a point of comparison. Error concealment is achieved by copying the colocated slice in previous reference frame. We choose previous frame copy error concealment because this simple method is widely used in actual video decoders, although more sophisticated concealment strategies are available in the literature [14]. It is well known that this method can cope well with data losses when there is slow motion. The main MPEG-2 bit stream is encoded at 2 Mbps, with the IBBP frame structure, and 10% worth of RS parity information is added. These values are consistent with experimental conditions described in [4]. The Reed-Solomon encoding parameters were determined experimentally based on the analytical model proposed in [12]. The WZ stream size depends on both the size of the parity slice (maximum size of a picture slice) and the RS parameters, which determine the number of parity slices. For the same final bit rate overhead, the lighter the reduced stream, the stronger the applied RS protection.

In the experiments, we simulate video streams transmission over a heterogeneous wired/wireless packet network with unpredictable error bursts. The slice loss process for this scenario can be modelled using a two-state Gilbert-Elliott model [15]. Figures 4 and 5 present experimental results in terms of error resilience as well as reconstructed video quality, respectively, with average burst length of 1,2 slices. The 8:1 (DC only) case clearly represents a border case, since it provides the highest error resilience also gives an error-free reconstructed sequence with low-visual quality.

Figure 4 gives the displayed erroneous slices rate after WZ correction. According to these results, the lower the resolution used for the WZ stream, the stronger the error resilience properties of the SLEP scheme. SLEP protection allows the RS error correction capability to practically double each time the spatial resolution of the WZ stream is lowered. Unequal picture protection permits rate savings on intercoded pictures protection, thus improving the error resilience properties for (I) pictures.

Figure 5 gives the distortion after transmission over error-prone networks as a function of the slice error rate. The overall distortion expressed in terms of MSE results from both uncorrected (concealed) and corrected slices. At the decoding stage, the corrected slices are identical to the transmitted slices, whereas in other cases, the corrupted slices are replaced by their frequency-filtered version. When WZ streams are used, correction induces a distortion due to lowering the resolution, which could cause the loss of texture information, especially for the 8:1 description. However, motion information is preserved, so that error concealment and most severe artefacts can be avoided.

According to the results, unequal error protection significantly improves the performances of the proposed SLEP scheme. For example, the I4-P2-B1 scheme has an error correction capacity and associated error concealment distortion equivalent to the fixed 8:2 scheme (5 lost slices per picture). An important issue is the distortion associated with slice substitution. For (P) pictures, the protected resolution is 8:2 in both schemes; the distortion due to downscaling remains unchanged. The reconstructed resolution of the B-picture-coded difference is lowered, causing a small unpropagated distortion, whose overall effect is negligible. However, the distortion due to the lowering of the I-picture resolution is comparable to the 8:4 scheme. Hence, ensuring a higher resolution for reconstructed I-picture frames greatly improves the quality of the displayed sequence by limiting the distortion due to WZ correction.

In order to clearly analyze the properties of the proposed scheme, several tests have been conducted with different video sequences. Simulation results are given here for two specific extreme cases.

Case one: the Football sequence, which contains high motion and moderate texturing, and is representative of most standard TV programs.

Case two: the Map sequence, which contains very little motion and is highly textured, and more atypical.

Intuitively, the distortion due to replacing slices with their reduced spatial resolution versions is generally much lower than the distortion caused by error concealment. However, for the Map sequence, error concealment gives more satisfying results than slice substitution given the absence of motion and highly detailed content (Table 4). The only convenient SLEP protection schemes for this kind of sequence are the ones based on 8:4 downscaling of I-pictures (i.e., WZ 8:4 or WZ I4-P2-B1).

Figure 6 shows one displayed I-picture frame from each sequence. The first one corresponds to a standard FEC scheme, the second to our SLEP 8:2 scheme, and the third to the new I4-P2-B1 scheme.

As seen in the Football sequence, the FEC lost slices are concealed, but the motion information cannot be recovered and the resulting misalignment is visually very annoying. In the WZ case, correction is performed by replacing the slices with their lower resolution versions, with exact motion compensation. The recovery of the lost slices is very efficient due to the stronger error resilience of the proposed SLEP method: indeed, the improved WZ scheme avoids the large distortions due to unrecovered motion information. Subjective comparison with standard methods shows the clear benefit of our SLEP scheme when dealing with motion.

For the Map sequence, no motion information is needed since the video content is quasistatic, and thus error concealment is able to estimate the lost content. The failure of the FEC correction induces only a slight distortion. On the other hand, we notice that the distortion due to WZ 8:2 reduced resolution becomes visually annoying. Applying unequal picture protection improves the displayed resolution without sacrificing error resilience. The visual distortion introduced by replacing corrupted slices with their low- esolution versions remains slight, thus the visual quality of displayed video is higher (see bottom-right image, Figure 6).

Finally, in order to account for spatiotemporal properties of video scenes, we investigate a hybrid WZ decoder, which switches adaptively between previous frame copy error concealment and WZ substitution, based on slice-based motion detection algorithm. Different motion detection algorithms have been previously described in the literature [16]. We suggest that a motion activity parameter υis computed for each slice from the coding parameters available in the WZ bit stream (motion vector coordinates, mode decisions), and compared to a predefined motion threshold T. As a first approach, the threshold value should be fixed experimentally based on statistics of the compressed video database. In the case of video scenes with little or no motion like the Map sequence (), error concealment (see top-right image, Figure 6) is preferred to WZ substitution. Otherwise, WZ substitution is applied. In this case, the overall visual quality should be significantly improved for most video sequences, whatever motion amount. Such extension of our SLEP scheme based on motion adaptation is currently under consideration.

4. Conclusion

We have proposed a new systematic lossy error protection scheme based on frequency filtering, which ensures unequal picture protection depending on picture coding type. This scheme gives better performances compared to the classic FEC method as well as previous SLEP implementations, while adding limited complexity increase. It offers also interesting properties for digital video format conversion. Because the SLEP scheme is independent of the broadcast network, other protection tools linked to the network layer could be associated independently, improving performance (e.g., if a feedback channel was available, the SLEP protection could be adaptive). The scheme is also independent of the main stream codec and could be applied to other compression techniques than MPEG-2, including the new H.264/AVC coding standard.

Acknowledgment

The authors would like to thank the anonymous reviewers for their valuable comments.