Département d'Opto-Acousto-Electronique, Institut d'Electronique de Microélectronique et de Nanotechnologie, UMR 8520, Université de Valenciennes, Le Mont Houy, 59313 Valenciennes, Cedex 9, France
Systematic lossy error protection (SLEP) is a robust error resilient mechanism based on principles of Wyner-Ziv (WZ) coding for video transmission over error-prone networks. In an SLEP scheme, the video bitstream is separated into two parts: a systematic part consisting of a video sequence transmitted without channel coding, and additional information consisting of a WZ supplementary stream. This paper presents an adaptive SLEP scheme in which the WZ stream is obtained by frequency filtering in the transform domain. Additionally, error resilience varies adaptively depending on the characteristics of compressed video. We show that the proposed SLEP architecture achieves graceful degradation of reconstructed video quality in the presence of increasing transmission errors. Moreover, it provides good performances in terms of error protection as well as reconstructed video quality if compared to solutions based on coarser quantization, while offering an interesting embedded scheme to apply digital video format conversion.
1. Introduction
Over the last few
years, the Wyner-Ziv coding theorem [1] has found several applications in
digital video coding and transmission [2–5]. Among these
applications, error resilience properties of Wyner-Ziv (WZ) coding are here
considered to strengthen the robustness of transmitted video bitstreams against
channel distortions. Many application scenarios are concerned, including broadcast
TV or video transmission over mobile networks. In [4], Rane et al. proposed a systematic lossy
error protection (SLEP) scheme, in which a supplementary bitstream is generated
using WZ coding and transmitted jointly with the unprotected MPEG-2 compressed
bitstream. The so-called WZ stream is obtained by coarsely quantization and
entropy coding of the main MPEG video stream. After transmission over error
prone packet networks, the WZ stream is used to replace the lost data from the
main stream, leading to a graceful degradation of reconstructed video quality
with worsening error conditions.
In this paper, we present
a new SLEP scheme which is based on the solution proposed in [4]. MPEG-2 video
compression is considered in this work because of its widespread use in digital
video broadcasting [5]. Regarding the scheme described in [4], however, many
techniques have been added or improved within the present work to enhance the
performances of the SLEP architecture. First, the supplementary WZ bitstream is
generated using frequency filtering [6], instead of coarser quantization. This modification
gives good performances in terms of error protection as well as reconstructed
video quality compared to coarse quantization. Moreover, frequency filtering can
be combined conveniently with decimation to perform video format conversion easily,
which constitutes a great advantage. Finally, the proposed SLEP scheme is
adaptive, so that error resilience varies according to picture encoding mode as
well as motion properties of the video scene.
The remainder of the paper
is organized as follows: first, we remind the systematic lossy source channel
coding framework proposed in [4] for error resilient MPEG-2 broadcasting. Then,
we detail our modified SLEP scheme based on combined frequency filtering and unequal
picture protection, and demonstrate its advantages over the one based on coarser
quantization. In Section 3, we give experimental results that illustrate the
performances of the proposed algorithm, and then propose a hybrid SLEP scheme,
which switches adaptively between spatial error concealment and WZ decoding,
based on motion detection. Finally, concluding remarks are given in Section 4.
2. The Proposed Slep Scheme
In this section, we
describe the proposed SLEP scheme which is based on frequency filtering and
unequal picture protection. The block diagram of the systematic lossy source-channel
coding framework proposed in [4] for error resilient MPEG-2 broadcasting is
shown in Figure 1.
Figure 1: Block diagram of the SLEP scheme proposed in [
4].
The input video signal
is first encoded by means of MPEG-2, and the resulting bit stream is
transmitted over the error-prone packet network without error protection. In
addition, a supplementary bit stream is generated using WZ encoding. First, a
coarsely quantized version is generated from the main MPEG bit stream and entropy
coded. As the entropy-encoded slices are of variable length, shorter slices are
filled with zero bytes in order to adjust the slices to the same size. Then,
systematic Reed-Solomon (RS) codes are applied across the slices of
the resulting data stream, after zero filling, as illustrated in Figure 2. Only
the generated parity slices which constitute the so-called WZ stream are
transmitted to the decoder. If packet losses occur, the WZ decoder uses both
parity packets and the error-prone decoded MPEG video sequence as side
information in order to obtain the error-free WZ description. Since the
location of the lost slices is known, the RS decoder can perform erasure
decoding across the error-prone slices. Therefore, the erroneous slices can be
substituted with the corresponding correct but coarser versions, leading to a
reconstructed video sequence of better visual quality. If RS error correction
capacity is overcame, spatial error concealment is applied using the previously
decoded frame.
Figure 2: Reed-Solomon encoding across slices.
This transmission
scheme is fully compatible with actual digital video coding standards. It is
more resilient to channel losses [4, 5] while adding negligible complexity increase
with respect to conventional FEC systems. Indeed, it only requires
requantization of DCT coefficients, the coding parameters (motion vectors, mode
decisions) from the main compressed version being reused. Such requantization
process is simple to implement in the SLEP scheme. We note that FEC constitutes
a special case of the SLEP scheme, for which the quantization step is the same for
both MPEG and Wyner-Ziv video encoders.
We proposed in a
previous work [6] to replace coarse quantization in the SLEP scheme, described
in Figure 1, with frequency filtering (also called hereafter frequency
scalability). Doing so, only those transform coefficients within a specified
zone of a block are processed further, with the remaining set to zero. This
process is also simple to implement, and corresponds to low-pass filtering if
only low-frequency transform coefficients are selected. By computing directly the
2D IDCT of blocks, we reconstruct a WZ description which consists in an image
version of original picture size, but reduced details. The corresponding low-pass
filtering process, which retains low-frequency
DCT coefficients from the original block (in our
case, ), is denoted hereafter : (i.e., 8:4 means halving the number of coefficients in both directions: only
the lowest DCT coefficients of the original block are retained).
Moreover, low-pass
filtering can also be combined with decimation to provide format conversion
capabilities. Indeed, it is well known that downsampling and upsampling can
also be performed in the transform domain [7–10]. To convert a
high-resolution video signal into a low-resolution version of half size in both
directions, frequency filtering is first applied to the 2D DCT of blocks
of the original signal to retain only the low-frequency coefficients
(Figure 3). Then, a low-resolution video signal is obtained from the 2D IDCT of
blocks after zonal filtering [7]. Such downscaling method in the
transform domain is of low-computational complexity in comparison with those
performed in the spatial domain: it avoids full decoding, spatial filtering and
decimation, then full reencoding of the video signal. The authors in [10] show
that more than 40% savings can be obtained as compared to spatial methods. This
characteristic of our scheme should be of great interest with the recent
deployment of technologies such as HDTV or mobile video, associated to a wide
variety of receiving devices. Professional video manipulation (including video
browsing, compositing, editing, or previewing) is also concerned, for which a
low-spatial resolution version of the video content is generally used.
Figure 3: Spatial resolution downsizing by frequency filtering.
In the following
subsections, we compare the proposed SLEP scheme with the one based on coarse
quantization, in terms of error protection, as well as reconstructed video
quality. The performance of our system has been evaluated over a wide range of
symbol error rates and test video contents, as well as different bit rates for
the main MPEG-2 bit stream. For comparison, we consider in what follows that
the main MPEG-2 bit stream is encoded at the same bit rate as reported in [5],
that is to say, 2 Mbps. For clarity, the SLEP scheme based on coarser
quantization is called the SNR scalability-based SLEP scheme; the one using
frequency filtering is called frequency scalability-based SLEP scheme.
2.1. Error Protection
In order to ensure a
fair comparison between the two SLEP schemes, we consider, hereafter, a fixed
bit rate for error protection. It means that the numbers of parity slices as
well as their length are set identical. Indeed, the length of a parity slice
from the WZ bit stream corresponds to the maximal size of a picture slice
generated from the low-bit rate (coarsely quantized or low-pass filtered) version.
Hence, in the present case, we consider the maximal image slice length after
entropy coding as a
significant parameter. The calculation of the WZ stream size denoted , as a function of is given by where and are the RS
encoding parameters, and is the maximum length of an entropy-encoded slice (expressed in bits). Frequency
scalability is applied with 8:4 ratio, that is, only the lowest frequency DCT
coefficients are retained before applying IDCT. It provides efficient
robustness to transmission errors, with a resulting WZ image of good visual
quality. Obviously, a higher downscaling ratio could be applied in order to
increase error robustness; but we verified that the reconstructed video quality
rapidly decreases with increasing downscaling ratio. The corresponding coarsely
quantized version has been determined in order to reach as much as possible the
same bit rate of error protection. As mentioned above, only the parity slices
are sent to the decoder, once shorter slices have been filled with zeros up to
the maximal slice length. Hence, the more variable the image slice size is, the
more the protection is unnecessarily applied to filling zeros, so the less
useful the protection is. We define the ratio as a parameter representative of the slice length variability, where
corresponds to the mean
slices length expressed in bits. The higher this ratio is, the more variable the
encoded slices length.
Table 1 gives the slices
length characteristics of WZ images for different CIF test sequences.
Only the results obtained for intracoded frames have been reported because it
is well known that the distortion associated to intracoded pictures mostly
impacts the overall video quality due to error propagation. We can see that,
for a given maximal slice length ,
the variability of slices length is higher in the SNR scalability case than in the
frequency scalability one. Indeed, by limiting each block to the same number of
low-frequency DCT coefficients, frequency scalability clearly makes the slices
length more homogeneous. In the same way, experiments have shown that the
variability in the SNR scalability case is more dependent on video content
(motion amount, detailed areas corresponding to high-frequency content). As
zero-filling is applied before RS encoding across slices, the increased
variability causes more zeros to be added to useful data and unnecessarily
protected in the SNR scalability case. Hence, for the same SLEP parity overhead,
the protected image data rate is superior in the case of a frequency-filtered
WZ video bit stream. Consequently, the proposed scheme provides better
efficiency in terms of error resilience in comparison with the one based on coarse
quantization.
Table 1: Comparison of Wyner-ZIV image characteristics for SNR and
frequency scalability for different test images ( = maximum
slices length, in bits; = mean slices length, in bits).
2.2. Reconstructed Video Quality
The reconstructed WZ images
exhibit different kinds of artefacts depending on the use of SNR or frequency
scalability mode. In the case of SNR scalability, both high-frequency and low-frequency
DCT coefficients are strongly quantized. Inner distortions of block content
(ringing) as well as well-known blocking effect appear consequently in the
reconstructed image [11]. In the case of frequency scalability, the higher
frequency coefficients are discarded, leading to a smoothing effect in areas of
high-spatial activity. But as the low-frequency DCT coefficients are left
untouched, areas of low and moderate activity are not affected, preserving the overall
image quality. Thus, the frequency filtering drawback is generally less salient
than the coding artefacts due to coarse quantization, which are reinforced with
respect to the ones due to the original MPEG2 compression process.
We use the PSNR metric
in order to evaluate the quality of the reconstructed video for the two SLEP
schemes using the same protection rate. The PSNR is increased of 1.7 dB on
average when frequency scalability is used
(Table 2). This corresponds to a significant
improvement in image quality. We also notice that the variations of PSNR values
are strongly related to the spatial activity of the processed video sequences.
But even for highly detailed videos (the worst case being Old boat) results are in favor of the frequency case. We can
conclude that the proposed SLEP scheme offers better performances in terms of
reconstructed video quality of intracoded pictures, at a parity of protection
bit rate.
Table 2: Comparison of PSNR results for different intracoded images and
same protection rate.
2.3. Unequal Protection Based on
Picture Type
We propose now an
adaptation of our SLEP scheme to account for I, P, or B picture coding mode
during the WZ encoding step. It relies on changing the resolution protected by
the WZ stream rather than the RS capacity, as proposed in [12]. Typically, an
MPEG-2 compressed video sequence is made of a series of groups of pictures (GOPs),
each GOP being composed of one intracoded (I) picture and the subsequent
intercoded predicted (P) and/or bidirectional (B) pictures. The transmitted
data in the latter include motion information (i.e., motion vectors) and
intercoded residual error data.
The study described
below was conducted for different MPEG-2 bit rates on a set of well-known CIF
video sequences edited by the video quality expert group
[13]. Table 3
gives as an example the results
for the targeted bit rate of 2 Mbps with 30 frames/sec; the GOP characteristics
are given by , where defines the distance between I frames, and is the distance between
consecutive I or P frames. We use as a distortion measure the normalized mean squared
error (MSE) with respect to overall picture variance.
Table 3: Average distortion as a function of picture
type.
It is clear from the
results in Table 3 that frequency filtering mostly affects intracoded pictures.
In addition, since (I) pictures serve as the reference for (P)/(B) picture
reconstruction, the corresponding high distortion will propagate to the subsequent
pictures inside the entire GOP. In the case of intercoded pictures, motion should
be recovered with no loss. Hence, the loss of intercoded residual data results in
a hardly noticeable blurring around edges, with little or no propagation. In
addition, the effectiveness of lowering the resolution becomes questionable for
picture areas with little or no motion, while masking acts in the case of
moving scenes.
For these reasons, we
propose to adapt the WZ protection depending on the picture coding type. This
protection consists in varying the 8: low-pass filtering strength
over the entire GOP structure, so that the spatial resolution associated to (I)
pictures is higher than the one of (P) and (B) pictures. This prevents blur from
affecting visual quality of static scenes containing details. On the other hand,
we propose also to adapt the decoding strategy depending on the motion
characteristics of the video scene. Such adaptations are described in the
following section.
3. Performance Evaluation
In this section, we analyze
the performances of the proposed frequency scalability SLEP scheme by
considering different configurations. Unequal picture protection is applied
using the following WZ protection streams:
I(8:4), P(8:2), B(8:1)
noted hereafter I4-P2-B1,I(8:2), P(8:1), B(8:1)
noted hereafter I2-P1-B1.
We also consider equal
picture protection using 8:4, 8:2, and 8:1 (DC only) WZ protection. Finally,
the conventional FEC case is presented as a point of comparison. Error concealment is achieved by copying the
colocated slice in previous reference frame. We choose previous frame copy
error concealment because this simple method is widely used in actual video
decoders, although more sophisticated concealment strategies are available in
the literature [14]. It is well known that this method can cope well with data
losses when there is slow motion. The main MPEG-2 bit stream is encoded at 2 Mbps, with the IBBP frame structure, and 10% worth of RS parity information is added.
These values are consistent with experimental conditions described in [4]. The Reed-Solomon
encoding parameters were determined experimentally based on the
analytical model proposed in [12]. The WZ stream size depends on both the size
of the parity slice (maximum size of a picture slice) and the RS parameters,
which determine the number of parity slices. For the same final bit rate
overhead, the lighter the reduced stream, the stronger the applied RS
protection.
In the experiments, we simulate
video streams transmission over a heterogeneous wired/wireless packet network
with unpredictable error bursts. The slice loss process for this scenario can
be modelled using a two-state Gilbert-Elliott model [15].
Figures 4 and 5
present experimental results in terms of error resilience as well as
reconstructed video quality, respectively, with average burst length of 1,2
slices. The 8:1 (DC only) case clearly represents a border case, since it
provides the highest error resilience also gives an error-free reconstructed
sequence with low-visual quality.
Figure 4:
Evolution of the displayed erroneous slices rate after RS correction as a
function of the incoming slice error rate.
Figure 5: Evolution of normalized MSE transmission
correction as a function of the incoming slice error rate.
Figure 4 gives the
displayed erroneous slices rate after WZ correction. According to these
results, the lower the resolution used for the WZ stream, the stronger the
error resilience properties of the SLEP scheme. SLEP protection allows the RS
error correction capability to practically double each time the spatial
resolution of the WZ stream is lowered. Unequal picture protection permits rate
savings on intercoded pictures protection, thus improving the error resilience
properties for (I) pictures.
Figure 5 gives the distortion after transmission
over error-prone networks as a function of the slice error rate. The overall
distortion expressed in terms of MSE results from both uncorrected (concealed)
and corrected slices. At the decoding stage, the corrected slices are identical
to the transmitted slices, whereas in other cases, the corrupted slices are
replaced by their frequency-filtered version. When WZ streams are used,
correction induces a distortion due to lowering the resolution, which could
cause the loss of texture information, especially for the 8:1 description.
However, motion information is preserved, so that error concealment and most
severe artefacts can be avoided.
According to the results,
unequal error protection significantly improves the performances of the
proposed SLEP scheme. For example, the I4-P2-B1 scheme has an error correction
capacity and associated error concealment distortion equivalent to the fixed
8:2 scheme (5 lost slices per picture). An important issue is the distortion
associated with slice substitution. For (P) pictures, the protected resolution
is 8:2 in both schemes; the distortion due to downscaling remains unchanged.
The reconstructed resolution of the B-picture-coded difference is lowered,
causing a small unpropagated distortion, whose overall effect is negligible.
However, the distortion due to the lowering of the I-picture resolution is
comparable to the 8:4 scheme. Hence, ensuring a higher resolution for reconstructed
I-picture frames greatly improves the quality of the displayed sequence by
limiting the distortion due to WZ correction.
In order to clearly analyze
the properties of the proposed scheme, several tests have been conducted with
different video sequences. Simulation results are given here for two specific
extreme cases.
Case one: the Football sequence, which contains high
motion and moderate texturing, and is representative of most standard TV
programs.
Case two: the Map sequence, which contains very little
motion and is highly textured, and more atypical.
Intuitively, the
distortion due to replacing slices with their reduced spatial resolution
versions is generally much lower than the distortion caused by error
concealment. However, for the Map sequence,
error concealment gives more satisfying results than slice substitution given
the absence of motion and highly detailed content (Table 4). The only
convenient SLEP protection schemes for this kind of sequence are the ones based
on 8:4 downscaling of I-pictures (i.e., WZ 8:4 or WZ I4-P2-B1).
Table 4: Average distortions associated with each frequency-filtered
version and error concealment for the
Football and Map sequences.
Figure 6 shows one
displayed I-picture frame from each sequence. The first one corresponds to a
standard FEC scheme, the second to our SLEP 8:2 scheme, and the third to the
new I4-P2-B1 scheme.
Figure 6:
Displayed frame from FEC, WZ 8:2, and WZ I4-P2-B1 for Football (left) and
Map (right)
sequences.
As seen in the Football sequence, the FEC lost slices are concealed, but the
motion information cannot be recovered and the resulting misalignment is
visually very annoying. In the WZ case, correction is performed by replacing the
slices with their lower resolution versions, with exact motion compensation.
The recovery of the lost slices is very efficient due to the stronger error
resilience of the proposed SLEP method: indeed, the improved WZ scheme avoids
the large distortions due to unrecovered motion information. Subjective
comparison with standard methods shows the clear benefit of our SLEP scheme
when dealing with motion.
For the Map sequence, no motion information is needed since the video
content is quasistatic, and thus error concealment is able to estimate the lost
content. The failure of the FEC correction induces only a slight distortion. On
the other hand, we notice that the distortion due to WZ 8:2 reduced resolution becomes
visually annoying. Applying unequal picture protection improves the displayed
resolution without sacrificing error resilience. The visual distortion
introduced by replacing corrupted slices with their low- esolution versions
remains slight, thus the visual quality of displayed video is higher (see
bottom-right image, Figure 6).
Finally,
in order to account for spatiotemporal properties of video scenes, we
investigate a hybrid WZ decoder, which switches adaptively between previous
frame copy error concealment and WZ substitution, based on slice-based motion
detection algorithm. Different motion detection algorithms have been previously
described in the literature [16]. We suggest that a motion activity parameter υis computed for each slice from the coding
parameters available in the WZ bit stream (motion vector coordinates, mode decisions),
and compared to a predefined motion threshold T. As a first approach, the threshold value should be fixed
experimentally based on statistics of the compressed video database. In the
case of video scenes with little or no motion like the Map sequence (), error concealment (see top-right
image, Figure 6) is preferred to WZ substitution. Otherwise, WZ substitution is
applied. In this case, the overall visual quality should be significantly
improved for most video sequences, whatever motion amount. Such extension of our
SLEP scheme based on motion adaptation is currently under consideration.
4. Conclusion
We have proposed a new
systematic lossy error protection scheme based on frequency filtering, which
ensures unequal picture protection depending on picture coding type. This scheme
gives better performances compared to the classic FEC method as well as
previous SLEP implementations, while adding limited complexity increase. It
offers also interesting properties for digital video format conversion. Because
the SLEP scheme is independent of the broadcast network, other protection tools
linked to the network layer could be associated independently, improving
performance (e.g., if a feedback channel was available, the SLEP protection
could be adaptive). The scheme is also independent of the main stream codec and
could be applied to other compression techniques than MPEG-2, including the new
H.264/AVC coding standard.
Acknowledgment
The authors would like
to thank the anonymous reviewers for their valuable comments.