Abstract

For some 3D applications, one may want to focus on a specific depth zone representing a region of interest in the scene. In this context, we introduce a new functionality called “autofocus” for 3D image coding, exploiting the depth map as an additional semantic information provided by the 3D sequence. The method is based on a joint “Depth of Interest” (DoI) extraction and coding scheme. First, the DoI extraction scheme consists of a precise extraction of objects located within a DoI zone, given by the viewer or deduced from an analysis process. Then, the DoI coding scheme provides a higher quality for the objects in the DoI at the expense of other depth zones. The local quality enhancement supports both higher SNR and finer resolution. The proposed scheme embeds the Locally Adaptive Resolution (LAR) codec, initially designed for 2D images. The proposed DoI scheme is developed without modifying the global coder framework, and the DoI mask is not transmitted, but it is deduced at the decoder. Results showed that our proposed joint DoI extraction and coding scheme provide a high correlation between texture objects and depth. This consistency avoids the distortion along objects contours in depth maps and those of texture images and synthesized views.

1. Introduction

Recent studies in the 3D technology led to a growing development in 3D applications. Nowadays, next generations of highly advanced multimedia video systems, such as 3D television (3DTV), 3D cinemas, and free viewpoint television (FTV), provide depth perception for the viewers [1, 2] on both large TV screens [3] and mobile phones [4] and allow free navigation within a real-world scene [5, 6]. Such autostereoscopic systems, providing 3D viewing experience without special glasses or other head gear [7, 8], rely on depth maps to give the 3D impression and to synthesize intermediate views at an arbitrary view point [9, 10]. These depth maps can be considered as an additional semantic information on the scene, which can be exploited in a region of interest (RoI) context. Indeed, among advanced functionalities for image coding, such as scalability, lossy/lossless and security, RoI is useful for many applications [11]. In RoI-based image coding, some regions, that are of interest to the viewer, are encoded with a higher fidelity than the rest of the image. There are several 2D applications using the RoI coding such as compression of infrared or digital medical images [12, 13], segmentation [14], and accurate objects location [15]. However, for some 3D applications, the areas of interest can be partially or fully dependent on the depth information [16]. In this context, new research works have been devoted to the application of the RoI feature in the 3D domain [1620]. In [17], Fan et al. proposed a digital watermarking algorithm for copyright protection of 3D images. The proposed watermarking scheme was based on Depth Perceptual Region of Interest (DP-RoI) [18] from the idea that the Human Visual System (HVS) is more sensitive to the distortion in foreground than background regions in 2D image. The DP-RoI extraction and tracking algorithms were proposed in [18] where Zhang et al. exploited the depth information beside the large contrast of illumination in the texture image to extract the RoI. In [19], Karlsson and Sjöström determined the foreground by using the information in the depth map. The foreground was defined as the RoI. It was used to reallocate bits from the background to the RoI by applying a spatiotemporal filter [21] on the texture image. This method achieved an improvement in the perceived quality of foreground at the expense of a quality reduction in background. In the method proposed by Karlsson, the RoI coding has only been used for the ordinary monoscopic texture video, part of the 2D plus depth representation, without the depth map. Zheng et al. [20] proposed a scheme for RoI coding of 3D meshes inspired by JPEG2000 [22]. In [16], Chamaret et al. adaptively adjusted the shift between the two views of a stereoscopic system in order to have a null disparity on the RoI area. This enhanced the visual quality of a given area in the rendered view scene. However, Chamaret et al.’s method was applied on stereoscopic and not on autostereoscopic systems.

Many different contributions have been proposed for the RoI extraction and coding schemes in the 3D domain, but only a few combine both extraction and coding. From a representation and coding point of view, three important criteria must be considered for the RoI coding system: (1) RoI spatial resolution, (2) the RoI-mask cost, and (3) the choice of quality levels between background/foreground. Most of the state-of-the-art codecs apply a block-based RoI coding scheme, where the coding unit is a block of a minimum pixels, such as JPEGXR [23], H.264 [24], and HEVC [25]. When the RoI spatial resolution is block-based, the RoI-mask cost is reduced. However, it decreases the accuracy along the RoI contours. On the other hand, other state-of-the-art codecs, such as JPEG2000 [26, 27], apply RoI pixel-based scheme but impose the difference of quality between background and foreground. Table 1 gives a functional comparison of these different codecs. Indeed, for 3D applications, the resolution of objects contours in the depth map has to be very fine, in particular when rendering intermediate views. The reason is that 3D rendering software, such as VSRS [28], relies on depth maps to synthesize intermediate views. Several researches [2932] highlight the influence of depth image compression and its implications on the quality of video plus depth virtual view rendering.

In this paper, we propose a so-called “Depth of Interest” (DoI) scheme for joint representation and coding of the region of interest in the 2D + Z autostereoscopic domain. The related application consists of an automatic focus on a given depth range for a better rendering in this area of interest. It offers a fine extraction of objects with well-defined contours in the depth map, inducing a low distortion among area of interest in the synthesized views. Contrary to the state-of-the-art methods, such a scheme better fits with the three criteria of a RoI coding scheme: (1) it relies on a pixel-based resolution, (2) the RoI-mask transmission is almost free as only the two values limiting the Depth of Interest range are transmitted, and (3) it allows freely choosing different qualities for the different regions in the scene (cf. Table 1). The proposed DoI coding scheme was tested on a large set of MPEG 3D reference sequences (Balloons, Kendo, BookArrival, Newspaper, UndoDancer, and GTFly).

This paper is organised as follows. In Section 2, the global coding scheme is presented. In Section 3, the 2D LAR coder framework and the 2D + Z coding scheme are described. In Section 4, the proposed DoI representation and coding scheme is explained. Section 5 provides the experimental results for DoI extraction, coding performance, and subjective quality tests for texture images and synthesized intermediate views. Finally, we conclude this paper in Section 6.

2. Global Coding Scheme

The proposed scheme embeds the Locally Adaptive Resolution (LAR) codec [33]. LAR is a global coding framework based on a content-based QuadTree partitioning. It provides a lot of functionalities such as a unique codec for lossy and lossless compression, resolution and quality scalability, partial/full encryption, 2D region of interest coding, rate control, and rate-distortion optimization [34]. Extensions to scalable 2D + Z coding have been presented in [35].

For 2D + Z applications, it can be assumed that the object of interest is located in a specific depth zone. This depth range which is provided as an input of our global coding scheme defines the Depth of Interest (DoI). First, the DoI representation scheme consists in defining, from the depth map, a binary mask covering the DoI zone. It leads to a fine extraction of objects located within a DoI zone. Then, the DoI coding scheme aims at ensuring a higher quality at DoI zone at the expense of other depth zones. Higher visual quality in the DoI is ensured for both depth and texture images (cf. Figure 1).

For this purpose, firstly, a DoI refinement is applied on the depth map (Section 4.1). The QuadTree partition of the depth map is adapted to get a higher resolution at the DoI layer, and thus well-defined contours along the objects of interest are obtained. Secondly, a DoI coding is applied on the texture image (Section 4.2). The RoI-based scheme using the defined depth-based binary mask allows setting different qualities inside and outside the DoI. The benefits of the proposed DoI representation scheme are as follows: (1) it is content-based; (2) the DoI coding scheme is inserted as a preprocess, and thus we prevent modifying the coder which reduces the complexity of the coder; (3) the DoI-Mask is not transmitted to the decoder, and thus there is no over cost coding.

3. LAR Coder Framework

3.1. 2D LAR Coder

Locally Adaptive Resolution (LAR) [33] is an efficient multiresolution content-based 2D image coder for both lossless and lossy image compressions (cf. Figure 2).

3.1.1. QuadTree Partitioning

The LAR coder relies on a local analysis of image activity. It starts with the estimation of a QuadTree representation, conditioning a variable-size block partitioning ( pixels). The block size estimation is performed by comparing the difference of the local maximum and minimum values within the block with a given threshold . Therefore, the local resolution defined by the pixel size depends on the local activity: the smallest blocks are located upon edges, whereas large blocks map homogeneous areas (cf. Figure 3). Then, the main feature of the LAR coder consists in preserving contours while smoothing homogeneous parts of the image. This QuadTree partition is the key structure of the LAR coder.

3.1.2. Transform and Prediction

The coder uses a pyramidal representation of the image. Starting from the upper level of the pyramid (lower resolution/larger blocks), a dyadic decomposition conditioned by the QuadTree is performed. Each level of the decomposition integrates a 2 × 2 block transform (interleaved S + P transform) stage and a prediction one. The resulting prediction error is quantized using an input quantization factor and coded using an adapted entropy coder. The quantization value for each level is then given by , with being the pyramidal level (full resolution for ). It has been found in [34] that the near optimal choice for the pair () is to set as . Thus is the only parameter of the codec, and induces a lossless compression.

3.2. Global 2D + Z LAR Coding Scheme

The proposed DoI coding scheme integrates a global 2D + Z coding scheme. The global scalable 2D + Z coding scheme has been proposed in [35] (cf. global scheme in Figure 4). In a first step, a QuadTree partition is built considering only the image, (). Based on , the 2D image, represented in a () colour space, is encoded first at low bit rate and low resolution. Then, the image is encoded with an improved prediction stage using the component of the previously encoded 2D image. In a second step, the QuadTree partition is refined considering both 2D and images (). Finally, based on , the quality of the 2D image is improved by a new coding process at a finer resolution. This joint coding method preserves the spatial coherence between texture and depth images and reduces visual artefacts for synthesized views, especially upon edges. In terms of complexity, this coding scheme has a reduced one compared to JPEG2K and about the same one compared to JPEGXR. For objective efficiency compression point of view, the codec is close to JPEG2K and significantly outperforms JPEGXR (cf. Figure 5).

4. Proposed Depth of Interest (DoI) Representation and Coding Scheme

In this section, we present the proposed joint (Depth of Interest) DoI representation and coding scheme. Firstly, a DoI refinement of depth is presented in Section 4.1. Secondly, a DoI coding of texture is presented in Section 4.2.

4.1. DoI Refinement of Depth

As previously mentioned, quality of compressed depth map will be strongly linked to the distortion introduced on contours. Thus, a local quality enhancement of depth inside a DoI should firstly aim at refining the spatial resolution. As the LAR initially considers a fix threshold to build the QuadTree, a possible solution would be to design a new QuadTree algorithm with an adaptive threshold. We opted for another solution preserving the QuadTree estimation stage but introducing a depth map preprocessing. This stage performs a dynamic range adjustment on the depth map so that the Depth of Interest (DoI) gets the largest dynamic range at the expense of the other depth zones (cf. Figure 1).

We introduce the following notations:(i)In_Depth: input depth map.(ii)Out_Depth: output depth map after dynamic range adjustment.(iii): input low and high limits of Depth of Interest range.(iv): input Depth of Interest window size, with .(v): adjusted Depth of Interest window size, with .

The input depth range and coefficient can be considered as approximations of the focal distance and -number (or relative aperture) in optical system, respectively. In particular, the parameter will control the degree of image sharpness for the DoI. These parameters will be set by the user.

The proposed piecewise adjustment function is given in Figure 6 and the corresponding algorithm in Procedure 1. We set the constraint that the value of the middle point of the input depth range has to be unchanged.

Input:
Output:
for all do
  if then
   
   (I)
  end if
  if then
   
   
   (II)
  end if
  if then
   
   
   (III)
  end if
end for
return the adjusted depth map}
return

The image is then used for the QuadTree estimation, whereas the input depth is encoded with the same quantization parameter for the whole image.

4.1.1. QuadTree Results after Depth Map Preprocessing

Figures 7 and 10 represent the DoI zone within and in the original input depth map (original input binary depth mask). Figures 8(a) and 8(b) present the original and the adjusted depth maps, respectively. Indeed, after the dynamic range adjustment of the depth map, the DoI is mapped by smaller blocks. Thus, it leads to an increase of the local resolution inside the DoI area in the QuadTree (cf. Figures 9 and 11).

4.1.2. Objective and Visual Results of Decoded Depth Maps in DoI Context

In this section, we compare the depth map coding between the original 2D + Z LAR solution and the proposed one including the adjustment stage. For the same bit rate, the previous preprocessing step decreases the global objective quality of the depth image in comparison with the classic coding. However, it increases the local quality within the DoI with a gain up to 11 dB for the given configuration as the local resolution in the DoI is increased, (cf. Figures 12 and 14). Furthermore, we can notice a significant visual quality improvement in the DoI (cf. Figures 13 and 15). More precisely, it is obviously visible that edges of objects within DoI are more accurately encoded in comparison with the classic coding scheme.

4.2. DoI Coding of Texture

The “DoI” coding of texture means that the texture image has to be compressed with unequal quality according to the given DoI (cf. Figure 1).

The first step is the extraction of a binary mask image (DoI-Mask) that will be further used to define the region of interest, considering that only the depth range () is transmitted to the decoder. This extraction is simply performed by a binarization step of the coded depth image, with the depth range as input parameter. The considered depth image is the coded one for two main reasons. The first one is that the process has to be duplicated at the decoder side. The second reason is that the binary mask has to be seen as a subset of the QuadTree partition.

The second step is the quality enhancement for the DoI for the texture. As mentioned in Section 3.2, the texture image is encoded in two passes: first, the 2D + Z images are compressed at low bit rate based on only, and then the 2D image is enhanced considering . In this coding scheme, the same quantization factor is applied at both passes.

We introduce two different ways of texture quality enhancement. The first one consists in a global SNR quality enhancement using the concept of region-level coding introduced in [33] for RoI coding: the image is represented by regions mapping the QuadTree, and each region is independently encoded at a SNR quality level. The original solution allowed only two regions (RoI and non-RoI); we extended it until eight regions with their own quantization parameter, starting from label 0 for the RoI. Thus, the DoI coding system now defines the DoI-Mask as the RoI-mask and implements at least three quantization parameters: for the depth and Low Resolution Texture images, for refined texture image inside the DoI, and for the refined texture image outside it (cf. Figure 16). In this paper, we present results for this configuration only, but other scenarios are possible as a total of seven quantization levels are available for the non-DoI zones. For instance, a simple one consists of first dividing the non-DoI zones into depth ranges (), defining regions, and then applying a progressive quantization in such a way that the quantization factor is weighted by the distance of the region to the DoI.

The second enhancement way is to obtain a local resolution refinement for the texture image inside the DoI only. It simply consists of refining the only inside the DoI by masking the input texture image for the estimation (). For this mode, no information has to be transmitted to the decoder, and a unique quantization factor is applied (cf. Figure 17).

A joint SNR enhancement and local resolution refinement solution also is feasible. Examples of the different solutions are presented in the next section.

The two proposed texture quality enhancement methods are both efficient, yet simple. Both methods are based on the LAR coder framework which has a low complexity as mentioned in Section 1. Thus, a relatively small additional computational cost is associated with both methods.

5. Experimental Results

5.1. Evaluation Methodology

After illustrating the preprocessing and encoding depth maps in Section 4.1, in this section, we focus on the texture coding aspects, with SNR quality and local resolution enhancement. Finally, we explore the results of the proposed coding scheme in synthesized views context. To the best of our knowledge, the proposed global representation and coding scheme is unique in terms of combined functionalities, so comparisons with state of the art are not feasible. However, we provide in the following comparative results for RoI with block instead of pixel accuracy. More details about the compression efficiency of the 2D + Z compared to the state of the art can be found in [35].

5.2. Objective and Visual Results of Decoded Texture Images in DoI Context

Some examples of DoI coding for texture images are provided in the following. More particularly, Figures 18 and 21 show the original texture, the original input depth mask (defined from the original depth map of the DoI zone within and ), and the DoI-Mask (defined from the decoded refined depth map) with different resolutions: full resolution as available for the proposed scheme and subsampled one by or block resolution, generally provided by state-of-the-art coders.

First, examples of SNR quality enhancement are provided. Figures 19 and 22 show a zoom on the visual quality of decoded texture image coded with classic LAR and with LAR-RoI using the full resolution DoI-Mask and a subsampled one. Results show that pixel accuracy on RoI contours gives a better visual quality (cf. Figures 20 and 23).

Then, Figure 24 shows an example of local resolution enhancement for DoI in the texture image. A higher resolution is accorded to the texture image in the DoI only which leads to an increase in the objective quality as well as in the visual quality in the DoI.

5.3. Visual Results of Synthesized Views in DoI Context

The final and most important issue in 2D + Z image coding is the visual quality of the resulting synthesized views. With the depth and texture information, intermediate views at an arbitrary view point can be synthesized with the View Synthesis Reference Software (VSRS 3.0) [28]. In this set of experiments, we consider the texture images and the depth maps coded at low bit rate with and without the proposed DoI scheme, in order to evaluate the compression effect of the proposed technique on the synthesized views (cf. Figure 25). In order to compare the proposed DoI scheme with the RoI block-based approach (such as H264 and HEVC), we study the effect of RoI resolution on synthesized views by decoding texture images using DoI-Mask at different resolutions: full resolution and and block resolution. It is clearly noticeable that the quality within the DoI in the intermediate views synthesized from depth images decoded by the proposed DoI scheme is much better than the one synthesized from depth and texture images decoded by the classic LAR at the same bit rate, (cf. Figures 26(c) and 26(d)). Moreover, the decoded texture images with block-based DoI scheme lead to a low quality synthesized views especially upon DoI contour, while the decoded texture images with our proposed pixel-based DoI scheme lead to a fine and accurate quality upon DoI contour in the synthesized views (cf. Figure 26).

6. Conclusion

In this paper a joint content-based scheme, called “Depth of Interest” (DoI) scheme, for representation and coding of the region of interest in the 3D autostereoscopic domain is presented. It ensures a higher quality in depth zones of the sequence that are of interest to the viewer. The proposed scheme embeds the LAR (Locally Adaptive Resolution) codec. The DoI representation scheme consists of defining, from the depth map, a binary mask covering the DoI zone. Then, the DoI coding scheme is applied on both depth and texture images. For this purpose, a preprocess, consisting of a dynamic range adjustment, is applied on the depth map in order to increase the resolution in the QuadTree partition at the DoI zone. Then, for the texture image, we use a RoI-based scheme using the defined depth-based binary mask. The main strength of this scheme is that it provides a high correlation between texture objects and depth and allows high quality along objects contours in depth and texture images as well as in synthesized views. In future work, we will focus on a nonlinear adjustment of the dynamic range and we will also study multiple DoI zones.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.