Abstract
A new video format called a multiview video plus depth map is recently designed as the most efficient 3D video representation. 3Dhigh efficiency video coding (3DHEVC) is the international standard for 3D video coding finalized in 2016. In 3DHEVC intracoding, the depth map is an essential component, in which the depth map intraprediction occupies more than 85% of the overall intraencoding time. In 3DHEVC design, a highly flexible quadtree coding unit partitioning is adopted, in which one coding tree unit is partitioned recursively into a prediction unit (PU) from 64 × 64 down to 4 × 4. This highly flexible partitioning provides more accurate prediction signals and thereby achieves better intradepth map compression efficiency. However, performing all depth map intraprediction modes for each PU level achieves high depth map intracoding efficiency, but it results in a substantial computational complexity increase. This paper proposes an amelioration of the previously proposed depth map PU size decision using an efficient homogeneity determination. The resulting experiences show that the proposed method can significantly save the computational complexity with a negligible loss of intracoding efficiency.
1. Introduction
In recent years, the digital 3D video and electronic device with a high resolution receive too much attention from the consumer and electronic applications [1]. To support the explosive increase in 3D video service, transmission, and large storage, the joint collaborative team on 3D video coding (JCT3V) [2] developed the emerging international standard for 3D video coding, namely, 3Dhigh efficiency video coding (3DHEVC) [3]. For displaying the 3D video on an autostereoscopic display and reducing the rate data significantly, a new alternative 3D video format for 3D scene representation is adopted by 3DHEVC, commonly known as multiview video plus depth (MVD) maps [4, 5]. In an MVD format, only a few 2D texture videos and their corresponding depth maps are multiplexed into 3D video bitstream. At the decoded side, the additional intermediate views suitable for enabling the autostereoscopic view can be easily created by the synthesizing process using depth imagebased rendering (DIBR) [6].
In the 3D video coding standard, 3DHEVC uses a highly flexible coding unit recursive partition called a coding tree unit (CTU) [7], where each CTU is further split to create a quadtree structure based on new unit types: coding unit and prediction unit [8]. It is allowed to encode more efficiency in all depth map characteristics and then reduce the bandwidth. This highly flexible coding unit leads to a vast coding complexity increase caused by the extensive search for the best size using ratedistortion optimization cost [9]. Hence, it is imperative to propose an efficient and fast 3D video coding algorithm to decrease this coding complexity.
Recently, some research studies have been developed to accelerate the size decision process in 3DHEVC intradepth map coding [10–17] and in HEVC/VVC [18, 19]. In [10], the proposed method used a specific feature called a corner point to accelerate the quadtree intradecision. In [11], the authors propose a strategy that speeds up the depth map intraprediction by ignoring the small CU size using a threshold variance value. Chen et al. [12] develop an efficient method based on the gradient’s sum to decide whether the CU must skip unnecessary ratedistortion optimization cost of the smaller partition sizes. The proposed algorithm in [13] exploits the correlation between hierarchical CU/PU and then reduces the number of recursive ratedistortion optimization costs in the depth map intraprediction. Chiang et al. [14] develop a low complexity CU size prediction. The smaller CU sizes are not performed if some conditions are satisfied, including the oversplitting of the collocated texture CU. The authors in [15] exploit the statistical data analysis to create fast CU/PU size decisions to reduce the complexity of depth map intracoding. In our previously proposed algorithm [16], the depth map intrasize decision is established, including thresholds calculated based on an efficient clustering algorithm and tensor features. In [17], a low complexity intramode selection algorithm is proposed to reduce complexity of depth intraprediction in both intraframes and interframes. In [18], a novel fast QTMT partition decision framework is developed, which can determine the partition decision on the both QT and multitype tree with a novel cascade decision structure. In [19], the authors propose a fast algorithm for VVC from two aspects of mode selection and prediction terminating to reduce coding complexity.
This paper proposed an efficient size intramodel decision as an amelioration of our previous work [16] using an efficient clustering algorithm, namely, automatic merging possibilistic clustering method [20] and new features. The proposed method decreases the synthesized views efficiency losses while conserving the same coding time savings. For this goal, we preserve the same previous proposed algorithm with different features.
The remainder of this paper is as follows: Section 2 presents an overview of the size coding tools in 3DHEVC depth map intracoding. Section 3 gives a brief introduction of the AMPCM and feature extraction. Then, the statistical analysis and the proposed algorithm are discussed in Section 4. The resulting experiences are presented and discussed in Section 5. The conclusion of this paper is presented in Section 6.
2. Overview of 3DHEVC Depth Map Intraprediction
In this section, we give a description of the highly flexible coding structure followed by the intraprediction modes in 3DHEVC intracoding.
2.1. Highly Flexible Coding Units
To achieve the highest 3DHEVC depth map intraprediction efficiency, the 3DHEVC video encoder exploits spatial correlation inside the depth map views based on highly flexible coding structure units, which are described as follows: each depth map view is partitioned into a coding tree unit (CTU) [21], and each CTU is performed using three basic units, namely, coding unit (CU), prediction unit (PU), and transform unit (TU). Each one of these units has a different surly role. The CU is the basic coding unit with a square size from 64 × 64 recursively subdivided down to 8 × 8. It consists of one luma and two chromas [22], and it is further split into PUs (see Figure 1) and TUs.
The prediction unit is defined by a subdivision of the coding unit where the same intraprediction mode is applied [23]. The PU is used for carrying the information associated with the intraprediction process, in which each PU supports two different splits, 2N × 2N corresponds to 64 × 64, 32 × 32, 16 × 16, and 8 × 8, and N × N corresponds to 4 × 4 [24], as depicted in Figure 1.
The transform unit is formed by the difference in pixels between block and block . It shares the same procedures of the ratedistortion optimization cost. Each transform unit allows the sizes from a maximum of 32 × 32 to a minimum of 4 × 4. The multiple TU split forms another quadtree called residual quadtree (RQT). Table 1 presents the supported sizes for the CU, PU, and TU, respectively.
2.2. 3DHEVC Depth Map Intraprediction Tools
The 3DHEVC depth map intraprediction exploits the previously coded block to remove the spatial redundancies [25, 26] where each depth map PU is evaluated by rough mode decision [27]. In the rough mode decision, 35 modes including DC/Planar and 33 angular directions (see Figure 2 (top)) are performed based on the sum of absolute transformed difference (SATD) [28]. The only subset of the modes having the lowest SATDs is selected to be added into a full ratedistortion list. Next, most probable modes are performed and three intramodes derived from the left and top of already coded neighboring PUs are added to the full ratedistortion. Finally, the depth modeling mode is accomplished by approximating a model pattern that partitions the depth map PU into two nonrectangular regions P_{1} and P_{2} according to two partition types: wedgelet partition (DMM_{1}) and contour partition (DMM_{4}) as shown in Figure 2 (bottom). The result of depth modeling modes ((DMM_{1}) and (DMM_{2})) is added to the full ratedistortion list only if some conditions are satisfied [29]. Figure 3 illustrates the depth map intraprediction process in 3DHEVC.
(a)
(b)
In 3DHEVC, a single coding tree unit comprises 341 PUs, in which the sizes vary from 64 × 64 down to 4 × 4. For each PU, rough mode decision, most probable modes, and depth modeling modes are evaluated to select the best depth map intraprediction mode. Table 2 presents the detailed depth map intraprediction modes for a single coding tree unit.
3. Clustering Algorithm and Feature Extraction
Clustering is a method that regroups the data with similar characteristics in the same cluster. In the fuzzy clustering algorithms, in [20] an efficient clustering algorithm is proposed, namely, automatic merging possibilistic clustering method (AMPCM). The AMPCM overcame the weakness of the previous fuzzy clustering systems like the fuzzy Cmean [30] and possibilistic Cmean [31], and it is very robust in a noisy environment and resolves the parameter and initialization problems.
The purpose of this work is to decrease the high complexity selection of the best size in the depth map intraprediction while preserving the ratedistortion performance. In the CTU, the selection of the best size is mostly related to the depth map content. This relationship is well illustrated in Figure 4, in which the homogeneous CTU is often coded with large sizes and the complex CTU is coded with small sizes. However, computing the features that can describe the homogeneous depth map CTU can be used to predict the suitable sizes to be evaluated. In this paper, we use features different from those used in our previous contribution [16]. Hence, we select the amplitude of simplified mass center (ASMCV) and the variance [32] as the proposed homogeneous features. Let L(i, j) be the luminance value of the pixel at (i, j) and the CTU, the ASMCV, and the variance are, respectively, defined by equations (1) and (2).
Equation (3) presents the amplitude of the feature vector (ASMCV and Var). Theoretically, the amplitude of the computed features is correlated to the CTU sizes prediction in the 3DHEVC depth map intraprediction, where if the amplitude is high, then the small sizes are likely to be selected. On another side, if the amplitude is low, then the larger sizes are likely to be selected. Figure 5 presents the histogram distribution size of the depth map CTU selected by the anchor 3DHEVC according to the amplitude of the homogeneous features. In Figure 5, the case where the amplitude is distributed between 0 and 1000 (low amplitude), more than 97% of the CTU are coded with 64 × 64 and the probability to select the other sizes are almost insignificant. In another case, where the amplitude increases, the probability that the CTU will be coded with small sizes increases and the probability of those with large sizes decreases, which justifies our theory.
(a)
(b)
(c)
4. Clustering Data Analysis and Proposed Size Prediction in Depth Map Intracoding
For an efficient data clustering result, the clustering method input data must consider all the possible contents of the depth map. The input data for the AMPCM were regrouped from a set of depth map sequences, which are “LoveBrid1” with resolution 1024 × 768, “Akko&Kayo” and “Rena” with resolution 640 × 480 and “Champagne_tower,” “Dog,” and “Pantomime” with resolution 1280 × 960. The primary dataset contains CTUs of the first frame from every 24th frames of all depth map sequences, that is, a total of 53,521 CTUs. Taking into consideration all possible homogeneous and complex depth map contents, we compute for all 53,521 CTUs in ASMCV equation (1) and variance equation (2) and sorted them according to equation (3). Next, we select the input AMPCM clustering data as the feature vectors by a frequency of 13, a total of 4117 vectors, which present only 7.7% of the primary dataset. In the output of the AMPCM, we get exactly 1117 cluster centers sorted according to their amplitude.
The purpose of this article is to create a size decision model whose goal is to estimate the best sizes for a given CTU of the depth map and then decrease the 3DHEVC depth map intraprediction complexity. Using the CTU of the six sequences, a total of 1,254,240 vectors, where the ASMCV and variance compose each vector and the size flag, we start combining all 1,254,240 vectors and the output of the AMPPCM (1117 cluster centers) by distributing each vector to its cluster center that minimizes the Euclidean distance. According to the percentage distribution of the CTU sizes in each cluster, we determine which sizes are dominant. If the percentage is more significant than 10%, it will be considered, or else it will be ignored. Regrouping the clusters that have a similar dominant size, we get five groups. This process is executed for all depth map quantization parameters. Table 3 presents the results of this distribution process.
Using the results presented in Table 3, a size decision model for depth map intraprediction in 3DHEVC is defined based on the dominant size distributions. Hence, Table 4 presents the size prediction model according to four thresholds , . The thresholds are determined using the amplitude of the regrouped center clusters, where they are computed as follows: as mentioned previously, the center clusters are sorted according to their amplitude (equation (3)), so let be , the regrouping clusters centers, we set = (MinAmp + , where and return, respectively, the minimum and the maximum amplitude of the center clusters regrouped in a specific . Table 5 presents the respective threshold values used in the proposed size model decision for depth map intracoding according to the depth map quantization parameter. The algorithm of the proposed size model decision is presented in algorithm 1.

5. Resulting Experiences
This section presents the resulting experiences of the size decision model performance developed for the depth map intraprediction in 3DHEVC compared with the stateoftheart size prediction methods [10–16]. The anchor HTM16.2 is used as the reference 3D video coding software [33]. The experimental configurations are based on JCT3VG1100 documents [34], where the experiments are conducted on eight sequences “Shark,” “UndoDancer,” “PoznanStreet,” “PoznanHall2,” “GTFly,” “Newspaper1,” “Kendo,” and “Balloons” in all intraconfiguration using three textures and their associated depth map. The quantization parameters are (25, 34), (30, 39), (35, 42), and (40, 45) for the texture and depth map. The depth map efficiency is performed using Bjontegaard metrics [35] and time saving.
Table 6 presents the resulting experiences of the proposed size decision model and the state of the art [10–16], where BDBR shows the Bjontegaard bitrate performance using the YPSNR of the six synthesized views over the total bitrate. The time reduction (TR) presents the time saving achieved by the state of the art and the proposed size decision model. From Table 6, the proposed method can decrease the computational complexity for eight sequences compared to the reference software. The time saving is often high for sequences with low motion like “PoznanHall2” (51.6%) and low for complex sequences like “Newspaper1” (32.3%). In conclusion, the proposed size decision model skips unnecessary size partition evaluation for each CTU.
Figure 6 illustrates the time saving and the ratedistortion curves of the proposed size decision model in all intraconfiguration in comparison with the reference software HTM16.2 for three sequences “GTFly,” “Balloons,” and “Newspaper1.” As presented in Figure 6, the proposed size decision model ratedistortion curve is analogous to the anchor HTM16.2 from high to low bitrate; the proposed method improves the time significantly, saving for all quantization parameters.
(a)
(b)
(c)
(d)
(e)
(f)
To perform more on our proposed method and compare it with the stateoftheart size prediction methods, we visualize the BDBRTR as a twodimensional figure, in which we present in Figure 7, the TR at the cost of BDBR for the proposed method and the stateoftheart size prediction methods for four sequences “Kendo,” “GTFly,” “PoznanStreet,” and “UndoDancer.” As illustrated in Figure 7, it is clear that our proposed method has a very good balance between time saving and BDBR compared to all stateoftheart methods, which makes our proposed method more efficient.
(a)
(b)
(c)
(d)
Evaluating the subjective visual quality, Figure 8 presents the renderer views for “UndoDancer” and “PonznanStreet” sequences utilizing the proposed algorithm in comparison with the renderer views using the anchor HTM16.2. From Figure 8, we observed that the proposed size decision model preserves almost the same synthesized view quality visual compared to the 3D video coding reference software, especially around the object boundaries where an occlusion zone frequently arises in the synthesized views. The rest of the sequences present the same subjective visual quality.
(a)
(b)
Comparing the proposed size decision model developed for depth map intraprediction with the stateoftheart size prediction methods that also develop solutions to decrease the size prediction in the depth map intraprediction [10–16], the proposed size decision model reaches a significant encoded time reduction with a negligible BDBR increase.
Comparing our proposed method to the stateoftheart size prediction methods, our algorithm achieves almost the same time saving to the one presented in [10], in which the technique developed in [10] yields 41.0% in the time saving and a 0.44% increase in BDBR in comparison with HTM12.1 as the reference software. In contrast, our method is implemented in HTM16.2, including all 3D coding tools from HTM12.1.
In [11], the proposed algorithm reaches 30.2% in the encoding time saving with a 0.29% increase in BDBR compared with HTM13.0. Thus, the algorithm presented in [11] evaluates only 100 frames for each sequence. Though our algorithm is evaluated in full frames for each sequence, it is not feasible to make a rational comparison with the proposed method. To overcome this issue, additional experiments on the first 100 frames using our proposed algorithm are conducted. Our proposed method achieves 38.4% in the encoding time saving with a 0.19% increase in BDBR compared with HTM16.2. Comparing our proposed method to the algorithm proposed in [11], our proposed algorithm surpasses the method proposed in [11] by a factor of 1.3 and 0.6, respectively, for the encoding time saving and the BDBR.
Concerning the algorithm developed in [12], the method proposed in this paper reduces the computational complexity significantly than that reported in [12] by 21.9%. Regarding the encoding performance, the technique in [12] preserves almost the same reconstructed synthesized view efficiency compared with HTM16.1.
Concerning the method developed by Kim et al. in [13], the proposed algorithm reaches a lower encoding time reduction for all eight sequences in comparison with HTM12.0. Compared with the proposed size decision model, our algorithm achieves a higher time reduction by a factor of 1.6. On the other hand, the proposed method in [13] surpasses our algorithm in the BDBR where it increased by 0.18%.
Taking into consideration the method in [14], the algorithm in this paper surpasses the results presented in [14] in terms of time reduction by a factor of 1.3, where the method in [14] achieves 31.7% in time reduction and a 0.16% increase in BDBR when compared to HTM15.0.
Compared to the method developed by Gang et al. in [15], our proposed size decision model reaches almost the same time reduction as [15], which achieves 47.8% in time reduction in comparison with HTM13.0. Taking into consideration the coding efficiency, the work in [15] presents a 0.84% increase in BDBR when compared to HTM13.0. Since the method in [15] does not give all eight sequences performance, there is no fear of making a reasonable comparison with the present method.
Taking into consideration our previous size decision model developed in [16], the size decision model proposed in this paper reaches almost the same time reduction as [16] by a factor of 0.9. Regarding the coding efficiency BDBR, the proposed size decision presented in this paper outperforms our previous size decision model by a factor of 0.5. Our previous size decision model results in a 0.44% increase, which is high compared to the current size decision model.
6. Conclusion
In this paper, a computation complexity reduction of the size decision algorithm for the depth map intraprediction in 3DHEVC has been developed. In brief, the proposed approach applies an efficient clustering algorithm, namely, automatic merging possibilistic clustering method based on a set of selective data. Using the clustering algorithm and data analysis output, we create our proposed size decision model for the depth map intraprediction. Finally, this model predicts the size flags and then accelerates the 3DHEVC depth map intracoding.
Taking the result experience, the proposed size decision model achieves 40.2% in time reduction with a negligible increase in BDBR by 0.21% in comparison with the anchor HTM16.2. Compared to the stateoftheart size prediction methods, the results presented in this paper confirm that the proposed size decision model can significantly reduce the depth map coding time while preserving almost the same ratedistortion as the original 3D video coding standard.
Data Availability
The data used in this study are described in the following reference: “K. Muller, A. Vetro, Joint Collaborative Team on 3D Video Coding Extension Development Document JCT3VG1100, 7th Meeting, San Jose, US (2014).”
Conflicts of Interest
The authors declare that they have no conflicts of interest.