The joint collaborative team on video coding (JCT-VC) is developing the next-generation video coding standard which is called high efficiency video coding (HEVC). In the HEVC, there are three units in block structure: coding unit (CU), prediction unit (PU), and transform unit (TU). The CU is the basic unit of region splitting like macroblock (MB). Each CU performs recursive splitting into four blocks with equal size, starting from the tree block. In this paper, we propose a fast CU depth decision algorithm for HEVC technology to reduce its computational complexity. In  PU, the proposed method compares the rate-distortion (RD) cost and determines the depth using the compared information. Moreover, in order to speed up the encoding time, the efficient merge SKIP detection method is developed additionally based on the contextual mode information of neighboring CUs. Experimental result shows that the proposed algorithm achieves the average time-saving factor of 44.84% in the random access (RA) at Main profile configuration with the HEVC test model (HM) 10.0 reference software. Compared to HM 10.0 encoder, a small BD-bitrate loss of 0.17% is also observed without significant loss of image quality.

1. Introduction

Recently, high efficiency video coding (HEVC) has been developed as a new video coding standard mainly focusing on the coding of ultrahigh definition (UHD) videos as the high resolution and high quality videos are getting more popular. HEVC is a draft video compression standard, a successor to H.264/MPEG-4 AVC [1]. Market and user demand for advanced video services with high quality standards is increasing. UDH has a resolution of 4 K to 8 K (such as and ), while HD is . The data rate for UHD is up to 4~16 times more than for HD video. This kind of data rate increase will put additional pressure on all types of networks and services. HEVC emerges to meet the new demand in UHD TV which has large resolution, high frame rates, and limited bandwidth. In particular, the variable block size prediction and compensation in H.264/AVC are a key factor which contributes to significant bit reduction with the same image quality. To improve performance of video coding efficiency, the ITU-T video coding experts group (VCEG) and the ISO/IEC moving picture experts group (MPEG) recently formed the joint collaborative team on video coding (JCT-VC) [2]. The JCT-VC is finalizing the next-generation video coding standard, called high efficiency video coding.

Usually, adoption of large block types in video compression enables us to compress high resolution 3D video sequences effectively and it helps us to service video sequences on mobile and network economically [35].

The video encoding and decoding processes in HEVC are composed of three units: (a) a coding unit (CU) for the root of the transform quadtree, as well as a prediction mode for the INTER/SKIP/INTRA prediction, (b) a prediction unit (PU) for coding the mode decision, including motion estimation (ME) and rate-distortion optimization, and (c) a transform unit (TU) for transform coding and entropy coding. Initially, a frame is divided into a sequence of the nonoverlapped largest coding units, called a coding tree unit (CTU). A CTU can be recursively divided into smaller coding units (CU) and made flexible using quadtree partitioning, which is called a coding tree block (CTB). It is clear that the new CTB structure with larger coding block size in HEVC greatly increases the computational complexity to achieve the high efficiency of coding gain in HEVC standard, comparing to the H.264/AVC video standard.

In Figure 1, it is shown that a size of CTU is divided into smaller dimensions of CUs. A CTU has the dimension of which can be decomposed into four  CUs and further each  CU can be divided into 4 CUs of dimension. This decomposition process can go up to dimension of CU. It means the size of is the lowest dimension that a CU can have. Moreover, for the different combinations of CU structures, different CTBs are generated for a single CTU. For each CTB rate-distortion (RD) cost value is calculated.

To reduce the computational complexity of the HEVC, there are several algorithms with high speed-up factor keeping negligible losses of BD-bitrate. In [6], an early termination scheme called coded block flags (CBFs) fast mode (CFM) decision used the CBF of luminance and 6 chrominance and in order to reduce the complexity of the intermode decision. If the CBFs of luminance and chrominance are both zeros, the search process for the next PU modes in the current depth level is not performed.

Kiho and Jang [7] proposed a tree-pruning algorithm that makes an early determination of CU. To reduce computational complexity, it uses mode information of the current CU. When the best PU mode of the current CU selects the SKIP mode, the current CU is not divided into sub-CUs in the subdepth level of the current CU. This process was adopted in HEVC test model 4.0 reference software [8].

Zhang et al. [9] proposed algorithm by reducing the depth search range. Based on the depth information correlation between spatiotemporal adjacent CUTs and the current CTU, some depth can be adaptively excluded from the depth search process in advance. Similar to [6, 7], Yang et al. [10] proposed an early detection algorithm for the SKIP mode. Their motivation was early determination of skip conditions from fast method decision schemes in the H.264/AVC [1120]. They utilized differential motion vectors (DMV) and coded block flags (CBFs) of the inter mode as skip conditions. In [21], an adaptive coding unit selection method has been proposed based on early SKIP detection technique.

Yang et al. [22] have proposed the advanced motion vector prediction (AMVP) mode skip decision method utilizing the correlated tendency of PU mode. This method checks whether the best prediction mode of the inter is skip mode or not. If it is the skip mode, the rest of subprediction (i.e., , , , and asymmetric subpartitions) units omit the AMVP process.

Vanne et al. [23] proposed a method based on three techniques: symmetric motion partition (SMP) mode decision, asymmetric motion partition (AMP) mode decision, and quantization parameter (QP) specific mode decision. This algorithm examined the rate-distortion-complexity (RDC) characteristics of the HEVC interprediction and used the obtained RDC results to optimize the mode decision and associated block partition.

To reduce the computational complexity of HEVC encoding system, we propose an effective CU selection algorithm for HEVC based on RD cost of . In the proposed algorithm, we use also early merge SKIP mode detection technique based on correlation of neighboring CUs including depth level.

The rest of the paper is organized as follows. In Section 2, we present the suggested algorithm including merge SKIP detection. Simulation results and some discussion will be given in Section 3. Concluding comments are given in Section 4.

2. Proposed Fast Intermode Prediction Algorithm

2.1. Adaptive CU Depth Decision Method

In order to enhance the encoding speed, the depth information of CTU is found by using  PU information. Table 1 indicates the probability that  PU is determined as the best mode. In this table, you can see the probability of PU determined is very high. It shows almost 90% in Class A and 79% in Class B case. This means that  PU decision has large portion. From this result, if we find a fast scheme to decide it early, then the overall consumed time for encoding may be decreased effectively.

In this study, the experiment environment to calculate  PU selection probability was set as the sequences from Class A for Class D and 50 frames for each sequence. Also, the selection probability has been computed by averaging of various QP values (22, 27, 32, 37).

As shown in Table 1, the maximum 91.2% and minimum 73.7% of  PU portion are shown. By using the depth information selected from  PU, we are able to predict the proper CU in the relevant depth. When the is given as input in the current frame, depth information of each CU is calculated in CTU [2]. The depth is defined as CTU in the quadtree structure that represents the split of the CU.

After finding the residual with the best motion vector obtained through PU, finally, the RD is calculated through the full RQT. By using this depth information, the depth of CTU is determined through the RD cost competition. The current depth () with the best RD cost will be determined as the current depth level. If the RD cost of depth () is smaller than that of the current depth (), the calculation to find the minimum RD cost is repeated recursively for upper depth. After that, the best subpartitioned mode is determined by calculating and comparing the RD cost from the depths , , and to go to the detailed partition search.

Figure 2 illustrates the overall procedure of the proposed CU depth decision. The proposed algorithm is performed as follows. Firstly, as input, if the  PU size is given at the current depth (), our algorithm calculates the RD cost from the given  PU size. After that, the best RD cost of the current depth () is compared with the accumulated RD cost of next depth (). The accumulated RD cost can be calculated from the previous encoded modes (PUs). If the given condition is satisfied, the current CU depth is selected as the best depth level. Otherwise, go to next CU depth () and perform in the same manner.

In the flowchart, after selecting the best CU depth, the detailed partition mode is determined by calculating and comparing the RD cost from the depths , , and as the detailed partition search.

2.2. Early Merge SKIP Decision Method

We also develop an early merge SKIP decision to increase the encoding. According to the SKIP of HEVC standard, the merge SKIP has been adopted [2] for providing more coding efficiency. The proposed merge SKIP detection method utilizes the information of neighboring blocks.

Figure 2 shows the position of adjacent CUs relative to the current CU. Neighboring CUs such as CU1, CU2, and CU3 (above-left, above, and left CU from the current CU) have a high degree of spatial correlation. CU4 and CU5 are used as CUs for temporal correlations, and CU6 and CU7 are used for depth correlations. Each encoding unit has the information of the flag generated in , in . After checking the merge flags, merge SKIP mode is decided.

When encoding the current block, the spatial and temporal neighboring blocks can provide useful information, because they have much similarity in terms of texture and motion. In HEVC, depth concept has been introduced as described in Section 2. So, the optimal block mode of the current block can also be deduced from the neighboring blocks which is composed of the spatial, temporal, and depth relationship.

In Figure 4, the proposed early CU SKIP decision method is displayed. When is the current block in usual HEVC coding, the merge process of motion information is performed to achieve more coding gain. In our previous work [24], an early merge skip decision method has been proposed using only block. In addition to this, we develop the extended version based on the RD cost and comparing with CBF from and partition types.

In the merge process, the proposed method checks on the mode types of spatial, temporal, and depth neighboring blocks (as shown in Figure 3) in the first. If all modes of neighboring blocks are SKIPs, then SKIP is selected as the best mode for merge process. The remaining mode search is omitted directly. Otherwise, a partition mode is determined by calculating the RD cost and comparing with CBF (coded block flag) from and partition types. With the merge SKIP detection technique, the complexity of the motion estimation can be more reduced while keeping similar image quality.

3. Experimental Results and Discussion

To verify the performance of the proposed algorithm, the experiment was performed with various standard video sequences. The proposed algorithm was implemented on HM 10.0 (HEVC reference software) [25]. Test condition was random access using RA-Main. Standard sequences with 100 frames were used from Classes A to D [26] with various QP values (22, 27, 32, 37). All frames of each sequence are selected from the 0th frame frame to the 99th frame like other works. Sequences of each class (from A to D) are illustrated in Table 2.

To evaluate performance, we defined the measurement of , , and as is the total bit difference between the compared methods (see (1)). represents the difference in quality variation between the average PSNR of the proposed method and the corresponding values of an anchor (HEVC reference software) (see (2)).

is a complexity comparison factor to indicate the amount of total encoding time saving as follows: where Timeproposed and TimeAnchor represent the time of the proposed method and the original method, respectively.

The results in Table 3 show the performance of our algorithm when comparing it to the original HM 10.0 encoder. The proposed algorithm achieves 44.84% of time-saving factor on average with only 0.04 (dB) loss in PSNR and a 0.17% increment in total bits. For Class A, about 46% of time-saving factor was observed with very small loss of quality. For smaller size of image (Class D), the speed-up gain is slightly decreased, but the quality loss is still negligible.

The proposed algorithm increases the speed of the HEVC encoding system up to 48.54% in the Traffic sequence, compared to the full mode search. Compared to Li’s method, the proposed algorithm achieved speed-up gain of up to 13% with a smaller bit increment for the ICE sequence. By using the proposed algorithm, we can see that the average speed-up gain of over 44% was obtained comparing to the full mode search while suffering less quality loss and negligible bit rate increment.

In terms of bit rate, the proposed method achieved small increment of BDBR bit number (almost 0.17%). For the StramLocomotive sequence, bit increment of up to 0.23% was obtained. From this result, the proposed algorithm is efficient to make real-time encoding system.

Regarding the quality of image, the proposed method gave very small loss of quality. But very large loss of image quality was observed from the original HM encoder (0.04 (dB) in average value). In the StramLocomotive and Cactus sequences, maximum loss of quality was given by 0.07 (dB) with the proposed method. This means that the proposed algorithm can provide credible quality with large speed-up factor.

Table 4 shows results for the performance comparison between Yang’s algorithm [22] and the proposed algorithm. Both algorithms have been implemented on HM 10.0 with same condition. The Yang’s algorithm achieves 20.92% of time-saving factor on average with only 0.02 (dB) loss in and 0.15% of decrement in total bits. The proposed algorithm gave about 45% of time-saving factor while keeping small loss of quality. When compared with the Yang’s algorithm, bit loss of the proposed algorithm is slightly high but it is negligible. The speed-up factor was faster by amount of over 24% than Yang’s method.

For all classes, the proposed method achieves over 20% of speed-up factor comparing to Yang’s method. In terms of quality, just 0.04 (dB) of the quality loss was observed on the average value. These results mean that the proposed algorithm is credible in terms of the speed-up factor and the quality loss.

Figure 5 shows the rate-distortion (RD) performance [27]. The performance of the proposed method is very similar to the HM 10.0 software. There is negligible loss of quality. For the Traffic sequence and BasketballDrive sequences, the proposed method gives almost similar performance to the original HM encoder. This means that the suggested algorithm can keep a reliable video quality with speeding up the HM encoder by about 44.84%. Up to 15 Mbps, we can see that the performance of the proposed algorithm is credible in terms of bitrate and image quality.

To check on the subjective evaluation of the original image and proposed algorithm, the decoded picture has been shown in Figures 6 and 7. In terms of visual quality, we can see there is almost no difference between two images. From this result, we can deduce that the proposed algorithm is able to keep very high quality for very large scale video service (Full or Ultra HD video).

4. Conclusions

In this paper, we have proposed a fast CU depth decision algorithm based on the RD cost comparison for high efficiency video coding (HEVC) technology to reduce its computational complexity. In addition, merge SKIP extraction method was developed and integrated with CU depth decision algorithm. Experimental result shows that the proposed algorithm achieves the average time-saving factor of 44.84% in the random access (RA) at Main profile configuration with HM 10.0 reference software while keeping small loss of quality. From experimental results, we can make a conclusion that the suggested algorithm can be a useful way to make real-time video encoding system for large scale multimedia service.

Conflict of Interests

The authors declare that there is no conflict of interests.


This work was supported by the National Research Foundation of Korea Grant funded by the Korean Government (MEST) (NRF-2010-0024786).