Abstract

Three-dimensional extension of the high efficiency video coding (3D-HEVC) is an emerging international video compression standard for multiview video system applications. Similar to HEVC, a computationally expensive mode decision is performed using all depth levels and prediction modes to select the least rate-distortion (RD) cost for each coding unit (CU). In addition, new tools and intercomponent prediction techniques have been introduced to 3D-HEVC for improving the compression efficiency of the multiview texture videos. These techniques, despite achieving the highest texture video coding efficiency, involve extremely high-complex procedures, thus limiting 3D-HEVC encoders in practical applications. In this paper, a fast texture video coding method based on motion homogeneity is proposed to reduce 3D-HEVC computational complexity. Because the multiview texture videos instantly represent the same scene at the same time (considering that the optimal CU depth level and prediction modes are highly multiview content dependent), it is not efficient to use all depth levels and prediction modes in 3D-HEVC. The motion homogeneity model of a CU is first studied according to the motion vectors and prediction modes from the corresponding CUs. Based on this model, we present three efficient texture video coding approaches, such as the fast depth level range determination, early SKIP/Merge mode decision, and adaptive motion search range adjustment. Experimental results demonstrate that the proposed overall method can save 56.6% encoding time with only trivial coding efficiency degradation.

1. Introduction

With the advancement of three-dimensional televisions (3DTV) and free viewpoint TV (FTV) technology, 3D video has received much attention in the consumer electronics market. Multiview video (MVV) has been developed for 3D video applications. MVV of the same scene is captured via multiple synchronized cameras, offering a better experience than previous 2D videos because users can choose different viewing angles interactively. However, an MVV system requires a huge amount of information associated with the multiview content and a very high transmission rate, which result in high computational complexity of the transmission coding. To address this issue, multiview video coding (MVC) is developed by the VCEG and MPEG [1]. MVC enables inter-view prediction to improve MVV system compression capability, as well as supporting ordinary spatial and temporal prediction. By doing this, a reduction in MVV bitrate of the MVC can be achieved without loss of compression quality.

The recent high efficiency video coding (HEVC) achieves a higher rate-distortion (RD) efficiency than H.264/AVC (50% bitrate savings for equivalent perceptual video quality) by utilizing new interprediction and intraprediction modes [2]. Following this, the next generation 3D video coding based on HEVC (3D-HEVC) standard has more advanced compression capability and supports for synthesis of additional perspective views [3]. In 3D-HEVC, when coding the dependent texture video views and the depth maps, modified HEVC coders which include new tools and techniques that make use of coded data inside the same access unit are used [4]. For texture coding, disparity-compensated prediction, inter-view motion prediction, advanced residual prediction, illumination compensation, and view synthesis prediction (VSP) are presented to code-dependent texture video [5]. Similar to HEVC, a computationally expensive mode decision in 3D-HEVC is executed to select the one with the least rate distortion cost from all potential prediction modes by employing Lagrange multiplier. Therefore, it is very desirable to develop a fast texture video coding algorithm that can reduce the complexity of 3D-HEVC with only a tiny loss of 3D video quality.

The rest of the paper is structured as follows: Section 2 reviews the related work and Section 3 presents the motion homogeneity analysis. The proposed low-complexity texture video method is detailed in Section 4. Simulation results and conclusions are provided in Sections 5 and 6, respectively.

In this part, the many prior efforts on MVC encoder accelerations are mentioned. These papers can be divided into methods with mode reduction and methods with fast disparity estimation (DE)/motion estimation (ME).

Category 1: Mode reduction algorithms are proposed in literatures [69]. A content-aware prediction algorithm is designed in [6] to alleviate the coding burden of MVC. Because video contents are very relevant, unnecessary coding computation will be reduced. A fast intermode decision is presented in [7] to alleviate encoding burden by employing textural segmentation and correlations. However, the characteristics of the intermode selection precedence in 3D-HEVC are unlike those in MVC. An early termination model is proposed in [8] for MVC fast mode decision, which can adaptively adjust the RD cost threshold based on the video content and motion properties. It can reduce 79.57% to 89.21% computation burdens with tiny coding degradation. An algorithm based on optimal stop theory is studied in [9] to realize a good balance between accelerating time and decision quantity in MVC. Considering both predicted mode probability and estimated coding time of all coding modes, a hybrid model is employed relying on the multiview coding mode characteristics. To reduce the computational complexity of multiview depth map coding, a fast mode decision algorithm based on intercomponent and inter-view correlations is introduced in [10]. However, these mode reduction methods cannot be used in the 3D-HEVC mode decision, in which the number of prediction modes for each prediction unit (PU) is increased in 3D-HEVC texture video coding (for example, intramodes have been raised to 35 from 9 in MVC). Moreover, the key factors in these methods of MVC are not considered and cannot work well when encountering the increased number of PU prediction in 3D-HEVC.

Category 2: Fast DE and ME methods are in literatures [1114]. A view-adaptive algorithm for DE and ME is introduced in [11], alleviating the computational burdens of MVC. In [12], a reduplicative search for motion and parallax estimation algorithm is presented to obtain motion and parallax vectors simultaneously by utilizing the stereo-motion consistency constraint. There exists an algorithm in [13], cutting down candidates for MVC using two methods: fast ME/DE and the fast mode decision. Adaptive DE and mode size decision in [14] employ a depth map to alleviate the coding burden of multiview texture video. However, those algorithms cannot accelerate the coding process efficiently for the new HEVC-based 3D video.

Recently, more traditional algorithms to reduce computational complexities of mode decision [1521] have been introduced for the HEVC encoder to speed up selecting dominant candidate prediction modes. Fast intramode decision method [15] employs brink information of the current PU to pick out candidate prediction directions. This method calculates the current PU pixels and defines a lessened set of intramodes to be tested predominant based on this orientation. A fast CU size and mode decision method investigated in [16] alleviates the computational burdens. By fully exploiting the RD cost and prediction mode connections in varying depth levels or spatially nearby CUs, this method can cut down some modes. A fast mode selection algorithm is proposed in [17] based on linear programming to allocate computational complexities among the frames, which can determine complexity factors for all frames and CUs. A novel zero block (ZB) detection algorithm presented in [18] explores fake ZBs while trying to alleviate the augments of the computational burdens for HEVC. A fast transform unit (TU) size decision method proposed in [19] cuts down the candidate transform sizes of HEVC. It reports 30%–46% computational complexity reduction with tiny degradation. An adaptive fractional-pixel ME-skipped scheme has been proposed in [20] for low-complexity HEVC, which is based on the characteristics of the variable-size PUs and the video content partition relationship among variable-size PUs. A fast CU size decision algorithm based on temporal and spatial correlation is proposed in our previous work [21] to reduce the computation complexity of the HEVC encoder. The aforementioned fast mode decision methods play an irreplaceable part in HEVC encoders accelerating coding process with tiny loss of quality degradation. However, there is still some room for further improvement in the mode decision process of 3D-HEVC because the prediction structures which involve inter-view motion prediction and disparity-compensated prediction are different from that of HEVC. They can be further adopted for complexity reduction and complex mode decisions.

To this end, a few approaches are published for 3D-HEVC encoders to accelerate texture video coding. The adaptive search range adjustment and early termination mode search are presented in [22] to reduce the 3D-HEVC encoder complexity by using the correlations between the dependent view and the base view. A fast mode decision method is proposed in [23] based on a Bayesian classifier to predict the current CU mode by utilizing the information of already encoded neighboring CUs. An algorithm has been designed in [24], which aims at lessening the memory access without hampering coding efficiency. It reports up to 21% texture video encoding time reduction with 0.1% bitrate loss. In [25], a fast texture video coding scheme is presented to cut down the encoding time of 3D-HEVC. A fast algorithm developed in [26] is based on the early SKIP mode detection and the prediction size correlation-based mode decision to reduce 3D-HEVC encoding time for real-time applications, by jointly exploiting correlations among multiview texture video and depth map and the interlevel correlation in quadtree structure of HEVC. An early merge mode decision method is presented in [27], utilizing inter-view correlations to accelerate the dependent texture view coding process. It reports an average 47.1% complexity reduction with 0.1% bitrate loss for texture video coding. An efficient online learning-based complexity reduction scheme is employed in [28] for the coding process of the dependent texture views in 3D-HEVC encoders. In addition, this method can adaptively coordinate the motion search range and alleviate the burden of prediction mode search. However, this method needs to confirm threshold via training some texture video data online, emphasizing that threshold accuracy is important. A fast mode decision is proposed in [29] to reduce candidate modes of 3D-HEVC, where the inter-view and grayscale similarity correlation are jointly used. A motion and disparity vectors early determination algorithm is proposed in [30], which exploits the spatial and temporal motion vector of neighboring treeblocks to reduce 3D-HEVC computational complexity. Up to 33.0% complexity reduction is reported with insignificant RD performance loss for texture video coding. Fast coding algorithms are also developed in our previous work [31, 32] based on the coding information from depth map-texture video and inter-view correlations to alleviate the computational burden of 3D-HEVC encoders. The aforementioned methods play a good role in multiple texture video and achieve significant encoding time reduction. However, they only employ texture video-depth map, inter-view, and spatial-temporal correlations. As shown in the aforementioned research, the property of the texture video motion homogeneity is not efficiently utilized. It shows that coding time can no more be shortened. More time of each step in process of texture video coding should be cut down by jointly studying the property of the texture video motion homogeneity.

For conquering the aforementioned limitation, we propose a fast texture video relying on motion homogeneity classification to reduce 3D-HEVC computational complexity. Considering that the multiview texture videos’ motion characteristics are highly content dependent, it is not efficient to use a fixed motion search range and depth level range for the whole encoding process. Therefore, we can skip some specific motion search range and depth levels rarely used in 3D-HEVC mode decision. Extensive experimental results demonstrate that the proposed low-complexity texture video algorithm can significantly reduce the encoding time of 3D-HEVC with tiny loss of RD performance.

3. Motion Homogeneity Analysis

In natural 2D video sequences, there exists high connection among the neighboring treeblocks in spatial and temporal domains [26]. Furthermore, different from single-view video coding HEVC [33, 34], inter-view correlation is employed to reduce multiview texture video redundancy. The 3D-HEVC uses the variable-size prediction techniques of HEVC to exploit the spatial, temporal and inter-view correlation within successive frames. Thus, there exists a strong motion correlation between the neighboring treeblocks in spatial, temporal, and inter-view in multiview texture videos in 3D-HEVC. If the current treeblock and corresponding treeblocks (spatial, temporal, and inter-view correlation) belong to the same video object, then they have similar motion activities, and hence these can be defined as statistically homogeneous treeblocks.

On the basis of these observations, we propose to analyze the current treeblock motion vector using the motion information from the spatial-temporal and the previously coded view corresponding treeblocks. A set of motion predictors () is defined in equation (1) including three types of motion predictors:where denotes spatial motion predictors in the current texture view (including , , and in Figure 1), denotes temporal motion predictors in the previously coded frame (including , , , , and ) as the current texture treeblock , and denotes the inter-view motion predictor in the previously coded view (including , , , , and ) located at the same position as the in Figure 1.

To verify motion information correlation of the spatial-temporal and the previously coded view corresponding treeblocks, extensive simulations have been conducted on a set of video test sequences with various motion activities and spatial resolutions. Among these test sequences, “Kendo,” “Balloons,” and “Newspaper” are in 1024 × 768 resolution, while “Undo_Dancer,” “GT_Fly,” “Poznan_Street,” “Poznan_Hall2,” and “Shark” are in 1920 × 1088 resolution. Experimental conditions are set as follows: I-B-P view structure; QPs are 25, 30, 35, and 40, respectively; treeblock size is 64; depth level range is from 0 to 3; search range of motion estimation (ME) is 64; the sum of test frames is one hundred. Tables 13 show the correlations of the current treeblock among spatial-temporal and inter-view neighboring treeblocks. The coding information (Motion Vector (MV)) correlation degree between the current treeblock and spatial-temporal and inter-view neighboring treeblocks is defined as follows:where represents the coding information correlation degree, which ranges from 0 to 1, denotes the max depth value of current CU, indicates the depth level of current treeblock, denotes the current treeblock at the th depth level, denotes the temporal adjacent treeblocks of the current treeblock () at the same view, denotes the spatially neighboring treeblocks, and denotes the colocated treeblocks in the previously coded base view frames. The previously coded view adjacent treeblock (in predictors ), the spatially adjacent treeblock (in predictors ), and temporally adjacent treeblock (in predictors ) are described as in Figure 1, whose information are independent. There exist strong correlations in current texture video treeblock and the spatial-temporal or the previously coded view corresponding treeblocks.

Based on the motion predictor in equation (1), the motion information from the spatial-temporal and inter-view adjacent predictors is extracted to analyze ME characteristics of the current texture treeblock in 3D-HEVC. We build a novel criterion to identify the motion homogeneity degree of the current texture treeblock between the adjacent treeblocks. It is explained as follows: The motion vectors from block level covered by the current texture treeblock and its corresponding treeblocks (all spatial-temporal and inter-view adjacent treeblocks: M1, M2, …, M14 as in Figure 1) are employed to represent the motion homogeneity. Assuming that a texture treeblock located at the th row and th column is expressed as , the motion vectors of its covered blocks are denoted as , , . The motion homogeneities of the current texture treeblock in and directions are described separately:where and are the motion homogeneities of the and components and is the total number ( block) of blocks covered by the current texture treeblock and its adjacent treeblocks (all spatial-temporal and inter-view adjacent treeblocks : , , … , as in Figure 1). For our experiments we have used . is the motion-weight factor. . is ruled relying on the effect of adjacent treeblocks on the current texture treeblock. According to motion information connection (the current texture treeblock and its adjacent treeblocks ) in Tables 13, the motion-weight factor is summarized in Table 4. These values are set based on a wide range of experiments, and these values achieve good coding results on a variety of 3D sequences with different motion properties. Equations (3) and (4) are not only used to calculate the correlation between the current treeblock and the reference treeblocks. In fact, the motion correlation between the reference treeblock and the current treeblock is also calculated in the formula. Thus, the motion homogeneity category parameter is as follows:

Relying on motion homogeneity parameter , each treeblock is classified into two styles, statistically homogeneous treeblocks and complex motion treeblocks. If is the current texture treeblock, the criterion is defined as follows:where is the threshold factor. is employed to identify whether a treeblock is the part with statistically homogeneous or complex motion. When the motion homogeneity parameter is less than , the current texture treeblock is considered in a statistically homogeneous region; otherwise, the current texture treeblock is with complex motion region. Thus, the probability of static and slow motion is higher than that of complex motion in real-world 3D video. Hence, we focus on static and slow motion separately to save as much encoding time as possible. On the basis of the above discussion, we use the average motion homogeneity of each treeblock to determine threshold . The average motion homogeneity of each treeblock is calculated as follows:where and represent the total number of treeblocks in a row and in a column, respectively. Based on extensive experiments, is set to , which achieves a good and consistent performance on a variety of test sequences with different texture characteristics and motion activities and fixed for each treeblock QP level in 3D-HEVC encoders.

4. Proposed Low-Complexity Texture Video Coding Algorithm

4.1. Fast Depth Level Range Determination

3D-HEVC usually allows a maximum CU size equal to 64, and the depth level range is from 0 to 3. The CU depth has a fixed range for a whole video sequence in the 3D-HEVC reference software. In fact, the depth value of “0” occurs very frequently for static or homogeneous region coding. On the other hand, the depth value of “0” is rarely chosen for treeblocks with complex motion [26]. These results show that depth level range should be adaptively determined based on the motion homogeneity property of texture video treeblocks.

To achieve great savings in 3D-HEVC encoding time while minimizing the loss in coding efficiency, the depth level of a texture video treeblock having limited contribution to coding efficiency should be skipped. Hence, we use motion homogeneity property to filter out the unsuitable depth level to speed up the 3D-HEVC encoding process. According to the motion homogeneity of the current texture video treeblock (based on equation (6)), each treeblock can be classified into two types, statistically homogeneous treeblocks and complex motion treeblocks.

Table 5 shows the depth level distribution for two types of treeblocks, where “Level 0,” “Level 1,” “Level 2,” and “Level 3” are the depth levels of the texture video treeblock. It can be seen that for treeblocks with statistically homogeneous region, about 63.6% of texture video treeblocks choose the optimal depth level with “0” and about 33.3% treeblocks choose the optimal depth value with “1.” In other words, if the maximum depth level is set to be “1,” it will most likely cover about 96.9% of the texture video treeblocks. The mode prediction on depth levels of “2” and “3” (CU sizes 16 × 16 and 8 × 8) can be skipped. For treeblocks with motion region, about 98.4% of treeblocks choose depth levels with “1,” “2,” and “3” (CU sizes 32 × 32, 16 × 16 and 8 × 8). If the minimum depth level is set to be “1” and the maximum depth level is set to be “3,” it will be the most likely cover about 98.4% of the texture video treeblocks. On the other side, the probability of choosing the depth level with “0” is very low, less than 1.6%, and thus mode prediction on depth level of “0” (CU size 64 × 64) can be skipped. Based on the above analysis, the candidate depth levels that will be tested using RD optimization (RDO) for each texture video treeblock are summarized in Table 6. With the proposed depth level range determination algorithm, most texture video treeblocks can skip one to two tested depth levels. As a result, the candidate depth levels are limited to a small subset, and the computational complexity of 3D-HEVC could be highly reduced.

4.2. Early SKIP/Merge Mode Decision

SKIP/Merge mode provides good coding performance and requires a lower computational complexity in 3D-HEVC encoder. Thus, once SKIP/Merge mode can be predecided, variable-size ME and DE computation for a treeblock can be entirely saved in a 3D-HEVC mode decision procedure. In fact, the decision to use SKIP/Merge mode is delayed until the RD costs of all other prediction modes have been determined, and it is found that SKIP mode costs less. Usually, after computing the RD costs of all prediction modes, many texture video treeblocks finally end up with being decided as SKIP mode in 3D-HEVC because they belong to homogeneous region or a motionless object [35, 36]. Based on this analysis, we propose a novel early SKIP/Merge mode decision algorithm for statistically homogeneous treeblock to avoid the whole variable-size ME and DE procedure.

To sustain the rationality of our early SKIP/Merge mode decision algorithm, we conducted simulation experiments, as shown in Table 7. By studying the thorough mode decision, we study the mode distribution for texture video treeblocks with statistically homogeneous region. The percentages of choosing SKIP/Merge mode are 93.5% and 3.5%. The percentages of choosing other mode sizes are lower than 0.9%. In a word, our early SKIP/Merge mode decision method can significantly reduce useless ME and DE on small CU size.

4.3. Adaptive Motion Search Range Adjustment

Motion search plays an important role in defining the computational complexity of 3D-HEVC encoder, which is defined as the search for the best matched treeblock in temporally preceding frames. A suitable motion search window can reduce the computational complexity of 3D-HEVC and also maintain the good RD performance in 3D-HEVC [37, 38]. In the joint model of 3D-HEVC, motion search has a large fixed range for a whole texture video coding. The large fixed motion search range achieves the highest texture video coding efficiency but requires a significant computational complexity. Because the optimal motion search range is highly dependent on treeblock motion homogeneity characteristic, it is not advisable to use a large fixed motion search window in 3D-HEVC. Because the predicted vector is not always accurate and sometimes totally wrong, a large search range window is necessary for reliable motion estimation. On the other side, search range window could be reduced without loss of coding efficiency when the predicted vector is accurate. Because of the wide variation in motion fields across different frames in a 3D video sequence, it is necessary to adjust the search range window on a frame-by-frame basis for 3D-HEVC motion estimation. Furthermore, a 3D video frame may contain various regions with different levels of motion homogeneity, and the search range of ME needs to be adjusted on the motion complexity. Based on the above analysis, we can determine motion search range and skip some specific range levels rarely used in statistically homogeneous region.

It can be seen from Table 8 that for texture video treeblocks in static homogeneous region, more than 97.0% of all texture motion vectors lie in window. In other words, if the maximum search range is set to , it will most likely cover about 97.0% of all texture motion vectors. For the texture video treeblocks in complex motion region, the percentage of all texture motion vector lying in , , and windows are about 48.5%, 73.7%, and 82.7%, respectively, and thus 3D-HEVC motion search range cannot be reduced. From Table 8, the dynamically motion adjust search range that will be employed in 3D-HEVC texture coding is defined as follows:

With the proposed motion search range adjustment method, most texture video treeblocks of 3D-HEVC are capable of omitting useless candidate range in statistically areas. Based on the above three methods, the depth level range is selectively enabled, the sum of candidate modes in texture video is cut down, and the search range of ME and DE is adaptively determined. Finally, the proposed low-complexity coding algorithm selects the best mode among all texture video candidate modes and the encoding time in 3D-HEVC encoders can be reduced dramatically.

4.4. Overall Algorithm

According to the aforementioned analysis, the basic idea of the proposed low-complexity texture video coding algorithm for 3D-HEVC is to adjust various steps of ME and DE based on motion homogeneity of texture motion field. The proposed overall algorithm incorporates fast depth level range determination, early SKIP/Merge mode decision, and adaptive motion search range adjustment based on motion homogeneity. The flowchart of the proposed overall algorithm is given in Figure 2. The overall algorithm is shown below.Step 1: start mode decision for a texture video treeblock.Step 2: obtain the coding information from predictors in the spatial-temporal/the previously coded view (shown in Figure 1).Step 3: compute motion homogeneity parameter based on (5) and based on (6) and categorize the current texture video into statistically homogeneous part and complex motion region.Step 4: execute fast depth level range determination. When the current texture video treeblock is a statistically homogeneous part, the optimal depth level is set to be “0” to “1”; when the current texture video treeblock is a complex motion region, the optimal depth level is set to be “1” to “3.”Step 5: perform adaptive motion search range adjustment. When the current texture video treeblock is a statistically homogeneous region, the optimal search window is reconfigured with .Step 6: perform early SKIP/Merge mode decision. If the texture video treeblock in a statistically homogeneous region, only SKIP and Merge mode are used for the best mode and skip variable-size ME and DE and go to Step 7.Step 7: determine the best mode. Go to step 1 and proceed with next treeblock.

5. Experimental Results

For evaluating the low-complexity texture video coding algorithm, we execute it on the 3D-HEVC test model (HTM 16.0 [4, 39]). We have tested eight sequences in two resolutions (1024 × 768/1920 × 1088) recommended by JCT-3V Group [40]. In these sequences, the “Shark” and “Undo_Dancer” possess a large global motion region or rich texture region; the “Kendo,” “Balloons,” “Newspaper,” “GT_Fly,” and “Poznan_Street” have a middle local motion or a glabrous texture region; and there exists a small global motion or homogeneous texture in “Poznan_Hall2.” Coding conditions are set as follows: 3-view case: center-left-right (in coding order); P-I-P interview prediction; texture video QP values for independent view: 40, 35, 30, and 25; depth map QP values: 45, 42, 39, and 34; the number of test coding frames in each sequence is 150. The “VSRS-1D-Fast” software is employed for the view synthesis. In this section, we compare the proposed algorithm in Tables 915 with the 3D-HEVC encoder using exhaustive mode decision and the state-of-the-art fast methods [25, 28], where coding efficiency is measured by texture video and rendered view peak signal-to-noise ratio (PSNR), and the computation complexity is measured by the consumed run time. The Bjontegaard Delta PSNR (BDPSNR) [41] represents the PSNR gain, Bitrate (BDBR) represents the improvement of total bitrates, and “Dtime (%)” represents the encoding time change in percentage:where and are the run time of the proposed method and the original 3D-HEVC encoders, respectively. Two kinds of situations (“Texture video” and “Rendered view”) are used for experiments. We can measure the “Rendered view” via comparing the encoded render view with images rendered using uncompressed texture video and depth maps. Simulations were run on two Intel Xeon E5-2640 v2 2.0 GHz processors with 32 GB DDR3 random access memory. The operating system was Windows 7 SP1.

5.1. Performance Evaluation of the Individual Algorithms

Tables 911 provide the results of our algorithms compared with the original 3D-HEVC encoders, i.e., fast depth level range determination (FDLRD), early SKIP/Merge mode decision (ESMMD), and adaptive motion search range adjustment (AMSRA). It will be known that three algorithms are capable of accelerating coding time with tiny coding loss. For FDLRD algorithm, about 30.9% encoding time has been reduced. This coding time reduction is very high for “Shark” (37.2%) and “Undo_Dancer” (36.5%), but still evident for small motion sequences such as “Poznan_Hall2” (23.7%). We receive that the average bitrate increase is 0.72% (or 0.02 dB PSNR drop) for texture video and 0.29% bitrate augment (or 0.01 dB PSNR drop) for rendered views in Table 9. As for the ESMMD algorithm, about 17.2% encoding time has been reduced with the highest gain of 25.3% in “Poznan_Hall2” and the lowest gain of 12.4% in “Undo_Dancer.” The RD loss is very negligible with increase or 0.01 dB PSNR drop for texture video and 0.14% bitrate increase or 0.01 dB PSNR drop for rendered views. This result indicates that ESMMD can accurately reduce unnecessary ME and DE on small CU size. For the proposed AMSRA algorithm, it also shows fine coding efficiency for vary motion activities and texture characteristics sequences, about 31.3% encoding time has been subtracted with the best gain of 39.6% in “Shark” and the lowest gain of 24.5% in “Poznan_Hall2.” Meanwhile, the coding efficiency almost has no loss, with 0.24–1.58% bitrate increase for texture videos and 0.06–0.92% bitrate increase for rendered views. Therefore, the AMSRS algorithm is capable of alleviating computational burden with tiny coding quantity loss.

5.2. Performance Evaluation of the Overall Method

The coding performance of FDLRD, ESMMD, and AMSRA approaches is illustrated in Table 12. In Table 12, our algorithm plays an important role in shorting time for entire sequences which is different with 3D-HEVC encoders. Our algorithm cuts 56.6% time with the minimum of 51.6% for “Poznan_Hall2” and the maximum of 60.8% in for “Shark.” For sequences “Shark,” and “Undo_Dancer,” our algorithm saves more than 59.9% computational complexity. The computation decreasing is large owing to the exhaustive depth level and mode decision process of a bit of texture video treeblocks which are rationally omitted in 3D-HEVC. Table 12 shows a truth that the efficiency decline is tiny, where the bitrate increase is 1.27% (or 0.04 dB PSNR drop) for texture video and 0.52% bitrate increase (or 0.02 dB PSNR drop) for rendered views. Since the proposed overall algorithm is only applicable to the texture video, the encoding time ratio over original 3D-HEVC encoder for only encoding texture video is shown in Table 13. “Dtime (tex),” “Dtime (base),” and “Dtime (dep)” represent runtime savings of all the texture views (including base and dependent views), the base texture views, and the dependent texture views ratio, respectively. As can be seen from Table 13, the proposed overall algorithm can reduce the encoding time to 78.6% for all texture video coding. At the same time, the encoding time ratio over HTM16.0 for encoding the base view and dependent view of texture video is 62.2% and 83.7%, respectively. It can be observed that the proposed overall algorithm can achieve consistent runtime saving in all base and dependent views coding. Therefore, the proposed overall algorithm keeps nearly the same RD performance of the original 3D-HEVC encoder, reducing the computational complexity of the encoding process considerably.

Figure 3 shows detailed information of our algorithm compared to 3D-HEVC for the two sequences “Newspaper” (1024 × 768), and “Shark” (1920 × 1088) in “Rendered view.” In Figure 3, our algorithm is capable of decreasing the time over a large bitrate range with tiny loss in PSNR and augment in bitrate. Moreover, the encoder runtime will be saving increasingly with decreasing the coding bitrate. Because of the QP increasing, both the potential of only testing coding level 0-1 for texture video treeblocks due to FDLRD and the potential of only deciding SKIP/Merge mode and small motion search range for texture video treeblocks due to ESMMD and AMSRA are all increased.

5.3. Performance Comparison with the State-of-the-Art Methods

In addition to the 3D-HEVC encoder, the results of our algorithm are also compared with the fast 3D-HEVC methods. Tables 14 and 15 compare the proposed overall algorithm with two state-of-the-art fast methods, fast encoder decision for texture coding (FEDTC) [25], and OLCRS [28] based on the “CTC” condition. The QP values used for the texture and depth are chosen as follows: (25, 34), (30, 39), (35, 42), and (40, 45). All algorithms are executed on a computer for comparison. Our algorithm is superior to FEDTC, saving over 6.7% of the coding time with 11.3% in “Poznan_Street” and 2.4% in “GT_Fly.” Due to making full use of the property of texture motion uniformity to predict the current tree block, the poor prediction process of many texture video tree blocks has not been processed by 3D-HEVC encoders, so the calculation is greatly reduced. The mean loss of RD performance is tiny in Table 14. Our algorithm has only 0.69% bitrate augment (or 0.02 dB PSNR drop) for texture video and 0.31% bitrate increase (or 0.01 dB PSNR drop) for rendered views. These mean that our algorithm is able to alleviate lots of computational burdens while keeping nearly the same RD performance as the FEDTC method. Compared with OLCRS, 27.8%–39.7% encoding time has been reduced in Table 15. Meanwhile, the average increase of bitrate is 0.87% (or 0.03 dB PSNR drop) for texture video and 0.29% (or 0.01 dB PSNR drop) for rendered views. We can observe from Tables 14 and 15 that our algorithm achieves a higher encoding time saving for each test 3D sequence than FEDTC and OLCRS methods, while both methods do not introduce a noticeable RD efficiency loss. The encoding time saving is particularly high in our algorithm because the full-mode decision process of a significant number of texture video CUs is reasonably omitted, which demonstrates that the proposed texture video coding based on motion homogeneity are more suitable for selecting candidate modes for 3D-HEVC coding.

Figure 4 proves the coding speed of our algorithm and 3D-HEVC, FEDTC, and OLCRS methods. Our algorithm is supreme for sequences such as “Shark” and “Undo_Dancer” with more runtime saving. Compared with FEDTC and OLCRS, our algorithm receives the best coding efficiency and accelerates the coding time by 56.6%. The above simulation results demonstrate that the proposed low-complexity texture video coding based on motion homogeneity is efficient for all test sequences and outperforms the state-of-the-art methods with a better coding speed performance.

6. Conclusion

In this paper, we have proposed a fast texture video coding method to accelerate the coding time of 3D-HEVC encoder, which include the three approaches, i.e., fast depth level range determination, early SKIP/Merge mode decision, and adaptive motion search range adjustment. The experimental results show that our algorithm is able to decrease about 56.6% encoding time on average as compared with the HTM 16.0 encoder with only tiny loss of RD performance. Furthermore, it uniformly outperforms two most advanced fast 3D-HEVC methods with an additional 2.4–39.7% coding time saving.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Nos. 61771432, 61773018, 61302118, 61702464, 61374014, and 61803346), Scientific Project of Henan Province (Nos. 182102210156 and 182102210610), Innovation Talents of Henan Province (No. 17HASTIT022), Young Key Teacher of Henan Province (No. 2016GGJS-087), and Education Department Project of Henan Province (Nos. 18B510019 and 17B510011).