Abstract

To compress stereo video effectively, this paper proposes a novel macroblock (MB) level rate control method based on binocular perception. A binocular just-notification difference (BJND) model based on the parallax matching is first used to describe binocular perception. Then, the proposed rate control method is performed in stereo video coding with four levels, namely, view level, group-of-pictures (GOP) level, frame level, and MB level. In the view level, different proportions of bitrates are allocated for the left and right views of stereo video according to the prestatistical rate allocation proportion. In the GOP level, the total number of bitrates allocated to each GOP is computed and the initial quantization parameter of each GOP is set. In the frame level, the target bits allocated to each frame are computed. In the MB level, visual perception factor, which is measured by the BJND value of MB, is used to adjust the MB level bit allocation, so that the rate control results in line with the human visual characteristics. Experimental results show that the proposed method can control the bitrate more accurately and get better subjective quality of stereo video, compared with other methods.

1. Introduction

Three-dimensional (3D) videos, including stereo video multiview video and, have currently been coming to the home through various approaches, such as Blu-Ray disc, cable and satellite transmission, terrestrial broadcast, and streaming and download through Internet [1]. Moreover, 3D video technologies have been gradually matured to be moved into mobile platforms, for example, 3D mobile phone and mobile 3D television [2]. However, there are many crucial technologies that need further research for stereo video coding. Rate control is a key issue in stereo/multiview video coding and transmission.

Lu et al. [3] proposed a rate control scheme for multiview video coding (MVC), on the basis of the new quantitative measure for stereo video quality. And the macroblock (MB) quantization parameters (QPs) are modified according to the QP of the neighboring MB on the purpose of eliminating block effect in stereo video. Shao et al. proposed a rate control method for asymmetric stereo video coding [4]. They use a fixed threshold to quantize binocular psychovisual redundancy and establish the relationship between distortion and quantization for asymmetric stereo video coding. Zheng et al. [5] presented a rate control algorithm for MVC by analyzing the rate allocation proportion among different types of views, and the frame complexity was used to regulate the target bit for each frame. Liu et al. [6] presented a rate control technique for multiview video plus depth (MVD) based 3D video coding, and an image-stitching method was utilized to simultaneously encode video and depth. While these rate control algorithms in stereo/multiview video coding made useful explorations, however, these algorithms were only designed to frame layer bit allocation. In order to more finely rate control, it is necessary to develop the rate control techniques on the MB layer.

Moreover, the above algorithms did not take into account human visual characteristics well. Due to physiological and psychological mechanisms of human visual system (HVS), human eyes do not detect all of image or video distortion [7]. Thus, the concept of just-noticeable difference (JND) has played an important role in understanding HVS [8, 9]. JND implies a visibility threshold in which human eyes can perceive changes in image values and depends on the luminance and contrast of local image region. Liu et al. [10] utilized a JND model to separate edge and textured regions of the image. Recently, several studies had concentrated on constructing visibility threshold for 3D image/video. Based on modeling and incorporating the binocular combination of injected noises, luminance masking, and contrast masking, binocular JND (BJND) was reported as the first model to measure the perceptible distortion of stereo images [11]. It was empirically demonstrated that human cannot realize distortion in stereo image if the distortion in one view image is less than the BJND value.

In this paper, we propose a novel MB level rate control method for stereo video coding based on visual perception. The proposed rate control method is performed in stereo video coding with four levels, namely, view level, group-of-pictures (GOP) level, frame level, and MB level. The BJND model based on the parallax matching is first exploited. Then, visual perception factor, designed based on BJND, is used to adjust the MB level bit allocation. So the rate control results in line with human visual characteristics. The rest of this paper is organized as follows. Section 2 describes the BJND model and Section 3 describes the proposed method. Section 4 shows experimental results, and finally, a conclusion is given in Section 5.

2. The BJND Model

To account for the mutual effect on the minimum distortion in one view that evokes perceptual differences in stereo image, the BJND model was proposed by Zhao et al. and measured with a set of psychophysical stimuli [11]. It has been demonstrated that the BJND is binocular combination of injected noises, luminance masking, and contrast masking in binocular viewing.

In [11], Zhao et al. did not take into account the disparity between left and right views of stereo image in their experiments, and they obtained BJND values under assumption about that the disparities between left and right views were zero. This assumption is not reasonable since in most cases the disparity of stereo image is not zero. In practice, binocular disparity is attributed to the primary visual cortical areas, and it is one of the most important stereo cues. Stereo matching itself has been a difficult task in computer vision, and a comprehensive review of stereo matching is given in [12]. Here, a state-of-the-art stereo matching technique in [13] is adopted.

To improve Zhao’s model, a modified BJND model is presented with the consideration of binocular disparity. Given the left and right views of stereo image with the disparity image corresponding to left view, by incorporating the models of the binocular combination of injected noises, luminance, and contrast masking effects, a BJND at the right view is defined as follows: where and are the pixel coordinates, is the disparity at the position of () in right view, and is a contrast masking threshold. Note that is dependent on the background luminance level , the edge height , and the noise amplitude of the corresponding pixel position in the left view. Here, the original sequence is used, so there is no noise in the left view; that is, .

is defined by where ( ) is a luminance masking function and ( ) is a fitting function.

() and are calculated in

The fitting function of is defined as

The edge height is given by where () is Sobel operator and the formula is as follows:

3. The Proposed BJND-Based Rate Control Method

Here, we present a new rate control method for the stereo video based on the modified BJND. The framework of the proposed method is illustrated in Figure 1, with rate control in four levels, including view level, group-of-pictures (GOP) level, frame level, and MB level.

3.1. View Level Rate Control

Hierarchical B Pictures (HBP) coding structure in stereo video is shown in Figure 2. There are two views to be encoded. It is seen that there is no interview predication in left view, and right view is encoded with unidirectional interview prediction from the reconstructed left view.

In [14], we obtained the proportion between left view and right view according to preencoding the first GOP. The target bit rate for each view is computed by where denotes the total bit rate of left and right views. () is bit allocation proportion for each view. is the coding order index of view, and the value of is 0 or 1, which represents the left view and right view, respectively; () is target bit rate for the th view.

3.2. GOP Level Rate Control

Within the left view or the right view, the total number of bits allocated to each GOP is computed and the initial QP of each GOP is set in GOP level rate control. The total number of bits allocated for the ith GOP is computed by where is frame rate, denotes total number of frames in the current GOP, and is actual buffer occupancy after encoding the th GOP. The buffer occupancy should be kept at after encoding each GOP. In the case of the constant bit rate (CBR), is updated frame by frame as follows: where () is the actual encoding bits for the th frame.

The initial QP of the first GOP is a predefined QP, denoted as . As shown in Figure 2, I0 and B1 frames in the first GOP of the left view and P0 and B1 frames in the first GOP of the right view are encoded by . The first frame QP in other GOP is computed by where is total number of bits in encoding B frame in a GOP and SumBQP is the sum of QPs for all B frames in the previous GOP.

3.3. Frame Level Rate Control

Within each GOP, the target bits for each B frame are calculated according to the target buffer fullness level and the number of remaining bits in frame level rate control.

The actual buffer occupancy, (), is computed as where is a constant and its typical value is 0.75 [15]. is the target buffer level of the -1th frame in the th GOP. is the initial value of target buffer level. is the actual buffer occupancy after coding the th B frame in the th GOP.

Meanwhile, the number of remaining bits should also be considered when the target bit is computed. where is the number of the remaining B frames in the ith GOP.

The target bits, , are a weighted combination of and and represented by where is a constant and its typical value is 0.5 [15].

3.4. MB Level Rate Control

Let and denote the number of remaining bits for the all noncoded macroblock in the current frame and the number of MBs, respectively. Let and denote target bit rate and MAD value in the kth MB, respectively. Consider

With the current MB’s BJND value and the average BJND value of removing the left, right, upper, and lower boundary MBs, the tolerable distortion degree, , of the current MB in the whole frame is measured as follows: where and denote the number of MBs in a row and a column for each frame, respectively. The fluctuation of is large, thus, it is normalized and then added with 0.5 so as to get the perception factor weighting (), which is denote by where min indicates the minimum function and max indicates the maximum function.

The perception factor is used to adjust the target bit allocation for the MB level. The larger BJND value represents more sensitive to distortion for human eye; that is, the value of is larger, so the MB should be allocated more bits. Conversely, the MB with smaller BJND value should be allocated smaller bits. Consider Based on the allocated target bits, the quantization step size can be computed by the quadratic R-Q model: where is the quantization step size of the current MB. and are the model parameters, which need to be updated using a linear regressive technique.

Furthermore, QP of the current MB can be computed by

4. Experimental Results and Discussions

In order to evaluate the performance of the proposed MB level rate control method for stereo video coding based on visual perception, six representative stereo video sequences with 1024 × 768 spatial resolution, including BookArrival, AltMoabit, DoorFlowers, LeavingLaptop, Newspaper, and Kendo, are used in the experiments. BookArrival, AltMoabit, DoorFlowers, and LeavingLaptop are provided by Fraunhofer HHI [16]. Newspaper is provided by Gwangju Institute of Science and Technology [17]. Kendo sequence is captured by Nagoya University with the moved camera array [18].

In the experiments, we use the revised MVC software JMVC7.0 to implement the rate control methods. The test conditions of the six sequences are shown in Table 1. Two middle views (view 9 and view 10) for BookArrival, DoorFlowers, and LeavingLaptop, two middle views (view 5 and view 6) for AltMoabit and Newspaper, the two middle views (view 4 and view 5) for Kendo are used to simulate the left and right views of stereo video in Figure 2.

In order to avoid the influence of BJND results caused by occlusion and exposure in the left and right views of stereo video, the left, right, upper, and lower MBs in each frame of the right view sequence are not processed. And the BJND results of BookArrival and AltMoabit are given in this paper, as shown in Figures 3 and 4, respectively. Figures 3(a) and 3(b) show the original left and the right view images. Figure 3(c) is the BJND map of the right view in pixel domain, and Figure 3(d) is the BJND mask on the basis of Figure 3(c) by MB processing. In the same way, Figure 4 shows the similar results.

4.1. Rate Control Accuracy in Stereo Video Coding

Under the test conditions shown in Table 1, three methods, Zheng’s in [5], SMBRC, and the proposed method, are compared. SMBRC denotes the stereo video MB level rate control method; that is, the MB level monoscopic video JVT-G012 algorithm is extended to stereo video coding and implemented in JMVC.

The detailed average control accuracies of three methods are shown in Table 2. In Table 2, the target bitrate and the actually controlled bitrate are the average value of two views. Rate control error ratio, , is used to measure the accuracy of the bitrate estimation and defined by where and denote the target bitrate and the actual coding bitrate, respectively.

Zheng’s method is a frame-level rate control method, while the proposed method and SMBRC are MB level rate control scheme, so they have better control accuracy. This can be evidently seen from Table 2. Table 2 also indicates that the absolute inaccuracy of the proposed stereo video rate control method is within 0.192%. It is obvious that the proposed method can more precisely control the bitrate in stereo video coding.

4.2. Rate Distortion Performance Comparison

Figure 5 shows the rate-distortion (R-D) performance comparison results of the three methods. In Figure 5, the average bitrate is the average value of two views, and the peak signal-to-noise ratio (PSNR) value is also the average value of two views. It is clear that, compared with Zheng’s and SMBRC methods, the proposed method can achieve the better R-D performance under different target bitrates since more accurate binocular perception information is used in the proposed method.

4.3. Subjective Quality Comparison

Different sequences possess different motion and scene characteristics; therefore the video frames which can best reflect the differences in the subjective quality are selected in accordance with the characteristics of each sequence. For the outdoor scene AltMoabit sequence, we select the 52th frame in view 6 as an example, shown in Figure 6. The bus and the words on the bus in the image are the areas of human visual attention in Figure 6, and more bits are allocated for these areas under the visual perception factor based on BJND. Therefore the subjective quality of the proposed method is better than the other methods.

In order to eliminate the limitation of only one video frame, we select the 43rd frame to 47th frame for the indoor scene LeavingLaptop sequence, shown in Figure 7. The details of the human body and head and the book are allocated more bitrates, so the subjective quality of the proposed method is better than the other methods. Moreover, for quantization parameter selection, if we select a small value QP, the subjective quality of the three methods is too good to reflect the difference. Hence, QP is set to 37.

5. Conclusion

A novel macroblock (MB) level rate control method for stereo video is proposed in this paper. The proposed method is performed with four levels, namely, view level, group-of-picture (GOP) level, frame level, and MB level. In the view level, different proportions of bitrates are allocated for left view and right view of stereo video according to the prestatistical rate allocation proportion. In the GOP level, the total number of bitrates allocated to each GOP is computed and the initial quantization parameter of each GOP is set. In the frame level, the target bits allocated to each frame are computed. In the MB level, the binocular just-noticeable difference (BJND) model based on the parallax matching is first given. Then, visual perception factor, which is measured by the ratio between the BJND value of the current MB and the average BJND value of removing the left, right, upper, and lower boundary MBs, is used to adjust the MB level bit allocation. Experimental results show that the proposed method can control the bitrate more accurately and the average of rate control error is only about 0.19%. Compared with other schemes, the proposed method gets better rate-distortion performance and subjective quality of stereo images.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the Natural Science Foundation of China under Grant nos. U1301257, 61271270, 61311140262, 61171163, and 61271021. It was also sponsored by K. C. Wong Magna Fund in Ningbo University.