Abstract

In the process of the H.264 video coding, special attention should be paid to the subjective quality of the image. This paper applies the structural similarity (SSIM) based subjective evaluation to the rate control in the H.264 coding and proposes to combine the SSIM and the mean absolute difference (MAD) to perform the macroblock layer bit allocation instead of the MAD. Experimental results show that the proposed method is correlating better with the human visual system and thus achieves better subjective image quality.

1. Introduction

The JVT-G012 algorithm for the H.264 video coding uses the mean absolute difference (MAD) linear prediction model to solve the “chicken and egg dilemma” [1]. The basic unit layer number of bits (including the macroblock layer) is allocated averagely. As JM10.1 adopts the method, the distribution scheme mainly allocates the bit rate from the content of the complexity of the natural image. It has some disadvantages—it does not allocate the bit rate according to the human’s subjective characteristic. The obtained image is not consistent with the subjective characteristics of the human eyes.

Some scholars made a lot of improvement to overcome these shortcomings. For example, Lu put forward the motion extrapolated coding complexity to allocate bits between the region of interest (ROI) and other regions which reduced the distortion in the ROI using the rate distortion optimized macroblock level bit allocation [2]. Han et al. used a content complexity factor to allocate the bits more accurately and a quantization parameter adaptation factor to adjust the current quantization parameter according to the encoded frames history information [3]. Chen and Lu put forward the frame complexity factor to optimize the frame layer target bit allocation [4]. Liu et al. combine an inter-frame subtraction with the background subtraction to detect the target region [5]. To achieve the purpose of high-definition of the target region under the low bit rate, an optimized rate control scheme is proposed. But these methods are not, according to the HVS (human visual system) model, to allocate the bits of the macroblock.

Wang et al. proposed the SSIM (structural similarity) as a new indicator to measure the similarity of two images. It is a method used for evaluating the quality of image [6]. Due to the fact that the human visual can extract image structure information easily, the calculation of two-image structure similarity can be used as a standard to evaluate the image quality. The bigger the value is, the better the quality is. The biggest of the SSIM is 1. The SSIM is better than the MSE (Mean Squared Error) and the PSNR (peak signal to noise ratio) in the image similarity evaluation.

Cui et al. put forward an empirical model of the SSIM linear distortion model and combined it with the improved secondly rate-quantitative model using Lagrange multiplier method to get the closed solution of the SSIM optimal MB layer quantization [7]. Cui et al. also used the SSIM to guide the RDO frame mode selection [8]. Wang et al. put forward a method of interest video quality evaluation method [9]. The method presented the extraction of the frame’s interest area to weigh the calculated SSIM of the block in the frame. Yang et al. introduced the SSIM as the distortion measurement for the H.264 interframe prediction [10]. Yang et al. found the myopia relationship between the SSIM and rate, and then it was integrated into SSE as distortion measure of RDO function and combined with human visual characteristics [11]. It built the distortion measure of RDO model combining the SSE with the SSIM. Cui and Zhu applied the SSIM based subjective distortion to RDO-based intramode decision in H.264 I frame coding, and they further proposed a frame layer adaptive Lagrange multiplier adjustment scheme to get better tradeoff between the rate and the SSIM distortion [12]. Wu et al. used the relationship between the reconstructed macroblock and the best prediction macroblock from mode selection [13]. Cui and Zhu put forward a kind of the adaptive skipped frame scheme by recovering the skip frame in the decoding port through the reference frame [14].

The above methods did not use the SSIM to allocate the bits of the macroblock layer. This paper will focus on how to use the SSIM theory to allocate the bit of the macroblock layer in order to improve the subjective video quality.

2. Improved Rate Control in MB

The pixels of the image are related. This contains the structural information of the image. In the observation of the image, the human eyes are more sensitive to changes of the structure information than individual pixel values. In the SSIM theory, the SSIM index is defined from the view of image composition, which reflects the scene object structure properties. It is independent of the brightness and contrast of the image. The distortion modeling is built as the combination of three different factors—brightness, contrast, and structure. The SSIM of the original image to the distorted image is defined as [6] where and are the brightness mean of the images and , and are the standard deviation the images and , is the covariance between them, and and are the small constants in order to avoid that the denominator is zero. In the calculation of frame SSIM, the general is to divide the image into the fixed size MBs and then calculate the SSIM of each MB. The SSIM measurement has a boundary, which is not more than 1. The closer the value is to 1, the better the subjective quality of the distorted image is. The higher SSIM value implies that the similarity between the distorted image and the original image is high, so the image quality is better. Therefore, the distortion of the th MB in the P frame can be defined as .

The MBs in the same position of the adjacent frames have strong temporal correlation and thus have the similar SSIM distortion characteristics. Therefore, the SSIM of the current MB can be estimated by the actual SSIM of the same position MB in the previous frame.

In the JVT-G012 algorithm for the H.264 video coding, the bit rate of the MB is allocated averagely. The obtained image is not consistent with the subjective characteristics of the human eyes. This paper applies the SSIM based subjective evaluation to the rate control in the H.264 coding and proposes to combine the SSIM and the mean absolute difference to perform the MB layer bit allocation instead of the mean absolute difference.

The implementation of rate control for the MB layer mainly includes the bit allocation, calculations of the quantization parameter, and buffer control. First, we allocate the target bits according to the remaining bits, SSIM and MAD. Then, we calculate the quantization parameter and adjust it. Last, we perform rate-distortion-optimization (RDO) and update the model parameters.

Firstly, the bit of the th MB in the frame is allocated by where is the serial number of the MB in the frame, is the total number of the MBs in the frame, and is the loop variable. and are the weighted coefficients to allocate bits and . is the remaining bits in the frame. represents the SSIM of the th MB in the frame; represents the MAD of the th MB in the frame. and are 0.4 and 0.6 after taking a lot of experiments.

Then, the quantization parameter of the current MB is computerized by using the quadratic R-D model. However, the inaccuracy of the model can produce the unexpected coded bits. Therefore, we adjust the current quantization parameter by the previous MB coding information, which can achieve more accurate rate control. In order to avoid the adjacent MBs’ image quality jump too much, we adjust the quantization parameter. If the current macroblock is the first basic unit in the current frame, , is the average quantization parameter of all the basic units in the previous frame. If the bit number of current frame is less than zero, the quantization parameter should be greater than that of the previous basic unit in order that the sum of generated bits is closed to the current frame bits. Therefore, where is the quantization parameter of the previous macro block. Otherwise, using the quadratic model to compute a quantization parameter, .

Finally, the algorithm has taken into account the coded frame information fully after the adjustment. We perform RDO and update the model parameters. In this paper, the SSIM is incorporated into the RDO framework as a quality metric [15] for all MBs in the current basic unit. For convenience, the method in reference 15 is called PRDO method.

3. Experimental Results and Discussions

The proposed algorithm in this paper is realized by enhancing the JM10.1 test model. The main experimenters are shown in Table 1. The purpose of the proposed algorithm is to improve the video subjective quality, so the original JM10.1 rate control algorithm is selected as a reference for the comparison, because the bit allocation method in the JM10.1 algorithm is better than the original JVT-G012 algorithm. The proposed algorithm is also compared with the PRDO algorithm.

3.1. Subjective Visual Quality

Figures 1 and 2 give some subjective renderings for the carphone and foreman sequences of the proposed, JM10.1, and PRDO algorithms at the bit rate of 64 kbps. From the view of human being’s eyes, the image structure information of the proposed algorithm is better than the JM10.1 algorithm as shown in the figures (the top is original image, the middle is obtained from the proposed algorithm, and the bottom is obtained from JM10.1).

3.2. SSIM

Figures 3 and 4 show the SSIM comparison from the first to the one hundredth frame of the carphone and foreman sequences by using the proposed, JM10.1, and PRDO algorithms. As can be seen from the figures, the proposed algorithm improves the SSIM of the video sequence. The SSIM obtained from the proposed algorithm is better than other two algorithms. So it has the better subjective video quality.

Tables 2, 3, and 4 show the average SSIM of the proposed, JM10.1, and PRDO algorithms for the carphone, crew, highway, claire, and foreman sequences at the bit rates of 64, 128, 256, and 512 kbps when the initial quantization parameter is set to 23, 28, and 33, respectively. According to tables, the proposed algorithm gets better average SSIM than the other algorithms for all video sequences at all bit rates. So the proposed algorithm can achieve better subjective video quality than other algorithms during the video coding.

3.3. PSNR

Figures 5 and 6 show the PSNR comparison from the first to the one hundredth frame of the carphone and foreman sequences by using the proposed, JM10.1, and PRDO algorithms. As can be seen from figures, the PSNR differences of three algorithms are very small. The proposed algorithm achieves a little higher PSNR than the other two algorithms.

3.4. Bit Rate

Tables 5, 6, and 7 show the bit rates of the proposed, JM10.1, and PRDO algorithms for the carphone, crew, highway, claire, and foreman sequences at the target bit rates of 64, 128, 256, and 512 kbps when the initial quantization parameter is set to 23, 28, and 33, respectively. As summarized in the tables, the proposed algorithm can control the bit rates more accurately than the other two algorithms.

The experimental results show that the proposed algorithm in this paper makes the macroblock layer bit rate allocation more reasonable and improves the subjective image quality, which allocates the macroblock layer rate from the two angles of the image structure similarity and the close degree of the content. What is more, it improves the continuity of the video sequences. Compared with JM10.1 and PRDO, the video effect of the proposed algorithm is also better.

4. Conclusions

This paper presents a rate control algorithm for H.264 video coding by using the SSIM and the MAD to allocate the bit number of the macroblock, and then it performs some measurements to adjust the quantization parameter. The experiment results show that the algorithm can improve the SSIM of the video sequence in the condition of ensuring the PSNR. So it can improve the subjective quality of the video sequence.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the Qing Lan Project and the Priority Academic Program Development of Jiangsu Higher Education Institutions.