Adaptive Rate Control Algorithm for H.264/AVC Considering Scene Change
Scene change in H.264 video sequences has significant impact on the video communication quality. This paper presents a novel adaptive rate control algorithm with little additional calculation for H.264/AVC based on the scene change expression. According to the frame complexity quotiety, we define a scene change factor. It is used to allocate bits for each frame adaptively. Experimental results show that it can handle the scene change effectively. Our algorithm, in comparison to the JVT-G012 algorithm, reduces rate error and improves average peak signal-noise ratio with smaller deviation. It cannot only control bit rate accurately, but also get better video quality with the lower encoder buffer fullness to improve the quality of service.
The multimedia applications are becoming popular on the Internet. The quality of service (QoS) in terms of end-to-end delay guarantees to real-time applications is especially important for the new generation of Internet applications such as video on demand and other consumer services [1–7]. Elements of network performance within the scope of QoS often include availability, bandwidth, delay, and error rate.
QoS involves prioritization of network traffic. QoS can be targeted at the network interface or in terms of specific applications. In order to make the video stream well adapted for the time delay and the network sources such as the bandwidth and the buffer, especially for the low bandwidth or time-varying wireless channel, two technologies, the traffic shaping and the rate control, are developed. The traffic shaping technology belongs to the transport layer method to improve the QoS. There are two categories of approaches for guaranteeing end-to-end performances. One is bounded modeling of frames and the other stochastic modeling of frames [8–11]. For the bounded modeling based approaches, the fundamental is network calculus [11–17]. For the stochastic modeling, the fractal time series is essential [17–22]. Li et al. provides a Holder exponent to describe the fractal time series  and use it to investigate the scaling phenomena of traffic data . While the rate control technology belongs to the compression layer method, it compresses the original video sequences according to the needs of the application and the available bandwidth. The emerging Internet video streaming media transmission, wireless channel transmission, the MPEG-4 object encoding transmission, and transmission of the actual application require the high efficiency rate control algorithm to meet the needs of real-time video transmission.
As a new international video compression standard for IP and the wireless communication, H.264 has not only absorbed the advantages of the entire previous coding schemes, but also focuses on the current advanced coding techniques. After being promulgated in 2003 formally, H.264 has elicited a wide range of interests in industrial and academic fields .
The rate control is an essential part of H.264. The main task of rate control is to allocate a certain number of bits for each frame in the purpose of controlling the output rate to adapt to the current bandwidth and minimizing the image distortion. It can effectively avoid the bit error and the packet loss caused by the excessive congestion in the real-time data transmission. But in the rate control algorithms for H.264/AVC, the quantization parameter is used in both the rate control and rate distortion optimization (RDO), which leads to chicken and egg dilemma . In order to solve this dilemma, many scholars have done a lot of research [25–28]. The work in  solved the dilemma by enhancing the ρ domain model. The relational model between rate and quantization is advanced for the dilemma . JVT-G012 algorithm proposes a linear model for MAD prediction to solve the chicken and egg dilemma . This method can obtain good coding result and solve the dilemma well. When there is no scene change in video sequence, JVT-G012 algorithm has good performance. However, the video quality would have a serious decline in the situation of scene change. There are two main reasons. On the one hand, it uses a fixed length of group-of-pictures (GOP) structure, which can not effectively detect scene change in video sequence. On the other hand, it is mainly based on the linear model to determine the allocation of coding bits and quantization parameter. When a scene change happens, the predicted MAD has larger deviation leading to the serious decline in coding quality after scene change.
Aiming at the problem of scene change, there are a variety of rate control algorithms, including gray value detection algorithm , intramode macro block statistics algorithm , motion searching detection-based algorithm , and edge detection algorithm , and so forth. The differences between those methods can be summarized in two main areas—how to detect a scene change and how to deal with the scene change. Edge detection algorithm has good performance, but it uses computer image recognition technology. So the algorithm is very complicated and this greatly limited its application. Intramode macro block statistics algorithm and motion searching detection-based algorithm need to compile a code on the current frame for the second coding sequence. Gray value detection algorithm is based on absolute difference and can better reflect the degree of scene changes. But the gray absolute difference would be very large when it comes to global motion or the image is not strong within the relevant contents. So it does not reflect the true complexity of this case.
Therefore, this paper proposes a novel adaptive rate control algorithm with little additional calculation for H.264/AVC based on scene change expression. We use a scene change factor to adjust bit allocation adaptively for every frame in video sequence. Experimental results show that our method can effectively improve the video quality in the situation of scene changes.
2. Effect of Scene Change on Coding Performance
When the scene change occurs in video sequence, the temporal correlation between images disappears or diminishes, which has a great impact on internal prediction. The encoding quality will be affected and the impact depends on the scene change frame position in the GOP. There are three types of frames—I, P, and B frames. I frame uses intraframe coding, P frame uses one-way prediction interframe coding, and B frame uses bidirectional prediction interframe coding. The following analyzed the impact of scene change on the coding performance when it occurs at three different types of frames, respectively.
When scene change occurs in I frame, because I frame uses intra-frame coding without reference to other frames, the subsequent P or B frames can be normally encoding. Therefore, the scene change in I frame has no impact on coding performance at all. When the scene change occurs in B frames, this B frame and the first subsequent P or I frame are in the same scene because B frame is bidirectional prediction frame. I frame is intraprediction frame and always has a good coding results; the first P frame after scene change has a good coding effect. So the current B frame would get a better prediction. Therefore, we also do not need to do any processing. When the scene change occurs in P frame, because P frame is forward predictive coding frame, the current P frame is predicted according to the previous P or I frame. At the same time, the current P frame may also be as the prediction reference frame of the next P or B frame in the current GOP. Therefore, the scene change occurred in the P frame has great impact on the image quality of the current P frame and subsequent frames. Since the current P frame and its reference frame are in different scenes, the interprediction coding is completely invalid. Macro block must perform RDO before taking intracoding mode selection. The optimization process costs a large number of coding time. In addition, most macro blocks uses intracoding mode taking a lot of bit rate resources; buffer fullness will increase dramatically, resulting in significant decrease in the bit rate distribution and image quality of the follow-up frames. This impact is likely to continue an all the following frames in the GOP.
From the above analysis, the scene change occurred in I frame and B frame has little effect on the coding performance, but great impact on P frame. Therefore, only the scene change in P frame is necessary to be considered.
3. Proposed Adaptive Rate Control Algorithm
Detecting the scene change in video sequence is not required in scene adaptive methods. Instead, the relative change between adjacent frames is considered. It is not necessary to change the GOP structure and to determine whether scene change occurs. It avoids missing scene change and miscarriage of justice.
The JVT-G012 rate control algorithm allocates the bits for nonencoded frames on average. When a scene change happens, the predicted MAD has larger deviation leading to the serious decline in coding quality after scene change. Thus we propose a scene adaptive method to resolve the above-mentioned shortcomings. The implementation of rate control mainly includes bit allocation, calculations of quantization parameter, and buffer control. Our adaptive rate control algorithm consists of three steps. First, according to the remaining bits and scene change factor, we allocate the target bits in frame layer. Then we calculate the quantization parameter. Last, we perform RDO and update the model parameters.
3.1. Scene Change Factor
Now most researchers use MADratio to indicate the frame complexity measurement. However, MAD ratio is the ratio of the predicted MAD of current frame to the average MAD of all previously encoded P frames in the GOP; when the current frame has scene change, the predicted MAD of the current frame fails. So our frame complexity measurement is represented by frame complexity quotiety as follows [34, 35]: where is the ratio of predicted MAD of the th frame in the th GOP to the average MAD of all encoded P frames in the th GOP, and is the frame number. . is the average difference of gray histogram of the th frame in the th GOP . is the weighting coefficient. According to the frame complexity quotiety, we propose a scene change factor as follows: where is the weighting coefficient.
In Figure 1, the scene change factor can reflect the complexity of the image. When scene change occurs, increases sharply and its value is more than 2. In this figure, the values of which belong to football sequence are significantly greater than the values which belong to suzie sequence. We can see that the combination frame complexity measure we proposed is reasonable. When is bigger than 2, there should be scene change. Therefore, we can more effectively allocate the target bits according to .
3.2. Bit Allocation
The target bits allocated for the th frame in the th GOP are determined by residual bits, frame rate, target buffer size, actual buffer occupancy, and the available channel bandwidth: where is the predefined frame rate, is the available channel bandwidth for the sequence, is the target buffer level, and is the occupancy of virtual buffer. is a constant and it is 0.5 when there is no B frame and 0.9 otherwise. is a constant and it is 0.75 when there is no B frame and 0.25 otherwise. is contained by the formula where , is the residual bit of all uuencoded frames in the th GOP, and is the number of unencoded P frames.
When is much larger than 2, which indicates that great changes have taken place in the scene, macro block uses intra-frame coding mode and a large number of bits are allocated for this frame. When is between 0.9 and 2, scene changes not much. In order to supplement bits cost by the big scene change frame, the allocated bits for these frames are reduced lightly. When is between 0.5 and 0.9, the scene changes very little. The allocated bits for these frames are reduced significantly. When is very small, it is usually located behind a large scene change frame. Because the former one uses too many bits, the bits assigned to this frame in the algorithm are not much, so there is no adjustment to this frame.
4. Experimental Results
We have performed our proposed rate control algorithm by enhancing the JM8.6 test model software. The JVT-G012 rate control method is selected as a reference for comparison (as is implemented on reference software JM8.6) using different test sequences under various target bit rates. The combined test sequences used are in QCIF4:2:0 formats: suzie-football, foreman-mobile, bus-coastguard-news, foreman-coastguard-news, and football-foreman-mobile-suzie. In the experiments, the frame rate is set to 15 frames per second; that is, f/s; the target bit rate is various; the total number of frames is set to 100 or 150; the initial quantization parameter is set to 28; and the length of GOP is set to 100.
The experimental results are shown in Table 1. Our proposed rate control algorithm outperforms JVT-G012 under various target bit rates for different video sequences. Our proposed rate control algorithm can control the bit rates under various target bit rates for different video sequences accurately. The average error of the actual bits is reduced compared with that of the JVT-G012 algorithm. The more accurate bit rate avoids the bit error and the packet loss (jumped frame) caused by the excessive congestion in the real-time video transmission.
The proposed algorithm also improves the average peak signal noise-ratio (PSNR) and PSNR deviation for the sequences. In Table 1, it shows that our method achieves an average PSNR gain of about 0.22 dB with similar or lower PSNR deviation as compared to the JVT-G012 algorithm. The maximum of PSNR deviation decrease is 26.44% compared with the original algorithm. The proposed algorithm obtains lower average PSNR deviation by 2.84% compared with the JVT-G012 algorithm. This shows the proposed algorithm can smooth the PSNR fluctuation between frames to some extent.
Table 2 shows the average PSNR comparisons of our proposed algorithm in some test sequences after the scene change occurs at different bit rates. Although the PSNR declined, it is shown that the decrease is reduced compared with the JVT-G012 algorithm.
Although the video quality caused by scene change is inevitable, a slight decrease in most frames can be used to replace the serious decline in some frames of video sequence to smooth video PSNR, thereby improving video quality. Figures 2–4 show frame by frame PSNR for the sequence suzie-football, foreman-mobile, and football-foreman-mobile-suzie with the comparison of the proposed algorithm with the JVT-G012 algorithm. For example, in Figure 3, that is, the comparisons of average PSNR for football-mobile sequence, there is a dramatic decline in PSNR in the JVT-G012 algorithm from 91 to 100th frames. In our proposed algorithm, the decline is reduced, but the PSNR from the 53rd to the 89th frame has a small decrease which compensates for the dramatic decline in PSNR from the 91st to the 100th frame.
We also make the comparisons of buffer fullness for the sequence in Figure 5. It shows that our proposed algorithm has less fluctuation in buffer fullness, especially for the frames of rich details, as compared to the JVT-G012 algorithm. Therefore, our algorithm achieves much steadier buffer fullness when compared to the JVT-G012 algorithm, which avoids the potential overflow. This implies the improved quality of service.
In this paper, we propose an adaptive frame layer rate control algorithm for H.264/AVC using the scene change factor. The algorithm allocates bits in frame layer according to the scene change factor. The experimental results show that our algorithm achieves accurate rate control and a better visual quality. The average PSNR is advanced by 0.22 dB. Our algorithm can improve the smoothness of the image quality. In addition, it avoids the potential overflow because of much steadier encoder buffer fullness. So the algorithm can improve the QoS on the Internet real-time video transmission under the H.264 standard.
This work was supported by Qing Lan Project and the National Natural Science Foundation of China (10904073).
H. Michiel and K. Laevens, “Teletraffic engineering in a broad-band era,” Proceedings of the IEEE, vol. 85, no. 12, pp. 2007–2032, 1997.View at: Google Scholar
Y.-M. Jiang and Y. Liu, Stochastic Network Calculus, Springer, 2008.
M. Li and W. Zhao, Analysis of Min-Plus Algebra, Nova Science, 2011.
S. Ma, W. Gao, F. Wu, and Y. Lu, “Rate control for JVT video coding scheme with HRD considerations,” in Proceedings of the International Conference on Image Processing (ICIP '03), vol. 3, pp. 793–796, September 2003.View at: Google Scholar
M. Jiang, X. Yi, and N. Ling, “Improved frame-layer rate control for H.264 using mad ratio,” in Proceedings of the IEEE International Symposium on Cirquits and Systems (ISCAS '04), vol. 3, pp. III813–III816, May 2004.View at: Google Scholar
Z. Li, F. Pan, and K. Lim, “Adaptive basic unit layer rate control for JVT,” in Proceedings of the 7th JVT Meeting, Pattaya II ,Thailand, March 2003.View at: Google Scholar
W. A. C. Fernando, C. N. Canagarajah, and D. R. Bull, “Fade-in and fade-out detection in video sequences using histograms,” in Proceedings of the IEEE Internaitonal Symposium on Circuits and Systems (ISCAS '00), vol. 4, pp. 709–712, May 2000.View at: Google Scholar
R. Lienhart, “Reliable transition detection in videos: a survey and practitioner’s guide,” International Journal of Image and Graphics, vol. 1, pp. 469–486, 2001.View at: Google Scholar
M. Sharifi, M. Fathy, and M. T. Mahmoudi, “A classified and comparative study of edge detection algorithms,” in Proceedings of the IEEE International conference on Information Technology: Coding and Computing (ITCC '02), pp. 117–120, 2002.View at: Google Scholar