Abstract

Aiming at the problems that the strategy of target bit allocation at the CTU layer has deviations from the human subjective observation mechanism, and the update phase of parametric model has a higher complexity in the JCTVC-K0103 rate control algorithm of ITU-T H.265/high efficiency video coding (HEVC) standard. Optimized rate control (ORC) algorithm of ITU-T H.265/HEVC based on region of interest (ROI) is proposed. Firstly, the algorithm extracts the region of interest of video frames based on time and space domains by using the improved Itti model. Then, the weight of target bits is recalculated based on space-time domains to improve the rate control accuracy, and the target bits are distributed based on ROI by the adaptive weight algorithm once again to make the output videos more attuned with the human visual attention mechanism. Finally, the quasi-Newton algorithm is used to update the rate distortion model, which reduces the computational complexity in the update phase of the parametric model. The experimental results show that the ORC algorithm can obtain a better subjective quality in the compressed results with less bit error compared with the other two algorithms. Meanwhile, the rate distortion performance of the ORC algorithm is better on the premise of guaranteeing rate control performance.

1. Introduction

With the improvement of videos in the aspects of clarity and quality, the traditional video coding standard already cannot meet the coding requirements of high resolution videos. In order to catch up with the developed trend of high resolution videos and the requirements of coding technology, the video coding joint working group JCT-VC (Joint Collaborative Team on Video Coding) has developed a new video coding standard based on ITU-T H.264/AVC in 2010, named ITU-T H.265/High Efficiency Video Coding (HEVC) standard [1]. The rates can be reduced by more than 50% of ITU-T H.265/HEVC compared with ITU-T H.264/AVC. In the transmission process of high-resolution videos, the higher requirements are proposed for network bandwidth in order to avoid compressed coding stream in the buffer of limited bandwidth from overflow and to ensure a reasonable bit allocation strategy to generate videos with minimum loss of rate distortion performance. JCT-VC has proposed two solutions to control the rates of ITU-T H.265/HEVC, the JCTVC-H0213 rate control algorithm based on model, and the JCTVC-K0103 rate control algorithm based on model, which are the most representative [2]. The JCTVC-K0103 rate control algorithm has better rate control effects and less fluctuation of bits compared with the JCTVC-H0213 rate control algorithm. However, the JCTVC-K0103 rate control algorithm does not consider the characteristics of video content in the process of bit allocation at the CTU layer, which causes the inaccurate results of bit allocation to affect the quality of output videos and the accuracy of the rate control algorithm. At the same time, the gradient descent algorithm in the update phase of rate distortion model has high computational complexity, which increases the complexity of rate control algorithm. Therefore, optimizing the JCTVC-K0103 rate control algorithm to improve the compression performance under the premise of guaranteeing the performance of rate control algorithm has become a hot point in the coding area of videos.

Aiming at the shortcomings of K0103 rate control algorithm, many scholars have done a lot of research at home and abroad, which appears in the aspects of the complexity measurement of CTU and the computational load of parametric model. In the aspect of the complexity measurement of CTU, Guo et al. [3] proposes a method which calculates the complexity of CTU based on the pixel statistical methods, but the bits allocated will have a large error of video sequences with severe local motion. In [4], SATD is taken as the measurement of complexity, which ignores the relevant characteristics of video content. In [5], differential histograms are adopted to measure the complexity of CTU, which has the uncertainty of the selection for threshold in different video sequences. The above algorithms improve the accuracy of complexity of CTU to a certain extent, but the overall performance of rate control algorithms is not improved obviously. In the aspect of the computational load of parametric model, in [6], the rate distortion of the parametric model is improved, but the adaptability is poor when the scenes of videos are transforming, and the computational load is also large. In [7], the gradient descent method is improved in the original parametric model, but with the increment of iterations, the computational load is also increasing gradually. The above algorithms improve the performance of the parametric model to a certain extent, but the computational load reduced is not obvious.

The above algorithms are improved unilaterally from the complexity measurement of CTU and the computational load of the parametric model. The overall performance of optimized rate control algorithms is not obvious. In this paper, the rate control algorithm is optimized between the complexity measurement of CTU and weight assignments of bits at the CTU layer and the computational load of parametric model. Firstly, the improved Itti model is used to extract ROI of video frames. Then, the complexity at the CTU layer is calculated based on ROI, and the adaptive weight algorithm is used to redistribute the target bits combining with the complexity of CTU, which makes the output videos more attuned with human visual attention mechanism. Finally, the quasi-Newton method is used to update the parametric model, which reduces the computational load of the update phase of the parametric model. Therefore, the overall performance of rate control algorithm is improved.

2. Analysis of Algorithms

2.1. JCTVC-K0103 Rate Control Algorithm

The JCTVC-K0103 rate control algorithm allocates reasonable number of bits to each coding layer, given certain target rates based on the model, to optimize the coding performance, which includes the following two steps specifically:(1)Performing target bit allocation of GOP layer, image layer, and CTU layer hierarchically according to target bit rate(2)The quantization parameter QP is determined by the model according to target bit allocation corresponding to the hierarchy

2.1.1. Target Bit Allocation at the CTU Layer

The bit allocation of each layer in the K0103 rate control algorithm is a dynamic process. It is necessary to refer to the actual number of bits in the current coding layer when performing the allocation of target bits in the next layer and to ensure that the number of target bits allocated of current coding layer is smaller than the total number of target bits allocated. The implementation process of target bit allocation at the CTU layer is as follows.

After the number of bits is allocated at the image layer, the target bit allocation of CTU layer depends on parameter , the number of target bits per frame. And equation (1) [2] of target bit allocation at the CTU layer is as follows:where represents the information of data header encoded, which includes the GOP flag bit, the frame flag bit, and so on; represents the actual number of bits of CTU encoded in the current frame; represents the total number of CTU in the current frame, ; represents the weight of bit allocation in the kth CTU; represents the weight of bit allocation of the cth CTU.

In the process of bit allocation at the CTU layer, the weight of bit allocation of each CTU affects the compressed quality of videos directly. In the K0103 rate control algorithm, the weight allocated can reflect the texture complexity of current CTU. Generally, CTU with high texture complexity need to be allocated more bit rates, and the value should be larger at this time; CTU with low texture complexity need to be allocated less bit rates, and the value should be smaller at this time. The K0103 rate control algorithm uses the mean absolute difference (MAD) to characterize the texture complexity of current CTU based on this strategy [8]. Equation (2) is as follows:where and represent the width and height of current CTU, respectively; represents the pixel value of current CTU; represents pixel value reconstructed of current CTU.

According to the definition of MAD, the predicted value of MAD is only taken as the weight of target bit allocation in the process of target bit allocation at the CTU layer, and the accuracy is low, which affects the compressed quality of videos ultimately. Therefore, the weight of bit allocation at the CTU layer needs to be adjusted.

2.1.2. Implementation of Target Bit

The purpose of implementation of target bit is converting the value of target bit allocation into the value of quantization parameter QP, and the compression ratio and the final encoding rate are determined by QP. After the value of quantization parameter is determined, it is necessary to update the parameters and adjust the relevant parameters. The implementation process of target bits is completed at the image layer and CTU layer, and the implementation ideas are same basically. The implementation steps of target bits at the CTU layer are as follows:

The K0103 rate control algorithm uses the hyperbolic model to simulate the relationship between rate and distortion:where represents the distortion values, and represent the parameters related to the characteristics of video content, and represents the value of target bit allocation. In the rate-distortion optimization theory, the Lagrangian multiplier represents the absolute value of tangent slope in the rate-distortion curve, and the derivative of equation (3) is as follows:where , , and and represent the parameters related to the characteristics of video content. A large number of experimental studies have shown that [9] there is a linear relationship between the quantization parameter QP and the Lagrangian multiplier :

Under the premise of target rates known, the QP can be determined by adjusting and . In the implementation process of target bits at the image layer or CTU layer, the relationship between and the value of target bits satisfies:where represents the average number of bits per pixel in the image or CTU determined by the value of target bit allocation at the image layer or CTU layer. In the determining process of QP, can be determined by and firstly, and the relationship between and QP determines the quantization parameter QP.

After performing encoding work of the current image or CTU, the parameters and are adjusted according to the actual number of bits encoded in the current image or CTU. The updating methods of parameters and are specifically as follows:where represents the Lagrangian multiplier of image or CTU encoded, represents the Lagrangian multiplier of current image or CTU, and are the parameter values of current image or CTU, represents the average number of bits per pixel in the image or CTU determined by the value of target bit allocation at the current image layer or CTU layer, and and represent the values updated of parameters encoded.

The K0103 rate control algorithm uses the gradient descent method to update and adjust the parameters and . However, the gradient descent method has a slow convergence rate, which leads to higher computational complexity for K0103 algorithm, and it is not conducive to practical application.

3. Extraction of ROI from Videos Based on Improved Itti Algorithm

To address the deficiencies of the Itti algorithm such as incomplete ROI of image and the edge blur, this paper improves the Itti algorithm to extract the ROI of video sequences based on picture contents in both spatial and temporal domains. First, the algorithm extracts texture and shape features in the spatial domain and extracts motion feature in time domain on the basis of the extraction of brightness, color, and orientation features. Then, the normalized step and the cross-scale step of Itti model are adopted to generate six single-feature salient maps, the brightness feature maps, the color feature maps, the orientation feature maps, the texture maps, the shape feature maps, and the motion feature maps. Finally, the information entropy theory is used to acquire the weights of six single-feature salient maps adaptively, and the cross-scale fusion is carried out to extract the final ROI of video sequences.

3.1. Texture Feature

The characteristics of human visual system (HVS) indicate that [10] human eyes have different attention to different regions in the image, and the texture-rich regions or the moving objects in the image are more likely to attract the attention of human eyes. The texture feature is extracted as a subfeature of the Itti model in this paper. Chiranjeevi and Sengupta [11] propose that the structural tensor matrix can represent the texture feature of image well, but the problem of inaccurate positioning of location exists. As a result of the isotropic Sobel operator having more accurate weighted coefficients of position, the isotropic Sobel operator is used to improve the structural tensor matrix to extract the texture feature of video frames, and the directional templates of isotropic Sobel operator are shown in Figure 1.

For the image , let and represent the horizontal gradient and vertical gradient of image, respectively. Then, the structural tensor matrix of image iswhere , , and .

Calculating the horizontal gradient and the vertical gradient of isotropic Sobel operator according to Figure 1. The calculation equations are as follows:where and are replaced with and , respectively, and , , and . Since the eigenvalues and can represent the overall trend of grayscale in the window and the contrast in the direction of eigenvector in the structural tensor matrix, the consistency of video frames can be calculated by and :

According to the central-peripheral difference algorithm (C-S algorithm) of the Itti model [12], the calculation formula of texture feature maps obtained by the consistency of video frames iswhere represents the center of receptive field, () represents the periphery of the receptive field, and represents the difference operator among different scale feature maps.

The normalized function is used to perform cross-scale fusion of six texture feature maps to form a single-feature salient map :

In summary, the texture feature salient map can be obtained.

3.2. Shape Feature

Aiming at the problem that the edges of salient maps are blur by the Itti algorithm, many scholars have done a lot of research. Long and Wu [13] adopts the improved Canny operator to extract the shape feature to locate the edges of salient maps accurately, but the computational load is large. In this paper, the boundary function is used to analyze the consistency of shape. The SUSAN corner detection algorithm is used to extract the shape feature of video frames based on the idea of corner point; the corner detection algorithm has the advantages of simple calculations, accurate positioning, and strong antinoise ability compared with the traditional edge detection algorithms and Harris corner algorithm, KLT corner algorithm, Kitchen–Rosenfeld algorithm, and so on [14].

The steps that the SUSAN corner detection algorithm extracts the shape feature as follows:(1)Firstly, defining a graphic template containing 37 pixels to slide on the video frames and determining whether the pixel belongs to the USAN region. The discriminant is as follows:where represents the length of neighborhood pixels from the central point, represents the point of central location, and represents the value of similarity demarcation.In order to obtain more stable results, the similarity calculations of pixel points are performed in the following equation:(2)Calculating the total similarity ,where represents the area of graphic template centered on .(3)According to obtained, the initial corner points are determined by using the corner response function:where is introduced to eliminate the effects of noise.(4)A final set of corner points is obtained by the operation of nonmaximum suppression among the initial corner points:where and represent the coordinates of corner points, respectively, and and represent the width and height of video frames, respectively.(5)Since the boundary function can describe the border of objects, the elements which are in the set of corner points mentioned are brought into the coordinate function , and the coordinate function is convolved with the Gaussian kernel linearly at the scale to obtain the boundary function corresponding to the set of corner points. The boundary function is taken as a measure of shape feature in this paper:where represents the length of border, represents the center of regions, and represents the Gaussian kernel function at the scale .(6)According to the C-S operation, the calculation formula of shape feature maps obtained by the boundary function is as follows:(7)The normalized function is used to perform the cross-scale fusion of six shape feature maps to form a single-feature salient map :

In summary, the shape feature salient map can be obtained.

3.3. Motion Feature

Aiming at the problem that the salient maps extracted by Itti model are incomplete, a large number of scholars have studied it. Jalink [15] proposes to improve the integrity of salient maps by using the MoSIFT algorithm to calculate the motion feature, but the MoSIFT algorithm has high computational complexity, which affects the real-time in the extracted process of salient regions. The human visual mechanism indicates [10] that the description of moving objects by human eyes is localized and the description of feature points by the SURF algorithm is also localized. Therefore, the description of motion feature by the SURF algorithm is more attuned with the human visual attention mechanism. The SURF algorithm is used to improve the MoSIFT algorithm to extract motion feature of video frames based on the idea in this paper, which reduces the computational complexity of the MoSIFT algorithm and obtains more stable motion feature. The steps of the MoSIFT algorithm are shown in Figure 2.

The steps of extracting motion feature by the improved MoSIFT algorithm are as follows:(1)In the construction phase of Hessian matrix, the SURF algorithm uses Hessian matrix to extract the feature points. Since the elements which are in the Hessian matrix are calculated by second-order Taylor expansion, the computational complexity is high. Based on the theory of linear scale space (LOG), the derivation of function is equal to the convolution between the function and the derivative corresponding to the Gaussian function. And, the elements in the Hessian matrix are calculated bywhere represents a quadratic function corresponding to the image and represents a Gaussian function corresponding to the quadratic function.(2)The SURF algorithm is used for downsampling. Compared with the SIFT algorithm, the SURF algorithm keeps the size of frames unchanged and changes the size of filter for downsampling, which reduces the computational complexity in the process of downsampling. Each pixel processed by Hessian matrix is compared with 26 pixels in the two-dimensional image space and the neighborhood scale space, which locates the preliminary key points and filters out the key points with weak energy and error location to determine the final stable SURF feature points . In the determination stage of direction of feature points, the total points of horizontal and vertical Harr wavelet feature in the 60-degree fan are counted in the circular neighborhood of feature points, and the direction of the longest vector is selected as the direction of feature points by traversing the entire image. The optical flow corresponding to the scale of SURF is calculated according to Figure 2.(3)Combining the SURF feature points with the optical flow to extract the descriptors of feature points , which are regarded as motion features in this paper, we getwhere and represent weights which are constants, and the motion feature are obtained by weighting the 64-dimensional vector of SURF and the corresponding 64-dimensional vector of optical flow. The specific description is that the values calculated by 64-dimensional Harr wavelet feature in SURF and the values calculated by the corresponding 64-dimensional Harr wavelet feature in optical flow are weighted to compose the 64-dimensional feature descriptors.(4)According to the C-S operation, the calculation formula of motion feature maps obtained by the feature descriptors is as follows:(5)The normalized function is used to perform the cross-scale fusion of six motion feature maps to form a single feature salient map :

In summary, the motion feature salient map can be obtained.

3.4. Feature Fusion Based on Adaptive Weight

The Itti algorithm adopts the mean method to assign weights to each single feature salient map, which ignores the contribution of each feature salient map to the final salient map, which affects the overall performance of feature salient map merged. Shannon’s information entropy theory [16, 17] can describe the overall statistical characteristics of source objectively, which can describe the contribution rate of the current impact factor to the whole. This paper determines the adaptive weight coefficient of each single feature salient map based on the information entropy theory, which can improve the accuracy of video frames merged.

Let a random variable of a single feature salient map is and its information entropy can be represented bywhere represents the probability when removing the ith signal and represents the information entropy of single feature salient map.

The values of information entropy of the brightness feature salient maps , the color feature salient maps , the orientation feature salient maps , the texture complexity salient maps , the shape feature salient maps , and the motion salient maps obtained are calculated according to equation (25), respectively, shorted for , , , , , and . The adaptive weight coefficient is calculated as follows:where represents the sum of , , , , , and , where .

The calculation formula of the final salient map is

In summary, the extraction of ROI of video frames can be completed, which solves the problem that the Itti model extracts the incomplete ROI and the positioning edge is blur. In order to verify the effectiveness of the above algorithm, three pictures from MSRA, the 208th frame of the tennis sequences, and the 25th frame of the basketball sequences provided by JCT-VC are used to perform experiments comparatively for the ORC algorithm, the Itti algorithm [18], the GBVS algorithm [19], the SR algorithm [20], the FT algorithm [21], the CAS algorithm [22], and the LC algorithm [23].

As shown in Figure 3, the ORC algorithm can extract the ROI in the image or video frame accurately. And, the ORC algorithm performs better in terms of the integrity of ROI extracted and the positioning accuracy of edge in the ROI compared with the other six algorithms, which proves the effectiveness of the ORC algorithm.

4. Target Bit Allocation of CTU Layer Based on Space-Time Domain

Aiming at the problem that MAD used as the weight index of CTU cannot measure the complexity of current CTU accurately in the process of target bit allocation, many scholars have studied it. Khoshnevisan and Salmasi [24] propose to take the gradient as the weight index of CTU, but ignored the contribution of influencing factors in time domain. Studies have shown that [25] the gradient of each pixel in the frame has a linear relationship with the bits allocated. The bits allocated of CTU are taken as the allocated weight of the complexity of space-time domain based on ROI, the gradient is used to measure the complexity of CTU, and the weight of bit allocation is redistributed at the CTU layer, which improves the accuracy of rate control algorithm. Then, the weight of bit allocation of CTU layer is distributed once again by the adaptive weight algorithm, which makes the output videos more attuned with human visual attention mechanism.

As a result of human eyes being sensitive to the gradient information in the image, we take gradient as a measure of complexity of current CTU:where and represent the height and width of current CTU, respectively, and represents pixel value of the component of brightness at the position of CTU.

In equation (6), represents the average number of bits per pixel in the image or CTU determined by the value of target bit allocation at the current image layer or the CTU layer; we take as the average number of bits per pixel in the CTU determined by the value of target bit allocation at the current CTU layer. The average number of bits of current CTU is as follows:where represents the total number of target bits in the current frame and represents the total number of CTU in the current frame.

As a result of the CTU of current frame and the CTU of adjacent frame having the similar texture feature, the complexity of current CTU is measured by the complexity of current CTU and the complexity of CTU corresponding to the adjacent frame. The weight of bit allocated of current CTU iswhere represents the average number of bits allocated for the CTU in current frame, represents the average number of bits allocated for the CTU in adjacent frame corresponding to the current CTU, represents the texture complexity obtained of current CTU according to equation (28), and represents the texture complexity obtained of CTU corresponding to adjacent frame according to equation (28).

The redistribution of weight of bits at the CTU layer can be achieved by following the above steps.

The human visual attention mechanism indicates that the attentiveness of human eyes from video frames is distributed in the central regions, while the attentiveness of human eyes from the peripheral regions of video frames is small [10]. We adjust the weight allocated to the target bits at the CTU layer to realize the redistribution of target bits once again based on the idea.

According to Figure 4, we determine the central coordinates of CTU in the current frame and the central coordinates of CTU coding in the current frame firstly, and the Manhattan distance is used to determine the weight of in the ROI and the weight of in the RONI in the current frame. The calculation formula is as follows:

After the weights and are obtained, equation (32) designed is used to adjust the weight once again:

In summary, the calculation of weight of bits at the CTU layer can be completed. The more bits are allocated to ROI of high texture complexity in the compressed frames, and the less bits are allocated for RONI of low texture complexity in the compressed frames, which makes the output videos more attuned to human visual attention mechanism.

In order to verify the effectiveness of the ORC algorithm, the 25th frame of the BasketballPass sequences and the 208th frame of the tennis sequences provided by JCT-VC are used for the experiment. Figure 5 shows the experimental results of the ORC algorithm and the K0103 algorithm of the HM10.0 model.(a)The original image of the 208th frame of Tennis sequences(b)The experimental result of the K0103 algorithm(c)The experimental result of the ORC algorithm(d)The 25th frame of BasketballPass sequences(e)The experimental result of the K0103 algorithm(f)The experimental result of the ORC algorithm

From the perspective of vision, the subjective quality of video frames compressed by the ORC algorithm is better than the subjective quality of the K0103 algorithm in the HM10.0 model in Figure 5. The quality of videos compressed is improved by the ORC algorithm. To further illustrate the effectiveness of the ORC algorithm, the enlarged details of the 208th frame of Tennis sequences are displayed for comparison. Figures 6 and 7 show the results of ROI and RONI in enlarged details.

We can see that the subjective quality of ROI by the ORC algorithm is better than the K0103 rate control algorithm from Figure 6. The value of QP of CTU is 29 in Figure 6(a) by the code stream analysis software, Elecard HEVC analyzer, and the value of QP of CTU is 28 in Figure 6(b), which indicates that the ORC algorithm can allocate more rates to the ROI. We can see that the subjective quality of RONI by the K0103 rate control algorithm is better than the ORC algorithm from Figure 7. The value of QP of CTU is 30 in Figure 7(a) by the code stream analysis software, and the value of QP of CTU is 31 in Figure 7(b), which indicates that the ORC algorithm reduces the allocation of rates to the RONI. On the basis of above analysis, it is verified that the ORC algorithm can redistribute the rates of RONI and ROI in the video frames, which makes the output videos more attuned with human visual attention mechanism.

5. Update of Parametric Model Based on Quasi-Newton Method

Aiming at the JCTVC-K0103 rate control algorithm adopting the gradient descent method to update the parametric model with the problem of slow convergence speed and high computational load, Li et al. [26] propose to use the Newton method to update the parametric model, which reduces the complexity of rate control algorithm to some extent, but the calculation load is still large and the overall performance is improved limited in the updated phase of the parametric model. This paper introduces the quasi-Newton method to update the parametric model and uses the BFGS algorithm to update the positive definite matrix approximated to the inverse of Hessian matrix, which reduces the computational load of the parametric model.

The quasi-Newton method is used to solve the optimization problems usually. The basic idea is to take the optimal solution of quadratic model as search direction, which obtains a new iterative point , and update in the each iteration. The iterative equation of quasi-Newton method is as follows:where represents the nth iterative point, represents the (n + 1)th iterative point, represents a constant, and represents a positive definite matrix approximated to the inverse of Hessian matrix. The specific implementation steps of the ORC algorithm are as follows:(1)In the implementation process of target bits, the relationship between rate and distortion is shown in equation (3). According to equation (3), the distortion value estimated from the target bit rate and the actual coding distortion value are as follows:where and represent parameters related to the characteristics of video content before the parametric model is updated; and represent parameters related to the characteristics of video content after the parametric model is updated; and represent target bit rate and actual bit rate, respectively.(2)Taking the logarithm of equations (34) and (35), respectively,The error between the value of actual coding distortion and the value of distortion estimated by the target bit rate from equations (36) and (37) are as follows:where and . And, taking the derivative of and , respectively,(3)Performing the iteration by quasi-Newton method according to equation (33), we getAs we can see from equation (4), , , thenwhere and represent the parameters while determining the quantization parameter QP and and represent parameters updated. The parameters and updated can be obtained by simultaneous equations (40)–(43).(4)Selecting as the approximation of according to the following conditions of quasi-Newton method, we getwhere and , represents the first derivative of target function at . In this paper, the BFGS algorithm is used to calculate , and the performance and accuracy of BFGS algorithm are higher compared with the DFP algorithm and the Broyden algorithm. The equation of is as follows:where , , and the positive definite matrix generated satisfies the following equation:

In summary, the updated process of and can be achieved. In order to verify the performance of the improved algorithm compared with the K0103 rate control algorithm of the HM10.0 model, five different test sequences provided by JCT-VC with 200 frames selected are experimented, which are under the two configuration files of the type of low delay LDMmain and the type of random access RAMmain. And, the coding efficiency is measured by encoding time saved [27, 28]. The calculation equation of is as follows:where represents encoding time of the algorithm in this paper and represents encoding time of the K0103 rate control algorithm.

From the comparison of experimental data in Table 1, we can see that the coding time by the algorithm in this paper is lower than the coding time of K0103 rate control algorithm regardless of the configuration file being LDMmain or RAMain, which reduces the computational load in the updated phase of the parametric model, and the algorithm in this paper is more effective.

6. Experiment

6.1. Experimental Parameters and Evaluation Indicators

In order to verify the validity of the algorithm in this paper, the hardware configurations are as follows: Inter(R) Core(TM) i5-3470, the main frequency 3.2 GHz, the memory 4 GB; the software configurations are as follows: Microsoft Visual Studio 2010 and OPENCV2.4.10 are used as experimental platform. Simulation was performed to verify the compression performance of the ORC algorithm in the HM10.0 model. The experimental data sets come from five different levels of video sequences provided by JCT-VC. The configuration files are LD configuration files of the IPPP coding structure. The number of experimental frames is 100 frames. In order to evaluate the compression performance of the ORC algorithm under different sequences, the component of peak signal-to-noise ratio of brightness , the increment of the component of peak signal-to-noise ratio of brightness , the error of rate , the peak signal-to-noise ratio , the percentage increase of the bit rates , and the reduction of peak signal-to-noise ratio [29, 30] are used as indicators to measure the compression performance of the algorithm in this paper. The equations of and are as follows:where represents the component of peak signal-to-noise ratio of brightness of coded image obtained by the ORC algorithm, represents the component of peak signal-to-noise ratio of brightness of coded image obtained by the K0103 rate control algorithm, represents the actual output rate obtained by the ORC algorithm, and represents the target bit rate.

6.2. Analysis of Experimental Results

The performance of the ORC algorithm in this paper is evaluated from two aspects: compression performance and rate control accuracy. The experimental comparison results of the ORC algorithm, the K0103 algorithm, and the algorithm in [31] are shown in Tables 2 and 3, respectively, under four sets of target bit rate, which are 500 kbps, 1 Mbps, 2 Mbps, and 4 Mbps.

From the experimental data in Tables 2 and 3, we can see that the range of by the ORC algorithm is , and the mean value is . The range of by the K0103 algorithm is , and the mean value is . The range of by the algorithm in [31] is , and the mean value is . The output bit rates of the ORC algorithm are more in line with the target bit rate. From the perspective of rate control accuracy, the ORC algorithm can control the rate more accurately compared with the K0103 algorithm and the algorithm in [31]. At the same time, the value of Y-PSNR by the ORC algorithm is improved by 0.065 dB and 0.045 dB compared with the K0103 rate control algorithm and the algorithm in [31], respectively. From the analysis of compression performance, the ORC algorithm improves the overall quality of videos while maintaining the rate control accuracy.

In order to analyze the compression performance of the ORC algorithm specifically, BasketballPass sequences with complex texture and Kimono sequences with flat texture are selected for the experiment. Figure 8 shows the comparison results of Y-PSNR each frame among the ORC algorithm, the K0103 algorithm, and the algorithm in [31] when QP = 27.

We can analyze that the average value of Y-PSNR by the ORC algorithm is larger compared with that by the K0103 algorithm and the algorithm in [31] under two sequences of different complexities from Figure 8. The ORC algorithm can measure the texture complexity of video frames more accurately, and the compression performance gets promoted.

Table 4 shows the experimental data of PSNR in the ROI and RONI of four different sets of video sequences with 100 frames per set under different target bit rates. As we can see from Table 4, the average PSNR of the K0103 algorithm in the ROI is 36.77 and the average PSNR in the RONI is 36.72. The average PSNR of the algorithm in [31] in the ROI is 37.07, and the average PSNR in the RONI is 36.48. The average PSNR of the ORC algorithm in the ROI is 37.55, and the average PSNR in the RONI is 36.17. Compared with the K0103 algorithm and the algorithm in [31], the average values of PSNR increased to 0.78 dB and 0.48 dB, respectively, by the ORC algorithm, which means that the ORC algorithm has a definite advantage to improve the quality of videos compared with the others two algorithms. At the same time, according to the analysis of the data in Table 4, we can see that the range of PSNR by the K0103 algorithm between ROI and RONI is 0∼0.08 dB, the range of PSNR by the algorithm in [31] between ROI and RONI is 0.23∼1.24 dB, and the range of PSNR by the ORC algorithm between ROI and RONI is 0.52∼2.32 dB. Compared with the K0103 algorithm and the algorithm in [31], the ORC algorithm has a larger difference of PSNR between ROI and RONI, which indicates that the ORC algorithm can make the output videos more attuned with the human visual attention mechanism.

In terms of rate distortion performance, the experimental results of the ORC algorithm compared with the K0103 algorithm and the algorithm in [31] under five categories of test sequences are shown in Tables 5 and 6.

From the experimental comparison results in Tables 5 and 6, we can see that the increment of BDBR by the ORC algorithm is reduced by 37.80% and 35.34% on average and the loss of BDPSNR by the ORC algorithm is reduced by 0.0356 dB and 0.0329 dB on average compared with the K0103 algorithm and the algorithm in [31], respectively. The ORC algorithm maintains a better rate distortion performance compared with the K0103 algorithm and the algorithm in [31].

In order to show the rate distortion performance of the ORC algorithm more intuitively, Figure 9 shows the comparison results of the rate distortion (RD) curves among the ORC algorithm, the K0103 algorithm, and the algorithm in [31] under the Johnny sequences and the ParkScene sequences, which have different resolutions.

We can see that the rate-distortion curves of the ORC algorithm are above the rate-distortion curves of the K0103 algorithm and the algorithm in [31] from Figure 9, which indicates that the Y-PSNR of the ORC algorithm is larger compared with the K0103 algorithm and the algorithm in [31] when the rates are same, and the RD performance is more optimized. The overall performance of the ORC algorithm is better.

7. Conclusions

This paper proposes an optimized rate control algorithm of ITU-T H.265/high-efficiency video coding based on the region of interest. The algorithm improves the Itti model based on the space-time domain firstly, which extracts the ROI of video frames. And, the target bits of CTU layer are redistributed based on ROI so that the output videos are more attuned with the human visual attention mechanism. Finally, the quasi-Newton method is used to update the parametric model, which reduces the computational complexity in the updated phase of the parametric model. The experimental results show that the ORC algorithm has better compression performance and rate control accuracy than the K0103 algorithm and the algorithm in [31], which can obtain better compression results of videos.

Data Availability

The datasets available for research articles come from http://trace.eas.asu.edu/yuv/index.html and http://www.iaprtc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.