Abstract

Compressive-Sensing Video Coding (CSVC) is a new video coding framework based on compressive-sensing (CS) theory. This paper presents the evaluations on rate-distortion performance and rate-energy-distortion performance of CSVC by comparing it with the popular hybrid video coding standard H.264 and distributed video coding (DVC) system DISCOVER. Experimental results show that CSVC achieves a poor rate-distortion performance when compared with H.264 and DISCOVER, but its rate-energy-distortion performance has a distinct advantage; moreover, its energy consumption of coding is approximately invariant regardless of reconstruction quality. It can be concluded that, with a limited energy budget, CSVC outperforms H.264 and DISCOVER, but its rate-distortion performance still needs improvement.

1. Introduction

Video communication is an important type of data communication. Compression coding must be done before high-dimensional video signal is transmitted in channels with limited bandwidth. Therefore, video compression coding has become a hot research topic in digital video communication. The international video coding standard H.264 [1], jointly developed by ISO/IEC and the ITU-T, has been widely used in various video technologies, and H.264 has achieved great commercial success. H.264 standard uses motion estimation and discrete cosine transform to eliminate temporal and spatial redundancy of video sequences, and its coding complexity is much greater than decoding complexity. For instance, when the test sequence Foreman with CIF format is processed by H.264 codec, the encoding time is about 50 to 90 times as long as the decoding time in different quantization steps, which means that H.264 has strong applicability for the situation of one coding and multiple decoding, such as video broadcasting and video on demand. For the wireless communication equipment, long-time encoding means reduced economics and practicality; therefore, video coding method with low coding complexity is needed as an alternative. In this case, DVC [2], which was first proposed by Wyner and Ziv in information coding theory [3], has received widespread attention. In the initial stage of DVC research, the main codec algorithms include Wyner-Ziv video coding [4], PRISM video coding [5], hierarchical Wyner-Ziv video coding [6], and DVC scheme based on wavelet coding [7]. With an aim of improving coding performance, European Union scientific research institutions put forward special research plan, and, based on the existing research, develop a DVC standard program called DISCOVER (DIStributed COding for Video sERvices) [8]. DISCOVER makes the low-complexity video coding performance further enhanced. But the feedback channel [9] and the virtual channel [10] in DISCOVER scheme are highly controversial, which is an important engineering problem hindering its popularization and application. CS theory [1113] combined with video coding has led to the emergence of a new low-complexity video coding scheme called CSVC [14]. The scheme still retains the distributed characteristics and does not depend on feedback channel or virtual channel and has great engineering application potential, which has attracted many scholars’ attention [1517].

At present, there is still a lack of discussion on the comparison of rate-distortion performance between CSVC, H.264, and DISCOVER. It will help to clarify the upper limit of performance improvement of CSVC by obtaining the result of the performance difference between the three video coding schemes. The rate-distortion performance cannot show the relationship between coding energy consumption and video reconstruction quality. Therefore, rate-energy-distortion performance [18] also needs an objective evaluation of CSVC, H.264, and DISCOVER. This paper first summarizes the basic framework and technical details of H.264 and DISCOVER. Next, a typical algorithm of CSVC will be described in detail. Finally, on the basis of the theories exposition, rate-distortion performance and rate-energy-distortion performance of the three video coding schemes are evaluated, and the performance difference between the three video coding schemes is then fully discussed. The experimental results show that, under the test of CIF videos, respectively, named Bus, Football, Foreman, and Mobile, in terms of the rate-distortion performance, H.264 is optimal and DISCOVER follows while CSVC is the worst and has a large performance difference from the other two; but, in terms of rate-energy-distortion performance, CSVC is optimal, DISCOVER follows, and H.264 is the worst. The results also reveal a fact that the energy consumption of CSVC is approximately the same regardless of reconstruction quality, while, for H.264 and DISCOVER, there is a close correlation between recovery quality and the energy consumption of coding. As a result, it can be concluded that when low energy consumption is demanded, CSVC program can give full play to its advantages, but its rate-distortion performance still needs improvement.

2. Typical Video Coding Schemes

2.1. H.264

H.264 system is divided into two levels in function: Video Coding Layer (VCL) and Network Abstraction Layer (NAL). Its codec framework is shown in Figure 1. In the coding process, there are two options to predict the current image block : interprediction and intraframe prediction. When interprediction is adopted, the motion vector of current block is obtained by the motion estimation according to the reference block, and then the predicted frame could be obtained by the motion compensation method; when intraprediction is used, the predicted block in the current frame is the weighted average of the selected adjacent decoded blocks of the current block. After the predictive frame is determined, the main steps of codec process are as follows.

Step 1. Calculate the residual between the current block and the predicted value .

Step 2. Obtain the quantized coefficient by transforming and quantizing .

Step 3. Form the bit stream by reordering and entropy coding of quantized coefficient .

Step 4. Transmit the bit stream to the decoder side through NAL; at the same time, use part of the bit stream, which could be decoded on the encoder side, as reference frame.

The core technology of H.264 is mainly reflected on its improvement in the interframe and intraframe predictive coding, for example, using 4 × 4 integer discrete cosine transform technology instead of the former 8 × 8 discrete cosine transform technology to avoid mismatches in inverse transformation.

2.2. DISCOVER

The codec framework of DISCOVER system is shown in Figure 2. The encoder side carries on video processing in Group of Pictures (GOP) whose length is determined by the specific situation. The length of GOP will be increased to reduce time redundancy when the image contains a small amount of motion; on the other hand, the length can be shortened accordingly if there is a large amount of motion. For each group, video frames are divided into WZ frames and key frames. H.264 is used for intracoding and decoding of key frame; meanwhile, WZ frame is encoded by Wyner-Ziv encoder, and the core steps of decoding process are given as follows because of its complexity.

Step 1. Extract side information from key frames.

Step 2. Simulate the extracted side information by virtual channel to generate the correlated noise.

Step 3. Perform the soft input calculation on the correlated noise and the transformed side information.

Step 4. Verify the soft input calculation result by the information transmitted from encoder side to judge whether the decoding is successful or not.

In the channel coding process, DISCOVER uses Low Density Parity Check Accumulation (LDPCA) code, which is rate-compatible and is closer to the capacity of all types of channels when compared with Turbo code. The complexity of overall system will be increased because of the request of cumulative syndrome sent from decoder side to encoder side. Therefore, the minimum quantity of syndrome that each bit plane can transmit is set based on the Wyner-Ziv rate-distortion constraint in DISCOVER decoder side to reduce the number of requests and hence obtain higher compression efficiency.

3. Compressive-Sensing Video Coding

CSVC system follows WZ video coding system which is proposed by Wyner-Ziv et al. [4] in the way that it also divides video stream into key frames and nonkey frames, and two different methods are used to implement encoding and decoding of the two types of frames. For key frames, traditional video intraframe coding framework is adopted, or high-rate CS codec is introduced to ensure the high quality of reconstructed key frames. For nonkey frames, low-rate CS measurement is adopted. Side information is first extracted from high-quality key frames and then is combined with the measurement vectors for joint reconstruction of nonkey frames. In the early research of Distributed Compressed Sensing (DCS) video, DCS theory was first proposed in [19], which also demonstrated the possibility of the combination of distributed coding and CS. Since then, domestic and foreign scholars have devoted much attention to the research of DCS video. Typical examples are DIStributed video Coding Using Compressed Sampling (DISCUCS), proposed by Prades-Nebot et al. [20], DIStributed video COmpression Sensing (DISCOS), proposed by Do et al. [21], and improved DISCOS, proposed by Tramel and Fowler [22]. On the basis of the above research, we propose the CSVC system with superior performance in [14]. In this paper, we will evaluate H.264, DISCOVER, and CSVC system in terms of rate-distortion performance and rate-energy-distortion performance. The codec process of CSVC system is described in detail below.

3.1. Encoder Framework

The encoder framework of CSVC system is shown in Figure 3. First, the input video sequence is divided into several GOPs, and key frames and nonkey frames are separated in the group. Then, the key frames and the nonkey frames are divided into nonoverlapping subblocks of size pixels, and each block is arranged in raster order as column vector of length (). Finally, the measurement matrix is constructed to calculate the measurement vector of each block as follows:where and denote the th subblock of the key frame and the nonkey frame, respectively, and and are the measurement matrix constructed by the random Hadamard matrix. For the key frames, the size of the measurement matrix is , so the measurement rate is . For nonkey frames, the size of measurement matrix is , so the measurement rate is .

Then the measurement vectors and of blocks will be transmitted to the quantizer to form bit stream. CSVC system uses the nonuniform quantizer based on Differential Pulse-Code Modulation (DPCM), DPCM-NQ for short, which first computes the residual of measurement value between adjacent subblocks to reduce coding redundancy and then quantifies the residual. In consideration of the high frequency of small residual value, the nonuniform quantization is used to process the residual of each subblock in order to reduce the quantization error. Supposing that the th measurement residual of the th subblock is , it can be compressed according to law as follows:where is the maximum measurement residual of the current frame, represents the symbolic function, and is 10. In the inverse quantization, the estimated value of is calculated using the following decompression formula:After the residual of each block is quantized, the quantized data of all the blocks are subjected to Huffman coding and are encapsulated into data packets to be sent to the decoder side.

3.2. Decoder Framework

The decoder framework of CSVC system is shown in Figure 4. After the data packets are received on the decoder side, the measurement vectors of key frames and of nonkey frames can be obtained by Huffman decoding and inverse quantization. For key frames, the intraframe reconstruction model is used as follows:where is the sparse transform matrix of the video frame , and represents the regularization factor. The reconstructed model (4) can be solved by a variety of still image CS reconstruction algorithms. To ensure high-quality recovery of key frames, CSVC system uses multihypothesis smoothing Landweber iterative algorithm used in [22] to solve model (4).

For the nonkey frame, we firstly obtain the side information of the current nonkey frame by carrying out the side information prediction of the adjacent reconstructed key frame and then calculate the residual measurement vector between the measurement vector of each block and its side information as follows:whereSo (6) can be transformed intowhere is the residual between nonkey frame and side information . According to (8), the residual reconstruction model of nonkey frame can be established as follows:where is the sparse transform matrix of the residual and denotes the regularization factor. The residual reconstruction model (9) is still solved using Landweber iterative algorithm. Finally, the reconstruction of nonkey frame can be calculated as follows:The features of CSVC system constructed according to the above codec process are as follows: () compared to DISCOVER, CSVC system eliminates virtual channel and feedback channel, and thus the difficulty of engineering is reduced; () since there is no correlation between the CS measurement and image content, the code rate is determined only by the measurement rate, which makes it easier for CSVC system to control the code rate; () each measurement value contains all the image information; therefore, it is easy to implement scalable coding; () the data security can be enhanced by the random generation of the measurement matrix. The above features endow CSVC system with more engineering value and make it become a potential new DVC scheme. We are more concerned about the comparison of coding energy consumption between CSVC system and H.264 and DISCOVER, so, in the experiment part, the coding energy consumption of the three systems will be evaluated in detail.

4. Experimental Results and Analysis

The performances of H.264, DISCOVER, and CSVC are evaluated, respectively, using four standard video sequences named Foreman, Bus, Mobile, and Football in CIF format. H.264 adopts the standard coding configuration of JM19.0 model and implements intramode; DISCOVER uses the default encoding configuration; and CSVC adopts the experimental parameter configuration in [14]. The rate-distortion and rate-energy-distortion performances of the three encoders are compared, where rate-distortion reflects the relationship between the code rate and the Peak Signal-Noise Ratio (PSNR) while rate-energy-distortion reflects the relationship between the coding time and PSNR. Using the same experimental platform, the coding time is proportional to the energy consumption; therefore, it can represent the level of coding energy consumption. The experimental platform is MATALB R2012b; the computer system is 64-bit Windows 7 operating system with an installation memory of 8.00 GB and Intel Core i7-4900 processor whose frequency is 3.60 GHz.

4.1. Evaluation on Rate-Distortion Performance

Figure 5 shows the rate-distortion curves for H.264, DISCOVER, and CSVC encoders under different test video sequences. It can be seen from Figure 5 that the PSNR values of the whole reconstructed video processed by different encoder always grow in a positive trend when the code rates increase. On the whole, the encoding effects of H.264 and DISCOVER are always better than CSVC encoder. For the test videos Bus and Football, at the same code rate, the video reconstruction effect of H.264 is the best; for Foreman and Mobile, at the same code rate and in the specific code rate range, the coding effect of DISCOVER is even better than H.264. It can be seen that the rate-distortion performance of H.264 is optimal, DISCOVER follows, and CSVC is the worst and has a big performance difference from the other two. For CSVC, the measurement rate determines the bitrate. When the measurement rate is 0.05, the bitrate is about 6000 kbits/s. If we further decrease the measurement rate, the bitrate will be lower than 6000 kbits/s. The average PSNR of reconstructed video gradually decreases with the measurement rate linearly decreasing. The variation of PSNR curve is smooth, and the PSNR value cannot suddenly reduce when the bitrate drops to below 6000 kbits/s.

4.2. Evaluation on Rate-Energy-Distortion Performance

Figure 6 shows the rate-energy-distortion curves for H.264, DISCOVER, and CSVC encoders under different test video sequences. It can be seen from Figure 6 that, for any video, under the same PSNR value, the encoding time of CSVC is the shortest, DISCOVER follows, and H.264 is the longest. In particular, the average encoding time of CSVC is only about 3 seconds, which means that the energy consumption of CSVC is much lower than DISCOVER and H.264 on the same recovery level. With the PSNR value of reconstructed video increasing, the encoder time of H.264 and DISCOVER gradually increases. But the change of encoder time under DISCOVER framework is steeper and H.264 framework more gentle, which shows that H.264 has a high dependency on energy consumption with its promotion of performance, and DISCOVER also needs a certain amount of energy input. The computational complexity of CS measuring determines the energy consumption at encoder. Suppose denotes the number of CS measurements for a video frame and denotes the total number of pixels in a video frame. The computational complexity of CS measuring is . Because is far below , the variation of energy consumption is very small when changing . However, is an important factor for the reconstruction quality of video frame. The reconstruction quality can be improved effectively with small increments of . Therefore, the slope of the rate-energy-distortion curve is almost vertical, indicating that the small investment of energy consumption can get the significant improvement of reconstruction quality. It can be seen that the rate-energy-distortion performance of CSVC is optimal, DISCOVER follows, H.264 is the worst. Among the three, the energy consumption of CSVC is approximately invariant regardless of reconstruction quality, while the reconstruction quality of H.264 and DISCOVER has a great correlation with the energy consumption of coding.

5. Conclusions

This paper has conducted an experiment-driven analysis of rate-distortion and rate-energy-distortion performances of CSVC algorithm and compares them with that of H.264 and DISCOVER. The rate-distortion and rate-energy-distortion performances of the three systems are evaluated under the same experimental environment. Experiment results show that the rate-distortion performance of CSVC has a large performance difference from H.264 and DISCOVER, but its rate-energy-distortion performance has a greater advantage; that is, the rapid improvement of its reconstruction quality does not depend on coding energy input. Therefore, on the premise that communication bandwidth is effectively improved, CSVC can be used as a candidate for future wireless video communication because of its characteristics, which provides wireless video terminals limited by energy consumption and computing power with more possibilities.

At present, the rate-distortion performance of CSVC is still not ideal, and there is still some way to go before we put CSVC into practical use. Efforts should be made in the following areas to improve the rate-distortion performance of CSVC.

(1) Side Information Estimation. Rate-distortion performance of CSVC is greatly related to the accuracy of side information estimation, which means that high-quality side information immensely reduces the required supply of bit load from encoder side. Therefore, finding the appropriate motion estimation algorithm to obtain more accurate side information and realizing the optimal reconstruction of decoding will become the key to improving the rate-distortion performance of CSVC.

(2) A Priori Structural Feature Modeling of Video Frames. Images of the same type often have similar structural information. Therefore, the reconstructed model can be constructed by evaluating the structural information of the decoded video frames and extracting the prior knowledge, which can reduce code rate and improve the reconstruction quality. For example, the statistical correlation structure can be used in the image transformation coefficient and a tree structure can be adopted for the wavelet coefficient. However, due to the complexity and uncertainty of natural images, further study should be made on how to use a priori knowledge to construct a suitable model.

(3) Quantization Measurement. Uniform quantization is the major method adopted to quantify the CS measurement currently. But the traditional entropy coding method is not ideal for compressing the uniform quantization values because of the statistical independence between the uniform quantization values. Then, how to express the CS value with the least number of bits with the constraint of information-theoretic rate-distortion coding theorem is one of the key topics of the following research. Therefore, it is necessary to propose a new nonuniform quantization method to establish statistical correlation between quantization values and hereby to design a new entropy coding method matching the statistical correlation.

The above-mentioned further researches are employed at decoder to improve the rate-distortion performance of CSVC, but the CS measuring at encoder guarantees the advantage of CSVC in rate-energy-distortion performance. Therefore, the rate-energy-distortion performance cannot be affected while improving the rate-distortion performance of CSVC.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China, under Grant no. 61501393, in part by the Key Scientific Research Project of Colleges and Universities in Henan Province of China, under Grant 16A520069, in part by Youth Sustentation Fund of Xinyang Normal University, under Grant no. 2015-QN-043, and in part by Scientific Research Foundation of Graduate School of Xinyang Normal University, under Grant no. 2016KYJJ10.