About this Journal Submit a Manuscript Table of Contents
International Journal of Distributed Sensor Networks
Volume 2012 (2012), Article ID 582403, 7 pages
http://dx.doi.org/10.1155/2012/582403
Research Article

Novel Side Information Generation Algorithm of Multiview Distributed Video Coding for Multimedia Sensor Networks

Fu Xiao,1,2,3 Jinkai Liu,1 Jian Guo,1,3,4 and Linfeng Liu1,3,4

1School of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou 215006, China
3Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing 210003, China
4Key Laboratory of Broadband Wireless Communication and Sensor Network Technology, Ministry of Education, Nanjing 210003, China

Received 24 August 2012; Accepted 15 October 2012

Academic Editor: Ruchuan Wang

Copyright © 2012 Fu Xiao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The traditional multiview distributed video coding scheme using regional unified coding may lead to distortion problem of decoding estimation of the intense motion region. This paper presents a novel multiview distributed video coding algorithm. In the main perspective, gaining the intense motion regions of Wyner-Ziv frame according to the criteria of ROI, the algorithm extracts their DCT low-frequency coefficients for the entropy coding, in order to generate the best temporal side information. For the nonintense motion regions, the algorithm utilizes motion compensation interpolation (MCI) to generate side information. Finally, side information based on fusion of temporal and spatial side information will be gained. Experimental results show that our proposed algorithm can gain more accurate motion estimation in the intense motion region. Quality of decoded image is improved with the same transmit rate; thus, energy consumption of sensor nodes will be decreased ultimately.

1. Introduction

In recent years, along with the rapid development of the wireless multimedia communication technology [1], digital video requirements are being increased. People would like to see natural characterization of objects clearer and more realistic: the traditional single-view video network can only provide two-dimensional visual, and three-dimensional visual senses cannot be provided better; so the multiview video network appears. In the multiview video network, which has the limits of low power, storage capacity, computational and communication ability,   it does not only need low-complexity encoding, but also  requires real-time video encoding and transmission. Traditional video coding standards, such as MPEG-x or H.26x, mainly rely on the hybrid architecture, encoder using motion estimation to fully exploit the video sequences of time, and spatial correlation information. Since the heavy computing burden of the motion estimation and compensation task in these video compression standards, the encoder is 5 to 10 times more complex than the decoder [2, 3]. Traditional video coding system is not suitable, novel coding methods are required. Wide attention has been focused on new video codec framework, distributed video coding (DVC) from scholars, which uses intraframe encoding and interframe decoding. Decoder explores the correlation of video signals for interframe prediction decoding; so DVC removes the complexity of the interframe prediction coding in encoder. Distributed Video Coding, which has the characteristics of low-complexity encoding and good robustness, can meet the needs of these new video applications very well.

There are some DVC frameworks that have been proposed, such as Girod and Aaron’s Wyner-Ziv video coding [4, 5], PRISM (power-efficient robust high-compression syndrome-based multimedia) [6], Zixiang Xiong’s layered DVC [7], Sehgal’s state-free DVC [8], Wavelet-based DVC [9] and Multiview Distributed Video Coding [1012]. Literature [1012] proposed DVC algorithm based on turbo or LDPC, the Wyner-Ziv frame of Distributed Video Coding in all regions had not considered multiview [13, 14], motion estimation techniques cannot accurately predict the area where there is more intense exercise. Then, the decoder cannot accurately generate the temporal side information (temporal SI). And the side information is merged by the temporal side information (temporal SI) and spatial side information (spatial SI). Therefore, decoder needs to request more feedback information, thus not only increases the rate, but the decoded portion of the image is still not accurate enough. Focus on this problem, an improved multiview distributed video coding algorithm is proposed in this paper. In the main perspective, we can get the intense motion region and the nonintense motion region according to the criteria of ROI. For the intense motion regions, the algorithm extracts their DCT low-frequency coefficients for the entropy coding. Decoder uses decoded DCT low-frequency coefficients for bidirectional hash motion estimation which enables the decoder to choose between past and future reference frames for frame interpolation to obtain the best temporal side information. For the nonintense motion regions, the algorithm utilizes motion compensation interpolation (MCI) to generate temporal side information. Finally, side information based on fusion of temporal and spatial side information can be gained. Simulation results show that this algorithm can enhance the efficiency of the intense motion region, thereby reducing the bit rate while improving the quality of decoded images, energy consumption of sensor nodes can be reduced ultimately.

The rest of the paper is organized as follows. Section 2 introduces the basic principles of multiview DVC. Section 3 proposes the Multiview DVC framework based on DCT hash in detail. In Section 4, the experimental results are given. Finally, Section 5 is the conclusion.

2. Multiview Distributed Video Coding System

The goal of distributed source coding (DSC) is that the complexity of the encoding side is transferred to the decoder in order to achieve efficient compression. In this particular field of multiview distributed video coding (MDVC), the side information is generated from the camera of the internal and external camera interpolation. Multiview video systems tend to produce large amounts of data which have strong correlation to eliminate redundant information in the data, how to improve the compression ratio is one of the key problem for MDVC, however power consumption of the camera sensor nodes are subject to this problem, therefore, we need a low-complexity encoder in order to avoid communication between the complex internal nodes.

Figure 1 depicts a multiview DVC system architecture [1012], two views are adopted and assume that they are still. The camera of the first view as intracamera works in the traditional way, these video streams are independent of the other camera coding. The camera of the second view called Wyner-Ziv camera-independent encode, but utilizes the other video stream to decode. The frames of the first view are encoded using traditional intracoding, the second view utilizes DVC. The key frames (K) in the second view are encoded and decoded using a conventional intraframe codec. Between the key frames are Wyner-Ziv frames which are intra frame-encoded but interframe-decoded. Encoder applies a 8  ×  8 DCT transform on W frame, then the coefficients are quantized using a uniform scalar quantizer. The Slepian-Wolf coder is implemented using a low-density parity-check code (LDPC). The parity bits produced by the LDPC encoder are stored in a buffer, which transmits a subset of these parity bits to the decoder upon request. If the decoder cannot reliably decode the bits, additional parity bits are requested from the encoder buffer through feedback. The request-and-decode process is repeated until an acceptable probability of bit error is guaranteed. The decoded bits are reconstructed as DCT coefficients, and are the finally generated by taking the inverse-DCT and inverse-quantization of the reconstructed DCT coefficients. The side information of multiview is fusion by temporal and spatial side information, so the side information is more accurate. In past research, Slepian-Wolf codec usually adopts Turbo code for error correction coding. While LDPC codes with its excellent performance, simple form and good prospects are gaining increasing interest recently.

582403.fig.001
Figure 1: Multiview distributed video coding framework.

3. Multiview Distributer Video Coding Scheme

3.1. Multiview DVC Framework Based on DCT Hash

Based on [1012], the Wyner-Ziv frames (W frame) of multiview distributed video coding in all regions without distinction, motion estimation techniques cannot accurately predict the area which are more intense exercise. Then, the decoder cannot accurately generate the temporal side information (temporal SI). For this problem, this paper presents an improved multiperspective distributed encoding algorithm. In the main perspective, we select intense motion macroblock as the ROI macroblock, the low-frequency DCT coefficients of the ROI macroblock are selected to help the side information creation at the decoder and then improve the coding efficiency and the decoded image quality. Figure 2 illustrates the multiview distributed video coding  (MDVC) framework based on the DCT hash, the first view uses conventional intracoding scheme, the decoded video stream is transformed into the spatial side information for W frame of the second view by homograph. The K frames of the second view are encoded and decoded by conventional intraframe scheme, while for W frames of the second view, using the scheme of the combination of LDPC coding and entropy coding. At the encoder, according to ROI discrimination algorithm, W frame is divided into ROI macroblock (8 × 8 macroblock) and non-ROI macroblock (8 × 8 macroblock).  Then, for the ROI macroblock, the low-frequency DCT coefficients are selected as DCT hash, which uses the entropy coding. The residual of ROI macroblock and non-ROI macroblock are encoded and decoded by LDPC coding. At the decoder, if a DCT hash is available to guide the motion estimation process, bidirectional hash-based interpolation is performed; otherwise, MCI is used. Then, the temporal side information and spatial side information are fused to generate the best side information. Side information is used in the reconstruction to obtain the decoded DCT coefficients; finally, IDCT and IQ (inverse quantization) are applied to generate the W decoded frame  W′.

582403.fig.002
Figure 2: Multiview distributed video coding framework based on the DCT hash.
3.2. ROI Macroblock Selection and Temporal Side Information Generation [13]

Similar to our previous work [13], in hash-based motion estimation, which is described previously, the hash bits of W frame are sent for all macroblocks to assist the decoder to generate the side information. However, motion vector of many macroblocks is equal to zero or very small in most video sequences; so only little part of blocks has large displacement. For many macroblocks, MCI can make a good estimation. So, it is not necessary to send DCT hash for these macroblocks. Therefore, encoder uses SAD criteria to distinguish ROI, then get ROI and non-ROI macroblock. Similar to our previous work [13], current frame is, and the previous reference frame is; so SAD criteria  can be obtained from the following equation: where represents each macroblock, and is pixel coordinates inside 8  ×  8 macroblock. If  ,   we get ROI macroblock, otherwise, non-ROI macroblock. An adequate threshold T has been found experimentally.

As our previous work [13], the temporal side information of ROI macroblock   is generated by bidirectional hash motion estimation: where,   are the best matching macroblocks of ROI macroblock in the past and future reference frames  , ; is the motion vector of relative to  , is the motion vector of relative to  ; represents the macroblock motion vector in the direction,   is the macroblock motion vector in the direction.

3.3. Spatial Side Information Generation

Adjacent video sensor node monitors the same target scene from different locations and different angles; because of this multi-angle correlation, we can use the video sequence of the adjacent video sensor nodes at the same moment to generate spatial side information by homograph. We assume that all nodes are time synchronized in the wireless multimedia sensor networks. Since the position and perspective of the video sensor node is fixed, so only a homographic matrix transformation is needed. The homograph is a matrix that relates video sensor node to another one in the homogenous coordinates system. As in [3, 15], each point is from is mapped to a point of   up to a scale such that where   are 8 sport parameters. When , the model is an affine geometry transformation;,  , the model is a pure transformation;  ,  ,  , the model is transformation—zoom—rotation. To compute the model parameters, we can use gradient descent method as proposed in [3, 15].

3.4. Fusion Side Information Generation

Combining the temporal and spatial side information, fusion side information can be gained. We get a binary fusion mask in which 0 indicates that the pixel is taken from the spatial side information; 1 indicates that the pixel is taken from the temporal side information. Fusion process can be simply described as follows: the temporal side information and spatial side information with the previous key frame were compared, if the spatial side information is closer to the pixel value of the key frame, we set binary mask to 0; if the temporal side information is closer to the pixel value, we set binary mask to 1. We perform the same processing with future key frame. Thus, binary mask for side information can be obtained. Finally, we perform an OR logic operation between both binary masks to obtain the binary fusion mask. The fusion process is shown in Figure 3.

582403.fig.003
Figure 3: Fusion-based side information.

4. Experiments and Analysis

Similar to our previous work [13], in our simulation experiments, LDPC codes are generated by PEG algorithm [16], and the rate of LDPC code is 7/8. To change the rate, we varied the number of quantization levels, LDPC encoder produces different output bit rate, and then different compression ratio can be obtained. After several experimental analyses and comparisons, 64 is the ideal threshold of ROI criteria in our experiments, DC + 8AC of low-frequency DCT coefficients are selected as DCT hash. The proposed scheme is tested with exit and ballroom [17] which are QCIF (176 × 144) video sequences (at 25 fps and with a total of 100 frames). Video streams of Camera 1 uses the H.263 coding scheme; the Camera 0 uses distributed video coding, Key frames and Wyner-Ziv frames coding sequence for “K-W-K-W”; the key frames K and Wyner-Ziv frames W are alternative coding. We compare the rate-distortion performance of multiview distributed video coding only uses temporal side information, only uses spatial side information, uses fusion side information, H.263 intraframe coding (I-I-I-I), H.263 interframe coding (I-P-P-P), and JPEG coding. H.263 + codec uses TMN8. The experimental results are shown in Figure 4. multiview video coding has significantly better performance (2 to 3 dB) than that of H.263 intraframe coding. However, multiview video coding system is less than the overall complexity of the H.263 intraframe coding. The gap from H.263 interframe coding is a certain distance. The proposed approach can bring improvements up to 0.2–0.5 when compared to the multiview distributed video coding only uses temporal side information. The proposed algorithm can gain more accurate motion estimation in the intense motion region to obtain the best side information. Decoder utilizes the belief-propagation (BP) over cycle-free Tanner graphs in the iterative decoding process.

fig4
Figure 4: The simulation results of two sequences.

Figure 5 shows the decoded images (the  17st  frame) of “exit” and “ballroom” sequences using the fusion side information. Figure 6 shows the decoded images (the  17st  frame) of “exit” and “ballroom” sequences only using temporal side information. The subjective effects of the decoded image have been improved.

fig5
Figure 5: Decoded frame of “exit” and “ballroom” using our proposed algorithm.
fig6
Figure 6: Decoded frame of “exit” and “ballroom” using side information.

5. Conclusion

In this paper, a novel multiview distributed video coding algorithm is presented. In the main perspective, we select ROI macroblocks based on SAD criteria, a bidirectional hash-based interpolation is used to generate side information macroblock for motion intense area; however, motion nonintense area utilizes motion compensation interpolation (MCI). Finally, we gain the side information based on fusion of temporal and spatial side information. Experimental results demonstrate the algorithm’s effeciency.

Acknowledgments

The paper is sponsored by the National Natural Science Foundation of China (61003236, 61170065), the Natural Science Foundation of Jiangsu (BK2011755), Scientific Technological Support Project of Jiangsu (BE2012755, BE2012183), Project sponsored by Jiangsu Provincial Research Scheme of Natural Science for Higher Education Institutions (11KJB520016), Scientific Research and Industry Promotion Project for Higher Education Institutions (JHB2012-7), Doctoral Fund of Ministry of Education of China (20103223120007), Nature Science Foundation of NUPT (KJS1022) and Priority Academic Program Development of Jiangsu Higher Education Institutions (Information and Communication Engineering, yx002001).

References

  1. J. Huang, L. Sun, R. Wang, and H. Huang, “Improved virtual potential field algorithm based on probability model in three-dimensional directional sensor networks,” International Journal of Distributed Sensor Networks, vol. 2012, Article ID 942080, 9 pages, 2012. View at Publisher · View at Google Scholar
  2. G. Jiang, F. Shao, M. Yu, K. Chen, and T. Y. Choi, “Efficient block matching for Ray-Space predictive coding in Free-Viewpoint television systems,” in Proceedings of the 6th International Conference on Computational Science and Its Applications, pp. 307–316, 2006.
  3. W. Yufei, W. Ruchuan, H. Haiping, and S. Lijuan, “Multi-model sensors information based distributed video processing for wireless multimedia sensor networks,” Journal of Image and Graphics, vol. 15, no. 1, pp. 161–166, 2010.
  4. A. Aaron, D. Varodayan, and B. Girod, “Wyner-Ziv residual coding of video,” 2006, http://ivms.stanford.edu/~dsc/wzcodingvideo.
  5. D. Varodayan, A. Aaron, and B. Girod, “Exploiting spatial correlation in pixel-domain distributed image compression,” in Proceedings of the 25th Picture Coding Symposium (PCS '06), pp. 1–4, April 2006. View at Scopus
  6. R. Puri, A. Majumdar, and K. Ramchandran, “PRISM: a video coding paradigm with motion estimation at the decoder,” IEEE Transactions on Image Processing, vol. 16, no. 10, pp. 2436–2448, 2007. View at Publisher · View at Google Scholar · View at Scopus
  7. Q. Xu and Z. Xion, “Layered Wyner-Ziv video coding,” IEEE Transactions on Image Processing, vol. 15, no. 12, pp. 3791–3803, 2006. View at Publisher · View at Google Scholar · View at Scopus
  8. A. Sehgal, A. Jagmohan, and N. Ahuja, “A state-free causal video encoding paradigm,” in Proceedings of the International Conference on Image Processing (ICIP'03), pp. 65–72, September 2003. View at Scopus
  9. B. Wu, X. Ji, D. Zhao, and W. Gao, “Wavelet based distributed video coding with spatial scalability,” in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '08), pp. 3458–3461, Seattle, Wash, USA, May 2008. View at Publisher · View at Google Scholar · View at Scopus
  10. X. Artigas, E. Angeli, and L. Torres, “Side information generation for multiview distributed video coding using a fusion approach,” in Proceedings of the 7th Nordic Signal Processing Symposium (NORSIG '06), pp. 250–253, Reykjavik, Iceland, June 2006. View at Publisher · View at Google Scholar · View at Scopus
  11. J. Zhong, X. Hu, W. B. Kleijn, and E. Kozica, “Constructive camera pose control for optimizing multiview distributed video coding,” in Proceedings of the 47th IEEE Conference on Decision and Control (CDC '08), pp. 3372–3379, Cancún, Mexico, December 2008. View at Publisher · View at Google Scholar · View at Scopus
  12. M. Ouaret, F. Dufaux, and T. Ebrahimi, “Fusion-based multiview distributed video coding,” in Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks (VSSN '06), pp. 139–144, New York, NY, USA, October 2006. View at Publisher · View at Google Scholar · View at Scopus
  13. F. Xiao, J. Liu, L. Sun, and R. Wang, “A new side information estimate based distributed video coding algorithm for wireless multimedia sensor networks,” International Journal of Advancements in Computing Technology, vol. 4, no. 1, pp. 320–326, 2012.
  14. S. Lijuan, M. Ziping, X. Fu, and W. Ruchuan, “A new video compress algorithm for wireless multimedia sensor network,” Journal of Image and Graphics, vol. 16, no. 7, pp. 1276–1282, 2011.
  15. F. Dufaux and J. Konrad, “Efficient, robust, and fast global motion estimation for video coding,” IEEE Transactions on Image Processing, vol. 9, no. 3, pp. 497–501, 2000. View at Publisher · View at Google Scholar · View at Scopus
  16. X.-Y. Hu, E. Eleftheriou, and D. M. Arnold, “Regular and irregular progressive edge-growth tanner graphs,” IEEE Transactions on Information Theory, vol. 51, no. 1, pp. 386–398, 2005. View at Publisher · View at Google Scholar · View at Scopus
  17. “Mitsubishi Electric Research Laboratories, MERL multi-view video sequences,” 2012, ftp://ftp.merl.com/pub/avetro/mvc-testseq.