FPGA Implementation of Optimal 3D-Integer DCT Structure for Video Compression

Jacob, J. Augustin; Kumar, N. Senthil

doi:https://doi.org/10.1155/2015/204378

The Scientific World Journal

On this page

Abstract Introduction Experimental Results Conclusion References Copyright Related Articles

Research Article | Open Access

Volume 2015 | Article ID 204378 | https://doi.org/10.1155/2015/204378

FPGA Implementation of Optimal 3D-Integer DCT Structure for Video Compression

J. Augustin Jacob¹and N. Senthil Kumar²

Academic Editor: Marco Listanti

Received01 Jun 2015

Revised15 Sept 2015

Accepted17 Sept 2015

Published27 Oct 2015

Abstract

A novel optimal structure for implementing 3D-integer discrete cosine transform (DCT) is presented by analyzing various integer approximation methods. The integer set with reduced mean squared error (MSE) and high coding efficiency are considered for implementation in FPGA. The proposed method proves that the least resources are utilized for the integer set that has shorter bit values. Optimal 3D-integer DCT structure is determined by analyzing the MSE, power dissipation, coding efficiency, and hardware complexity of different integer sets. The experimental results reveal that direct method of computing the 3D-integer DCT using the integer set [10, 9, 6, 2, 3, 1, 1] performs better when compared to other integer sets in terms of resource utilization and power dissipation.

1. Introduction

Nowadays most video compression algorithms rely on reducing the spatial and temporal redundancy by motion compensation and prediction. However these algorithms are complex and no symmetry exists between encoding and decoding block. This has made implementation of the algorithm more complex. 3D-DCT based video coding [1] is considered as an alternate to the existing standard video compression algorithms. It eliminates some of the problems like blocking effect caused by motion estimation algorithm, which is lossy and time-consuming [2]. For a video sequence that involves fast motion object, motion estimation may not yield correct motion vector since full search cannot be done in a given video stream.

Few research efforts are made to enhance the 3D-DCT based video codec [3–5] and made comparable to the standard video compression algorithm. If implementable structure exists for 3D-integer DCT, that will further accelerate the encoding process. A lossy compression scheme has been developed by Zaharia et al. [6] that apply 3D-DCT for compressing 3D integral images and they showed that it outperforms the JPEG standard. Even though recent compression standards developed using discrete wavelet transform outperform the JPEG standard, DCT is the preferred one, because fast computation structures exist for DCT. It reflects the need for proposing new hardware for 3D-integer DCT. However no attempt has been made to implement 3D-integer DCT algorithm. It is essential to find the suitability of 3D-DCT based video coders in real time application by analyzing the hardware complexity.

Most standard video compression algorithms like MPEG and H.26X adopt DCT as part of their standard. This had led to the development of many fast 1D- and 2D-DCT algorithms. The fundamental aim behind the development of new algorithm for DCT is to reduce the number of multiplications and additions. In order to compute DCT for a given input sequence of length it requires multiplications and additions. The fast DCT algorithm stated in [7] reduces the computational complexity to multiplications and additions. A few algorithms and implementation structure exist for computing real valued 1D-DCT and 2D-DCT [8–23]. Among them the algorithm presented by Prado and Duhamel [16] is given significant importance because the study reveals that if an optimal algorithm is obtained for 1D-DCT then the extension to the corresponding 2D-DCT and 3D-DCT algorithm will also be optimal. However implementing the real value transform becomes more complex since the need of floating point multiplier is unavoidable even if it consumes more resources. Cham et al. [24] have presented a simplified algorithm that first converts the floating point to fixed point and then performs DCT. However exact energy transformation will not happen in this case because of the floating to fixed point conversion. The errors occurring during the computation of 1D-DCT are propagated to the third dimension.

Currently DCT with integer coefficients are of great interest, because the design is simpler and implemented more efficiently. An improvement over traditional real and fixed point implementation was proposed by Edirisuriya et al. [25]. In this paper DCT was computed using integer values. So there is no need to design floating point multiplier that consumes more resource and time. The survey undoubtedly shows the usage of integer DCT in 3D-DCT based video and image compression algorithms. However efforts to design the hardware for 3D-integer DCT are rare in the literature. A few approximation methods are available for deriving the equivalent integer DCT from real value DCT. It is classified as indirect or C-matrix transform method proposed by Kwak et al. [26] and direct method by Pei and Ding [27]. In these papers the two approximation methods (direct and indirect) are considered for analysis and optimal integer set for computing 3D-integer DCT is determined based on MSE and coding efficiency.

Finally based on power dissipation and resource utilization optimal structure for 3D-integer DCT is determined.

2. 3D-Discrete Cosine Transforms

The discrete cosine transform (DCT) is a member of a family of sinusoidal unitary transforms. It found applications in digital signal processing and particularly in image/video compression. The family of discrete trigonometric transforms consists of 8 versions of DCT. Each transform is identified as even or odd and of types I, II, III, and IV. All present image and video processing applications involve only even types of the DCT. In particular DCT-II received much attention in video compression applications because of its high energy packing ability and there exist fast computation structures to compute DCT-II. So throughout the text DCT-II was mentioned as DCT. Equation (1) defines the one-dimensional-DCT and inverse DCT for a finite duration signal of length as whereUsually image and video frames are two-dimensional in nature. Because of the orthogonality and separability property, DCT can be extended to two dimensional forms. The 2D-DCT for a block of pixels of size whose intensity values range between 0 and 255 is defined in where . Consider

The equation for computing 2D-DCT is extended along the temporal domain to get the required expression for computing 3D-DCT. It is defined in (5) and (7). Considerwherewhere and represent the frequency domain and time domain intensity values, respectively. Correspondingly the expression for finding inverse 3D-DCT is given as shown below:

3. Integer Approximation of 3D-DCT Using Indirect Method

In indirect method integer values are obtained using other orthogonal transforms like the Walsh-Hadamard transform. DCT can be implemented using WHT through a conversion matrix shown in where represents discrete cosine transform and is the conversion matrix which converts the Walsh domain vector () into DCT domain. In indirect method there are totally 11 different elements in the conversion matrix. Substitution of variable for each nonzero element in the matrix results in 11 variables denoted as . It is represented in (9), where is approximated conversion matrix: Preserving the signs of the element of a search was made to find suitable integer values. Also it has to satisfy the following algebraic equations:Equations (10) and (11) are conditions of orthogonality and they ensure that rows of are orthogonal to each other. Equation (12) is for normality condition. In order to make resemble those of real valued transform constraints are set on the variables The magnitudes of the elements in are compared and the following inequalities are obtained:All the integer solutions satisfying (10) to (12) under constraints given by (13) to (16) will guarantee that the approximated conversion matrix is orthonormal and close to the original conversion matrix . The generalized signal flow graph of integer approximation using indirect method is given in Figure 1, whereIn Figure 1 the lines indicated in blue color represent addition and dotted lines indicated in red color represent subtraction. Additional information regarding integer approximation can be found in the work done by Britanak et al. [28].

(a)

(b)

4. Integer Approximation Using Direct Method

In direct method equivalent integer values are obtained directly and it replaces the rational number in the DCT matrix. The approximated integer cosine transform matrix is given bywhere is a diagonal matrix with normalization factors on its main diagonal and is an integer matrix. It is seen that totally there are 7 different elements in the DCT matrix. The same variables are used to represent the elements in the conversion matrix having the same magnitude. Substituting a variable for each nonzero element in the matrix results in 7 variables denoted as as it is shown in (19). Set of inequalities are formed so that orthogonality and normality property of DCT matrix is preserved in the integer domain. ConsiderBy solving (20) under set of constraints described in (21) to (23), different integer solutions set are obtained [22]. Integer sets with low mean squared error (MSE) and high transform coding efficiency are preferred to get the optimal solution for 3D-integer DCT. Fast computation structures are obtained by recursive sparse matrix factorization method. The generalized signal flow graph of integer approximation using direct integer DCT is given in Figure 2, where the parameters are integers or dyadic rational.

5. Criteria for Evaluation of Approximated Integer DCT

In order to evaluate the approximation error between the integer DCT and original transform matrix and to measure the difference in performance in data compression, some theoretical criteria are needed. For this purpose, the input signal is frequently modeled as a first-order stationary Markov process (Markov-1) with zero-mean, unit variance, and adjacent interelement correlation coefficient chosen between zero and one. Then, the input signal is defined by a covariance matrix , whose elements are given byThe matrix is symmetric and Toeplitz. The covariance matrix of the transformed vector , where , is obtained from (25):

6. Mean Squared Error

For the evaluation of approximation error between the approximated and original transform matrix, the parameter mean squared error (MSE) was used. It is defined as follows. Let us assume that is the original transform matrix and is its approximation. Then, for a given input vector of length , the error vector isFrom (26), the MSE between the original and approximated transform can be defined bywhere is the covariance matrix of the input signal . Thus, to maintain the compatibility between the original and approximated transform, the MSE should be minimized.

7. Transform Efficiency

Equation (28) defines the transform efficiency: where are elements of . The transform efficiency measures the decorrelation ability of the transform. The optimal KLT converts signal into completely uncorrelated coefficients and it has transform efficiency for all values of , while the DCT has transform efficiency for the correlation coefficient .

8. Structure for Computing 3D-Integer Discrete Cosine Transform

In order to reduce the hardware complexity optimal integer sets from direct and indirect integer approximation are chosen based on number of multiplications/additions. The structure that possesses minimum complexity is considered for computing 3D-integer DCT by taking 1D-integer DCT along row, column, and temporal domain. The block diagram of the proposed 3D-integer DCT is shown in Figure 3. To compute 3D-integer DCT for the cube of dimension, say, , the 1D-integer DCT is initially performed along the row wise and the computed values are stored in buffer “” along the column wise. The process is repeated for all the rows of the cube starting from frame 1 to frame 8. To have clear visualization rows are marked with the same color, as shown in Figure 3. Here the buffer size and cube size are identical. The structure for computing DCT may be from either direct method or indirect method. Similarly 1D-integer DCT is computed for the values stored in buffer “” along the row wise and the results are stored along the column wise in buffer “,” this result in 2D-integer DCT. Then perform one more 1D-integer DCT for the values stored in buffer “” along the temporal direction that gives the 3D-integer DCT value as shown in Figure 3.

9. Experimental Results

9.1. Determination of Optimal Integer Set for Computing 3D-Integer Discrete Cosine Transform

In order to determine the optimal integer set the performances of the proposed 3D-IDCT are compared against the existing real valued transforms with respect to MSE and transform coding efficiency. Different possible integer solutions exist for both the direct and indirect method of computing 1D-IDCT and it is subjected to computing 3D-IDCT. The MSE and transform coding efficiency of the corresponding integer sets along with the computational complexity are listed in Tables 1 and 2.

The integer solutions whose MSE and coding efficiency are very close to real value transform are considered for FPGA implementation. Also it is observed that though the integer set with higher bit solutions (5, 6, 7, and 8) yield low MSE and high coding efficiency, it is not preferred for implementation. Because when computing 3D-integer DCT the size of registers (buffers “” and “”) holding intermediate values becomes larger for higher bit solutions that directly increases the computational complexity (ie) higher bit length multiplier is required. Further, it is noted that for integer set having zero and one, as one of the elements, variation in multiplication/additions is observed.

Here the number of multiplications and additions is estimated based on the structure shown in Figure 3. With reference to the results obtained in Tables 1 and 2 the optimal integer set is determined to be 10, 9, 6, 2, 3, 1, 1 because this integer set yields relatively low MSE and high coding efficiency when compared to real value transform.

Further it was observed that if optimal integer set is used to encode the video sequence instead of real value 3D-DCT there is no much deviation in PSNR value. However it is noticed that the deviation is proportional to the MSE of the corresponding integer set. For the optimal integer set the maximum degradation in PSNR value was found to be 0.01 db.

10. FPGA Implementation of 3D-IDCT

The hardware design for computing 3D-integer DCT for a block of data using the integer set 10, 9, 6, 2, 3, 1 and 1 was coded in Verilog Hardware Description Language. The functional behavior of the design was tested in Xilinx ISE simulator with sample data set. Simulations are also performed using MATLAB for the same data set for correctness. The design was mapped on to Artix-7 FPGA board. The Artix-7 belongs to 28-nanometer (nm) process technology designed for low power products used in portable communication devices. The maximum DC value of 3D-DCT was found to be 4000. If normalization factors are neglected, in integer domain maximum of 17 bits are required to hold the 3D-integer DCT value.

As the value of elements in the integer set increases, then bit length of the processing elements also increases to show that the least resources are utilized for the integer set that has shorter bit values. Synthesis was performed for the integer set 13, 12, 5, 12, 0, 0, 12, 4, 3, 3, 4 and comparison has been made with the optimal integer set. From the device utilization summary shown in Table 3 it was noticed that higher resources are utilized for the integer set 13, 12, 5, 12, 0, 0, 12, 4, 3, 3, 4. It is due to the fact that, for computing 3D-integer set, this integer set requires 25 bits; however for optimal integer set it requires only 17 bits. So when bit length of the integer set increases then bit length of computational unit (multiplication/addition) also increases that leads to higher resource utilization. In order to estimate the power consumption of the design Xilinx Power Estimator (XPE) tool was used. The distribution of on-chip power and total power of the design is shown in Figure 4.

The total on-chip power reflects the heat dissipated from the chip. If the device operates at 100 MHz clock, with the total on-chip power of 0.201 W, then the junction temperature is 25.4°C and it is well below the thermal margin of the target FPGA device. Also a comparison has been made between the existing fixed point 2D-DCT algorithm based on Loffler method [22] and the proposed 3D-integer DCT algorithm in terms of device utilization. It is identified that twelve instances of fixed point 2D-DCT Loffler structures are needed to compute a fixed point 3D-DCT algorithm in accordance with the fact that resource utilization is calculated and it is given in Table 4.

It is clearly seen from Table 4 that the proposed 3D-integer DCT algorithm with optimal integer set 10, 9, 6, 2, 3, 1, 1 outperforms the fixed point 3D-DCT algorithm based on Loffler method [22].

11. Conclusion

In this paper various integer sets from different approximation methods for converting real to integer value transforms are analyzed in terms of MSE and coding efficiency. Based on that, optimal integer set is chosen for computing 3D-integer DCT. Further if optimal integer set was adopted to encode the video sequence, then the deviation in PSNR with respect to real value DCT was found to be 0.01 db. Also a new hardware structure for computing the 3D-integer DCT is proposed and implemented the same in FPGA board. The synthesis results reveal that the least resources are utilized for the integer set that has shorter bit values. Also based on number of additions and multiplications variation in resource utilization is observed. The experimental results reveal that direct method of computing the 3D-integer DCT using the integer set 10, 9, 6, 2, 3, 1, 1 performs better when compared to other integer sets in terms of resource utilization and power dissipation.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

R. Westwater and B. Furht, “Three-dimensional DCT video compression technique based on adaptive quantizers,” in Proceedings of the 2nd IEEE International Conference on Engineering of Complex Computer Systems, pp. 189–198, IEEE, Montreal, Canada, October 1996.
View at: Publisher Site | Google Scholar
D. Le Gall, “MPEG: a video compression standard for multimedia applications,” Communications of the ACM, vol. 34, no. 4, pp. 46–58, 1991.
View at: Publisher Site | Google Scholar
N. Božinović and J. Konrad, “Motion analysis in 3D DCT domain and its application to video coding,” Signal Processing: Image Communication, vol. 20, no. 6, pp. 510–528, 2005.
View at: Publisher Site | Google Scholar
B. Furht, K. Gustafson, H. Huang, and O. Marques, “An adaptive three-dimensional DCT compression based on motion analysis,” in Proceedings of the ACM Symposium on Applied Computing, pp. 765–768, ACM, Melbourne, Fla, USA, March 2003.
View at: Google Scholar
J. Augustin Jacob and N. Senthil Kumar, “An approach to adaptive 3D-DCT based motion level prediction algorithm for improved 3D-DCT video coding,” Przegląd Elektrotechniczny, vol. R. 90, no. 12, pp. 95–99, 2014.
View at: Google Scholar
R. Zaharia, A. Aggoun, and M. McCormick, “Adaptive 3D-DCT compression algorithm for continuous parallax 3D integral imaging,” Signal Processing: Image Communication, vol. 17, no. 3, pp. 231–242, 2002.
View at: Publisher Site | Google Scholar
C. W. Kok, “Fast algorithm for computing discrete cosine transform,” IEEE Transactions on Signal Processing, vol. 45, no. 3, pp. 757–760, 1997.
View at: Publisher Site | Google Scholar
G. Plonka and M. Tasche, “Fast and numerically stable algorithms for discrete cosine transforms,” Linear Algebra and Its Applications, vol. 394, pp. 309–345, 2005.
View at: Publisher Site | Google Scholar | MathSciNet
C. Loeffer, A. Ligtenberg, and G. S. Moschytz, “Practical fast 1-D DCT algorithms with 11 multiplications,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pp. 988–991, Glasgow, Scotland, May 1989.
View at: Google Scholar
D. Hein and N. Ahmed, “On a real-time Walsh-Hadamard/cosine transform image processor,” IEEE Transactions on Electromagnetic Compatibility, vol. 20, no. 3, pp. 453–457, 1978.
View at: Publisher Site | Google Scholar
S. C. Chan and K. L. Ho, “A new two-dimensional fast cosine transform algorithm,” IEEE Transactions on Signal Processing, vol. 39, no. 2, pp. 481–485, 1991.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
H. R. Wu and F. J. Paoloni, “A two-dimensional fast cosine transform algorithm based on Hou's approach,” IEEE Transactions on Signal Processing, vol. 39, no. 2, pp. 544–546, 1991.
View at: Publisher Site | Google Scholar
V. Britanak and K. R. Rao, “Two-dimensional DCT/DST universal computational structure for $2^{m} \times 2^{n}$ block sizes,” IEEE Transactions on Signal Processing, vol. 48, no. 11, pp. 3250–3255, 2000.
View at: Publisher Site | Google Scholar
E. Feig and S. Winograd, “Fast algorithms for the discrete cosine transform,” IEEE Transactions on Signal Processing, vol. 40, no. 9, pp. 2174–2193, 1992.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
P. Duhamel and C. Guillemot, “Polynomial transform computation of the 2-D DCT,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '90), vol. 3, pp. 1515–1518, IEEE, Albuquerque, NM, USA, April 1990.
View at: Publisher Site | Google Scholar
J. Prado and P. Duhamel, “A polynomial-transform based computation of the 2-D DCT with minimum multiplicative complexity,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '96), vol. 3, pp. 1347–1350, IEEE, Atlanta, Ga, USA, May 1996.
View at: Publisher Site | Google Scholar
N. I. Cho, I. D. Yun, and S. U. Lee, “A fast algorithm for 2-D DCT,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '91), pp. 2197–2200, Toronto, Canada, May 1991.
View at: Google Scholar
N. I. Cho and S. U. Lee, “Fast algorithm and implementation of 2-D discrete cosine transform,” IEEE Transactions on Circuits and Systems, vol. 38, no. 3, pp. 297–305, 1991.
View at: Publisher Site | Google Scholar
N. I. Cho and S. U. Lee, “A fast 4×4 DCT algorithm for the recursive 2-D DCT,” IEEE Transactions on Signal Processing, vol. 40, no. 9, pp. 2166–2173, 1992.
View at: Publisher Site | Google Scholar
M. I. Cho, I. D. Yun, and S. U. Lee, “On the regular structure for the fast 2-D DCT algorithm,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 40, no. 4, pp. 259–266, 1993.
View at: Publisher Site | Google Scholar
A. Mohsen, O. Sharifi-Tehrani, and M. Peyman, “Optimizing hardware simulation and realization of discrete cosine transform using VHDL hardware description language,” Australian Journal of Basic and Applied Sciences, vol. 5, pp. 2040–2045, 2011.
View at: Google Scholar
I. Martisius, D. Birvinskas, V. Jusas, and Z. Tamosevicius, “A 2-D DCT hardware codec based on loeffler algorithm,” Elektronika ir Elektrotechnika, vol. 113, pp. 47–50, 2011.
View at: Google Scholar
C.-T. Lin, Y.-C. Yu, and L.-D. Van, “Cost-effective triple-mode reconfigurable pipeline FFT/IFFT/2-D DCT processor,” IEEE Transactions on Very Large Scale Integration Systems, vol. 16, no. 8, pp. 1058–1071, 2008.
View at: Publisher Site | Google Scholar
W.-K. Cham, C.-S. Choy, and W.-K. Lam, “A 2-D integer cosine transform chip set and its application,” IEEE Transactions on Consumer Electronics, vol. 38, no. 2, pp. 43–47, 1992.
View at: Publisher Site | Google Scholar
A. Edirisuriya, A. Madanayake, R. J. Cintra, V. S. Dimitrov, and N. Rajapaksha, “A single-channel architecture for algebraic integer-based 8×8 2-D DCT computation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 12, pp. 2083–2089, 2013.
View at: Publisher Site | Google Scholar
H. S. Kwak, R. Srinivasan, and K. R. Rao, “ $C$ -matrix transform,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 31, no. 5, pp. 1304–1307, 2003.
View at: Publisher Site | Google Scholar
S.-C. Pei and J.-J. Ding, “The integer transforms analogous to discrete trigonometric transforms,” IEEE Transactions on Signal Processing, vol. 48, no. 12, pp. 3345–3364, 2000.
View at: Publisher Site | Google Scholar | MathSciNet
V. Britanak, P. C. Yip, and K. R. Rao, Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations, chapter 3–5, Academic Press, Oxford, UK, 2006.

Copyright

Copyright © 2015 J. Augustin Jacob and N. Senthil Kumar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1498

Downloads

770

Citations