Advances in Computer Engineering

Volume 2015, Article ID 405856, 10 pages

http://dx.doi.org/10.1155/2015/405856

## High Performance Discrete Cosine Transform Operator Using Multimedia Oriented Subword Parallelism

IRISA Lab-CAIRN, CNRS UMR 6074, Lannion, France

Received 27 August 2014; Revised 18 December 2014; Accepted 9 January 2015

Academic Editor: Jenhui Chen

Copyright © 2015 Shafqat Khan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In this paper an efficient two-dimensional discrete cosine transform (DCT) operator is proposed for multimedia applications. It is based on the DCT operator proposed in Kovac and Ranganathan, 1995. Speed-up is obtained by using multimedia oriented subword parallelism (SWP). Rather than operating on a single pixel, the SWP-based DCT operator performs parallel computations on multiple pixels packed in word size input registers so that the performance of the operator is increased. Special emphasis is made to increase the coordination between pixel sizes and subword sizes to maximize resource utilization rate. Rather than using classical subword sizes (8, 16, and 32 bits), multimedia oriented subword sizes (8, 10, 12, and 16 bits) are used in the proposed DCT operator. The proposed SWP DCT operator unit can be used as a coprocessor for multimedia applications.

#### 1. Introduction

In most multimedia applications, image/video data needs to be transmitted. For efficient utilization of channel bandwidth, image compression is done before the transmission. Image compression is the process of reducing the size of the image without degrading the quality of the image to unacceptable limits. The compressed images/videos require less memory to store and less time to transmit. Discrete cosine transform (DCT) is the most widely used operation in video/image compression. Due to the computational efficiency of DCT, different compression standards like JPEG, MPEG, H263.1, and so forth, use cosine function to convert the input pixel values to frequency domain. In the frequency domain, quantization of image is performed to obtain the compression. The quantization is done to such an extent that the image can be reconstructed from compressed data with minimum distortion. Usually the compression ratio depends upon the requirements of quality of the reconstructed image.

Different algorithms and hardware architectures have been proposed to speed up the DCT computation process. Due to the symmetrical nature of computations, these algorithms use different matrix properties to reduce the computational complexity. In [1–3], architectures for two-dimensional DCT for JPEG image compression have been presented. They are based on one algorithm which calculates the DCT in one dimension (1D-DCT) and the 2D-DCT calculation is made using its separability property. This algorithm aims to minimize the number of arithmetic operations to obtain the DCT values. In [4], implementation of DCT and inverse DCT is proposed using Weiping Li algorithm. The resultant design has regular structure and it requires seven constant parallel multipliers. In [5] the modified Loeffler algorithm (11 MUL and 29 ADD) is presented for DCT computations. In [6], fast parallel algorithms for the DCT-kernel-based real-valued discrete Gabor transform (RDGT) and its inverse transform are presented based on multirate signal processing. Similarly in [7], hardware implementations of () DCT and IDCT on different FPGA technologies are presented.

The performance of DCT implementations can be further improved by introducing the parallelism in arithmetic operations at pixel level. For this purpose, multimedia oriented SWP capability can be used in the implementation of DCT operator. In SWP [8, 9], subwords are packed in word size registers and same computations are performed on packed subwords. In this paper a SWP-based operator is proposed for DCT computations. Along with algorithmic efficiency, the performance of the DCT operator is increased through parallel computations on packed pixel data. In the proposed DCT operator, the coordination between pixel sizes and subword sizes is increased by using multimedia oriented subword sizes (8, 10, 12, and 16 bits) rather than classical subword sizes (8, 16, 32 bits, etc.). This coordination further enhances efficiency through a better resource utilization.

The rest of the paper is organized as follows. Section 2 gives a brief overview of the DCT and the 8-point DCT operator used for computations. Section 3 presents the multimedia oriented SWP technique which is used in the SWP DCT operator. Section 4 explains the architecture of the two-dimensional SWP DCT operator. The functions of the different units used in the SWP DCT operator are also explained in detail. Section 5 presents the control unit for two-dimensional DCT computations. Section 6 describes the synthesis results of the SWP DCT operator. The performance analysis of the SWP DCT operator is also presented in this section. Finally the conclusion is presented in Section 7.

#### 2. Discrete Cosine Transform

In an image, most of the useful information is stored at low frequencies. On the other hand, the information like sharp edges, abrupt transitions, and so forth is stored at higher frequencies. Discarding certain information at higher frequencies has less effect on the overall quality of the image. Therefore, in the compression process, pixels at higher frequencies are discarded up to certain extent without causing substantial degradation in the quality of the image. The pixel values in the image are stored in time or spatial domain. In the first step, the image is converted from time domain to frequency domain using DCT. The process of image compression is done in frequency domain. As per the requirements of compression ratio, certain information at higher frequencies is discarded. This loss of information reduces the quality of image to some extent. This type of compression is called lossy compression because the lost information can not be recovered. The compressed image can be stored in less memory space or it can be transmitted in less time compared to uncompressed image. The compressed image can be converted to the time or spatial domain using inverse DCT. This process is shown in Figure 1.