Abstract

A psychovisual experiment prescribes the quantization values in image compression. The quantization process is used as a threshold of the human visual system tolerance to reduce the amount of encoded transform coefficients. It is very challenging to generate an optimal quantization value based on the contribution of the transform coefficient at each frequency order. The psychovisual threshold represents the sensitivity of the human visual perception at each frequency order to the image reconstruction. An ideal contribution of the transform at each frequency order will be the primitive of the psychovisual threshold in image compression. This research study proposes a psychovisual threshold on the large discrete cosine transform (DCT) image block which will be used to automatically generate the much needed quantization tables. The proposed psychovisual threshold will be used to prescribe the quantization values at each frequency order. The psychovisual threshold on the large image block provides significant improvement in the quality of output images. The experimental results on large quantization tables from psychovisual threshold produce largely free artifacts in the visual output image. Besides, the experimental results show that the concept of psychovisual threshold produces better quality image at the higher compression rate than JPEG image compression.

1. Introduction

Most digital cameras implement a popular block transform in image coding [1]. The sequential block-based coding is a popular technique since it is compact and easy to implement. In standard JPEG image compression, an image is compressed by one block of pixels at a time. The block-based DCT coding has prevailed at reducing interpixel statistical redundancy [2]. However, in order to achieve high compression ratio, image block size with default JPEG quantization tables produces discontinuity of intensities among adjacent image blocks. These discontinuities of the intensity image between two adjacent image blocks cause a visual artifact due to interblock correlations [3] in image reconstruction. Block transform coding always results in blocking artifact at low bit rate. Blocking artifact is one of the most annoying problems [4].

The blocking effect becomes visible within smooth regions where adjacent block is highly correlated with an input image. Since the blocks of image pixels are coded separately, the correlation among spatially adjacent blocks provides boundary blocks when the image is reconstructed [5]. The artifact images in the compressed output are introduced by two factors. First, the transform coefficients coming out of the quantization process are rounded and then inadequately dequantized. Second, the blocking artifacts appear by the pixel intensity value discontinuities which occur along block boundaries [6]. These blocking artifacts are often visually observable.

This research pays a serious attention to the role of block size in transform image coding. Referring to JPEG image compression, the block of image pixels based line coding has provided a low computational complexity. Previously, a compression scheme on block has been investigated by Pennebaker and Mitchell [7] for image compression. This scheme does not provide an improvement on the image compression due to lack of progress on the central processing unit in terms of its computing power at the time. The larger block requires an extra image buffering and a higher precision in internal calculations. Nowadays, the technology of the central processing unit grows rapidly in terms of its computing power. Therefore, two-dimensional image transform on larger blocks is now practically efficient to operate on image compression.

In the previous research, the psychovisual threshold has been investigated on image block size [815]. This paper proposes psychovisual threshold on the large image block of DCT in order to reduce significant blocking effect within the boundary of small image block. This paper also discusses the process and apparatus to generate quantization tables via a psychovisual threshold.

The organization of this paper is as follows. The next section provides a brief overview on a psychoacoustic model. Section 3 discusses a brief description of the discrete cosine transform. Section 4 explains the development of psychovisual threshold on the large discrete cosine transform in image compression. Section 5 discusses a quality measurement on compressed output images. Section 6 shows the experimental results of quantization tables from psychovisual threshold in image compression. Lastly, Section 7 concludes this paper.

2. The Principle of Psychoacoustic Model

Psychoacoustics is the study on how humans perceive sound or human hearing. Psychoacoustic studies show that the sound can only be heard at certain or higher sound pressure levels (SPL) across frequency order [16]. The psychoacoustics indicates that human hearing sensation has a selective sensitivity to different frequencies [17]. In the noise-free environment, the human ear audibility requires different loudness across various frequency orders. The sound loudness that the human audibility can hear is called the absolute hearing threshold [18] as depicted in Figure 1.

The principle of the psychoacoustic model by incrementing the sound pressure level one bark at a time has been used to detect audibility of human hearing threshold. The same principle of the psychoacoustic technique has been used to measure the psychovisual threshold in image compression by incrementing image frequency signal one-unit scale at a time. A block of image signals as represented by a transform coefficient at a given frequency is incremented one at a time on each frequency order. A psychovisual threshold has been conducted for image compression by calculating the just noticeable difference (JND) of the compressed image from the original image. This research will investigate the contribution of the transformed coefficient on each frequency order to the compressed output image. The average reconstruction error from the contribution of the DCT coefficients in image reconstruction will be the primitive of psychovisual threshold in image compression. This quantitative experiment has been conducted on image blocks.

3. Discrete Cosine Transform

The two-dimensional discrete cosine transform [19] has been widely used in image processing applications. The DCT is used to transform the pixel values to the spatial frequencies. These spatial frequencies represent the detailed level of image information. The standard JPEG compression uses DCT as shown in Figure 2 in image compression.

This paper proposes a large image block of DCT set of size which can be generated iteratively as follows: for . The first four one-dimensional DCT above are depicted in Figure 3 for visual purposes.

The kernel for the DCT is derived from the following definition [20]:wherefor and . The definition of the two-dimensional DCT of an input image and output image is given as follows [19]:for and where

The inverse of two-dimensional DCT is given as follows:for and . The image input is subdivided into blocks of image pixels. The DCT is used to transform each pixel in the image block pixel into the frequency transform domain. The outputs of transforming image blocks of frequency signals are 65536 DCT coefficients. The first coefficient in the upper left corner of the array basis function is called the direct current (DC) coefficient and the rest of the coefficients are called the alternating current (AC) coefficients. DC coefficient provides an average value over the block domain.

4. Psychovisual Threshold on Large Discrete Cosine Transform

In this quantitative experiment, the DCT coefficients on each frequency order are incremented concurrently one at a time from 1 to 255. The impact of incrementing DCT coefficients one at a time is measured by average absolute reconstruction error (ARE). The contribution of DCT coefficients to the quality image reconstruction and compression rate is analyzed on each frequency order. In order to develop psychovisual threshold on DCT, ARE on each frequency order is set as a smooth curve reconstruction error. An ideal average reconstruction error score of an incrementing DCT coefficient on each frequency order on luminance and chrominance for 40 real images is shown in Figure 4.

An ideal finer curve of ARE on a given order from zero to the maximum frequency order 510 for luminance and chrominance channels is presented by the red curve and blue curve, respectively. These finer curves of absolute reconstruction error are set as the psychovisual threshold on DCT. The contribution of an ideal error reconstruction for each frequency order is determined by two factors, its contribution to the quality on image reconstruction and the bit rates on image compression. The smooth curve of ARE is interpolated by a polynomial that represents the psychovisual error threshold on DCT in image compression. With reference to Figure 4, this paper proposes the new psychovisual thresholds on DCT basis function for luminance and chrominance which are simplified as follows:for , where is a frequency order on image block. Further, these thresholds are used to generate smoother quantization values for image compression. The quantization table has 511 frequency orders from order 0 until order 510. The order 0 resides in the top left most corner of the quantization matrix index of (0,0). The first order represents the quantization value of (1,0) and (0,1). The second order represents (2,0), (1,1), and (0,2). For each frequency order, the same quantization value is assigned to them.

Due to the large size of these new quantization values, it is not possible to present the whole quantization matrix within a limited space in this paper. Therefore, the quantization values are presented by traversing the quantization table on each frequency order in zigzag pattern as shown in Table 1. Table 1 presents one-dimensional index of quantization table on each frequency order. Each index represents the quantization value at those frequency orders. The new finer quantization tables from psychovisual threshold on DCT for luminance and chrominance are shown in Tables 2 and 3, respectively. The visualization of the whole quantization values on each frequency order from the psychovisual threshold for luminance and chrominance channels is depicted in Figure 5.

These new finer quantization tables have been generated from the psychovisual threshold functions in (7). Each traversing array in zigzag pattern represents the quantization value on each quantization order from order 0 to order 510. This new smoother quantization table for luminance is designed to take smaller value than a quantization table for chrominance. Any slight changes on respective frequency order in the luminance channel will generate significantly greater reconstruction error than a change in chrominance channel. The slight changes by the image intensity on the luminance channel will provide visible textures that can be perceived by human visual systems.

The contribution of the frequency signals to the reconstruction error is mainly concentrated in the low frequency order. Referring to Figure 1, the SPL values on frequency from 2 kHz to 5 kHz are significantly lower. These quantization tables follow the same pattern in order to capture the concept of the psychoacoustic absolute hearing threshold. The quantization values from the psychovisual threshold on chrominance channel are designed to be larger than the quantization values on luminance channel. The human visual system is less sensitive to the chrominance channels as they provide significantly irrelevant image information. The smoother quantization tables will be tested and verified in image compression.

5. Quality Measurement

Two statistical evaluations have been conducted in this research project to verify the performances of psychovisual threshold in image compression. In order to gain significantly better performance, the image compression algorithm needs to achieve a trade-off between average bit rates and quality image reconstruction. The conventional quality assessments are employed in this paper. They are average absolute reconstruction error (ARE), means square error (MSE), and peak signal to noise ratio (PSNR). The average reconstruction error can be defined as follows:where the original image size refers to the three RGB colors. The MSE calculates the average of the square of the error defined as follows [21]: The standard PSNR is used and calculated to obtain the measure of the quality of the image reconstruction. A higher PSNR means that the image reconstruction is more similar to the original image [22]. The PSNR is defined as follows:where is the maximum possible pixel value of the image. Structural similarity index (SSIM), another measurement of image quality, is a method to measure quality by capturing the similarity between original image and compressed image [23]. The SSIM is defined as follows:where , , are parameters to adjust the relative importance of the three components. The detailed description is given in [23].

6. Experimental Results

This quantitative experiment has been conducted to investigate the performance of a psychovisual threshold on the large image block. The new finer quantization tables have been generated from the psychovisual threshold on DCT. They are tested on 40 real and 40 graphical high fidelity images. An input image consists of colour pixels. The RGB image components are converted to YCbCr color space. In this experiment, each image is divided into image block pixels; thus each block is transformed into DCT. The DCT coefficients are quantized by new finer quantization tables from psychovisual threshold. The frequency distribution of the transformed coefficients after the quantization process is summarised by histograms in Figures 6 and 7. The histogram of the frequency distribution is obtained after quantization processes. Figures 6 and 7 show a histogram of the frequency distribution after quantization process of default JPEG quantization tables and quantization tables from psychovisual threshold for luminance and chrominance, respectively.

The compression rate focuses mainly on the contribution of the AC coefficients to image compression performance. The frequency coefficients after quantization process consist of many zeros. The frequency distribution of the AC coefficients given by its histogram may predict the compression rate. The higher zeros value on the histogram of frequency distribution means that the image compression output provides lower bit rate to present an image.

According to Figures 6 and 7, the distribution of the frequency coefficients after the quantization process by smoother quantization tables from psychovisual thresholds produces significantly more zeros for both luminance and chrominance channels, respectively. These finer quantization tables produce a smaller standard deviation on AC coefficients from the large image block than the small image block. Hence, it is possible to code large transformed block using smaller number of bits for the same image.

The transform coding consists of 65535 AC coefficients for each regular block. Most AC coefficients are naturally small coming out of quantization process. The psychovisual threshold on large DCT determines an optimal contribution of the AC coefficients of large transform coding. The new quantization tables from psychovisual threshold are able to reduce down the irrelevant AC coefficients. The distribution of the transform coefficients gives an indication on how much transform coefficients will be encoded by a lossless Huffman coding.

An average Huffman code is calculated from the quantized transform coefficients. After the transformation and quantization of image block are over, the direct current (DC) coefficient is separated from the AC coefficients. The AC coefficients are listed as a traversing array in zigzag pattern as shown in Figure 8.

Next, run length encoding is used to reduce the size of a repeating coefficient in the sequence of the AC coefficients. The coefficient values can be represented compactly by simply indicating the coefficient value and the length of its run wherever it appears. The output of run length coding represents the symbols and the length of occurrence of the symbols. The symbols and variable length of occurrence are used in Huffman coding to retrieve code words and their length of code words. Using these probability values, a set of Huffman code of the symbols can be generated by Huffman tree. Next, the average bit length is calculated to find the average bit length of the AC coefficients.

There are only four DC coefficients under regular DCT. The maximum code length of the DC coefficient is 16 bits. The DC coefficients are reduced down by 4 bits. The average bit length of the DC coefficients produces 12 bits after the quantization process in image compression. The average bit length of image compression based on default JPEG quantization tables and the finer quantization tables from psychovisual threshold is shown in Table 4. The experimental results show the new large quantization tables from psychovisual threshold produce a lower average bit length of Huffman code than the default JPEG quantization tables.

The DCT coefficients from a large image block have been greatly discounted by quantization tables in image compression. An optimal amount of DCT coefficients is investigated by reconstruction error and average bit length of Huffman code. The effect of incrementing DCT coefficient has been explored from this experiment. The average reconstruction error from incrementing DCT coefficients is mainly concentrated in the low frequency order of the image signals.

The new quantization table from the psychovisual threshold produces a lower average bit length of Huffman code in image compression as shown in Table 4. At the same time, the compressed output images produce a better quality image reconstruction than the regular default JPEG quantization tables as listed in Table 5. The new design on quantization tables from psychovisual threshold performs better by producing higher quality in image reconstruction at lower average bit length of Huffman code.

The average bit size of image compression as presented in Table 6 shows that the finer quantization tables from psychovisual threshold use fewer bits. Therefore, the compression ratio of the difference between a compressed image from the new large quantization table and the original image produces a higher compression ratio than standard JPEG image compression as shown in Table 7. In order to observe the visual quality of the output image, a sample of original baboon right eye is zoomed in to 400% as depicted on the right of Figure 9.

The image compression output of quantization table from psychovisual threshold is shown on the right of Figure 10. A visual inspection on the output image using quantization tables from psychovisual threshold produces the richer texture on the baboon image. The psychovisual threshold on image block gives an optimal balance between the fidelity on image reconstruction and compression rate. The experimental results show that the psychovisual threshold on large DCT provides minimum image reconstruction error at lower bit rates. The JPEG image compression output as depicted on the left of Figure 10 contains artifact image or blocking effect under regular block transform coding. At the same time, the smoother quantization tables from psychovisual threshold manage to overcome the blocking effects along the boundary blocks. These finer quantization tables from psychovisual threshold provide high quality image with fewer artifact images. The psychovisual threshold is practically the best measure of an optimal amount of transform coefficients to the image coding.

7. Conclusion

This research project has been designed to support large block size in a practical image compression operation in the near future. A step-by-step procedure has been developed to produce psychovisual threshold on block transform. The use of the psychovisual threshold on a large image block is able to overcome the blocking effect or artifact image which often occurs in standard JPEG compression. The psychovisual threshold on discrete transform has been used to determine an optimal amount of frequency transform coefficients to code by generating the much needed quantization tables. Naturally, a frequency transform on large image block is capable of reducing redundancy and better exploiting pixel correlation within the image block. The new set of quantization tables from psychovisual threshold produces better performance than JPEG image compression. These quantization tables from psychovisual threshold practically provide higher quality images at lower bit rate for image compression application.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors would like to express a very special thanks to Ministry of Education, Malaysia, for providing financial support to this research project by Fundamental Research Grant Scheme (FRGS/2012/FTMK/SG05/03/1/F00141).