Abstract

Images with subband coding and threshold wavelet compression are transmitted over a Rayleigh communication channel with additive white Gaussian noise (AWGN), after quantization and 16-QAM modulation. A comparison is made between these two types of compression using both mean square error (MSE) and structural similarity (SSIM) image quality assessment (IQA) criteria applied to the reconstructed image at the receiver. The two methods yielded comparable SSIM but different MSE measures. In this work, we justify our results which support previous findings in the literature that the MSE between two images is not indicative of structural similarity or the visibility of errors. It is found that it is difficult to reduce the pointwise errors in subband-compressed images (higher MSE). However, the compressed images provide comparable SSIM or perceived quality for both types of compression provided that the retained energy after compression is the same.

1. Introduction

Images require much storage space and large transmission bandwidths. Therefore, only the essential information in an image needs to be stored, thereby giving rise to the need of image compression techniques. Naturally, the amount of this essential information is dictated by the image reconstructability. An image is dealt with as a matrix of pixel (picture element) values that are intensity dependent. Image redundancy is exploited in the compression process. Image redundancy is manifest in areas where adjacent pixels have about the same values and compression is achieved by removing this redundancy.

Effective compression of image data is based on the discrete cosine transform (DCT) and the discrete wavelet transform (DWT). This transform-based compression transforms the two-dimensional data or image from the spatial domain to the frequency domain and then keeps only the useful information by removing redundancies. The human visual system (HVS) is more sensitive to energy with low spatial frequencies than to that with high spatial frequencies. Although DCT compression is less expensive, the DWT offers adaptive spatial-frequency resolution that is well suited to the properties of the HVS [1].

Digital images are prone to distortion during compression and transmission resulting in quality degradation. The most widely used IQA metric is the MSE which is computed by averaging the squared intensity differences of the distorted and reference images. MSE is simple to compute but does not always relate to perceived visual quality [2, 3].

In [2], a measure of structural similarity (SSIM) has been developed. The SSIM index compares local patterns (structures) of pixel intensities that have been normalized for luminance and contrast. SSIM is based on the fact that perceptual quality is best estimated by quantifying the visibility of errors; SSIM therefore is designed to compare the structures of the reference and distorted images. This IQA is based on the hypothesis that the human visual system (HVS) is highly adapted for extracting structural information [2].

A more recent improvement on SSIM is the complex wavelet SSIM (CW-SSIM) [4]. It compares wavelet coefficients extracted at the same spatial locations in the same wavelet subbands of the two images being compared. The CW-SSIM is insensitive to small geometric distortions such as small rotations, translations, and small differences in scale. The reason for this desirable insensitivity is that such distortions lead to consistent phase changes in the local wavelet coefficients which do not change the structural content of the image [3, 4]. Since we are not concerned in this paper with these particular types of distortion, the CW-SSIM will not be discussed further. It should be stressed, however, that CW-SSIM is a natural extension and improvement on the SSIM index. In [5], a method is proposed to further improve IQA by adding a color comparison to the criteria of the SSIM index. Note that the above-mentioned similarity measures are all based on statistical moments, which is the focus of this paper, while there are other moments that can also be used to test similarity [6].

In this paper, the IQA problem is treated within an information communication framework. Still images are the signals under consideration. The transmitter involves wavelet compression, quantization of the wavelet coefficients, and conversion of the wavelet coefficients to binary, gray encoding, and 16-QAM modulation. The signal is then transmitted over a Rayleigh fading channel [7, 8] and is contaminated by AWGN. The receiver includes demodulation and gray decoding back to binary then back to decimal to yield the received wavelet coefficients. By the inverse discrete wavelet transform (IDWT), the images are then reconstructed. Naturally, they are distorted due to lossy compression and the addition of AWGN.

The two above-mentioned IQA metrics (MSE and SSIM) are computed and compared for the reconstructed image when two different types of lossy wavelet compression are employed at the transmitter, namely, subband coding and threshold compression, and for different values of signal-to-noise ratio (SNR). Images are used with intensive strong edge structures. It is found that it is difficult to reduce the pointwise errors in the subband-compressed image (higher MSE). However, the compressed images provide comparable SSIMs (i.e., perceived quality) for both types of compression.

The paper is organized as follows. Section 2 introduces subband and threshold wavelet image compression. Section 3 reviews different IQA measures that will be used in this work. Simulation results are presented and discussed in Section 4 in a visual communication framework. Finally, Section 5 concludes the paper.

2. Subband Coding and Threshold Wavelet Image Compression

The importance of wavelet analysis stems from its adoption of the concept of multiresolution. Different parts of the data are viewed through different size windows of resolution. High frequency parts of the data use a small window to give good time resolution, whereas low frequency parts use a big window to get good frequency resolution [9]. Wavelet analysis takes a “mother wavelet,”and then the signal is translated into shifted and scaled versions of this mother wavelet.

The continuous wavelet transform (CWT) of a function of time is the sum over all time of the function multiplied by scaled, shifted versions of a wavelet function as follows: The results of the CWT are many wavelet coefficients C which are a function only of scale and position.

In the discrete wavelet transform (DWT), we choose scale and position based on powers of two, that is, dyadic scale and position. Thus, wavelet analysis consists of decomposing a signal or image into a hierarchical set of approximations and details. The levels in the hierarchy often correspond to those in a dyadic scale. Wavelet analysis is a decomposition of the signal or image on a family of analyzing signals, which is usually an “orthogonal function” method [10].

The two-dimensional DWT analyses the image into approximation and detail subsignals. The approximation subsignal shows the general trends of the pixel values, whereas the detail subsignals show the vertical, horizontal, and the diagonal details or changes in the image.

The following is a discussion of subband coding and threshold wavelet image compression schemes.

2.1. Image Compression by Wavelet Subband Coding

Wavelet subband coding considerably impacts the practice of signal and image compression. The advantageous feature of this technique is the ability to decode only part of the bit stream to get a first approximation of the data. This is useful in applications involving both compression and communication where energy is compacted into low frequency wavelet coefficients. If transmission issues are taken into account, successive approximation coding together with appropriate communication schemes can be the best both in theory and practice [11]. This is why, in this work, we study wavelet compression and IQA in a visual communication framework.

Subband coding involves segmentation of the data into an important part, which is a course first approximation of the signal, and a part that provides the details which are typically of high frequencies. During transmission, the course approximation is sent with a high probability of arriving successfully.

The quality of the compressed image depends on the number of decompositions used. The larger the number of decompositions, the better the distinction of important DWT coefficients from less important coefficients. The HVS is less sensitive to the removal of smaller details, a fact by which the success of subband coding wavelet image compression is justified.

2.2. Image Compression by Wavelet Thresholding

After decomposing the image and representing it by wavelet coefficients, compression may be realized by ignoring all coefficients below some threshold value. A global positive threshold value is used in this work. The amount of information retained by an image after compression and decomposition is known as the “retained energy” and is proportional to the sum of the squares of the pixel values. If retained energy is 100%, the compression is named “lossless” and the image may be perfectly reconstructed. This implies no change in detail values. Otherwise, some energy will be lost and this is known as “lossy compression.” Ideally, after compression, the number of zeros and energy retained must be as high as possible. Clearly, a balance between the two is needed. The higher the decomposition level, the higher the number of zeros even without thresholding.

Consider where is the norm of the compressed wavelet coefficients vector and is the original image wavelet coefficient vector.

More reliable compression is achieved by using local thresholds, but this is not dealt with in this work. Wavelets that compact most of the energy in the approximation subsignal provide the best compression because a large number of coefficients in the detailed subsignal are set to zero. Different images will also have different compressibility [9].

3. Image Quality Assessment (IQA) Techniques

IQA methods are considerably important where digital images are subject to distortion during compression, transmission, and reproduction, all of which may degrade the visual quality. Therefore, IQA measures are developed to predict the perceived image quality. The simplest and most widely used IQA metric is the mean square error (MSE). MSE is computed by averaging the squared intensity differences between the reference or original and distorted image pixels. But two distorted images with the same MSE may have very different types of errors, some of which are more visible than others. Thus, there arises the need to weight different aspects of the error signal according to their visibility. As a result, IQA is rather based on the hypothesis that the HVS is highly adapted for extracting structural information [2]. A measure of structural similarity (SSIM) compares local patterns of pixel values that are normalized for luminance and contrast.

The SSIM attempts to compare the structures of the reference and distorted image. An image is said to be highly structured if its pixels exhibit strong dependencies especially when they are spatially close. Besides, structural information in an image is an attribute that represents the structure of objects in the image, independent of the average luminance and contrast. First, the luminance of each image is compared. This luminance is the estimated mean intensity: The luminance comparison function is then denoted by which is a function of and . The original and distorted images are denoted by and , respectively.

We use the standard deviation or the square root variance as an estimate of the signal contrast. The contrast comparison is then a comparison between and .

Third, the signals are normalized or divided by their own standard deviations, so the two signals being compared have unit standard deviation. The structure comparison is conducted on the normalized signals and .

Now, the three components are combined to yield the overall similarity measure as a function of the three of them: In [1], the luminance comparison is defined as The constant is included to avoid instability when the denominator is close to zero. , where is taken to be much smaller than unity and is the dynamic range of the pixel values.

Similarly, the contrast comparison is defined as and . Again, is much smaller than unity.

Structure comparison is conducted after luminance subtraction and variance normalization. Therefore, where Combining the three comparison functions results in the SSIM index: In order to simplify the above expression, we set and to obtain In practice, all means, variances, and standard deviations are computed within a local square window which moves pixel by pixel over the entire image. Thus, if is the number of windows, a single overall mean SSIM (MSSIM) measure is given by

4. Results and Discussion

4.1. Materials and Methods

The Wavelet Toolbox is a collection of functions built on the MATLAB Technical Computing Environment. It provides tools for the analysis and synthesis of signals and images using wavelets and wavelet packets within the framework of MATLAB.

In this work, the image to be transmitted is MATLAB’s “earth” [10]. The size of this image is pixels. To achieve image compression, wavelet decomposition is performed at level using the Daubechies wavelet “db3.”

In wavelet analysis, we speak of approximations and details. The approximations are the high-scale, low-frequency components of the signal. The details are the low scale, high frequency components. Thus, the signal passes through two complementary filters (low pass and high pass, resp.) and emerges as two signals. These have to undergo down-sampling to restore the original length of the signal. The resulting signals are the DWT coefficients. This decomposition process can be iterated indefinitely. In practice, however, we select a suitable number of levels based on the nature of the signal.

The choice of the aforementioned filters not only determines whether perfect reconstruction is possible but also determines the shape of the wavelet we use to perform the analysis. Decomposition and reconstruction filter coefficients are computed in MATLAB for a variety of wavelet functions.

Crucial properties of wavelets in image compression are compact support, symmetry, orthogonality, regularity, and degree of smoothness. Daubechies and Coiflet wavelets are orthogonal and with compact support. Compactly supported wavelets correspond to finite-impulse-response (FIR) filters and therefore lead to efficient implementation. The names of the Daubechies family wavelets are written dbN, where is the order and db is the “surname” of the wavelet. The Daubechies wavelet “db3” will be used in this work [1, 12].

The Daubechies wavelets have no explicit expression except for db1. However, the square modulus of the transfer function of “” is explicit and fairly simple. “” is the sequence containing the constituent samples of the wavelet function. If we define a function as where are the binomial coefficients, then where Clearly, the support of is .

The Wavelet Toolbox supports indexed images with linear monotonic color maps. These images can be thought of as scaled intensity images, with matrix elements containing only integers from 1 to , where is the number of discrete shades in the image. The elements in the image matrix are floating-point integers which MATLAB stores as double-precision values.

In the subband coding method of wavelet image compression, we discard the detail coefficients while retaining the approximation coefficients. In the thresholding method of wavelet image compression, we set a suitable threshold and force all values under it to zero while, at the same time, retaining most of the energy of the original image. In general, setting more or less aggressive thresholds achieves better results.

Prior to transmission, the wavelet coefficients are quantized. MATLAB quantization is achieved by a codebook that tells the quantizer which common value to assign to inputs that fall into each range of the partition; this codebook is a vector whose length is the same as the number of partition intervals. Thereafter, the values are converted to binary and are then PCM-Gray-encoded and 16-QAM-modulated [13]. At the receiver, all these operations are reversed to reconstruct the image.

4.2. Results Discussion

We have achieved wavelet analysis for both types of compression with a decomposition level of 2 for both. The retained energy values are almost equal as shown in Table 1. For threshold compression, a fixed threshold of 52 is used and only the nonzero wavelet coefficients are transmitted. This threshold was adjusted until the retained energies for both compression methods became equal.

The image quality measures used in this work are the MSE [3] and the structural similarity SSIM [2]. Suppose an image element block is , , and a reconstructed or reproduced image block is , . The error in the th pixel is : The mean square error is The percentage MSE used in this work is defined as As for the SSIM measure, we apply (12). We choose and .

The percentages MSE and the MSSIM are tabulated in Table 1 for different SNR and for the two wavelet compression methods discussed.

The tabulated results are also plotted in Figures 1 and 2. It is clear that the difference in MSE is greater for subband compression. This is due to wavelet coefficients being highly correlated in the approximation band and therefore a smaller quantization step is needed. The quantization step is 0.5591. Many wavelet coefficients are clustered within a quantization step size. Threshold compression involving more than one band apparently results in smaller MSE due to the presence of detail coefficients. The difference in MSE is especially clear for small SNRs.

Figure 3 illustrates the original and reconstructed received images with their wavelet coefficients using the subband compression method. These are also displayed in Figure 4 for threshold wavelet compression. We notice similar structural similarity between the original and reconstructed images for both kinds of compression and for all values of SNR. This also becomes clear upon inspection of Figure 2. We define the structural information in an image as those attributes that represent the structure of objects in the scene or image, independently of luminance and contrast. Structural comparison is therefore computed after luminance subtraction and variance normalization.

How well a type of wavelet works in compression is strongly dependent on the image because the ability of a wavelet to compact energy depends on the energy spread within the image. We have used the “db3” wavelet for the reasons stated in Section 4.1.

Comparison between image compression techniques using IQA indices can be found elsewhere in the literature such as in [14]. However, the significance of the present work is that the comparison is made in a visual communication framework, highlighting the fact that subband coding, which is one of the wavelet image compression methods in question, is practically and theoretically suitable when combined with appropriate communication schemes. Although our results have shown that subband coding is not optimized to reduce the MSE as is the case with threshold wavelet compression, the other IQA index, namely, the MSSIM, was found to be comparable in the two compression techniques discussed.

It is worth noting, upon comparing Figures 3 and 4, that the closeness of the MSSIM indices of the reconstructed images using the two compression techniques is actually not absolutely consistent with human subjective perception or judgment. It would therefore prove useful to consider other improved versions of the structural similarity index [15, 16] in conducting the tests of the present work. These improved SSIM metrics are more accurate because they consider the spatial distribution of the image structures [15] and are more effective with blurred or noisy images [16], rendering the improved SSIM metric more consistent with subjective image perception. The deployment of these metrics is left for future work.

5. Conclusion

Images with subband coding and threshold wavelet compression are analyzed in a visual communication framework. The image is transmitted over a Rayleigh communication channel with AWGN after quantization and 16-QAM modulation. A comparison is made between the two types of compression using both MSE and SSIM indices of IQA applied to the reconstructed image at the receiver with the original image as the reference image. The two methods of compression yielded comparable SSIM but different MSE measures. Quantization and AWGN resulted in increased MSE for images compressed by subband coding. This is due to the wavelet coefficients being highly correlated in the approximation band for subband coding and therefore, less discernible from each other with the quantization step size used. This correlation of wavelet coefficients is less manifest for threshold compression due to the presence of some detail coefficients.