Abstract

This paper proposes a statistically matched wavelet based textured image coding scheme for efficient representation of texture data in a compressive sensing (CS) frame work. Statistically matched wavelet based data representation causes most of the captured energy to be concentrated in the approximation subspace, while very little information remains in the detail subspace. We encode not the full-resolution statistically matched wavelet subband coefficients but only the approximation subband coefficients (LL) using standard image compression scheme like JPEG2000. The detail subband coefficients, that is, HL, LH, and HH, are jointly encoded in a compressive sensing framework. Compressive sensing technique has proved that it is possible to achieve a sampling rate lower than the Nyquist rate with acceptable reconstruction quality. The experimental results demonstrate that the proposed scheme can provide better PSNR and MOS with a similar compression ratio than the conventional DWT-based image compression schemes in a CS framework and other wavelet based texture synthesis schemes like HMT-3S.

1. Introduction

Texture data contain spatial, temporal, statistical, and perceptual redundancies. Representing texture data using standard compression schemes like MPEG-2 [1] and H.264 [2] is not efficient, as they are based on Shannon-Nyquist sampling [3] and do not account for perceptual redundancies. They are often resource consuming (as they acquire too many samples) due to its fine details in textured image and high frequency content. Variety of applications in computer vision, graphics, and image processing (such as robotics, defence, medicine, and geosciences) demands better compression with good perceptual reconstruction quality, instead of bit accurate (high PSNR) reconstruction. This is because the human brain is able to decipher important variations in data at scales smaller than those of the viewed objects. Ndjiki-Nya et al. [48], Bosch et al. [9, 10], Byrne et al. [11, 12], and Zhang et al. [13, 14] have proposed techniques to reconstruct visually similar texture from sample data. Statistically matched wavelet [15] is aimed at designing a filter bank that matches a given pattern in the image and can better represent the corresponding image as compared to other wavelet families.

Compressive sensing (CS) technique [16] has proved that it is possible to achieve a sampling rate lower than the Nyquist rate [3] with acceptable reconstruction quality. Leveraging the concept of transform coding, compressive sensing enables a potentially large reduction in sampling and computation costs for sensing signals that have sparse or compressible representation (by a sparse representation, we mean that for a signal of length , we can represent it with nonzero coefficients). Compressive sensing framework opens a new research dimension in which most of the sparse signals can be reconstructed from a small number of measurements (), using algorithms like convex optimization, greedy methods, and iterative thresholding [17, 18]. Significant theoretical contributions have been published on the compressive sensing in recent years [16, 17, 19] for image processing applications [2024]. Compressive sensing framework mainly consists of three stages, that is, sparsification by transformation, measurement (projection), and optimization (reconstruction). Designing a good measurement matrix with large compression effects and designing a good signal recovery algorithm are the two major challenges for applying CS technique in image compression.

1.1. The Prior Works and Motivation

Existing methods of textured image compression scheme can be broadly classified as parametric or nonparametric. Nonparametric approaches can be applied to a wide variety of textures (with irregular texture patterns) and provide better perceptual results. However, these schemes are often computationally more complex. Parametric approaches can achieve very high compression at low computational cost. However, these techniques are not effective for structured textures such as those with primarily nonstationary data content.

Nonparametric approaches are pixel-based or patch-based. Efros and Leung [25] proposed pixel-based nonparametric sampling to synthesize texture. Wei and Levoy [26] further improve the above using a multiresolution image pyramid based on a hierarchical statistical method. A limitation of the above pixel-based methods is an incorrect synthesis owing to incorrect matches in searching for similar statistics. Patch-based methods overcome this limitation by considering features matching patch boundaries with multiple pixel statistics. People generally use Markov Random Field (MRF) models for texture analysis [7, 26]. The popular choice for texture synthesis is a patch-based graph cut [27]. Region-based texture representation and synthesis algorithms [2830] have been explored recently to address the limitations of block-based representation in handling the homogenous texture and blocking artifacts. Byrne et al. [11, 12] have demonstrated region-based synthesis structure using a morphological and spectral image representation technique [31]. Region-based representation has also been explored using multiresolution wavelet based decomposition as reported in [3234]. However that work is limited to document class of images.

Recent work in the parametric approach is typically based on Auto Regressive (AR) [38] or Auto Regressive Moving Average- (ARMA-) based modelling [3941]. AR- and ARMA-based models in texture synthesis enable blocks to be selectively removed at the encoding stage and reconstructed at the decoder with acceptable perceptual quality. AR- and ARMA-based approaches are suitable for the textures with stationary data, like steady water, grass, and sky; however, they are not suitable for structured texture with nonstationary data as blocks with nonstationary which data are not amenable to AR modelling. Further, they are block-based approaches, and blocking artifacts can appear in the synthesised image. Portilla and Simoncelli [37] propose a statistical model for texture images based on joint statistical constraint on the wavelet coefficients. In the above work, authorsproposed an algorithm for synthesizing textures using sequential projections on to the constraints surfaces; however, the model is the choice of the statistical constraints and not suitable for texture with structural pattern. Wavelet domain hidden Markov tree (HMT-3S) [42] was recently used for texture analysis and synthesis, where it was assumed that three subbands of the two-dimensional DWT, that is, HL, LH, and HH, are independent. The HMT-3S adds dependencies across subbands by treating jointly the three hidden elements across the three orientations; however, HMT-3S is established in the nonredundant DWT and is inferior to redundant DWT for statistical modelling. Also, for structural textures with regular patterns, both HMT and HMT-3S fail to reproduce the regular patterns.

In this paper, we propose a statistically matched wavelet based texture data representation and a compressive sensing based texture synthesis scheme (henceforth we refer the proposed scheme in this paper as SMWT-CS). Statistically matched wavelet based representation [43] causes most of the captured energy to be concentrated in the approximation subspace, while very little information is retained in the detail subspace. We encode not the full-resolution statistically matched wavelet subband coefficients (as normally done in a standard wavelet based image compression) but only the approximation subband coefficients (LL) using standard image compression scheme like JPEG2000 [44] (which accounts for 1/4th of the total coefficients and can be represented using fewer bits). The detail subband coefficients, that is, HL, LH, and HH (which account for 3/4th of the total coefficients), are jointly encoded in a compressive sensing framework and can therefore be represented with fewer measurements. Quality matrix is an essential component for assessing the performance of an image compression system. We have computed Mean Opinion Score (MOS) for subjective assessment as it captures the visual perception of human subjects. We have also computed PSNR for objective assessment of the quality for comparative study.

Table 1 provides a comparison of existing texture analysis and synthesis scheme. We have included our proposed scheme to put the work in perspective. As can be seen the proposed scheme is more efficient than the existing schemes. The main difference between a conventional wavelet based compression scheme in a CS framework (DWT-CS) [21, 35, 45] and our approach is the use of Statistically Matched Wavelet based sparsification [15, 43], as opposed to generic DWT used in previous works. The existing schemes use regular DWT that is not able to fully exploit the signal properties (orthogonal/nonredundant DWT cannot well preserve the regularity and/or periodicity [46]) and hence do not fully exploit the sparsity. CS optimization that is done over detail subspace allows for scalability in choosing measurement vector size providing a tradeoff between compression and perceptual reconstruction quality. The other novelty is that we apply standard coding on the low resolution part (LL subband) of the information, because of the nature of wavelet decomposition (i.e., true especially for multilevel wavelet decomposition); the low frequency information is not sparse and hence not amenable for CS based encoding. Our proposed scheme can be easily integrated into an existing encoding framework [38] with texture analysis and synthesis being performed by the proposed scheme. So far, multiresolution based representation has been used in the domain of documentation class of image; however, to the best of our knowledge no work has reported using statistically matched wavelet and compressive sensing based analysis and synthesis technique for a generic texture class images for compression with subjective reconstruction quality.

1.2. Overview of the Scheme

Figure 1 gives an overall view of the proposed scheme. In this figure, the texture analyzer block decomposes the input textured image into approximation (LL) and detail (HL, LH, and HH) subbands using statistically matched wavelet based image representation [15]. The basic idea is to design a statistically matched wavelet filter bank using source data and decompose the input textured image into approximation and detail subspace [43]. Standard image encoder like JPEG2000 [44] is used to encode approximation subband coefficients (LL). The detail subband coefficients, that is, HL, LH, and HH are jointly encoded in a compressive sensing framework. The proposed SMWT-CS encoder block performs compressive measurement using Noiselet transform [47] over detail subspace coefficients (joint representation for HL, LH, and HH subband coefficients) along with quantization and entropy coding. For a -sparse signal , we compute the measurement vector , which is much smaller in length than the length of signal (), and therefore the compression is guaranteed. Texture synthesis block does the compressive sensing optimization to synthesize the samples from detail subspace measurements, using convex optimization [16, 19], that is, -norm minimization with equality constraint. Combining the decoded samples from the standard decoder and the texture synthesizer and doing the inverse of statistically matched wavelet transform, a synthesized image is reconstructed.

The rest of the paper is organized as follows. Section 2 provides an overview of statistically matched wavelet based texture representation scheme. Section 3 presents the design and implementation of the compressive measurements and encoding framework, while Section 4 provides a detailed view of synthesis framework. The experimental results are discussed in Section 5 followed by conclusion in Section 6.

2. Statistically Matched Wavelet Based Texture Representation

Gupta et al. [15] propose statistically matched wavelet for the estimation of wavelets that is matched to a given signal in the statistical sense. This concept is further extended for image data and Figure 2 shows a 2D two-band separable kernel wavelet system (with an example of estimated matched wavelet filters for brick wall image). In this figure, and represent the horizontal and vertical directions, respectively. The scaling filter is represented by and while the wavelet filter is represented by and corresponding to the horizontal and vertical direction. The dual of them is represented by , , , and . The input to this system is a 2D signal (an image in our framework). The output of Channel 1 in Figure 2 is called approximation subspace or scaling subspace while the outputs of the other three channels (Channels 2, 3, and 4 in Figure 2) are called detail subspace. This system is designed as biorthogonal wavelet system so that it satisfies the condition (1) for perfect reconstruction of the two-band filter bank: where can be or , respective to horizontal or vertical direction, and is any odd delay. is the scaling filter while is the wavelet filter. and are the dual of scaling and wavelet filters, respectively.

The optimization criterion used for matched wavelet is the minimization of the energy in the detail subspace [43]. If (,) is a 2D image signal and (,) represents the 2D image reconstructed using only the output of Channels 2, 3, and 4 (detail subspace) in Figure 2, then the error function (,) can be defined as

To ensure that the maximum input signal energy moves to approximation subspace, the energy captured in the difference signal (,) should be maximized with respect to both and direction filters. This leads to the following set of equations [43]: Here the filter weight is kept constant leading to a close form expression. These is a set of linear equations in filter weight that can be solved simultaneously. All rows of image are placed adjacent to each other to form a 1D signal having variations in horizontal direction only and represented as in (3). Similarly all the columns of image are placed together to form 1D signal having variations in vertical direction only and represented as in (4). The solution of (3) and (4) gives the analysis high pass filters (wavelet filters) and . From these, other filters (analysis low pass, synthesis high pass, and synthesis low pass) are computed using finite impulse response perfect reconstruction biorthogonal filter bank design as presented with (1).

The estimation of scaling and wavelet function is done separately for the variations in the horizontal and vertical directions of the input image. Passing the input image through the 2D statistically matched wavelet filter bank, we get the output as subsampled image corresponding to approximation and detail subspace. Figure 3 demonstrates the decomposition of one of the input tests sequences in approximation and detail subspace and Figure 4 gives the distribution of the matched wavelet coefficients in approximation and detail subspace (horizontal, vertical, and diagonal subspace). As one can observe from Figures 3 and 4, most of the information is represented by approximation subspace alone and very little information is present in the detail subspace. In the rest of the paper we refer to the approximation or scaling subspace (output of Channel 1 in Figure 2) as the candidate data for host encoding and the detail subspace (sum of output of Channels 2, 3, and 4 in Figure 2) as the candidate data for CS measurements and synthesis.

3. Compressive Measurement and Encoding

Compressive sensing exploits the fact that most natural or artificial signals are sparse in some domain and hence compressible. A real valued signal , can be represented as a function of basis vectors as follows [16, 17]: where x and s are column vectors and is an × sparsifying basis matrix. The signal is called -sparse if it can be represented as a linear combination of only -basis vectors, that is, only elements of the vector s are nonzero. The signal can be treated as compressible if it can be well approximated by a signal with only () nonzero coefficients. Figure 5 shows sparse representation of subbands coefficients corresponding to brick wall and escalator test sequences. Significant wavelet coefficients in the figure are represented by blue pixels while all other nonsignificant coefficients are shown in white. One can observe that most of the detail subbands coefficients are close to zero, and therefore CS can be best utilised to exploit this sparsity for encoding detailed subbands coefficients. Compressive measurement is computed through linear projection as shown in (6): Here y is an measurement vector where . is an measurement matrix. The matrix represents a dimensionality reduction matrix; that is, it maps to , where is typically much smaller than . The main challenge with CS theory is that how should we design the sensing matrix so that it preserves the information in the signal . Several theories have been proposed in the literature for reconstructing from , if satisfies a Restricted Isometry Property (RIP) [16]. We have used statistically matched wavelet for the sparsifying matrix () and Noiselet [47] for sensing matrix () for compressive measurements according to (6). Measurements from Noiselet have been chosen because they are highly incoherent with the considered sparse domain and RIP tends to hold for reasonable values of . In addition, noiselet comes with very fast algorithms and just like the Fourier transform, the noiselet matrix does not need to be stored to be applied to a vector. This is of crucial importance for efficient numerical computations without which applying CS can be very complex. The CS measurement matrix created is orthogonal and self-adjoint, thus being easy to manipulate. Scalar quantization over the measurement vector is proposed to achieve better compression ratio.

It is important to note that (6) is ill-conditioned since there are more unknowns than the number of equations as ; however, it has been shown that if the signal is -sparse and the locations of the nonzero elements are known then the problem can be solved provided through a simplified linear equation by deleting all those columns and elements corresponding to zero or nonsignificant elements. Figure 6 gives an overall block diagram of the proposed encoder. Bit mapper block is responsible for generating a final bit stream encompassing encoded data from standard codecs for approximation subspace and CS measurements for detail subspace. Algorithm 1 provides the implementation summary of the proposed encoder framework: where is the quantization factor.

(1)  Design a 2-D separable kernel filter bank using Statistically Matched Wavelet Detailed in Section 2
 (a)  Decompose input image into approximation and detail subbands coefficients
   (i)  Use Wavelet function from MATLAB “wavecut” to represent approximation and detail
     coefficients
(2)  Design a CS measurement matrix using Noiselet Transform [47] and detail subbands coefficients
(3)  Do quantization of the CS measurements using (7)
(4)  Do Entropy coding of the quantized CS measurements
(5)  Combine standard and CS encoded bit streams

4. Texture Synthesis Framework

In this section, we present the overall texture synthesis framework. Figure 7 gives an overall block diagram of the proposed decoder.

At the decoder, the detail subbands coefficients are synthesized from the CS measurements using the compressive optimization, while the approximation coefficients are decoded using standard JPEG2000. In all our experiments, we have used convex optimization (-norm minimization with equality constraint) for good recovery in a CS framework. This matches the norm as RIP is ensured due to noiselet measurements. Because of nondifferentiability of the -norm, this optimization principle leads to sparser decompositions [17] and ensures fast and stable resolution better than other methods from the class of greedy algorithms. The norm also provides a computationally viable approach to sparse signal recovery. In all our experiments we have used standard basis pursuit using a primal-dual algorithm [48] which finds the vector with smallest -norm (8). Algorithm 2 provides the implementation summary of the convex optimization and proposed decoder framework. Combining the decoded samples from the standard decoder and the texture synthesizer (CS optimisation) and doing the inverse of statistically matched wavelet transform, a synthesized image is reconstructed:

Inputs: Initial Reference Point, Observation Vector, Optimization Parameter
Initialize:   vector(guess point), measurement vector,
(tolerance for primal-dual algorithm) = 5, (Maximum primal dual iterations) = 20,
(Tolerance for conjugate gradients) = , (Maximum conjugate gradients) = 300
If (Valid Starting Reference, )
   Minimize subject to (Where, -convex, rank )
    (Use Primal Dual Interior Point Methods with Equality Constraint)
   While (Surrogate duality gap < OR Iterations > ) do
 (1)  Optimality Condition: ,
 (2)  Compute the Newton step and decrement ,
 (3)  Solve Positive definite system of equations from Newton step
    (3a)  Use Conjugate Gradients Method, MATLAB command, cgsolve
    (3b)  Stop Conjugate Gradient algorithm, If ( )
 (4)  Do line Search. Choose step size by backtracking line search
 (5)  Update.
 (6)  Update Central and Dual residuals
   end while
end If
Output: Sparse representation Wavelet Coefficients

5. Results and Discussion

In this section, we present the experimental results. For our experiments, we have used texture database from Brodatz album [36] and Portilla and Simoncelli [37] website to select different class of structural textured images (periodic, pseudoperiodic, and aperiodic) and complex structured photographic textures. All the test sequences are 128 × 128, 8-bit, and gray scale texture. PSNR and MOS have been used as the quality metrics. Mean Opinion Score (MOS) computation was done by collecting responses of various students and staff working in the lab and averaging them. All the experiments are carried out in MATLAB, running on windows XP PC with P4 CPU and 1 GB RAM.

5.1. Statistically Matched Wavelet Filter Estimation

In this section we present the simulation results of statistically matched wavelet based 2-D separable kernel wavelet filters used to input data decomposition and subbands representation in the proposed framework. Table 2 gives the result of statistically matched analysis wavelet filters estimated using input data with the filter length set to . Using the analysis filters we have computed all other wavelet filters as described in Section 2, to construct the filter bank. Figure 8 shows the reconstruction results of one of our test sequences (escalator), using different set of subbands coefficients for approximation and detail subbands. As one can see that by increasing the number of approximation subband coefficients (a = 1000 to 4000) while keeping the number of detail subbands coefficients constant (d = 1000) improves the reconstruction quality and PSNR significantly. As opposed to this if we keep the approximation subband coefficients at constant (a = 1000) and increase the number of detail subbands coefficients (d = 1000 to 4000), the reconstruction quality and PSNR remain almost unchanged. This experiment demonstrates the correctness of our theoretical claim that statistically matched wavelet based texture representation ensures that maximum energy is captured in the approximation subspace with very little information left in detail subspace. One can also observe from Figure 8 that we need smaller number of coefficients from detail subbands (we select 1000 coefficients out of total 13872 coefficients (3*68*68)) as compared to those from approximation subband. This experiment was performed to illustrate the significance of approximation and detail subbands coefficients for good quality reconstruction.

5.2. Texture Synthesis Results

In this section we present the texture synthesis results of our proposed scheme (SMWT-CS) and conventional DWT-based texture synthesis scheme in a CS framework (DWT-CS) [21, 35]. We have compared our simulation results with the conventional DWT based image synthesis scheme in a CS framework as presented in [21, 35]. In addition, we have done a comparative study of our synthesis results with joint statistics based statistical model for texture synthesis schemes as presented in [37, 42] and standard JPEG2000. The texture synthesis performance is measured using varying CS measurement samples such as = 2000 and = 4000.

(i) Figures 9 and 10 present the synthesis results for structural texture such as brickwall, escalator, and statistical texture such as floorbox and balckhole, using conventional DWT-CS and our proposed scheme. Table 3 gives the PSNR values and MOS scores of the synthesized texture. As one can observe from Table 3, the proposed scheme outperforms the conventional DWT-CS scheme and can provide significantly better PSNR (5 to 10 dB gain) for the same compressive measurements ( = 2000 or = 4000) or better compression at the same PSNR. It is important to note that SMWT-CS is able to reconstruct the textural pattern smoother and sharper as compared to conventional DWT-CS based scheme for the same CS measurements (Figure 9). In addition, we can observe that the texture synthesis quality can be improved by increasing the CS measurement. This can be observed both subjectively (MOS assessment) as well as objectively (PSNR data), hinting at the scalability in the proposed framework.

(ii) Figures 11, 12, and 13 show the synthesis results for structural textures with regular patterns such as , , and and structural textures with irregular patterns such as , , and D87 from Brodatz album [36]. As one can observe, the proposed scheme can synthesize the regular and irregular structural patterns with better PSNR and perceptual quality as compared to conventional DWT-CS schemes for the same CS measurements. When compared with the synthesis results for the same textures with other wavelet based texture synthesis schemes such as HMT-3S [42], the proposed scheme outperforms them in terms of perceptual synthesis quality. In fact, such joint statistics based statistical texture model cannot handle the regular and periodic structures as reported by the authors in [42].

(iii) Figure 14 shows the synthesis results for complex structured photographic textures such as stone wall and fish fabric from Portilla website [37]. As one can observe, the proposed scheme results in better PSNR and perceptual quality as compared to conventional DWT-CS schemes for the same CS measurements. Also, when compared with the synthesis results based on joint-statistical models for texture synthesis as suggested by Portilla and Simoncelli [37] for the same textures, the proposed scheme provides much better synthesis quality and the capability to synthesise all the patterns types.

When compared with JPEG2000, we obtain approx 4 dB PSNR gain for statistical textures such as blackhole and floorbox (Table 3). For structural textures such as brickwall and escalator (Figure 9), JPEG2000 performs better at the same bit rate. The reason for this is the presence of significant energy in the high frequency region for structured textures. The coding efficiency loss in a scene with very high frequency transition is due to the fact that compressive sensing recovery of sparse signals typically requires a number of measurements to be larger than the number of nonzero samples.

6. Conclusion

In this paper, we propose statistically matched wavelet based texture data representation and synthesis in a compressive sensing framework (SMWT-CS). Statistically matched wavelet based representation causes most of the captured energy to be concentrated in the approximation subspace, while very little information is retained in the detail subspace. We encode not the full-resolution statistically matched wavelet subband coefficients but only the approximation subband coefficients () using standard image compression scheme like JPEG2000. The detail subband coefficients, that is, , , and , are jointly encoded in a compressive sensing framework using compressive projection and measurements. The experimental results demonstrate that the proposed scheme can provide significantly better PSNR for the same compressive measurements or better compression at the same PSNR as compared to conventional DWT-based image compression scheme in a CS framework. It can also be observed that performing linear compression over the approximation subspace provides better reconstruction quality for the same number of samples as compared to CS measurements over approximation subspace. This indicates that performing linear compression over approximation sub-space and CS measurements over detail subspace provides optimal compression and reconstruction quality as against using only standard linear compression or using only CS measurements over both approximation and detail subspace.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.