Abstract

Many desirable properties make fractals a powerful mathematic model applied in several image processing and pattern recognition tasks: image coding, segmentation, feature extraction, and indexing, just to cite some of them. Unfortunately, they are based on a strong asymmetric scheme, consequently suffering from very high coding times. On the other side, linear transforms are quite time balanced, allowing them to be usefully exploited in realtime applications, but they do not provide comparable performances with respect to the image quality for high bit rates. In this paper, we investigate different levels of embedding orthogonal linear transforms in the fractal coding scheme. Experimental results show a clear improved quality for compression ratios up to 15 : 1.

1. Originality and Contribution

The literature about both linear transform-based image compression (discrete cosine transform—DCT, Discrete Wavelet Transform—DWT) and fractal image coding is sizeable. Nevertheless, there are still few contributions exploring possible fusions of the two approaches. From a theoretic point of view, the novelty of the proposed paper is just to investigate several ways in which the former can embed, or even be embedded in, the latter. As pointed out by the experimental results, fusing these methodologies allows to significantly speed up the fractal coding process, while retaining most of the objective quality of the decoded image, even at high compression rate. This represents a highly desirable feature for image coders in all those applications managing high-resolution images (e.g., GIS, satellite image databases, and cultural heritage).

2. Introduction

Fractal image compression is based on the self-similarity property of an image, and performs image compression by applying a series of transformations to the image. To perform image decompression, these transformations are applied iteratively until the system converges, a condition which may be assessed by using the Hutchinson metric. The main disadvantage of previous techniques for fractal image compression is the large amount of computation that they require. Several methods have been proposed in order to speed up fractal image coding [1, 2]. Speedup methods based on nearest neighbour search by feature vectors outperform all the others in terms of decoded image quality at a comparable compression rate [3], but they often suffer from the high dimensionality of the feature vector [4]; Saupe's operator represents a suitable example. To cope with this drawback, dimension reduction techniques are introduced. Saupe reduced the dimension of the feature vector by averaging pixels while in [5] discrete cosine transform (DCT) is used to cut out redundant information.

In the same way, linear transforms (LT) have been widely exploited to extract representative features or to codify groups of pixels in image indexing and compression applications. Indeed, Linear transforms form the basis of many compression systems as they decorrelate the image data and provide good energy compaction. For example, the discrete fourier transform (DFT) [6] is used in many image processing systems while DCT [6] is used in standards like JPEG, MPEG, and H.261. Still others are Walsh-Hadamard transforms (WHT) [6] and Haar Transforms (HT) [6]. There are many attempts in the literature for combining fractal coding with the discrete cosine transform [79] or with the discrete wavelet transform [10, 11], all of them facing the speedup problem of the coding phase. In more detail, in [7], image blocks are classified as smooth, diagonal/subdiagonal edge, and horizontal/vertical edge. The classification operation is performed using only the lowest horizontal and vertical DCT coefficients of the given block. Since the scheme is simple, the overhead is minor compared to the complexity of the encoder, but the speedup it can achieve is no larger than 3, which is quite smaller than that achievable by nearest neighbour search based methods. Zhao and Yuan, previously proposed a similar approach [9], in which the most attention was paid for reducing the bit rate; indeed, the original image is first compressed by DCT domain fractal transform instead of spatial domain fractal transform, then the difference image between the original image and its fractal approximation is quantized and encoded by a Huffman code. In this case, image ranges and domains are only partitioned in low and high-activity blocks while full search is performed only for the latter. A Similar approach has been developed by Zhang and Po in [11] by using HT instead of DCT. Range blocks and corresponding domain blocks are classified into edge selective categories by the energy compacted wavelet coefficients. The searching is carried out between the same classes for range/domain comparisons which significantly reduces the computational complexity, so that the speedup achieved is up by 12-times over the conventional full search method. All these methods make use of the lowest horizontal and vertical DCT/HT coefficients of a given block, but not exploiting nearest coefficients that also have an outstanding hand in the total energy. Fractal coding in wavelet domain has been also investigated by using the wavelet zerotree as in [8, 10], where a fractal method is adaptively applied to the parts of an image that can be encoded more efficiently relative to an EZW (embedded zerotree wavelet) coder at a given rate. However, in these cases wavelet coefficients are not used to speed up the coding phase but to improve the decoded image quality at a given rate. Hence, this paper sets as its main goal of investigation the ways of embedding an orthogonal LT 𝑇 into the standard PIFS- (partitioned iterated function systems-) based coding scheme. In more details, at first, linear properties of 𝑇 are exploited to dramatically reduce the computational cost of the encoding phase, by arranging its coefficients in a suitable way, Figure 1 depicts the architecture of such a scheme. Range and domain blocks are extracted from the input image, and an orthogonal linear transform (LT) is applied to extract feature vectors. Domain feature vectors are organized in a tree data structure, while range feature vectors are used to search the best approximating domain in the tree during the coding phase, speeding up the range/domain matching process. The residual information is computed as the difference between LT transformed versions of the original range and the approximated range; residual coefficients are quantized and stored in the final bit stream. In Figure 2, the decoding process is shown. The fractal decoded image is obtained iteratively by applying affine transformations to an all-black image. The 𝑇 transform is applied to fractal decoded ranges while residual coefficients are dequantized separately; transformed range coefficients are summed with residual information and then 𝑇1 inverse transformed to obtain the output range. The hybrid approach significantly outperforms Saupe's method for relatively small compression ratios, while becoming comparable when the bit-rate increases.

3. Theoretical Concepts

Partitioned iterated function systems (PIFS) consist of a set of local affine contractive transformations, which exploits the image self-similarities to cut out redundancies, while extracting salient features. In more details, given an input image 𝐼, it is partitioned into a set 𝑅={𝑟1,𝑟2,,𝑟|𝑅|} of disjoint square regions of size |𝑟|×|𝑟|, named ranges. Another set 𝐷={𝑑1,𝑑2,,𝑑|𝐷|} of larger regions is extracted from the same image 𝐼. These regions are called domains and can overlap. Their size is |𝑑|×|𝑑|, where usually |𝑑|=2|𝑟|. Since a domain is quadruple sized with respect to a range, it must be shrunk by a 2×2 average operation on its pixels. Due to the large number of domains, downsampling operations will introduce an overhead if performed for each range/domain comparison; the same result can be obtained in a more effective way by downsampling the whole image 𝐼𝐶=𝑐(𝐼). The downsampled image 𝐼𝐶 is a quarter of the input image 𝐼, so that downsampled domains are directly extracted from 𝐼𝐶 with side-length |𝑟|×|𝑟|.

The image 𝐼 is encoded range by range: for each range 𝑟, it is necessary to find a domain 𝑑 and two real numbers 𝛼 and 𝛽 such that the quadratic error is minimized with respect to the Euclidean norm. It is customary to impose that |𝛼|1 in order to ensure convergence in the decoding phase. To find the best approximating domain 𝑑, however, requires an exhaustive search over the whole set 𝐷, which is an impractical operation. Therefore, ranges and domains are classified by means of a feature vector in order to reduce the cost of searching the domain pool: if the range 𝑟 is being encoded, only the domains having a feature vector close to that of 𝑟 are considered [12].

On the other hand, a transformation 𝑇 is called linear if it has two mathematical properties: (a) additivity (𝑇(𝑥+𝑦)=𝑇(𝑥)+𝑇(𝑦)) and (b) homogeneity (𝑇(𝛼𝑥)=𝛼𝑇(𝑥)). The linear transform domain features are very effective when the patterns are characterized by their spectral properties; so, here, we investigate the feature extraction capability of the discrete fourier transform (DFT), the discrete cosine transform and the Haar transform (HT) [6].

4. Speeding up the Coding Phase

In order to reduce the computational cost of exhaustive search while still preserving a good image quality, we scan the domains in the pool to compute feature vectors that are organized in a K-dimensional-tree structure [13] (KD-tree) and will help us to choose the most promising candidate domains for encoding a given range.

Let 𝑟 and 𝑑 be a range and a domain block, respectively, and let 𝑇 be a two-dimensional orthogonal linear transform (FFT, DCT or HT), the range block 𝑟 can be rewritten in terms of 𝑑 by applying an affine transformation 𝑟=𝛼𝑑+𝛽𝑜(1)+𝑒, where 𝛼 and 𝛽 are two scalar values, 𝑜(1) is an all ones matrix of size |𝑟|×|𝑟|, and 𝑒 is the |𝑟|×|𝑟| matrix of the residual error. As the PIFS decoder has no knowledge of the residual error 𝑒, it must be minimized during coding process, so that the ̂𝑟=𝛼𝑑+𝛽𝑜(1) well approximates 𝑟=𝛼𝑑+𝛽𝑜(1)+𝑒. A feature vector 𝑢 can be extracted from ̂𝑟 and 𝑑 by reorganizing the coefficients of the transformation 𝑇; ̂𝑟=𝛼𝑑+𝛽𝑜(1)Applying𝑇𝑇(̂𝑟)=𝑇(𝛼𝑑+𝛽𝑜(1))Linearityof𝑇𝑇(̂𝑟)=𝛼𝑇(𝑑)+𝛽𝑇(𝑜(1))Transforming𝛽𝑇(̂𝑟)=𝛼𝑇(𝑑)+𝛽𝑂,(1) where𝑂=100000000.(2)

Γ being the transformed domain 𝑇(𝑑), the transformed range can be rewritten as𝑇(𝑟)=𝛼Γ00+𝛽𝛼Γ01𝛼Γ0𝑛1𝛼Γ10𝛼Γ11𝛼Γ1𝑛1𝛼Γ𝑛10𝛼Γ𝑛11𝛼Γ𝑛1𝑛1.(3)

Notice that only the first term of 𝑇(𝑟) is affected by 𝛽, and it represents the mean of 𝑟. As the main desired property of the feature vector is the independence from 𝛼 and 𝛽, the first element of the 𝑇(𝑟) matrix is then discarded while the remaining ones are rearranged in a linear vector 𝑢={Γ01,Γ10,,Γ𝑛1𝑛1} of dimension 𝑛21 by means of a zig-zag scanning that starts from the position (0,1). In order to also cancel out effects of 𝛼 on 𝑢, its elements are divided by the quantity 𝑢, indeed,1𝑢=𝑛21𝑛21𝑖=0𝛼Γ𝑖1=𝛼𝑛21𝑛21𝑖=0Γ𝑖=𝛼Γ.(4)

Finally, the real feature vector 𝑢 is given by𝑢=𝛼Γ01𝑢,𝛼Γ10𝑢,,𝛼Γ𝑛1𝑛1𝑢=𝛼Γ01𝛼Γ,𝛼Γ10𝛼Γ,,𝛼Γ𝑛1𝑛1𝛼Γ=Γ01Γ,Γ10ΓΓ,,𝑛1𝑛1Γ.(5)

5. Residual Information Coding

Linear transformations can be exploited within the fractal coding approach to a further extent, the residual information encoding. Indeed, in the proposed hybrid approach, the transform 𝑇 is applied to each of the pool domains during indexing stage while it is applied to every range during encoding. Consequently, during encoding, 𝑇(𝑟) and 𝑇(𝑑) are both known while the transform 𝑇(̂𝑟) of approximate range ̂𝑟 can be simply obtained as 𝑇(̂𝑟)=𝑇(𝛼𝑑+𝛽𝑜(1)) by exploiting 𝑇 linearity (as discussed in previous sections) without any further computing cost. Coefficients of the 𝑇 transformation can be further exploited to save part of the original energy lost through fractal encoding. More precisely, as ̂𝑟 is calculated by minimizing the approximation error 𝑒 for the range 𝑟, we can assume 𝛿𝑟=𝑇(𝑟)𝑇(̂𝑟) is represented by average small values which can be quantized by a reduced number of bits. To this aim, each 𝛿𝑟(𝑖,𝑗) is rounded to the nearest integer. Moreover, during decoding, the approximate range ̂𝑟 is obtained by means of fractal decoding, so, knowing 𝛿𝑟 and 𝑇(̂𝑟), we could recover 𝑇(𝑟) in terms of 𝑇(̂𝑟)+𝛿𝑟. It is clear indeed that the core of following discussion is the proposal of a highly efficient strategy to store if not all, at least a fraction of information contained in 𝛿𝑟 while still preserving a useful quality/bit rate ratio. The compact representation of values 𝛿𝑟(𝑖,𝑗) within the file is the most crucial aspect of the proposed method. Hereafter, this topic is presented in detail. The greater the module of a coefficient 𝛿𝑟(𝑖,𝑗) the more relevant its encoding accuracy. The range of 𝛿𝑟 coefficients is not formally bounded, but we need to restrict this interval to a limited range in order to maximize the effectiveness of the proposed quantization strategy. Experimentally, we found that the most of the energy of 𝛿𝑟 coefficients could be isolated by selecting only those falling in a limited range [𝜖,2𝜖]. The constant 𝜖 is an integer value, which is computed so that modules of 𝛿𝑟(𝑖,𝑗) coefficients near to 3/2𝜖 for any 𝑟𝐼 are maximized. The 𝛿𝑟(𝑖,𝑗) coefficients are approximated by 3/2𝜖 if their module falls in the range [𝜖,2𝜖] and set to zero otherwise. To this regard it is possible to define a criterion to automatically set the 𝜖 value. For every range block 𝑟 in 𝐼, 𝛿𝑟(𝑖,𝑗) coefficients are computed and a global histogram 𝐻 is generated, where 𝐻(𝑡) is the number of times the module |𝛿𝑟(𝑖,𝑗)| of some 𝛿𝑟(𝑖,𝑗) equals 3/2𝑡. Thus, the value of 𝜖 is given by 𝜖=argmax𝑘2𝑘𝑡=𝑘(𝑡𝐻(𝑡)). A graphical example of Lena's histogram together with the corresponding value for 𝜖 with a compression ratio of 16.5 is reported in Figure 3.

An even greater attention has to be paid for quantization and encoding of values 𝛿𝑟(𝑖,𝑗). After 𝜖 has been computed, coefficients with |𝛿𝑟(𝑖,𝑗)| lower than 𝜖 are set to zero, so that only values greater than threshold 𝜖 can be found within each block. So if a block does not contain any |𝛿𝑟(𝑖,𝑗)|>0, this means that it has not useful residual information and it needs no further processing than a flag set to 0 in the bitstream, otherwise the flag is set to 1. Because many rows 𝑖 of the block 𝛿𝑟 can be empty, a further bit-rate reduction can be achieved separating the rows 𝑖 containing 𝛿𝑟(𝑖,𝑗)0 from those containing 𝛿𝑟(𝑖,𝑗)=0, for all 𝑗. More precisely, each row 𝑖 of block 𝛿𝑟 is associated to a bit 𝑏, where 𝑏=1 if there exist 𝑗𝛿𝑟(𝑖,𝑗)0 otherwise 𝑏=0, for a total of |𝑟| bits (where |𝑟| is the side length of the range block 𝑟). In the proposed approach, 𝛿𝑟(𝑖,𝑗) coefficients are approximated to a constant value of 3/2𝜖 (the center of range [𝜖,2𝜖]) rather than being singularly quantized; so that for each coefficient it is sufficient to store in the file just the sign and column info 𝑗. As this results in block 𝛿 in a sparse matrix, each element can be considered as an ordered couple (𝑖,𝑗); while due to the aforementioned representation (which distinguishes between empty and notempty rows), it is possible to further reduce the bit-rate. Considering that the values 𝛿𝑟(𝑖,𝑗) are consecutively read from the file, one row after the other, it is possible to save in the file only the column 𝑗 of each coefficient as for all the coefficients sharing the same row the columns are increasingly ordered. Consequently, while reading a column sequence, if the one just read is smaller than the previous one, this implies that the present column index belongs to a following row. Whereas the correct row to which the column sequence belongs can be retrieved as the first notempty row (previously flagged with a 1) and not necessarily be the very following row 𝑖+1, because empty rows are frequent. In the end, for each block containing residual information, only 1+|𝑟|+2𝐾log2(|𝑟|) bits are saved, where 𝐾 is the number of 𝛿𝑟(𝑖,𝑗)0, otherwise just a single bit set to 0 is saved. A graphical representation of the residual encoding process is provided in Figure 4.

Overall, the residual encoding framework can be resumed in the following lines.

Coding Process
(a)Delta computation 𝛿𝑟(𝑖,𝑗)=𝑇(𝑟)𝑇(̂𝑟), 0𝑖,𝑗<|𝑟|, (b)If |𝛿𝑟(𝑖,𝑗)|<𝜖 or |𝛿𝑟(𝑖,𝑗)|>2𝜖 set 𝛿𝑟(𝑖,𝑗)=0, (c)For all 𝑖,𝑗|𝛿𝑟(𝑖,𝑗)>0 write 𝛿𝑟(𝑖,𝑗) in compact form.

Decoding Process
(a)Image decoding 𝐼 through PIFS, (bRead 𝛿𝑟 from file, (c)For each range ̂𝑟 with 𝛿𝑟0, (d)compute 𝑇(̂𝑟) and replace ̂𝑟 with 𝑇1(𝑇(̂𝑟)+𝛿) in image 𝐼.

6. Experimental Results

Tests have been conducted on a dataset of twenty images, twelve of them coming from the Waterloo Bragzone Standard database [14], and the remaining eight from the web. A large variability in testing conditions has been ensured by selecting test images containing patterns, smooth regions, and details. They are all 8-bit grayscale images at a resolution of 512 by 512 pixels. The main aim of our tests is to assess the efficiency of the LT-based feature vector and the improvements provided by LT coding of residual information. The bit allocation for the range/domain approximation parameters is set as follows: 4 bits for 𝛼, 7 bits for 𝛽, log2(|𝐷|) bits for the domain coordinates (where |𝐷| represents the size of the domain pool), and 3 bits for the isometry. The compression ratio has been calculated as the ratio between the original image size and the encoded image size while the objective image quality has been measured in terms of peak signal-to-noise ratio (PSNR). In order to further assess the performance of the hybrid scheme, we also compared it with Saupe's algorithm [3].

A comparison with Saupe's algorithm, as shown in Figure 5, underlines the particular behavior of the hybrid scheme 3 variants (PIFS-LTD, PIFS-LTH, and PIFS-LTF). It obviously comes out that the FFT provides very poor performances, which represents a further confirmation that LT yielding real and imaginary coefficients are not effective at all when applied into the PIFS coding. Figure 5 also points out that DCT- and Haar-based feature vectors (in PIFS-LTD and PIFS-LTH, resp.) have almost quite comparable performances. Furthermore, they show better performances than the FFT (PIFS-LTF) and Saupe's vector. The main reason motivating the superiority of the PIFS-LTD and PIFS-LTH is that DCT and Haar transforms retain the most of the image information in its first coefficients, so when a shorter vector is obtained by truncating the original one to a little number of coefficients, more representative features are retained. On the contrary, this does not happen for Saupe's vector that is usually reduced by averaging its components.

In Figure 6, the results obtained through the integration of both domain classification method (PIFS-LTD and PIFS-LTH) and linear transform-based residual information encoding (in this case DCT) are shown. The advantage provided by residual information encoding is more relevant for small compression ratios. The reason for this can be found considering that for high-quality fractal decoded images the residual features relatively small values which are more easily encoded via the proposed technique. Figure 7 also confirm this thesis. The first image in its top row is the original image while the following two show the nonresidual coded block in blue at two different compression ratios, 4.8 and 11.5. Notice that residual coded pixels in the last column outnumber that in the second one, because of the larger distortion introduced by the fractal encoder. The second and third rows show a detail of the decoded image with (third row) and without (second row) residual coding, underlining that details added by residual information can damp blocking artifacts.

In Table 1, encoding times and corresponding PSNR values with and without residual coding are reported for both Saupe- and LT-based fractal encoders. Results refer to the 512×512 grayscale Lena image while the system runs on an Intel Core Duo 2.0 Ghz with 1 GB RAM. Results in Table 1 clearly underline the superiority of the LT-based classification scheme with respect to Saupe's approach. This is due to the ability of Linear Transformations (such as the DCT) to compact the most part of the signal energy in their first coefficient, turning in a more accurate indexing of the range/domain pool. The last column also points out that residual coding contribution is of about 0.5–0.8 dB to the final quality estimation of the decoded image.

The hybrid scheme proposed in this paper has also been compared with state-of-the-art classification approach such as [7]. The outcomes of this comparison are reported in Table 2. The first row shows time and PSNR of the full range/domain search for the 256×256 grayscale Lena image while the second row reports analogous information when Duh's speedup method is adopted. As it can be seen Duh's classification scheme produce a speedup of about a factor 3, confirming what the authors claim in [7]. On the contrary, the proposed speedup technique provides a speedup of about a factor 150 without the residual coding and 90 in the case the residual information is encoded too. This is attributable to the number of range/domain comparisons performed during the encoding process. Indeed, Duh's approach divides the total amount of range/domain comparisons by 3, while in the proposed scheme the each range is compared with no more than 50 domains with a feature vector similar to that of the range.

A further comparison with state of the art techniques is given by considering the hybrid fractal/wavelet approach proposed by Iano et al. [15] as shown in Table 3. This technique presents a certain affinity to the proposed method, as they both code base information through fractals, then exploiting additional processing to code the residual (detail) information.

7. Conclusions

In this paper, we presented two different ways to exploit linear transformations for a PIFS-based fractal encoding strategy. Indeed, linear transformations have been used to define a range/domain classification vector which drastically reduces the computational weight of encoding stage, by avoiding most of least significant range/domain comparison. Furthermore, linear transform properties have been exploited for residual information encoding, resulting in a decoded image quality far superior than in classic fractal approaches. A comparison with the Saupe fractal approach clearly shows the improvements achieved through the proposed method while a comparison with three among the most known linear transforms (FFT in PIFS-LTF, DCT in PIFS-LTD and Haar in PIFS-LTH) confirms how all of these differently contribute to enhance the basic scheme. The proposed hybrid approach has also been compared with a state-of-the-art classification scheme underlying that it can obtain higher speedup factors but at a comparable decoded image quality.