Abstract

Fast normalized covariance based similarity measure for fractal video compression with quadtree partitioning is proposed in this paper. To increase the speed of fractal encoding, a simplified expression of covariance between range and overlapped domain blocks within a search window is implemented in frequency domain. All the covariance coefficients are normalized by using standard deviation of overlapped domain blocks and these are efficiently calculated in one computation by using two different approaches, namely, FFT based and sum table based. Results of these two approaches are compared and they are almost equal to each other in all aspects, except the memory requirement. Based on proposed simplified similarity measure, gray level transformation parameters are computationally modified and isometry transformations are performed using rotation/reflection properties of IFFT. Quadtree decompositions are used for the partitions of larger size of range block, that is, 16 × 16, which is based on target level of motion compensated prediction error. Experimental result shows that proposed method can increase the encoding speed and compression ratio by 66.49% and 9.58%, respectively, as compared to NHEXS method with increase in PSNR by 0.41 dB. Compared to H.264, proposed method can save 20% of compression time with marginal variation in PSNR and compression ratio.

1. Introduction

The emerging multimedia applications such as video conferencing, video over mobile phones, video email, and wireless communications require an effective video coding standard to achieve a low bit rate with good quality. Performance of video coding standard depends on parameters such as quality of reconstructed video, compression ratio, and encoding time. Fractal based video compression [1] is an alternative to accomplish high compression ratio with good quality reconstructed output video than the existing video standards (MPEG, H.263, H.264). Recently, various researchers proposed different algorithms to improve the fractal encoding speed.

Jacquin [2] proposed an innovative technique which is based on the fractal theory of iterated function system for image compression. It reduces an affine redundancy of an image by using its self-similarity properties. Video sequences contain temporal redundancies between consecutive frames that can be easily removed by using fractal based technique. Fractal coding has received attention of researcher due to its advantages of independent resolution, high decoding speed, and high compression ratio [3, 4]. But high encoding time is main drawback of fractal based method; due to this it is not useful for real time applications. The motivation of this method is that it gives high compression ratio with good quality output which is useful for storage and transmission of bulky videos. To increase the encoding speed with keeping motivational parameters, a fast fractal video coder system is proposed.

Cube-based [5, 6] and frame based [7] fractal compression methods are used frequently for video compression. In cube-based compression, the video is divided into groups of frames, each of which in turn is partitioned into three-dimensional (3D) domain and range blocks; however, it has high computing complexity and low compression ratio. In frame based compression, each frame is encoded using the previous frame as a domain pool which introduces and spreads the error between the frames and it can be used to obtain a high compression ratio. Wang proposed a fixed block size hybrid compression algorithm [8] and an adaptive partition instead of fixed-size partition [9], which merges the advantages of cube-based and frame based fractal compression method. Another hybrid coder scheme which combines neighborhood vector quantization with fractal coding to compress the video as a 3D volume was proposed by Yao and Wilson [10]. Fractal approach for 3D searchless [11], prediction of error frame for low bit rate video [12], and wavelet transform based video coding approach [13, 14] are also considered for compression of videos.

Circular prediction mapping (CPM) and noncontractive interframe mapping (NCIM) are proposed by Kim et al. [15], to combine the fractal sequence coder with well-known motion estimation/motion compensation algorithm that exploits the high temporal correlations between the frames. Fractal video coding using a new cross hexagon search (NHEXS) algorithm is proposed [16] for higher motion estimation speed for searching stationary and quasi-stationary blocks. The regions can be defined according to [17, 18] a previously computed segmentation map and are encoded independently using NHEXS based searching technique. A new object-based method [19] is introduced in the transform domain using shape-adaptive DCT for stereo video compression. Zhu et al. proposed an automatic region-based video coder [20] with asymmetrical hexagon searching algorithm and deblocking loop filter to improve decompression video quality. High efficiency fractal multiview codec is presented in [21] to encode anchor viewpoint video using intraprediction modes and fractal coder with motion compensation technique. Three-step search algorithm is modified in [22] using two cross search and two cross hexagon search patterns to implement fractal video coder.

Block based motion estimation and motion compensation algorithms exploit the high temporal correlations between the adjacent frames. In frame based fractal video coding, range and domain blocks need to be matched with proper selection of geometrical transformation, scaling, and luminance factors. Normalized covariance, that is, Zero Mean Normalized Cross Correlation (ZNCC) is a method for determining the structural similarity between two blocks from the image [23]. The best matched domain block having high normalized cross correlation [24, 25] value may have large average gray level difference. This difference is reduced to zero or very small value by selecting a proper fractal encoding parameters. But the direct computation of ZNCC for every range block is computationally very expensive. Sum table based method significantly minimizes the computation complexity of ZNCC. It is a precalculated running sum discrete structure of the entire image and acts as a look-up table for the calculation of definite sum according to the size of block.

In this paper, a fast fractal based video coder is proposed using the normalized covariance algorithm as a similarity measure. It uses three levels of quadtree partition for motion estimation, which provides good balance degree of variation to picture content and helps to improve the compression ratio. The complexity of covariance between range and all domain blocks is simplified and implemented in one computation using FFT algorithm. Computational complexity of scaling factor and brightness factor are also minimized with new simple expression based on normalized covariance concept instead of traditional mean square error (MSE). The speed of fractal encoding process is further increased by incorporating a few steps such as FFT based or sum table based method; either one is used to perform the normalization of covariance component, eight isometry transformations’ operation using 2D IFFT properties, and the early search termination technique. Performance of video compression using FFT based and sum table based methods is separately verified and they are nearly equal to each other. These techniques can be used to improve the subjective quality of video and coding efficiency.

The rest of the paper is organized as follows. The basic fractal block coding for the image is described in Section 2. Normalized covariance based motion estimation and quadtree partition are explained in Section 3. Fast fractal video coding using FFT is presented in Section 4. The experimental results and comparative study of the proposed algorithm with existing algorithms are presented in Section 5. The conclusion is outlined in Section 6.

2. Fractal Image Coding Theory

Fractal image coding is based on the theory of the partitioned iterated function system (PIFS) [2]. It consists of a set of contractive transformations; when this transformation is applied iteratively to an arbitrary image it will converge to an approximation of the original image. Images are stored as a collection of transformations, which will result in image compression.

The original image of size is initially partitioned into nonoverlapping range blocks () of each size . Similarly, the same image is partitioned into overlapping domain blocks () of each size as a domain pool with one pixel shift in horizontal and vertical direction (). For each range block, locate the best matching domain block from the domain pool and then apply contractive mapping which minimizes the MSE between range and contractive domain block. A range-domain mapping consists of three operations [3] sequentially on each domain block of size : (1) spatial contraction of the domain block () by downsampling or averaging the four neighboring pixels of disjoint group forming a block () of size ; (2) taking 8 geometrical transformations of each block which includes 4 rotations with 90 degrees and 4 mirror reflections; (3) for each geometrical transformed block perform contractive affine transformation on the grayscale values and select the parameters which give lowest MSE. The error between range () and one of the domain () blocks is measured by equation (1) and scaling factor “” and brightness factor “” of an affine transformation are calculated by (2) and (3), respectively. where is the number of pixels in the block and , are the pixel values of range block and contractive domain block at coordinates . For each range block, the parameters which need to be stored as a fractal encoded data are the coordinates of domain block along with and and geometric transformation index. Gray level transformation parameters “” and “” should be in the range of −1.2 to 1.2 and −255 to 255, respectively [3], to make sure that the transformation is contractive. At the decoder, these fractal parameters are iteratively applied to an arbitrary initial image according to the encoding block size, which will finally converge to a reconstruction of the original image after certain number of iterations.

3. Normalized Covariance Based Motion Estimation

The ZNCC is a recognized similarity measure criterion and is considered as one of the accurate motion estimators in video compression. In fractal video coding, the best matched domain block is decided after applying the proper affine transformation which gives least mean square error. The ZNCC method is more robust under uniform illumination changes and less sensitive to noise; hence it can help to increase the coding efficiency and improve subjective visual quality of output video. For identical regions, it may give a high NCC value for the best match, but with a large average gray level difference. This difference is minimized by selecting the accurate luminance and geometrical transformation parameters. These fractal parameters are interpreted as a kind of motion compensation technique due to unavailability of error frame. In discrete domain, the range block () of current frame is shifted pixel by pixel across the search window () of the reference frame. The estimation of motion relies on the detection of maximum ZNCC function between and . The ZNCC is defined as

In (4) denotes the mean value of within the area of the range block shifted to () position and similarly denote the mean value of the range block . represents the ZNCC surface matrix between the current macroblock and reference search window. If the search range is pixels in both directions from the same location of then the size of the will be . Fractal encoding is itself complex method and use of ZNCC function to estimate motion vector can make the combined technique computationally expensive and time consuming. To overcome these complexity problems, an efficient method of ZNCC calculation has been proposed which is based on new optimal and transformation parameters.

3.1. Quadtree Based Partition

Quadtree decomposition method initially partitions the current frame into a set of larger size (16 × 16 pixels) range blocks at level-1 (). Motion vector of the matched block is decided after verifying the highest three peak locations of ZNCC surface matrix. This verification is required because after quantizing gray level transformation parameters, the error between the blocks may vary. If the lowest quantizing error of larger size block () is above the prespecified threshold, then the block is partitioned in four quadrants . The partitioning scheme [26] can be recursively continued until the error becomes smaller than the threshold or 4 × 4 pixels block size () is reached.

Here, 3 levels of quadtree partitioning are employed with block sizes from 16 × 16 to 4 × 4 pixels. All subsequent partitions of 16 × 16 block size are represented by one code word as shown in Figure 1. The length of code word can be either 1 bit or 5 bits; it depends on only level-1 and level-2 nodes. If the level-1 block is not partitioned then the code word is 1 bit; otherwise it is assigned 5 bits. Similarly, if the corresponding block is partitioned, then it is represented by bit 1; otherwise it is represented by bit 0 as shown in Figure 1.

4. Fast Fractal Video Coding Algorithm

The most important factor that affects the speed of fractal encoding is the searching of matched domain block. In fractal video compression, searching process is limited to the size of search window but it is also expensive in comparison with standard video coder (H.264/MPEG4). We proposed a FFT based ZNCC method with new matching criteria which reduced the block searching time. Current frame () is initially partitioned into nonoverlapped range blocks “” of size pixels. The search window “” of size is defined on the previously decoded frame (), that is, reference frame of size . The flow diagram of the proposed fast fractal video coder is shown in Figure 2(a). The ZNCC matching process may find the wrong matched domain block if the range block is homogeneous. If the variance of any range block is below the smallest predefined threshold, then only mean value of that homogeneous block needs to be encoded; otherwise block belongs to nonhomogeneous group. The root mean square (RMS) measure is used to find the prediction error () between range () and matched domain () block. All the fractal parameters along with corresponding error () are the output of fast fractal searching algorithm as shown in Figure 2(b). The given block “” is partitioned into four quadrants when is above the specified threshold (); otherwise encode and save the fractal parameters. In this paper depth of quadtree is three levels; that is, , so two error thresholds are specified, that is, and .

Due to the high temporal correlation in video sequences, range-domain mapping becomes more effective if the sizes of range and domain block are the same [15]. The quality of reference frame plays an important role while mapping two same size blocks. If interframe motion vector prediction is based on good quality reference frame (previous frame), then the prediction error will be low. But for subsequent frames this error monotonically increases due to cumulative process. So to get a better quality reference frame at the beginning, intraframe is compressed using DCT transformation and quantization technique [27, 28].

4.1. Efficient Searching with Simplified Similarity Measure

The high computational complexity is the main drawback of ZNCC similarity measure method in spatial domain. Therefore, (4) is simplified to minimize the complexity. Numerator component of (4) is represented as and is rewritten as follows:where ; it has zero mean and because the sum of is zero, the term will also be zero and (5) can be written asEquation (6) is independent of the mean () value of domain block. All the domain blocks are overlapped and value of each block is different. is the cross correlation of and block, where and are the location of domain block . It is equivalent to the inner product operation of two bocks represented by operator. This cross correlation is equal to the complex conjugate multiplications of two frequency domain components. The complex conjugate part can be avoided by taking the inverse FFT (IFFT) of one of the input instead of FFT because , where is a constant term. In frequency domain, size of both the input blocks must be equal. The size of block is increased to the size of by padding zeros on the left and down the side of the block. The calculation of (6) for all the blocks can be computed in one computation using FFT as given below.Similarly, the denominator component of (4) is the product of standard deviation of and blocks. Due to cancellation of () factor of standard deviation with numerator term, it turns into norm as (8) and (9). The norm of (8) is also expensive because it is repeated for each overlapped domain block, whereas the norm of (9) is unique for all domain blocks. The norm of expression is simplified and is written as (10) in the following:Equation (10) can be simplified using (11) toThis can be represented in terms of mean value of domain block as All the overlapped domain blocks which are shifted by one pixel also require similar computations and mostly these will be repeated from the previous blocks. Two different methods are proposed to prevent unnecessary computations of the mean and norm of all overlapped blocks, that is, FFT based and sum table based. The ZNCC (4) can be rewritten using (7), (8), (9), and (13) to (14). The flow diagram of fast efficient searching algorithm with fast computation of ZNCC (14) is shown in Figure 2(b). In video sequences, more number of the blocks can be observed as stationary blocks with zero/nonzero motions. Such blocks can be identified when the highest value on ZNCC surface is nearly equal to one () and corresponding quantized value of and is one and zero, respectively. After finding the motion vector () of the stationary domain block, searching process can be terminated if the RMS error () is below the threshold () with isometry index . Due to early termination of searching process coding efficiency is increased with unchanged quality. The norm of can be neglected from (14) because it is constant and divisional term for all domain blocks and it does not change the final ZNCC result.

4.1.1. FFT Based Method for Mean and Norm Calculation of Overlapped Blocks

The sum of squared pixels and the sum of pixels according to the size of range block are similar to the convolution with unit () matrix of same size. FFT and IFFT combinations are used for fast computation. The flow diagram of the mean and norm calculation of all the domain blocks in one computation as matrices using FFT is shown in Figure 3. FFT of input is readily available from (7) and the IFFT of unit () matrix with zero padding is constant only one time computation. So, only FFT of squared pixels and last two IFFT are required along with others blocks. The mean () and norm () of all the blocks are required during gray level transformation and normalized covariance calculations.

4.1.2. Sum Table Based Method for Mean and Norm Calculation of Overlapped Blocks

The sum table based method also can be used to reduce the number of computations required to compute the mean and norm of all domain blocks. The precomputation of two sum tables and over the previous frame and squared pixel frame, respectively, acts as look-up tables. These tables are recursively constructed for each frame before the beginning of encoding process, defined bywhere and are the pixel coordinates of frame with and . So initial condition when either or . These two sum tables are partitioned according to the size of search window to and subtables of size (). It consists of one additional row and column at the initial position in comparison with search window size. The sum expressions in (12) over and can be calculated efficiently using subtables.Using (16) and (17) the norm (12) and the mean (18) of all domain blocks can be calculated as matrices of size 17 × 17, and .According to the three levels’ quadtree partition, it may require total six memories to store these tables of each size . But instead of six only two memory spaces are defined according to the size of level three blocks and based on that remaining two subtables are calculated.

4.2. New Gray Level and Isometry Transformation

The gray level transformation parameters such as scaling factor “” and brightness factor “” for each range block is based on certain domain block which strongly matches (19) when MSE is zero.Using (8) and (9), (19) can be represented as so and .

When the highest peak of ZNCC is close to one, that is, as per (14), then and denote the coordinates of matched domain block. The gray level parameters ( and ) are estimated by substituting in (14), which are computationally more efficient form as given belowAs per (21) both numerator and denominator terms are available in (14) except square operation and in (22) and are also available while finding and . So the computational complexity of these both the equations are minimal.

The searching algorithm has the difficulty of applying 8 different isometry transformations to the individual domain block since it operates on the entire search window using 2D FFT. Hence the isometry transformations are performed on the range block instead of domain block. The initial IFFT of zero padded range block acts as a IFFT of zero degree transformation block. The remaining seven isometry transformations along with their IFFTs are calculated based on the previous IFFT of range block, by applying rotation and reflection properties of 2D IFFT [26]. This helps to avoid the repeated IFFT calculations; due to this it increases the searching speed of the algorithm. In fractal based fast motion estimation using simplified similarity measure with quadtree partition, operating in frequency domain is one of the features of this paper.

5. Experimental Results

The performance of the proposed fast normalized covariance based fractal video coder with simplified similarity measure is evaluated. The denominator of (14) is implemented by using two different methods; one FFT based method is represented as proposed-FFT and other sum table based method is represented as proposed-ST. The popular video sequences (352 × 288 pixels of each sequence) [29], foreman, carphone, Tennis, news, and coastguard, are used to evaluate the performance of proposed methods. Range blocks are formed according to three-level quadtree partition criterion with block size 16 × 16 pixels at level 1, 8 × 8 pixels at level 2, and 4 × 4 pixels at level 3. The smallest predefined threshold to detect a homogeneous block at each level is () and the remaining two thresholds are and to obtain good quality output video. A search area on the reference frame is ±8 pixels in both vertical and horizontal directions from the same position as of range block on the target frame. Along with proposed methods, CPM/NCIM and NHEXS algorithms are also implemented. In CPM/NCIM method first 3 frames are set for CPM and the remaining frames are using NCIM. The video sequences are also compressed using H.264, JM 18.6 reference software [30] to compare the performances. The parameters of H.264 coder are defined as high profile, quantization parameter OPP: between 28 and 36 selected to ensure good quality, search range: 16, macroblock partitioned: 4 × 4, 8 × 8, and 16 × 16, group of pictures (GOP): 12 or 15, and entropy based coding method: universal variable length coding (UVLC). All the methods including proposed methods are implemented in MATLAB 7.14 and simulated on a PC (Intel Core i5-2400 CPU, 3.10 GHz, 3.16 GB RAM).

Fractal parameters of each range block are quantized separately: gray scale factors s and o are quantized by assigning 5 bits and 7 bits, respectively; coordinates of matched domain block () are encoded with 4-bit length code words and 3 bits for the indexing of isometry transformations. In comparison with all the presented methods, only sum table based method requires two additional memories of each size of bytes.

To evaluate the performance, the results of proposed algorithms are compared with traditional CPM/NCIM and NHEXS algorithms. All these algorithms are also compared with H.264. BY keeping the fixed value of PSNR for each video sequence, encoding time and compression ratio (CR) are calculated using all the methods. Table 1 shows the comparison of average coding results of five video sequences. In each sequence, PSNR and CR of proposed methods are the references for other methods analysis. The average coding time of the proposed method is decreased by 98.17% and 66.49% of the CPM/NCIM and NHEXS methods, respectively. In comparison with proposed method, the average compression ratio of CPM/NCIM and NHEXS is decreased by 55.75% and 9.58%, with 4.12 dB and 0.41 dB reduction in PSNR. The proposed methods present an average of 20% reduction in coding time and 2% decrease in compression ratio with marginal degradation of PSNR by 0.15 dB if compared to the H.264.

Table 2 shows the comparison of performances of the proposed methods with existing fractal video coder methods [17, 18, 20]. In each existing method, the different video sequences with different GOPs are used for analysis. The sequence highway (15 frames, 352 × 288 pixels) is used and the compression ratio is 3.55 times with PSNR 1.72 dB higher than average of [17, 18]. The silent and mother-daughter (20 frames each, 352 × 288 pixels) sequences are used and their average PSNR increased by 4.11 dB and bit rate decreased by 33% in comparison with algorithm in [20]. For low bit rate videos, these proposed methods give much better performance than H.264. In Table 2 the proposed method can save the compression time by 55% with marginal reduction of PSNR in comparison with H.264. The performance of proposed-ST and proposed-FFT based methods are almost equal in terms of PSNR, encoding time, and compression ratio as shown in Tables 1 and 2. For low bit rate videos, the proposed methods give high compression ratio and very less time in comparison with H.264. News, highway, mother-daughter, and silent are the low bit rate videos and others are high bit rate videos.

A statistical measures were used to compute the score distribution range of result parameters. Confidence interval (CI) generates an upper and lower limit for the mean; it gives an indication of how much uncertainty there is with true mean. The -distributions for each of the test parameters with sample size , sample mean , sample standard deviation sd, and desired significance level are used to define the confidence limits as follows:The confidence interval can be computed for different value of and () degrees of freedom. In the result analysis, the 90% confidence interval is calculated for each test parameter with the value of in (23) being 1.667. Table 3 shows a 90% CI width for a mean of PSNR, encoding time, and compression ratio for various video sequences. Due to fixed value of PSNR, the average half-width CI of PSNR is almost equal for both the methods. The 90% interval width of encoding time and compression ratio is narrow; it means proposed method also gives higher accuracy as compared to standard video coder with less encoding time.

Figure 4 shows a frame-wise performance comparison for 204 frames of foreman video sequence between the proposed method and H.264. Due to cumulative error, the PSNR of decoded frames slightly decreases as the frame number increases from every intraframe in proposed method. This error is minimized by using proper selection of gray level transformation as shown in Figure 4(a). The results show that the compression ratio and PSNR of the proposed method for each frame are marginal changes proportional to the H.264 results. The compression time of the proposed method as shown in Figure 4(c) for each frame is, on average, 0.6 sec. (27%) lesser than H.264. High encoding time drawback has been overcome by using proposed fast fractal video coder method. In addition to this it gives good quality output and high compression ratio approximately equal to standard (JM v18.6) video coder as shown in Table 1. Human visual system (HVS) does not perceive the smallest change in PSNR (≤0.8 dB) between the H.264 and proposed method. Figure 5 shows the 62nd original and decoded frame of “Tennis” sequence using H.264 with 32.26 dB and proposed method with 32.12 dB.

6. Conclusion

In this paper, a quadtree partition based fast normalized covariance for fractal video compression is presented. A simplified normalized covariance for similarity measure, eight isometry transformations using IFFT properties, and modified new gray level transformation parameters are proposed and estimated using FFT to improve the encoding speed and output quality. Meanwhile, this method can use FFT based or sum table based approaches to normalize the covariance matrix, which further increases the encoding speed significantly. They are used for the calculation of mean and standard deviation of all overlapped blocks in one computation. The results of using these approaches are almost equal in all perspective. The main drawback of sum table based method is that it required large memory space to store the tables as compared to the FFT based method. Quadtree partition helps to achieve high compression ratios with good quality output. The proposed methods can save the encoding time by 98.17% and 66.49%, compression ratio is increased by 129% and 9.58%, and the output quality increased by 4.12 dB and 0.41 dB in comparison with CPM/NCIM and NHEXS methods, respectively. In comparison to H.264, this method saves 20% of compression time with marginal degradation in frame quality and compression ratio.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.