Abstract

We propose a novel fractal video coding method using fast block-matching motion estimation to overcome the drawback of the time-consuming character in the fractal coding. As fractal encoding essentially spends most time on the search for the best-matching block in a large domain pool, search patterns and the center-biased characteristics of motion vector distribution have large impact on both search speed and quality of block motion estimation. In this paper, firstly, we propose a new hexagon search algorithm (NHEXS), and, secondly, we ameliorate, by using this NHEXS, the traditional CPM/NCIM, which is based on Fisher's quadtree partition. This NHEXS uses two cross-shaped search patterns as the first two initial steps and large/small hexagon-shaped patterns as the subsequent steps for fast block motion estimation (BME). NHEXS employs halfway stop technique to achieve significant speedup on sequences with stationary and quasistationary blocks. To further reduce the computational complexity, NHEXS employs modified partial distortion criterion (MPDC). Experimental results indicate that the proposed algorithm spends less encoding time and achieves higher compression ratio and compression quality compared with the traditional CPM/NCIM method.

1. Introduction

Fractal image compression reduces the redundancy of images by using self-similarity properties and seems to be a favorable method for the image compression due to its advantages of high compression ratio, fast decompression, and resolution independence [1]. This is particularly suitable for the situation of one encoding and many decodings [2]. Fractal coding is thought publicly to be one of the three most developing outlook codec methods [3]. However, its major drawback is that fractal encoding is complex and time consuming to search for the best-matching block in a big pool of domain blocks, and this seriously embarrasses the fractal image coding method’s application into the practice. Fisher classified the image blocks (range block and domain block) [1]. An image block was divided into four quadrants. The average and the variance were computed for each quadrant, so, for the four quadrants, 72 classes were constructed. The range block found matches only in the domain pool with the same class. This reduced the search space efficiently [3]. However, it needs a large amount of computations in the classifying process and some range blocks may not find the matching blocks in the same class in the domain pool [4]. As the developing of the fractal image compression, the fractal coding method has been applied in video sequence compression [2, 5], for instance, the famous hybrid circular prediction mapping (CPM) and noncontractive interframe mapping (NCIM) [6]. The CPM/NCIM combines the fractal coding algorithm with the well-known motion estimation and compensation (ME/MC) algorithm that exploits the high temporal correlations between adjacent frames. In CPM and NCIM, each range block is motion compensated by a domain block in the previous frame, which is of the same size as the range block even though the domain block is always larger than the range block in conventional fractal image codec. The main difference between CPM and NCIM is that CPM should be contractive for the iterative decoding process to converge, while NCIM need not be contractive since the decoding depends on the already decoded frames and is noniterative [7].

Recently, Wang et al. [8] proposed a hybrid fractal video compression algorithm, which merges the advantages of a cube-based fractal compression method and a frame-based fractal compression method; in addition an adaptive partition instead of fixed-size partition is discussed. The adaptive partition [3] and the hybrid compression algorithm exhibit, relatively, high compression ratio for image [3] and the video conference sequences [8]. In conclusion, a fractal image codec performs better in terms of very fast decoding process as well as the promise of potentially good compression [4, 912]. But, at present, fractal codec is not standardized because of its huge calculation amount and slow coding speed. In order to alleviate the above difficulties, a novel fractal video coding algorithm using fast block-matching motion estimation technology [13, 14] is proposed in this paper to improve the encoding speed and the compression quality.

Block-matching motion estimation is a vital process for many motion-compensated and video coding standards. Motion estimation could be very computational intensive and can consume up to 60–80% of computational power of the encode process [15]. So research on efficient and fast motion estimation algorithms is significant. Block-matching algorithms (BMAs) are used widely because they are simple and easy to be applied. In the last two decades, many block-matching algorithms are proposed for alleviating the heavy computations consumed by the brute-force full search algorithm (FS) which has the best prediction accuracy, such as the new three-step search (NTSS) [16], the four-step search (4SS) [17], the block-based gradient descent search (BBGDS) [18], the diamond search (DS) [19], and the cross-diamond search (CDS) [20].

In real-world video sequences, more than 80% of the blocks can be regarded as stationary (MV = (0,0)) or quasistationary (MV = (±1,0) or (0,±1)) blocks and most of the motion vectors are enclosed in the central 5×5 pixels area for search window. TSS, NTSS, and BBGDS employ rectangular search patterns of different sizes to fit the center-biased motion vector distribution characteristics [21, 22]. Hexagon-based search employs an hexagon-shaped pattern and results in fewer search points with similar distortion [23, 24]. In this paper, a novel fast block-matching algorithm called New Cross-Hexagon Search (NHEXS) algorithm is proposed [14]. It uses small cross-shaped search patterns in the first two steps before the hexagon-based search and the proposed halfway stop technique [13]. It results in higher motion estimation speed on searching stationary and quasistationary blocks. The traditional algorithms use all the pixels of the block to calculate the distortions that result in heavy computations. We propose to use the modified partial distortion criterion (MPDC) [25] that uses certain pixels of the block, which alleviates the computations and has similar distortion.

The rest of the paper is organized as follows. The New Cross-Hexagon Search (NHEXS) algorithm is described in Section 2. The proposed improving methods for fractal video sequence coding are presented in Section 3. The experimental results are presented in Section 4. Finally the conclusions are outlined in Section 5.

2. New Cross-Hexagon Search Algorithm

2.1. Cross-Center-Biased Motion Vector Distribution

From observing the motion vector probabilities of different video sequences, we find that most real-world video sequences have the center-biased MV distribution characteristics. Motion-vector probability (MVP) can be concluded as follows. (1) Global optimal distribution is the square-center-biased (SCB) within ±2 pixels, especially the zero motion vector (ZMV)(0,0); (2) MVP usually decreases away from ZMV; (3) optional MVs found along the vertical and horizontal directions are often more probable than the other locations with the same radius, which is regarded as cross-center-biased (CCB) MVP distributions. For example, there are about 15.71% and 7.94% of MVs found in vertical and horizontal directions with a radius of 1 pixel away from the ZMV. This probability is much higher than in the diagonal positions, which totally contribute about 2.76% at the same radius.

The results also show that the cross-center MV distribution is more dominant within this radius. For instance, 71.85% of MVs are found located in the central 2 × 2 pixels area, and there is about 68.98% of MVs located in the cross-center area. In the 4 × 4 pixels area, the total MVP is 81.75% and the cross-center probability within this area is 74.71%. Due to such a highly cross-biased distribution, the search pattern of BMA should match the cross-center shape to minimize the number of search points while maintaining a similar distortion error.

2.2. New Cross-Hexagon Search (NHEXS) Algorithm

The new cross-hexagon search consists of two patterns: cross-based and hexagon-based patterns. As the motion vectors distribution possesses cross-center-biased characteristics (74.76%) in the central 4 × 4 pixels area, two cross-shaped patterns, small-cross-shaped (SCSP) and large-cross-shaped (LCSP) patterns, as shown in Figure 1(a), are proposed as the first two initial steps to the hexagon-based search.

There are two different sizes of hexagon search patterns: large and small hexagon patterns. The large hexagon pattern used in this paper consists of not only the 7 check points in classic large hexagon pattern but also the two edge points (up and down), as shown in Figure 1(b). Therefore, the new large hexagon pattern consists of 9 search points which realizes a distinct search speed gain without increasing computational complexity of large hexagon search algorithm as shown in Figure 1(c).

From the simulation results on video sequences, we found that nearly 70% of the blocks can be regarded as stationary or quasistationary. By having this highly cross-biased property in most of the real-world sequences, we take the small cross-shaped patterns as the first two steps of NHEXS, which will save the number of search points for stationary or quasistationary blocks. Then, we search the remaining points of LCSP and SCB which leads to a much more precise direction for the subsequent HEXS.

The NHEXS algorithm is summarized as follows.

Step 1 (SCSP). A minimum block difference (MBD) is found from the 5 search points of the SCSP located at the center of the search window. If the MBD point occurs at the center of the SCSP, the search stops. Otherwise go to Step 2.

Step 2 (SCSP). A new SCSP is formed by using the vertex in the first SCSP as the center. If the MBD point occurs at the center of this SCSP, the search stops. Otherwise go to Step 3.

Step 3 (LCSP). The three unchecked outermost search points of the LCSP and the two unchecked points of the SCB (radius = ±2) are checked. The step tries to guide the possible correct direction for the HEXS.

Step 4 (LHEXSP). A new LHEXSP is formed by repositioning the minimum MBD found in the previous step as the center of LHEXSP. If the new MBD point is still at the center of the newly formed LHEXSP, then go to Step 5; otherwise this step is repeated again.

Step 5 (SHEXSP). Switch the search pattern from the large size of hexagon to the small size of hexagon (SHEXSP). The four points covered by the small hexagon are evaluated to compare with the current MBD point. The new MBD point is the final solution of motion vector.

A typical example is shown in Figure 2.

2.3. Analysis of NHEXS

Compared with the current HEXS [23] and cross-diamond algorithm [20], the characteristic of NHEXS algorithm lies in reducing the number of search points and increasing the search speed, especially for (quasi)stationary blocks (|MV|=1). For stationary blocks, HEXS and the current cross-diamond algorithm take 13 and 9 search points, respectively, while NHEXS takes 5 search points. For quasistationary blocks, HEXS and the current cross-diamond algorithm take 13 and 11 search points, respectively, while NHEXS takes 7 search points.

2.4. Modified Partial Distortion Criterion (MPDC)

BMA usually uses all of the pixels in the block to calculate the distortion that causes a heavy computation [25]. In fact, we use some parts of the pixels in the block that lead to similar results.

The block size is 16 × 16 pixels, and the top left corner coordinates of the blocks in frame 𝑛 and frame 𝑛1 are (𝑚,𝑛) and (𝑚+𝑝,𝑛+𝑞), respectively. The sum absolute difference (SAD) between the blocks in frame 𝑛 and frame 𝑛1 is =SAD(𝑚,𝑛;𝑝,𝑞)15𝑖=015𝑗=0||𝑓𝑛(𝑚+𝑖,𝑛+𝑗)𝑓𝑛1||.(𝑚+𝑝+𝑖,𝑛+𝑞+𝑗)(1)𝑓𝑛(𝑚+𝑖,𝑛+𝑗) is the grayscale of point (𝑚+𝑖,𝑛+𝑗) in frame 𝑛.SAD(𝑚,𝑛;𝑝,𝑞) is divided into 16 partial distortion criterionssad𝑘(𝑚,𝑛;𝑝,𝑞). The definition of the 𝑘 partial distortion criterion issad𝑘=(𝑚,𝑛;𝑝,𝑞)33𝑖=0𝑗=0|||||𝑓𝑛𝑚+4𝑖+𝑠𝑘,𝑛+𝑗+𝑡𝑘𝑓𝑛1𝑚+𝑝+4𝑖+𝑠𝑘,𝑛+𝑞+4𝑗+𝑡𝑘|||||.(2)𝑠𝑘 and 𝑡𝑘 are, respectively, the horizontal and vertical distances between the top left corner point by using sad𝑘(𝑚,𝑛;𝑝,𝑞) and the top left corner of the block. The calculation of the partial distortion SAD𝑘(𝑚,𝑛;𝑝,𝑞) is defined as follows:SAD𝑘(𝑚,𝑛;𝑝,𝑞)=𝑘𝑗=1sad𝑖(𝑚,𝑛;𝑝,𝑞).(3)

The calculation sequence of sadk(𝑚,𝑛;𝑝,𝑞) (𝑘=1,2,,16) is as the numbers in Figure 3, and the pixels have an equal distribution in the block. If SAD𝑘(𝑚,𝑛;𝑝,𝑞) uses too much fewer pixels, it will not replace the SAD correctly. Large simulations find that the percentage of miscarriage of justice will be below 5% if 𝑘3.

3. A New Fractal Video Coding Method

3.1. Video Sequence Coding by CPM/NCIM

The CPM and NCIM combine the fractal video sequence coding with the well-known motion estimation/motion compensation (ME/MC) technique that exploits the high temporal correlations between adjacent frames. In both of the CPM and NCIM [6], each range block is motion compensated by a domain block in the previous frame, which is of the same size as the range block even though the domain block is always larger than the range block in conventional fractal coders [2, 5]. The main difference between CPM and NCIM is that CPM should be contractive for the iterative decoding process to converge, while NCIM should not be contractive since the decoding depends on the already decoded frame and is noniterative. The first two or more frames of the video sequence are treated as a coding group and are encoded by applying CPM, each frame is predicted blockwise from the 𝑛-circularly previous frame. In other words, the 𝑘th frame 𝐹𝑘 is partitioned into range blocks, and each range block 𝑅 in 𝐹𝑘 is approximated by a domain block 𝐷 in 𝐹[𝑘1]𝑛, where [𝑘1]𝑛 denotes 𝑘1 modulo 𝑛. The remaining frames are encoded by employing NCIM. The structure of NCIM is the same as that of an interframe mapping, which forms CPM, except that there is no constraint on the contrast scaling coefficients 𝑜. Since the moving image sequence exhibits high temporal correlations, this domain-range mapping becomes more effective if the size of the domain block is the same as that of the range block. In such case, the domain-range mapping can be interpreted as a kind of motion compensation. In this context, the main advantage of the proposed domain-range mapping is that, in real moving image sequence, small motion vectors are more probable than larger ones. Therefore, the search region for the motion vectors can be localized in the area near the location of the range block. In the decoder, the first 𝑛 frames are reconstructed by applying CPM iteratively. Then, the remaining frames can be reconstructed by applying NCIM to the previous reconstructed frame without requiring iteration as shown in Figure 4, because the NCIM is not a contractive mapping. The first 𝑛 frames encoded by CPM are the minimal decodable set of all the frames [7], in that they can be decoded without reference to other frames. Therefore, only CPM affects the convergence of the total fractal mapping and that is the reason why NCIM need not be contractive.

3.2. Using Homo-I-Frame in H.264

The original reference frame (homo-I-frame in H.264 [26]) makes a great impact on compression ratio and decoding image quality. In CPM/NCIM, the original reference frames are coded by using CPM as shown in Figure 5, and the original reference frames could be several frames which are set to four in this text.

But in CPM, the coding process involves complex block classifying, block overturning, and iteration in order to make decoding frames converge to original frames, so the compression performances are under the requirements. Then, the method based on DCT which has worked effectively in JPEG image compression standard is used to treat the original reference frame in this paper [8].

3.3. Macroblock Partition

Macroblock partition has a large impact on calculation speed and complexity of video compression algorithm. In CPM/NCIM, a frame is partitioned by quadtree-based partition and the iteration is used in matching process which results in a high calculation complexity [6]. In this paper, a macroblock partition scheme like in H.264 is used. A frame is partitioned into many fixed-size (generally 16 × 16 pixels) macroblocks, and then each macroblock is partitioned as shown in Figure 6.

The matching rule in fractal image coding is the RMS:1RMS=𝑁𝑁𝑖=1𝑟2𝑖𝑠+𝑠𝑁𝑖=1𝑑2𝑖2𝑁𝑖=1𝑟𝑖𝑑𝑖+2𝑜𝑁𝑖=1𝑑2𝑖+𝑜𝑁.𝑜2𝑁𝑖=1𝑟𝑖,(4)𝑁𝑠=𝑁𝑖=1𝑟𝑖𝑑𝑖𝑁𝑖=1𝑟𝑖𝑁𝑖=1𝑑𝑖𝑁𝑁𝑖=1𝑑2𝑖𝑁𝑖=1𝑑𝑖21,𝑂=𝑁𝑁𝑖=1𝑟𝑖𝑠𝑁𝑖=1𝑑𝑖.(5)𝑟𝑖 is the pixel value of range block (𝑅), 𝑑𝑖 is the pixel value of domain block (𝐷), 𝑁 is the number of pixels in the block, 𝑠 is the scale factor, and 𝑜 is the offset factor.

The steps of macroblock partition are given as follows.(1)Match each block (whose size is equal to current encoding block) with current macroblock and calculate the RMS in mode 1. If the minimum RMS is less than the defined threshold 𝛾, save the IFS parameters (𝑠, 𝑜, and the domain block position “motion vector”) and encode the next macroblock. Otherwise go to the next step.(2)The RMS is calculated in mode 2 or 3. If RMS is less than 𝛾 in mode 2 or 3, then save the IFS. Otherwise go to the next step.(3)Firstly calculate the RMS in mode 4. If the RMS is less 𝛾, save the IFS. Otherwise each subblock in mode 4 can be partitioned continually until finding the matching block as shown in Figure 7.

The percentages of the 4 modes in two video sequences are shown in Table 1.

3.4. Encoding of IFS Parameters

The following parameters of each range block need to be encoded for subsequent transmission or storage: the scale factor 𝑠 and offset intensity 𝑜; translation vector (expressed in terms of the relative position of 𝐷 with respect to 𝑅) and affine transformation reduced to four rotations and four mirror reflections. Since 𝑠 and 𝑜 have, in general, nonuniform distributions, entropy coding usually proves to be beneficial.

We chose 5-bit quantization of 𝑠 and 7-bit quantization of 𝑜, reported in the literature to give good performance [1, 9], followed by Huffman coding. To ensure that both the encoder and the decoder use the same 𝑠 and 𝑜 values, we quantize both during the minimization of the dissimilarity measure (4), that is, prior to each evaluation of the RMS. We encode the motion vector as a relative position of 𝐷 with respect to 𝑅 using fixed-length codewords (determined by the search area ±7 pixels in both horizontal and vertical directions). We use three-bit codewords for the eight possible rotations/reflections.

4. Experimental Results

4.1. The New Cross-Hexagon Search Algorithm

The proposed NHEXS algorithm is simulated using the popular video sequences (the first 70 frames of each sequence) with the size of 352 × 288 pixels. The macroblock is 16 × 16 pixels, and the maximum displacement in the search areas is ±7 pixels in both horizontal and vertical directions. The experiment is proceeded in a PC (CPU: Intel Core 2 Duo E6300, 1.86 GHz, RAM: 2G, DDR2). The algorithm is implemented in Visual C++ 6.0.

The average search times per frame by the total distortion criterion and by the modified partial distortion criterion (MPDC) are summarized in Table 2 for each sequence. The Average PSNR values and the search point numbers are summarized in Tables 3 and 4 for different algorithms including NTSS, DS, CDS, HEXS, and NHEXS.

Table 3 shows that the proposed NHEXS algorithm always consumes the smallest number of search points compared to other fast BMA. The average search points per block with the observations are NHEXS<CDS<HEXS<DS<NTSS. Especially, for sequences with motion vectors limited within a small region around (0,0) and background of small changes, such as Claire and Hall, NHEXS algorithm can save 45% and 28% search points than HEXS and CDS, respectively.

Table 4 shows that the average PSNR values of NHEXS have a very small decline compared with CDS and HEXS (about 0~1.4%); for sequences with background of small changes, NHEXS has almost the same average PSNR value as CDS and HEXS.

4.2. The New Fractal Video Coding

From the above we can conclude that NHEXS is robust to all kinds of sequences, which can save search time effectively and produce almost the same PSNR values compared with the popular fast BMA.

To evaluate the coding performance of the proposed algorithm, a simulation has been conducted to compare the performance of different encoding algorithms. We use 15 frames of the sequence in the simulation. The maximum and minimum partition block sizes are 16 × 16 pixels and 4 × 4 pixels, respectively. Search time and the PSNR value per frame for a sequence with medium motion, such as “foreman” (352 × 288 pixels, the first 15 frames), are plotted in Figure 8.

Figure 8 shows that the NHEXS method proposed in this paper makes the encoding speed 5.3 times faster than the full search method while the quality of the decoded video images is almost the same. However the hexagon method descends a lot in PSNR.

In order to exhibit the successes of our method, the traditional CPM/NCIM method in which the CPM frames are set for 4 is also implemented.

The sequence was also compressed using the H.264/AVC reference software, version JM14.2, available for download at [26]. The sequence was encoded using the baseline profile to enable P-pictures and context adaptive variable length coding (CAVLC) for coding efficiency. Each frame was divided into a fixed number of slices, where each slice consisted of a full row of macroblocks. A fixed quantization parameter (QP=28) was carefully selected for the sequence so as to ensure high visual quality. The sequence was visually inspected in order to check whether the chosen QPs minimized the blocking artifacts induced by lossy coding. Table 5 illustrates the parameters used to generate the compressed bitstreams.

The comparison of average coding results of five video sequences is shown in Table 6. The results indicate that the proposed scheme can raise compression ratio 4 times, speed up compression time 10 times, and improve the image quality 3 to 5 dB in comparison with CPM/NCIM. Also, the PSNR and compression ratios are low but near to H.264, and the compression speed is better than that of H.264.

The comparison is shown in Figure 9 for fifteen frames of “foreman.cif”. As the frame number increases, the PSNR or the quality of decoded image decreases due to cumulating error. It could be resolved by inserting the I-frame like in H.264.

5. Conclusions

In this paper, firstly, the novel CCB characteristics of the MV distribution are exploited. With the CCB behavior in most of the real-world video sequences, we develop a novel fast algorithm using a cross-hexagon search pattern in block motion estimation, which demonstrates a significant speedup gain over the diamond-based search and other fast search methods while maintaining similar distortion performance. Meanwhile, distortion criterion is optimized, which simplifies the computational complexity without deteriorating the block distortion measure greatly.

Secondly, a fast fractal video coding method using this new cross-hexagon block-matching motion estimation is presented to reduce the encoding time and improve the encoding quality. In comparison to the CPM/NCIM method, the proposed algorithm spends less encoding time and achieves higher compression ratio and better quality compression.