Abstract

Content-based image retrieval is a branch of computer vision. It is important for efficient management of a visual database. In most cases, image retrieval is based on image compression. In this paper, we use a fractal dictionary to encode images. Based on this technique, we propose a set of statistical indices for efficient image retrieval. Experimental results on a database of 416 texture images indicate that the proposed method provides a competitive retrieval rate, compared to the existing methods.

1. Introduction

With the popularity of the computer and the rapid development of multimedia technology, image information is growing rapidly. How to effectively manage these resources has become a focus of many scholars’ study. The early image retrieval technology is based on text, which relies on manual work that it could hardly meet the users’ needs. Subsequently, a concept of content-based image retrieval (CBIR) is proposed. A new time has come that image management system can analyze image and extract features automatically. Nowadays, CBIR is used in many techniques, including fractal-based image retrieval.

Fractal image coding is based on approximating an image by an attractor of a set of affine transformations [3]. To a certain extent, fractal codes reflect spatial relationships between regions, which can describe image content. Sloan [4] first proposed fractal codes based on CBIR. Zhang et al. [5] presented an approach to texture-based image retrieval that determines image similarity on the basis of matching fractal codes. Pi et al. [3] proposed four statistical indices utilizing histograms of range block mean and contrast scaling parameters. Huang et al. [2] used a new statistical method based on kernel density estimation.

All of the aforementioned retrieval indices are based on techniques which are similar to traditional fractal coding and are generated by image self-similarity. In this paper, we propose a retrieval method, regarding a shared dictionary as a medium and using the similarity between images and the dictionary as retrieval indices. All the data obtained in the experiments reflect the differences between query image and the dictionary. The remainder of the paper is organized as follows. Section 2 introduces the fractal image coding based on the fractal dictionary. The proposed indices and retrieval method are described in Section 3. Experiment results are reported in Section 4, which is followed by the conclusion.

2. Fractal Image Coding Based On Set

Images can be viewed as vectors and can be encoded by a set of transforms. Usually, the transform can be generated by collage theorem [6]. In this theorem, a suitable transform is constructed as a “collage,” and the “collage error” is represented as the distance between the collage and the image. In the traditional fractal encoding, devised by Jacquin, image is titled by “range blocks,” each of which is mapped from one of the “domain blocks” as depicted in Figure 1; the combined mappings constitute transforms on the image as a whole [6].

Encoding image needs a suitable transform minimizing the collage error for each range block, which requires recoding blocks mapping parameters, such as contrast scaling s and luminance offset . These parameters are applied on (1) for image reconstruction. In (1) is defined as the image at th iteration. , the initial image, can be an arbitrary image with the same size of encoding image. Consider

2.1. Block Truncation Coding

Delp and Mitchell [7] presented a block truncation coding (BTC) scheme for image compression. It uses a two-level nonparametric quantizer that adapts to local properties of the image. In this scheme, an original image is first divided into nonoverlapping square blocks of size . For each pixel in a block, if its value is greater than the block mean (), the point is marked as 1, otherwise as 0, forming a two-value matrix consisting of 1 and 0. The equation is defined as follows:

Then the matrix is reshaped to a row vector to generate a binary sequence. Finally, the corresponding decimal, defined as BTC value, is recorded.

Figure 2 shows a block after an original image is divided. Its is 241.875. There are 9 nodes bigger than .

The two-value matrix is shown in Figure 3. In the matrix, nine nodes are noted as 1 and the others as 0. The binary sequence is 1010 1100 1111 1000, so the BTC value is 44272.

2.2. Fractal Dictionary Based on M-J Set

In fractal decoding, the initial image and the final image have no direct relationship. Therefore, we compress enough domain blocks into a file as a dictionary. An image is encoded by finding best-matching domain blocks in the dictionary and decoded like traditional decoding process, by affine transform on these best-matching domain blocks.

Mandelbrot set (abbreviated as M set) and Julia set (abbreviated as J set) are the classical sets in fractal study. They both contain abundant information. Each point in the M set corresponds to different parameters for the J set construction. The structures of J sets have self-similarity and infinity, which are rich enough to present an image. We use it for a dictionary [8]. The process is as follows.

Step 1. Choose parameters for a J set. We use a standard M set’s boundary points as generation parameters for J set.

Step 2. Generate the J sets. We use the above parameters to generate J sets. According to the time-escaped algorithm [9], points in J sets are represented as the escape time, which must be satisfied with the following equation: where Max_Iterative is the max escape time.

Step 3. Quantize the image of J set. The values of the pixels assigned as the escape time are relatively small. It is better to multiply them by an expansion number as follows, so that the pixel values are between 0 and 255 equally:
Note that H is the expansion number.

Step 4. Classify the domain block. The J set from Step 3 is divided into nonoverlapping blocks. We calculate the BTC value of each block and use it as a classifier. If a block is the same as one in the BTC queue or if the queue has v blocks, we ignore this block. Otherwise, we compute the collage error between the calculating block and each one in the queue. If the collage error is less than a threshold, this block would be ignored, otherwise, added to the queue.

Step 5. Output the dictionary. All the blocks are written into a file by ascending BTC. We call this file dictionary.

Like traditional fractal encoding, an original image is first divided into nonoverlapping range blocks of size . For each range, we search a best-matching block with the smallest norm in the BTC queue after its BTC value is calculated. Finally, we get the parameters of each range block: one BTC value (), matching block number (), contrast scaling parameter (), luminance offset (), and affine transformation ().

In the decoding process, we use the (, ) as an index to locate the best-matching block. The original image can be decoded as follows: where is the domain block in the dictionary after affine transformation.

3. The Proposed Indices

It has been demonstrated in the literature that a grayscale histogram (color histogram) provides good indexing and retrieval performance while being computationally inexpensive [10]. However, it is still a coarse feature when applied in image retrieving systems. In the existing method, we know that the histogram of fractal coding for image retrieval is effective [2]. In this paper, we use the following indices in the retrieving system.

3.1. Dictionary of Collage Error (DE)

Collage error is the real-value of the distance between the range block and best-matching domain block. The smaller it is, the closer the decoded image is to the original image. Consider

In (6), U is a matrix whose elements are all ones. As (6) shows, it can also demonstrate the distance between the original image and dictionary. So, the distribution of collage error can be used as a parameter to classify texture images.

We quantize collage error to an integer interval (K). It is rounded to a nearest integer when it is smaller than K, or it is cut into K if it is bigger than K. In this paper, K is 13.

Figure 5 shows that similar texture images share almost the same distributions, while differing from different texture images. Hence, the distance between the same texture images is smaller than the different ones.

3.2. Dictionary of BTC (DB)

The domain blocks in the dictionary are classified by BTC value as a category. So BTC value can also be treated as an index when we search a best-matching block. An image has a feature on BTC distribution (DB) and can be a scale in the image retrieving system.

In this paper, DB is a quantized value ranging from 0 to 15. We calculate the DB of images in Figure 4.

Figure 6 shows that the DB of four similar images, Figures 6(a)6(d), are distributed similarly while Figures 6(e)6(h) are not. Based on the above observation, we choose the DB as an image index. However, experimental results prove that it is only a coarse parameter, which will be discussed in Section 4.

3.3. Joint of Dictionary of BTC and S (JDBS)

Schouten and De Zeeuw [11] have proved that contrast scaling parameters (s) in fractal coding can be used in retrieving images. But it is still a rough feature, just as DE and DB. Combing BTC with s, we present a 2D joint histogram with its character expressed in Figure 7.

Note that is a sum number when  . Then, we get the result of JDBS shown in Figure 8.

Figure 8 shows peaks of the same texture images coordinated roughly near while far in different texture images visually. We believe that JDBS are more precise than the above indices in retrieving images.

3.4. Similarity Measurement

To measure the similarity between two images, we choose as distance metric, which is expressed as follows: where and are our proposed indices of query image and candidate images, respectively. The distances corresponding to DE, DB, and JDBS for images are listed in Table 1. The query image is image (a) in Figure 4.

It can be observed that the corresponding indices are roughly close for similar texture images and different for different texture images. However, (d)((a), (c)) is bigger than (d)((a), (f)) and (d)((a), (h)) on DE. In fact, (a) and (c) share similar texture, while (a), (f) and (h) are different texture images. This causes an unexpected result that image (f) and image (h) are retrieved, while image (c) is lost when the query image is (a). Hence, DE is only a coarse index.

Compared with DB and JDBS, the distance of DB between similar textured images has changed at 0.1; the JDBS changes at 0.001, and the distance between dissimilar texture images changes irregularly. Note that JDBS is more accurate.

3.5. The Operation Process

The whole operation process includes three parts: encoding image, extracting statistical features, and comparing their feature distances as shown in (7). The pseudocode is listed as in Figure 9.

4. Performance Evaluation

In this section, we present the performance of the proposed indices and compare our method with the literature’s methods. The image retrieval system is shown in Figure 10. The test database is composed of 26 grayscale Brodatz texture images [12] and each image is separated into 16 subimages of size . Each sub-image is encoded based on M-J set fractal dictionary. A query image is randomly selected from the test database. All the sixteen retrieved subimages are selected based on the smallest distance criterion. In this paper, we use indices’ length to evaluate the computational complexity.

Unfortunately, all the proposed indices have some inherent flaws. Although JDBS does better than DB, its vector length is longer than DB’s, so it takes more computational complexity than that of DB. In order to reduce the complexity, we divide a retrieving index into several parts. For instance, we can replace the JDBS with the method of . A list of candidate images is selected by matching . Also, we can combine two indices. Let DE and DS be a 2D joint statistic of JDSE, then the performance will be enhanced. On the other hand, the computational complexity must be taken into consideration.

4.1. Average Retrieval Rate

Usually, average retrieval rate (ARR), as follows, can evaluate a technique’s performance:

Note that F is denoted as the number of retrieved images, is denoted as the number of correctly retrieved images at zth test and Z is the number of subimages in the test database. In this case, and . All experiments shown in Table 2 were conducted on a Core(TM) i5(2.40 GHz) PC. All data shown in Table 2, was acquired by the experiments.

Compared to HM and KM, DB technique has a better performance. The average retrieval rate of DB (for 16 vector length) is 60.86%, while the HM and the KM provide 42.01% and 32.84% average retrieval rate at similar vector length. If the DB is quantized to 32, its performance goes up by 6.19%. At the same time, the performance of goes up to 70.12% average retrieval rate, which is 9.18% higher than that of and 31.97% higher than that of, while their vector lengths are all around 21.

Compared with , technique works well in average retrieval rate. When the DB is at 32 length, the total length of is at 84, which is not much longer than 68 vector length of and . What more, due to the simple operations, the vector length does not impact on the computing complexity so much when the length is not too long. The time consumption in retrieving is less than 0.3 s in the experiments, so its computing complexity is tolerated. When is at 68 and 84, the average retrieval rates reach 78.67% and 79.18%, respectively.

4.2. Precision against Recall Curve

Precision against recall curve is another method of evaluating retrieving performance. The higher both precision and recall are, the better the technique performs. The precision and recall are denoted as follows:

Note that retrieved is a set of retrieved images for a query and relevant is a set of relevant images for the query images [13] (in this case, ). Based on top 2, 4, 6, 8, 10, 12, 14, and 16, we calculate the average precision and average recall of both and at 33 vector length.

The average precision (or recall) of is obviously higher than the average precision (or recall) of , when vector lengths are the same (see Figure 11). Besides, ’s slope varies slightly, and this implies that has a better performance.

Figure 12 shows precision against recall curve of at 16, 32, 64, and 128 DB vector lengths.

To some extent, the curve varies greatly with the quantization levels of DB (see Figure 12). However, when the quantization level is in excess of 16, the curve changes slightly; on the other hand, the computation becomes more complex. Figure 13 shows that the average retrieval rates change significantly as the quantization levels of DB increase. When the level reaches 16, the curve gets to its peak. After 16, the distribution of DB becomes too detailed, which makes the lose statistical characteristics when DE is at thirteen vector length and DS is at four vector length, so the curve falls down and the average retrieval rates decrease. That is to say, when the vector length is 16, the performance and complexity can achieve a balance.

5. Conclusions

In this paper, we have proposed a set of indices based on M-J fractal dictionary encoding. M-J fractal dictionary is a shared file composed of blocks of Julia set with BTC ascending. We proposed that DE, DB, and JDBS indices are close for similar texture images and different for different texture images. Subsequently, we calculated average retrieval rate, average precision, and average recall and compare with previous methods. We discussed further minimizing of the vector length of without a big loss of retrieval rate and gave the optimal length of the feature vector.

Experimental results on a database of 416 texture images showed that the proposed indices provided better performance than the previous methods. Also, provided a 79.18% average retrieval rate at the maximum, and its computational complexity was tolerated. In addition, and not only had low computational complexity, but also provided competitive retrieval rate, compared to existing methods.

Acknowledgments

This research is supported by the National Natural Science Foundation of China (nos. 61103147, 61075018, 61070098, and 61272523), the National Key Project of Science and Technology of China (no. 2011ZX05039-003-4), and the Fundamental Research Funds for the Central Universities (no. DUT12JB06).