Abstract

The techniques of -means algorithm and Gaussian Markov random field model are integrated to provide a Gaussian Markov random field model (GMRFM) feature which can describe the texture information of different pixel colors in an image. Based on this feature, an image retrieval method is also provided to seek the database images most similar to a given query image. In this paper, a genetic-based parameter detector is presented to decide the fittest parameters used by the proposed image retrieval method, as well. The experimental results manifested that the image retrieval method is insensitive to the rotation, translation, distortion, noise, scale, hue, light, and contrast variations, especially distortion, hue, and contrast variations.

1. Introduction

Much attention has been devoted to the design of image databases over the past few years [18]. Image retrieval is an important task in many image database applications, such as office automation, medical image archiving, digital library, multimedia publishing, computer-aided design, and geographic information systems.

Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, or descriptions to the images so that retrieval can be performed over the annotation words [4]. As these systems are built upon a large database, the textual-feature-based method becomes not only cumbersome but also inadequate to represent image contents. Hence, many content-based image retrieval (CBIR) systems have been proposed in the related literatures [18]. The CBIR aims at avoiding the use of textual descriptions and instead retrieves images based on their visual similarity to a user-supplied query image or user-specified image features.

Low-level features, such as colors [9, 10], shapes [7, 8], and textures [11, 12], are extensively used to index the image features of the CBIR systems. The shape-based image retrieval system searches for images containing the objects similar to the objects specified by a query image. However, locating and recognizing objects from images is a real challenge. One of the difficulties is to separate the objects from background. These difficulties come from discretization, occlusions, poor contrast, viewing conditions, noise, complicated objects, complicated background, and so forth. Moreover, the shape-based image retrieval method can deal with only the images with simple object shapes efficiently. For complex object shapes, the region-based method has to build a binary sequence by using smaller grid cells so that more accurate results can be obtained; nevertheless, the storage of indices and retrieval time may increase tremendously.

The color and texture attributes have been very successfully used in retrieving images with similar feature distributions. The color-based image retrieval method, such as the conventional color histogram (CCH) [13] and the fuzzy color histogram (FCH) [14], measures the similarity of two images with their distance in color space. The extraction of the color features follows a similar progression in each of the four methods: () selection of the color space, () quantization of the color space, () extraction of the color feature, () derivation of an appropriate distance function [13]. Color attribute may avoid object identification and extraction in image retrieval. Color may provide multiple measurements at a single pixel of the image, and often enable the classification to be done without complex object segmentation. Vu et al. [7] proposed a color-based image retrieval method, called a SamMatch method, to process the query by sampling-based matching approach. The SamMatch method has the benefits of region-based techniques without reliance on the highly variable accuracy of segmentation methods. However, this method requires the user to point out the object areas at the time of the query.

Unfortunately, color-based image retrieval systems often fail to retrieve the images taken from the same scene but under different time or conditions, for example, the images of a countryside taken at dawn, dusk, or noon under a clear or a cloudy sky. In another scenario where a same scene is imaged by different devices, using one image taken by one device as the query example may fail to find the same scene taken by other devices.

Texture attribute depicts the “surface” of an object. Intuitively, this term refers to properties such as smoothness, coarseness, and regularity of an object. Generally, the structural homogeneity does not come from the presence of a single color or intensity, but requires the interaction of various intensities within a region. Texture similarity is often useful in distinguishing objects with similar colors, such as sky and sea as well as leaves and grass. Making texture analysis is a real challenge. One of the ways to perform content-based image retrieval using texture as the cue is to segment an image into a number of different texture regions and perform a texture analysis on each region. However, segmentation can sometimes be problematic for image retrieval. In addition, texture is quite difficult to describe and subject to the difference of human perception. No satisfactory quantitative definition of texture is available at this time.

The notion of texture appears to depend upon three ingredients [15]. First, some local “order” is repeated over a region which is large in comparison to the order's size. Second, the order consists in the nonrandom arrangement of elementary parts. Third, there are roughly uniform entities having approximately the same dimension everywhere within the textured region. Liu and Picard [16], and Niblack et al. [17, 18] used contrast, coarseness and directionality models to achieve texture classification and recognition. Huang and Dai [19] proposed a texture-based image retrieval method. The method associates one coarse and one fine feature descriptors with each image. Both descriptors are derived from the coefficients of wavelet transform of the original image. The coarse feature descriptor is used at the first stage to quickly screen out nonpromising images; the fine feature descriptor is subsequently employed to find the truly matched images. However, Huang’s method cannot give a high accuracy rate of precisely seeking the desired image.

To solve the problems mentioned above, this paper proposes a color-texture-based image retrieval system (CTBIR method). The CTBIR method integrates the color and texture attributes of an image. It employs -means algorithm [20] to classify the colors in all database images into clusters, then calculates the average of the pixel colors in each cluster, and uses this average color to represent the colors of the pixels in the cluster. Next, the CTBIR method utilizes Gaussian Markov random field model (GMRFM) to analyze the texture of each representative color in an image. A genetic algorithm is also provided to decide the most suitable parameters used in the CTBIR method. Besides, the performance of the CTBIR method is investigated by experiments. The effects of various image variations, such as rotation, translation, distortion, noise, scale, light, hue, and contrast on the performance of the CTBIR method are explored as well.

This section will give a brief introduction of Gaussian Markov Random Field Model, genetic algorithm, and other techniques to be used in the CTBIR method.

2.1. Gaussian Markov Random Field Model

Texture plays a very significant role in analyzing image characteristics. Chellappa and Chatterjee [11] adopted GMRFM to describe the relationship between each pixel and its surrounding pixels. Generally, there exist some similarities and correlations in gray-level variance between adjacent pixels. In GMRFM, the varying degree of texture in images is statistically calculated mainly on the basis of the property. While Chellappa and Chatterjee used GMRFM to describe the textures of objects on an gray-scale image and segment the objects from the image, in this paper it will be used to characterize the textures of images in order to retrieve the images from a database which are most similar to the given query image.

In GMRFM [12, 2123], all the pixels in each block are classified according to their relationship with the central pixel . Figure 1 illustrates an example of GMRFM where and indicates the neighboring pixels of , where is the pixel belonging to the jth category in the ith rank relative to . A set of eigenvalues ’s are employed to describe the relationship between each and where is calculated as [11] In this formula, indicates a Gaussian distribution of noises. denotes the neighbors of shown in Figure 1. Let and be the column vectors composed, respectively, of and excluding and . Then, one can calculate as follows: which can be used to depict the texture of . The larger most of the values in vector is, the rougher is; otherwise, is smoother. Yue [8] also uses to describe the texture of a telemeter image of glacier and divides the image into some regions according to their textures.

2.2. Genetic Algorithm

Seo [24] applied a genetic algorithm for selection and feature combination in pattern recognition applications. Ballerini et al. [25] used a genetic algorithm to select a subset of texture features from all the features extracted. Lai and Chen [26] used the interactive genetic algorithm to tune human judgment results on similarity of images. Joshi and Tapaswi [2] applied a genetic algorithm to decide the most plausible matching between two images in image retrieval. Stejić et al. [6] proposed an image retrieval system using local similarity patterns. The system used a genetic algorithm to find an optimal assignment of similarity criteria to image regions. Chan and King [1] provided a trademark image retrieval system which employed a genetic algorithm to find the weighting factors in the dissimilarity function by integrating five shape features. This paper will adopt a genetic algorithm to decide the fittest parameters which will be used by the CTBIR method.

Genetic algorithm (GA) [27] is a heuristic optimization method that operates through a determined and randomized search. The set of possible solutions for an optimization problem is considered as a population of individuals, and the degree of adaptation of an individual to its environment is specified by its fitness. A chromosome, essentially a set of character strings, represents the coordinate of an individual in the search space. A gene is a subsection of a chromosome that encodes the value of a single parameter that is being optimized. Typical encoding for a gene can be binary or integer.

Derived from the biological model of evolution, genetic algorithms operate on the Darwinian principle of natural selection, which holds that, given a certain population, only the individuals that adapt well to their environment can survive and transmit their characteristics to their descendants. A genetic algorithm consists of three major operations: selection, crossover, and mutation. Selection evaluates all individuals, and only those best fitted to their environment survive. Crossover recombines the genetic material of two individuals to form new combinations with the potential for better performance. Mutation induces changes in a small number of chromosomal units with the goal of maintaining sufficient population diversity during the optimization process.

2.3. System Evaluation

Whenever a certain image is selected as a query image, the image retrieval system delivers the user database images whose matching distances to are shortest. If the desired database image is one of these transmitted images, we say the system correctly finds out the desired image [9, 10]. Otherwise, the system fails to respond to the desired image. In the following experiments, the accuracy rate of querying for a system will be explained with ACC.

The other evaluation, normalized modified retrieval rank (NMRR), proposed by MPEG-7 [28] is usually used as a benchmark for system evaluation. NMRR not only indicates how many of the correct items are retrieved, but also how highly they are ranked among the retrieved items. NMRR is defined by where is the size of the ground truth set for a query image , is the ranking of the ground truth images by the retrieval algorithm, and specifies the “relevance rank” for each query. As the size of the ground truth set is normally unequal, a suitable is determined by where is the maximum of for all queries. If , then is changed into . The NMRR is in the range of [], and the smaller NMRR is, the better the retrieval performance will be. ANMRR is defined as the average NMRR over a range of queries, and is given by where NQ is the number of query images.

3. Color-Texture-Based Image Retrieval System

This paper proposes a color-texture-based image retrieval system (CTBIR method) on the basis of the color and texture attributes of images. The system contains two phases—feature extraction and image matching. The feature extraction phase is to extract the GMRFM feature of an image. The image matching phase is to compare the GMRFM features of the given query image and each database image and then, deliver the most similar database image relative to the query image to the user.

3.1. Feature Extraction

For large sized blocks, the GMRFM feature needs high feature dimensions to precisely describe the texture of an image. The CTBIR method will require much memory space to hold the feature of the image and much time to compare the feature with that of each database image. In this paper, every gray-scale image is hence divided into many overlapping blocks of pixels, as shown in Figure 2(a), and the relationship of the positions for the 16 pixels in a block is depicted in Figure 2(b). Afterwards, each block B of pixels is further partitioned into four overlapping subblocks, , , , and , of pixels where each subblock includes nine pixels: , , , , , , , , . Let in the GMRFM be the jth category in the ith rank in B. is defined as follows: where , for h, , are the texture estimates and are computed as follows: As a result, each block B comprises four feature values , , , and .

Each pixel in a full color image consists of three color components: R, G, and B. The CTBIR method uses the three color components to construct three gray-scale images of the same size as : , , and , and each of which is entirely composed of R, G, and B color components from , respectively. Next, , , , and are all divided into overlapping blocks of pixels and the attributes ’s in each block with regard to , , and are calculated, which are, respectively, notated as , , , , , , , , , , , .

In a full color image, a pixel is generally described by a 24-bit memory space; there are 224 possible colors in an image. However, it is impossible for most images to use all of the possible colors. In order to reduce the memory space required for storing image features and querying time, the CTBIR method employs -means algorithm to group the pixels in all database images into groups according to their colors. Moreover, the average color of all the pixels in one group is considered to be a representative color of all images (including database images and query image). The representative colors comprise a common palette (CP).

In every color image , the ith color Ci in CP is given a set of variables: , , , , , , , , , , , , and A. For each block B of 4 × 4 pixels in I, the average Ca of all the pixel colors in B is calculated. If Ca is similar to Ci in CP, the blocks , , , , , , , , , , , and are, respectively, added to , , , , , , , , , , , and , as well as is increased by 1. Then, the CTBIR method adopts these ’s and A’s of all colors in CP as the feature of , where we call it the GMRFM feature of . The GMRFM feature of hence contains of ’s values and of A’s values, where is the number of blocks whose average pixel color is similar to one color of CP in . Hence, A’s can be used to describe the distribution of pixels colors in , while the textures can be described by means of ’s. On the whole, the distribution of the textures for different colors in I can be explained by A’s and ’s.

The GMRF model assumes that the texture field is stochastic, stationary and satisfies a conditional independence assumption. It is able to capture the local (spatial) contextual variations in an image and is effective for the analysis and synthesis of random looking textures. For each color image I, the CTBIR method generates a set of , , , and for , and . Here, () illustrates the orientation of GMRFM feature of ; , , and are the three color components of the th color in . describes the color distribution of . Hence, the GMRFM feature also can describe the texture orientation and color information of .

3.2. Image Matching

Minkowski distance is widely used to calculate the distance between two multidimension vectors [5]. Suppose that () and are two -dimension vectors. The Minkowski distance of the two vectors can be defined by where is a user-defined constant. is a Manhattan distance if and while is a Euclidean distance in case of .

Let and , respectively, be the of and with regard to the th color in CP, where , , or ; and are 1 or 2 while and stand for , respectively, of and with regard to the th color in CP. The image matching distance D between and is defined as follows: where and are two user-defined constants.

4. Genetic Algorithm-Based Parameter Detector

The performance of CTBIR method is significantly affected by and . In this paper, a genetic algorithm-based parameter detector (GBPD) is used to determine the most suitable and . The GBPD makes use of a binary string composed of binary bits to represent a chromosome in which the first bits and the remaining bits are used to describe and , respectively. For each chromosome Ch, and are encoded as and , where is the number of 1 bit in the first bits of , and is the number of 1 bit in the remaining bits of ; gap1 and gap2 are the maximum estimated errors of and .

For a certain application, one can accumulate some of its historic images (including database images and query images). After that, he can apply the accumulated historic images to train the most appropriate values for and via the genetic algorithm. In this genetic algorithm, we define the accuracy rate (ACC) obtained by the CTBIR method based on the accumulated historic images and the values of and encoded by a chromosome as the fitness of the chromosome.

GBPD first randomly generates N chromosomes, each with binary bits. To evolve the best solution, the genetic algorithm repeatedly executes mutation, crossover, and selection operations until the relative fitness of the reserved chromosomes is very similar.

In the mutation operation, for each of the reserved chromosomes, the GBPD uses a random number generator to specify one bit from the first bits and the other bit from the remaining bits in the chromosome. Later, and are replaced by and to generate a new chromosome, where signifies the operator “NOT.”

In the crossover operation, GBPD similarly uses a random number generator to designate pairs of chromosomes from the reserved chromosomes. Let be the substring consisting of the to bits in . For each chromosome pair (,), the genetic algorithm concatenates into a new chromosome, and into other new chromosome.

In the selection operation, optimal chromosomes are selected from the N chromosomes reserved in the previous iteration along with the and chromosomes created in the mutation and crossover operations according to their fitness. GBPD continuously performs the mutation, crossover, and selection operations, until the related fitness of the reserved chromosomes is very close or the number of iterations is equal to the specified maximum number of generations (in this paper, the maximum number of generations is set to be 100). Finally, GBPD multiplies gap1 (resp., gap2) by the numbers of 1-bits in (resp., ) to get (resp., ), where is the chromosome having the best fitness in the reserved chromosomes.

Let , , and let Figure 3(a) be a chromosome Ch. Derived from Ch, , and . Figure 3(b) demonstrates a new chromosome created from Ch by the mutation operator, where the underlined bits are the randomly selected and . Figure 3(c) displays two new chromosomes and , generated from the two chromosomes and through the crossover operator.

5. Experiments

The purpose of this section is to evaluate the performances of the CTBIR method by experiments. Let and be two image sets, each of which contains 1087 full color images. The images in SetD are employed as the database images and those in SetQ are used as the query images. Some of them are captured from animations, where each image pair () is randomly picked from a same animation. Most of the animations were downloaded from http://www.mcsh.kh.edu.tw. The rest of the images was scanned from natural images and trademark pictures.

In SetD and SetQ, the former 500 images are almost cartoon images and the remainders are natural ones. Figure 4(a) shows parts of the query and database images in the former 500 ones of SetD and SetQ, and Figure 4(b) demonstrates parts of those in the latter 587 images of SetD and SetQ. In the following experiments, the parameters , , N, , , and are given to be 40, 40, 20, 10, 0.05, and 0.05 to determine the most suitable and via the genetic algorithm. Moreover, the size of CP is set to 16.

The first experiment is to investigate the performance of the CTBIR method in retrieving cartoon images. First, the images and are, respectively, specified to be the database images and the query images, and the GBPD is applied to determine the most suitable and . The experimental results obtained by the GBPD tell that the most suitable and are 1.60 and 0.40. Then, the images and are used as the database images and the query images based on the most suitable and .

The second experiment is to scrutinize the performance of the CTBIR method in retrieving natural images. In this experiment, and are first taken as the database images and the query images, respectively, and the GBPD is employed to find out the most suitable and . In the wake, and are used as the database images and the query images, and the CTBIR method is assigned to retrieve the expected database images with and 5. Figure 5 shows the obtained , and , in 100 iterations obtained by the GBPD in experiments 1 and 2. This experimental result explains that a bigger r1is required to describe the cartoon images, since the color distributions between most natural images are more varied compared to those between most cartoon images even though the colors of both images are reduced to reserve only a few principal colors.

In the third experiment, the images and are considered as the database images and the query images for determining the most suitable and by using the GBPD. Then, the images in and are used as the database images and the query images based on the most suitable and . Table 1 displays the results of the first to third experiments.

The fourth experiment is to compare the performances of the CTBIR, Huang’s and SamMatch methods by using the images in and . Table 2 demonstrates the experimental results where and are given to be 1.65 and 0.50. The experimental results indicate that the CTBIR method gives much better ANMRR than Huang’s and SamMatch methods, especially for cartoon images. The CTBIR method totally spends 110.4 seconds to run the 1087 queries in this experiment. In this experiment, CP consists of colors each of which corresponds to twelve ’s and one ; each value is held in a 4-byte memory space. Hence, the CTBIR method takes bytes to hold the whole GMRFM features of the 1087 database images.

The SamMatch method resizes each image into a image, reduces all pixel colors in the image into only 256 colors, and divides the image into blocks and uses the average colors of the blocks as the feature of the image. Hence, 113 samples evenly spread out in each image, and each dimension is held by a 4-byte memory space. It consumes 491.324 () bytes to hold the features of all database images. In addition, the SamMatch method takes 245.02 seconds to execute the 1087 queries. The Huang’s method consumes 620,928 bytes memory space to save the features of 1087 database images, and spends 125.6 seconds in running the 1087 queries.

When one takes some pictures, the lens may be adjusted to different positions or different directions. Moving the position of certain objects or pixels on an image is called the shift variation of the image, rotating the objects in certain degree is called the rotation variation of the image, and being added some noises in the image is called the noise variation of the image. An image may be enlarged or reduced because of different camera resolution setups. We call this phenomenon a scale variation of the image. In the real world, the distortion variation is a very common phenomenon too. For example, putting a picture on an uneven plane may deform the objects on the picture. Besides that, the hue, contrast, and light variations, such as the image pairs in Figure 6, may be often generated. These variations may make the image regarded as a different image from the original one by an image retrieval system. An excellent image retrieval method should be insensitive to these variations.

The next experiment is designed to explore the capacities of the CTBIR method for resisting the variations of rotation, distortion, noise, scale, hue, luminance, and contrast in images. 100 full color images Iq,1, Iq,2,…, Iq,100 are used as the query images. These 100 full color images comprise an image set Sq. Besides, this paper employs the rotation, distortion, noise, scale, hue, luminance, and contrast functions in ADOBE PHOTOSHOP 7.0 to process each Iq,i, and, respectively, generates the variant images Ir,i, Id,i, In,i, Is,i, Ih,i, Il,i, and Ic,i. The group of images Iα,1, Iα,2, …, Iα,100 forms an image set Sα, which is produced by the same function where α = r, d, n, s, h, l, and c. Figure 7 shows some images in Sq, Sr, Sd, Sn, Ss, Sh, Sl, and , respectively, where i is the image number in Sq, Sr, Sd, Sn, Ss, Sh, Sl, and Sc.

In this experiment, all the images in Sr, Sd, Sn, Ss, Sh, Sl, and Sc are employed as the database images, and the images in Sq as the query images. For the ith query, Iq,i is used as the query image and the CTBIR, Huang’s, and SamMatch methods are applied to retrieve the similar database images for the query image. Table 3 shows the experimental results. The in Table 3 is the accuracy (ANMRR) which the experiment obtains. Furthermore, the ANMRRr, ANMRRd, ANMRRn, ANMRRs, ANMRRh, ANMRRl, and ANMRRc in Table 3 are the accuracies (ANMRR) obtained by the experiments which take the images, respectively, in Sr, Sd, Sn, Ss, Sh, Sl, and Sc as the database images.

The experimental results show that the color feature is more significant than the texture feature in distinguishing different cartoon images; hence, a bigger r1 and smaller r2 should be provided. However, a smaller r1 and a bigger r2 should be assigned for recognizing natural images. The experimental results also indicate that the CTBIR method can offer impressive performances in resisting the rotation, translation, distortion, noise, scale, hue, light, and contrast variations, especially for distortion, hue, and contrast variations because it is indifferent to the shape, color, and light variations when describing the texture features of an image.

6. Conclusions

In this paper, a CTBIR method is proposed for image retrieval. The GMRFM feature is used to describe the texture and color information of an image. Based on this feature, the CTBIR method is used to deliver the database images, which are most similar to the given query image, to the user. The GBPD is offered to decide the fittest parameters of r1 and r2, as well. The experimental results show that the CTBIR method is indifferent to the rotation, translation, distortion, noise, scale, hue, light, and contrast variations, especially distortion, hue, and contrast variations. Moreover, the CTBIR method could give a much better performances than those of the Huang’s and SamMatch methods. In the future, a subimage retrieval system based on the GMRFM feature will be developed. With a given query image IQ, the subimage retrieval system is to respond the images ID from a database so that IQ is very similar to a region on ID.