Abstract

In a world of multimedia information, where users seek accurate results against search query and demand relevant multimedia content retrieval, developing an accurate content-based image retrieval (CBIR) system is difficult due to the presence of noise in the image. The performance of the CBIR system is impaired by this noise. To estimate the distance between the query and database images, CBIR systems use image feature representation. The noise or artifacts present within the visual data might confuse the CBIR when retrieving relevant results. Therefore, we propose Noise Resilient Local Gradient Orientation (NRLGO) feature representation that overcomes the noise factor within the visual information and strengthens the CBIR to retrieve accurate and relevant results. The proposed NRLGO consists of three steps: estimation and removal of noise to protect the local visual structure; extraction of color, texture, and local contrast features; and, at the end, generation of microstructure for visual representation. The Manhattan distance between the query image and the database image is used to measure their similarity. The proposed technique was tested using the Corel dataset, which contains 10000 images from 100 different categories. The outcomes of the experiment signify that the proposed NRLGO has higher retrieval performance in comparison with state-of-the-art techniques.

1. Introduction

The advent of multimedia tools has made it easier to access a wider range of images with a variety of information. CBIR (content-based image retrieval) uses low-level image properties to search a huge database for images that suit the user’s demands. Color, edges, and orientation are considered low-level image attributes [1]. CBIR is widely used in different applications like medical imaging [2], E-commerce [3], and digital libraries [4].

Text-based image retrieval (TBIR) and sketch-based image retrieval are two image retrieval methodologies. TBIR search images on the basis of labels and keywords. It is mostly used in Google Images. Therefore, it become difficult to describe whole image content in words and it may show irrelevant content. Manual annotation becomes difficult when there is large database [5]. A sketch-based image retrieval system is used to find images on the basis of sketch content drawn by user [6]. Soft Histogram of Edge Local Orientation (S-HELO) uses local orientation computation for improving the performance of sketch-based image retrieval [7]. The proposed histogram of line relationship (HLR) solves the appearance problem between sketches and images. It removes the noisy edges by choosing suitable edge shape that best corresponds to object boundaries [8]. Manual annotations of text-based and user-based sketch techniques cause incompatible image retrieval. Xie et al. proposed analogy relevance feedback method based on the user intentions that is helpful in maximizing retrieval results [9]. Case Based Long Term Learning (CB-LTL) method of relevance feedback [10] incorporates user intentions and increases retrieval performance. Mandal et al. proposed signature-based bag of visual words (S-BoVW) that employs jumble of words with image’s texture and color to retrieve similar images [11]. BoVW based methods improve the retrieval performance but their computation is expensive.

In this paper, we propose Noise Resilient Local Gradient Orientation (NRLGO) that overcomes the noise factor within the visual information and strengthens the CBIR to retrieve accurate and relevant results. In our proposed NRLGO, the color, texture, and local contrast information were used for feature representation of visual data. While semantic attributes are estimated through calculating the correlation between visual features and regular structure data. NRLGO has noise resilience characteristics and saves the local structures of visual information from noise by finding an uncertain state. The major contribution of NRLGO is summarized as follows:(1)The NRLGO holds a noise resilient attribute by finding the uncertain bit within the visual information which is significant to increase the CBIR system’s performance.(2)NRLGO computes pixel value and difference and then multiplies them by the same constant value. In this way, the bright changes will be canceled and the discriminative feature representation is improved.(3)Multiresolution gradient orientations of NRLGO improve the gradient magnitude by removing noise factor and detecting the edges.(4)NRLGO estimates the gradient orientation through dividing the value into T dominant orientations. Thus, gradient orientation strengthens the visual feature representation of image.

The remainder of the paper is structured as follows: The related work is given in Section 2. The suggested NRLGO is presented in Section 3 which discusses the attributes of NRLGO against noise, illumination variation, and rotation variation. Section 4 summarizes the experimental findings and our contribution against state-of-the-art techniques. In Section 5, discussion is given, and Section 6 concludes our contribution.

The CBIR system plays a vital role in retrieving images that are related from large dataset that are semantically and contextually similar to search image. The CBIR systems extract visual feature representation and estimate the similarity distance for scoring the images that are comparable from the visual database. In the CBIR systems, visual feature descriptors were used in two ways: local and global descriptors. The local descriptors extract image’s local interest points while global descriptors take the complete image for feature extraction [12]. Different CBIR methods are used to achieve efficient retrieval performance. The scale invariant feature transform (SIFT) [13] was introduced as a local descriptor, and it is invariant to scale and rotation. To deal with the difficulty of image retrieval using a high-dimensional feature vector, the PCA-SIFT [14] was used. The Hessian matrix and distance ratio were used to solve the computational issue of speeded up robust features (SURF) [15]. The local binary pattern (LBP) based descriptor was proposed by [16]; however, LBP is applicable for gray-scale images only. Moreover, LBP does not perform accurate prediction in presence of noise. This limitation of LBP was overcome by multichannel decoded LBP [17]. The multichannel decoded LBP [17] performs computation on the color and gray-scale images. To further improve the performance of LBP, the fusion of color histogram with LBP was applied to achieve better retrieval performance for color images [18]. To minimize noise in regular patterns, the local ternary pattern (LTP) [19] was developed. The LTP extracts the spatial features in three directions while LBP extract textual features in two directions. The extended local ternary pattern (ELTP) [20] is robust and overcomes the noise within the images. The scale invariant local binary pattern (SILTP) [21] was used to extract local features for gray-scale images and estimate the pixel difference for complex background scenes. Edge detection Sobel filter calculates orientation of edges but has low signal to noise ratio [22]. Edges obtained from Sobel filter have large thickness which gives mismatched images. Canny filter [23] solved the issue of thickness of edges. It has good signal to noise ratio.

Color and image edges are sensitive to the human visual system. In HSV color space, CDH is utilized to improve human visual perception. Entropy is used for feature selection, and correlation is developed between the features after feature selection [24]. The use of visual feature discrimination to measure the similarity between the image passed by the user and images in the database is proposed in an approach based on weight learner [25]. Spatial pyramid matching (SPM) is proposed for spatial distribution of images that increased the retrieval accuracy, but when image alignment is not done properly, it results in rotation and translation variance. Geometric relation is established on the basis of center of image among a group of similar words [26]. Color feature is fused with texture feature to achieve highest retrieval performance as a single attribute is not more resilient for obtaining efficient retrieval performance. As a result, for color extraction, color moment is used, and Gabor descriptor is used to extract the texture feature. After that, features are represented using the Color and Edge Directivity Descriptor (CEDD) [27]. The images are first transformed to RGB color space. Texture information is obtained with the help of evaluated local binary pattern and by predefined prepattern unit. This texture information is then fused with color channels to increase the retrieval performance [28]. Feature extraction [29] is described by fusing top-hat transforms and local binary patterns for other image processing cases to analyze the color images. Top-hat transform is used to extract the shape, and color local binary pattern is used for texture classification.

In [30], extraction of color feature is done by color difference histogram, and Gabor descriptor is used for the texture classification. Joint property of color with the texture increases the retrieval accuracy. Rotation invariant and gray-scale invariant property is achieved by using the texture feature with color feature. Another technique that uses the color and texture features is developed for image retrieval. HSV and Lab color space are used to extract the color attribute, and texture classification is done by using Haralick features in RGB and gray-scale images [31]. Plant disease [32] is identified by using color, and texture classification is done by using gray-level cooccurrence matrix (GLCM).

CBIR systems are used to represent the visual information in global context based on different deep learning techniques. Bilinear convolutional neural network was proposed for efficient retrieval performance that consists of CNN architecture with bilinear root pooling [33]. Similarly, another deep learning method for CBIR which extracts global features through two parallel CNN models was presented. In [34] SPoC global descriptor was developed through aggregation of deep learning local features for CBIR. The cross-dimensional CNN weights were aggregated to represent the global information for CBIR [35]. The triplet network [36] was used for optimizing the ReLU max aggregation convolutions (R-MAC). The R-MAC involves the pooling of regions of image to cover the point of interest. Moreover, the deep CNN model [37] was used to compress the image descriptors and activate the layers of CNN. The multilayered CNN was used for features extraction, and features were encoded with VLAD encoding scheme [38]. The CNN based CBIR systems require large amount of data for model learning and robust machines for computation of loss and hyperparameters, while local features and distance based CBIR approaches require no clustering and are computationally less expensive than deep learning methods.

The CBIR systems have been studied for decades. Still, occlusion, cluttered background, viewpoint variation, and noise make the retrieval process a challenging task. The literature reveals that all the existing methods for CBIR systems were established using clear visual information, and little effort was made to remove the noise within the visual information. However, presence of noise in visual information is responsible for degraded performance of local or global descriptors (deep learning methods). This is due to the fact that the visual information might be occluded with noise, or maliciously designed data was introduced within the visual information to deceive the CBIR models. Therefore, noise resilient local descriptor is required to overcome the noise or artifacts within the images for CBIR.

3. Noise Resilient Local Gradient Orientation

The CBIR systems compute the discriminative low-level attribute of image including color, edges, and patterns. The distribution of pixels within the visual information is uniformed and represents the discriminative attributes of visual information like color, texture, and patterns. The uniform distribution of pixels is sensitive to small uncertain changes and these small uncertain changes disturb the uniform pixel distribution and cause nonuniform patterns within the image. This uncertain noise confuses the CBIR models to retrieve relevant information. Therefore, we proposed Noise Resilient Local Gradient Orientation (NRLGO) to overcome the noise patterns and sustain the uniform pattern of the pixels. The proposed NRLGO consists of three steps: estimation of noise and protection of the local visual structure; then low-level feature extraction (color, texture, and contrast); and generation of microstructures, as illustrated in Figure 1. The nonuniform noise signal was estimated, and resilient model was established through error correction method. Then, discriminative color and texture features were estimated through quantizing HSV color space and local gradient orientation (LGO). Moreover, NRLBP exacts contrast information through quantizing V values of HSV. The Manhattan distance between the search image and the database image is used to determine their similarity. The proposed NRLGO method is completely described in Figure 2.

3.1. Noise Resilient Local Binary Pattern

The proposed NRLBP establishes a noise resilient attribute through computing the small pixel difference and estimating the uncertain bit and uncertain state. The uncertain bit and uncertain state were estimated on the basis of corrected noise-free bits within the image. The intensity difference is calculated between center pixel and its corresponding neighboring pixel through the three-state fuzzy logic’s representation, and s is threshold which is chosen as 2 for NRLBP [39].

The noise factor is considered as uncertain bit and occurs either from to or from 1 to 0. The uncertain bit represents the feature vector where is formed by variable. The uncertain state can be mathematically represented by the following equation:

The regular visual patterns involve edges, edge-ends, corners, etc. The regular patterns exist more frequently than irregular patterns within the images. As regular expression for normal visual patterns exists, it is possible to estimate the regular expression of uncertain state U from the regular visual patterns present within the image.

NRLBP corrects the irregular patterns to regular patterns by removing noise on the basis of error correction mechanism. For example, the image shown in Figure 1 holds an uncertain code . In order to remove the noise, there is a need to predict the uncertain bit and then modify the uncertain bit to form uniform patterns, for example, and. Suppose that represents the set of all regular LBP patterns. For instance, the image consists of 58 regular patterns. On the basis of uncertain state , a list of NRLBP codes is provided as follows:

The uniform code of is obtained after error correction, and noise was removed. After that, the histogram of NRLBP for a local region of image was formed, and the number of elements in is represented by n. The bin of histogram was representing each element of which was added by when . When , the irregular bin was added by . The same process was repeated for each pixel in the region. Furthermore, NRLBP has reduced histogram bins from bins. The feature vector of the local image region can be generated by summing the feature vectors collected from each pixel in the image region.

3.2. Feature Extraction

Feature extraction includes the extraction of color, texture, and local contrast information. Converting RGB to HSV color space extracts the color characteristic. HSV (high-saturation value) color space is quantized into colors with, 8, 3, and bins. Texture classification is done by using multiresolution gradient orientation. It improves the gradient magnitude by removing noise factor and detecting the edges. The V value of the HSV color space is used to acquire local contrast information.

3.2.1. Color Feature

The chrominance signals of visual information hold significant discriminative information to represent the images in CBIR models. To reflect the chrominance attribute of visual information, different color spaces are used like YCbCr, HSV, HIS, Lab. The RGB color model is made up of three components: red, green, and blue. In proposed method, HSV color space was selected to segregate the hue, saturation, and intensity of image chrominance.

In addition, the HSV color model represents the chrominance attribute of image, and it has cylindrical representation with hue (H), saturation (S), and value (V). The hue component of the HSV color space describes the wavelength of colors ranging from 0 to 360, beginning with red at 0, yellow at 60, green at 120, cyan at 180, blue at 240, and magenta at 300. The saturation component describes the chrominance saturation level between 0 and 1, where 0 denotes gray and 1 denotes primary color. The value V component describes the intensity values between 0 and 1, where reflects black and reflects white. The difference between HSV and RGB is that the HSV separates the intensity values from the color element. The RGB system is based on three color elements: red, green, and blue.

The HSV color space is color invariant because the V component of HSV represents the image’s brightness and is independent of color [24]. With 8, 3, and 3 bins, the HSV color space is quantized into 72 colors. The quantization is essential in persevering images from light and intensity and also reduces the time complexity.where , , and represent the quantized bins of hue, saturation, and value component of HSV [12].

The quantized values for hue (h), saturation (s), and value () of HSV in color map cm are , , and . is the number of quantized colors. and are spatial coordinates in quantized color map .

3.2.2. Local Gradient Orientation (LGO)

For gradient computation, the orientation component of Weber local descriptor (WLD) was applied to HSV color model. This is due to the fact that by using the gray-scale images majority of chrominance information is lost. Therefore, the local gradient orientations were extracted, which holds rotation invariant attribute from chrominance part and is beneficial to obtain discriminative features. The gradient orientation [40] was calculated from the angle between the reference axis and the vector in horizontal and vertical location x. and . When filters f10 and f11 are applied to an input image, then outputs and are obtained. and .

is quantized into T gradient orientations. Firstly, mapping is done before quantization and, and . Mapping takes value of under consideration.

Quantization function is given as follows.

and . If T = 8 then .

The multiscale gradient orientation is useful for extracting gradient orientation to describe different granular aspects of visual representation. Multiscale gradient orientation is able to improve the discriminative ability of a single resolution. It is obtained by using square neighbors of pixels, considering the length of . As illustrated in Figure 3, P denotes the set of neighbors, and R denotes the spatial resolution.

The multiscale gradient orientation was obtained through combining the histograms of different operators at varying and. By varying the P and R, whole image pixel values are considered. A multiscale analysis of WLD orientation can be done using the data generated by several operators of varying scales (P, R). Despite the fact that the operator is based on a squared symmetric neighbor set of P members on a square with side length (2R + 1), it can also be used for a circular situation. In general, it can improve the discrimination of a single resolution (P, R).

The WLD orientation is used for texture estimation, and it also eliminates the effect of noise. The WLD orientation reduces the noise and illumination variation by computing the differences between the pixel and its neighbors. WLD orientation is tolerant of the presence of noise in an image. The influence of noise is reduced by using a WLD orientation, which is analogous to smoothing in image processing. Furthermore, the sum of its p-neighbor differences is divided by the current pixel’s intensity, reducing the influence of noise in an image.

WLD orientation has also been established to mitigate the impact of changing brightness. It calculates the differences between the center pixel and its neighbors. As a result, a brightness modification that adds a constant to each image pixel has no effect on the disparities in values. WLD orientation, on the other hand, is responsible for dividing the differences. Thus, when each pixel value is multiplied by a constant, the differences are also multiplied by the same constant, canceling out the contrast change. As a result, the description is not affected by changes in brightness.

Finally, quantized orientation map incorporating multiscale granularity is represented as follows:where s and t are spatial coordinates of orientation structure and j is the resultant value obtained across quantized value .

3.2.3. Local Contrast Information

The pixel intensity is used to describe the local contrast information [41], as the visual information depends on intensity, and specific contrast range improves the visibility of visual information and increases the performance of CBIR systems. The V element of HSV color model was used to extract the intensity feature. This is due to the fact that V component of HSV color model called luma separates the chrominance element of image. The local contrast information map for input image is calculated as [42]

From the mathematical expression represented in (11), the constants s and t are the coordinates of input image . The dimensions of input image are represented by and . Then, the local contrast information was quantized at levels to retain the salient information. The final intensity map was obtained and mathematically represented by the following expression:

3.3. Interlinked Microstructure Identification

Natural images contain large color, edge, and shape attribute, which are considered as low-level image attributes. More relevant images are retrieved from large datasets on the basis of color, edge, and shape attribute of image. Human eye shows sensitivity toward color and orientation features. Orientation reflects an informative content in image. Strong orientation gives a uniform pattern. As most natural standard images do not contain strong orientation, there is no uniform pattern. Natural images contain uniform and nonuniform patterns contributing spatial information, forming the microstructures. Orientation is not enough to discuss the image’s spatial characteristics, so color, edge, orientation, and local contrast information are fused to describe the richer image contents. In this paper, microstructures are proposed on the basis of color, orientation, and local contrast information attributes. Color feature and microstructure which is obtained by associating texture and local contrast information are joined to obtain micro-color structure. Orientation attributes and microstructure which is derived by linking color and intensity features are joined to obtain micro-orientation structure. Micro-intensity map is derived by intensity features and micro-map which is derived by linking orientation and intensity features. As microstructures are combination of color, orientation, and intensity features, they will increase the retrieval performance. Quantized orientation and quantized intensity ranges are used to derive micro-color map. The whole orientation Oorientation is divided into 3 × 3 small grids [43]. Orientation is scale and rotation invariant, so it is used to derive color map. As orientation is quantized at level 6, it can vary from 0 to 5. For horizontal and vertical location, pixel of lengths is defined for each grid. Suppose that are the grids which involve Oorientation where where represents length of grid. Similarly, suppose that are the grids which involve Ointensity where where represents length of grid. Suppose that is the grid of length moved on quantized orientation and is the grid of length moved on quantized local contrast information. Relationship is established between the pixels in the center and the pixels in the surrounding areas as a result of resemblance measures, which are helpful in finding the regular patterns. Suppose that , are the central pixels and and are neighboring pixels of quantized orientation and intensity microstructures. While moving grid of , neighbors of quantized orientation and intensity have the same value as central pixels of orientation and intensity; then, uniform or regular patterns and microstructure basic block are obtained. If no values of neighboring pixels of and are the same as those of the central pixels , , then irregular patterns are formed and microstructure basic block is not obtained. Five steps are followed to achieve a finally single image’s microstructure.(1)Starting from , a grid of size is moved on both quantized orientation and intensity from the left to the right and from the top to the bottom. The micro-map is established at and is labeled as on the basis of regular patterns in both quantized orientation and intensity where , .(2)At , a grid of size is moved on both quantized orientation and intensity from the left to the right and from the top to the bottom. The micro-map is established at and is labeled as on the basis of regular patterns in both quantized orientation and intensity where , .(3)At , a grid of size is moved on both quantized orientation and intensity from the left to the right and from the top to the bottom. The micro-map is established at and is labeled as on the basis of regular patterns in both quantized orientation and intensity where , .(4)At , a grid of size is moved on both quantized orientation and intensity from the left to the right and from the top to the bottom. The micro-map is established at and is labeled as on the basis of regular patterns in both quantized orientation and intensity where , .(5)Final micro-map of image is obtained and demonstrated as by combining all four maps.

4. Feature Quantization

Extraction of discriminative set of attributes for retrieving images is a difficult task. After obtaining discriminative set of features, the second task is the image’s attribute representation, that is, how to represent the extracted features. In this work, microstructures are suggested for feature representation. The main steps involved in forming correlated microstructures are shown in Figure 4.

4.1. Color Feature

Microstructure’s value for the input image is demonstrated as  = a, where and reflects the dimensionality of micro-color structure which is obtained on the basis of the relationship between microstructure of quantized orientation and intensity. The equation used to derive the microstructure features is as follows:

Every block of input image is , supposing that reflects the central pixel with position and reflects the neighboring pixels of central pixel with position . The value for the central pixel is neighboring pixels and denotes the number of cooccurring values for and . 1 is used to represent the number of occurrences of . The micro-color structure is derived by establishing the relationship between microstructure image and quantized color structure. Final micro-color structure is obtained from quantized color map’s pixel values, which are in regular region present in . The features of micro-color structure have dimensions.

4.2. Orientation Feature

Microstructure’s value for the input image is demonstrated as , where , and reflects the dimensionality of micro-color structure which is obtained on the basis of the relationship between microstructure of quantized orientation and intensity. The equation used to derive the microstructure features is as follows:

Every block of input image is , supposing that reflects the central pixel with position and reflects the neighboring pixels of central pixel with position . The value for the central pixel is neighboring pixels and denotes the number of cooccurring values for and . is used to represent the number of occurrences of bo. The micro-color structure is derived by establishing the relationship between microstructure image and quantized orientation structure . Final micro-color structure is obtained from quantized color map’s pixel values, which are in regular region present in . The features of micro-orientation structure have dimensions.

4.3. Local Contrast Information

Microstructure’s value for the input image is demonstrated as  = c, where , and reflects the dimensionality of micro-color structure which is obtained on the basis of the relationship between microstructure of quantized orientation and intensity. The equation used to derive the microstructure features is as follows:

Every block of input image is , supposing that Pco reflects the central pixel with position and reflects the neighboring pixels of central pixel with position . The value for the central pixel is neighboring pixels and denotes the number of cooccurring values for co and ci. 3 is used to represent the number of occurrences of co. The micro-color structure is derived by establishing the relationship between microstructure image and quantized local contrast structure . Final micro-color structure is obtained from quantized color map’s pixel values, which are in regular region present in . The features of micro local contrast information structure have dimensions.

5. Experimental Results and Findings

In this section, experiment is performed on Corel dataset to evaluate the retrieval performance of NRLGO method. The performance of NRLGO is given as follows.

5.1. Dataset

Different databases are used for evaluating the retrieval accuracy of images. In our proposed work, Corel database is used for measuring efficiency of NRLGO. Corel database consists of Corel 1k, Corel 5k, and Corel 10k. Corel 1k contains 1000 images of different contents such as Africans, mountains, elephants, buildings, beaches, and horses. There are 10 categories and 100 images per category with dimension of 256 × 384. Corel 5k is derived from Corel 10k and contains 5000 images of different contents such as waves, trees, lions, ducks, and food. There are 50 categories and 100 images per category. Corel 10k is a large dataset and contains 10000 images of different contents such as 250 cars, flowers, trains, furniture, butterflies, and tractors. There are 100 categories and 100 images per category with dimension of 192 × 128.

5.2. Evaluation Parameters

Precision and recall are the most important parameters for evaluating retrieval performance. Precision of input image is described as ratio of number of images retrieved that are similar to total number of images retrieved. Recall of input image is defined as ratio of similar images retrieved to total number of similar images.where reflects the total number of images that have been retrieved and reflects the total number of similar images that have been retrieved. Average precision and recall are also calculated to measure the image retrieval performance being 100; describes the total number of images of Corel dataset in each category; and values 10, 50, and 100 describe the number of categories of Corel dataset. Average retrieval precision and recall are calculated by (18) and (20).

F-score has characteristics of both precision and recall, and it combines precision and recall into single similarity measure as shown in the following:

5.3. Distance Metric

Manhattan distance (L1) is used to measure the similarity between the search image and the images in dataset by using the following equation: is the collection of attributes extracted from database’s images and is the collection of attributes extracted from search image, where and , where is the dimension of feature vector. Manhattan distance is good to measure similarity for a large image database by reducing the computational costs.

5.4. Performance of NRLGO

Color, texture, and local contrast information are used to retrieve relevant images from large dataset. In proposed NRLGO, color, edges, and local contrast information are extracted from the color image. The effectiveness of NRLGO is obtained by using 192, 128, 108, and 72 color quantization levels; 6, 12, 18, 24, 30, and 36 orientation quantization levels; and 10, 15, and 20 local contrast information quantization levels.

Tables 13 demonstrate the efficient performance on Corel dataset with 5000 images by changing color and texture levels by keeping local contrast information values fixed at 5, 10, and 15. In Table 1 highest precision of 65.82% is achieved when  = 72 and  = 6, and in Table 2 highest precision of 65.32% is achieved when  = 128 and  = 6. Moreover, in Table 3 highest precision of 65.58% is achieved when = 128 and  = 24. So, in the proposed NRLGO, we obtained the descriptor of dimensions (72 + 6 + 5 = 83) by setting values for color Nc = 72, texture Nt = 6 and local contrast information Nci = 5. It is observed that, in Tables 13, ARP decreases when  = 128; is set to 5, 10, and 15; and values are selected from 6 to 36. This decrement is basically because of sensitivity to visual system toward continuously changing texture orientations. In some cases, nonuniform pattern of average retrieval precision is noticed; for example, in Table 3, when  = 72 and  = 15, average retrieval precision firstly increases for from 6 to 18, then decreases at  = 24, and again increases for from 30 to 36. This irregular pattern occurred because of noise due to quantization that increases variability within the class and decreases the performance of NRLGO.

Table 4 shows the ARP obtained on Corel 5k by varying from 6 to 36 and from 16 to 32, 64, and 128 in RGB color space. Highest precision is achieved when color quantization is 128 and texture level is 6 at intensity level 5. It is shown from the table that HSV color space gave higher retrieval performance. As RGB color channels ignore the color characteristics, irregular pattern is obtained. Thus, by using HSV color space, average precision and recall have been increased. HSV color space is used in NRLGO. Not only is it effective toward user intentions, it also takes standard color characteristics into considerations. Hence, HSV color space increases average retrieval performances. Table 5 demonstrates retrieval accuracy obtained on Corel 1k, Corel 5k, and Corel 10k by using different similarity measures. In the proposed method, NRLGO, bin-by- bin similarity measures are applied. Similarity is measured between the search image and database image by using Manhattan distance. L1 increases the retrieval performance: performance of 83.5% on Corel dataset with 1000 images, 65.82% on Corel dataset with 5000 images, and 53.07% on Corel dataset with 10000 images. L1 does not perform any square or square root calculation, so it has high performance for large datasets, while L2 has poor performance at higher distance values and has high computation. It is clear from Table 5 that average retrieval accuracy is smaller on Corel dataset with 5000 images and Corel dataset with 10000 images by using Euclidean distance.

Euclidean distance includes the square operation, so it does not perform well on larger distance values and is also computationally expensive. Furthermore, on Corel 1k, retrieval accuracy is less at square chord. Color, texture, and local contrast information attributes are selected for proposed method NRLGO. However, NRLGO performance is observed at different color, texture, and local contrast information’s combinations. ARP and ARR at 7 different combinations are shown in Table 6 for performance comparisons.

It is evident that when the combination of color, texture, and local contrast information is used, better average retrieval precision (ARP) and average retrieval recall (ARR) are achieved. Besides, poor performance is achieved when using only color, texture, or local contrast information. Local contrast information is an essential feature as it distinguishes images on the basis of difference in illumination; however, it is less emphasized for efficient image retrieval methods. Standard images contain more color and intensity features. Therefore, contrast information is useful to distinguish between the images with the same contrast and the images with different contrast information. Proposed method NRLGO gave better retrieval performance in all combinations of color, texture, and local contrast information as compared to other descriptors on Corel dataset with 1000, 5000, and 10000 images. The retrieval accuracy of NRLGO descriptor is observed on Corel 1k in comparison with MTH [44], MSD [43], CMSD [2], CDH [45], SED [46], and ENN [47] descriptors. Table 7 shows the efficient retrieval accuracy of NRLGO in comparison to the state-of-the-art techniques at different categories of Corel dataset with 1000 images. Texture is penetrating feature in beach and mountain images. It is observed that beach category has variation in textural classifications. Therefore, human perception is more sensitive toward textural variations.

Table 7 shows that dinosaur’s category has 100% retrieval performance. Category-wise result evaluation shows that average retrieval precision is low at beach category, i.e., 50.75%, in comparison with MTH, MSD, SED, ENN, CDH, LeNET-F6, and CMSD. In beach category, most of mountain category images are retrieved, so this results in decrement of ARP in both mountain and beach category. In Table 8, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100 images are retrieved; a total of 100 images are retrieved per category; and R is the relevant images retrieved. Highest precision is achieved at top 10 images retrieved, and lowest precision is achieved at 100 images retrieved. To achieve multiresolution property that increases the retrieval accuracy, we performed experiment by varying P and R as shown in Table 9. Retrieval accuracy is increased when we select the value of P equal to 24 and R equal to 3.0. Similarly, we used different threshold values and achieved highest precision and recall when we selected the threshold value equal to 2. NRLGO extracts noise-robust features that remain invariant to variation in scale, illumination, and orientation. Experiment is performed on Corel dataset by adding Gaussian noise for texture recognition. Gaussian Noise of 5, 10, and 15 (%) is added to dinosaur color image

In Table 9, by using different values of P and R, the multiresolution property is analyzed. Highest accuracy is observed when we choose the value of P equal to 24 and R equal to 3.

In Table 10, different values of s are used. Highest precision and recall are observed when we set threshold value equal to s = 2.

NRLGO firstly converts the noisy color images to gray-scale images and then extracts the edges perfectly even in the presence of noise. It extracts the edges under different noise values as shown in Figure 5. Accordingly, the accuracy of NRLGO is good in comparison to state-of-the-art descriptors on Corel dataset with 1000 images.

6. Discussion

An image retrieval approach, based on Noise Resilient Local Gradient Orientation, that reduces the noise is proposed. Experiment is performed using Corel dataset, i.e., Corel dataset with 1000, 5000, and 10000 images, and the NRLGO achieves higher retrieval accuracy in comparison to state-of-the-art image techniques based on feature extraction techniques.

6.1. Comparison with Other Techniques

Natural images consist of regular and irregular textural patterns. Different textural descriptors like GLCM [25], Gabor features [28], SQ-Spatiogram [49], CMSD [2], and GMM-mSpatiogram [49] are proposed to check the performance on these regular and irregular patterns. It is observed that these textural descriptors perform well on regular patterns; their performance is not good at irregular patterns.

Table 11 demonstrates the retrieval accuracy of proposed NRLGO descriptor with uniform codes as compared to state-of-the-art methods, namely, GLCM [50], SQ-Spatiogram [49], GMM-mSpatiogram [49], SED [25], CMSD [2], and CPV-THF [42]. Experimental result shows that increase in retrieval accuracy of 27.8%, 24.13%, 14%, 9.5%, 2.66%, 1.9%, and 5.5% is achieved as compared to GLCM, SQ-Spatiogram [49], GMM-mSpatiogram [49], SED, CMSD, and CPV-THF on Corel 5k. On Corel 10k, for GLCM, SED, CMSD, CPV-THF, and LeNET-F6 [48], retrieval accuracy increase of 21.17%, 15.43%, 5.82%, 6.67%, 2.82%, and 79% is observed. Table 12 shows retrieval performance of proposed descriptor with respect to texture, color, and shape in comparison with state-of-the-art descriptor at Corel dataset with 5000 images and Corel dataset with 10000 images. Retrieval accuracy of NRLGO descriptor as compared to other descriptors is 29.6%, 26.4%, 29.98%, 17.36%, 27.32%, 5.52%, 2.82%, and 12.06% greater on Gabor [51], EHD [8], color moment [18], LBP [52], STH [53], and MTSD [54] at Corel dataset with 5000 images. At Corel dataset with 10000 images, the accuracy is 23.97%, 20.77%, 26.69%, 18.06%, 5.04%, and 1.12% on Gabor, EHD, color moment, LBP, STH, and MTSD.

Increase in retrieval precision and recall shows that the proposed method outperforms the shape, orientation, and color techniques. Figures 68 demonstrate performance of NRLGO with state-of-the-art techniques in form of curve using precision and recall parameters. x-axis is labeled as recall (%) and y-axis as precision (%). Precision-recall curves are drawn opposite to each other. For example, if average retrieval precision of descriptor is high, then obtained curve is drawn away from the point of origin. If ARP is low, then precision-recall will be short. We have also measured performance of NRLGO on the basis of F-score (%) with state-of-the-art methods like GA [55], GMM-mSpatiogram [49], ODBTC [56], Tetrolet Transform [57] and BiCBIR [58] as shown in Table 13 with the increase of F-score of 0.087%, 0.062%, 0.053%, 0.03%, and 0.015%.

When there is uncertainty in the performance of a descriptor, curve with different turning point is obtained. When the performance of two descriptors is the same in image category, then the curves overlapped at some point. NRLGO outperforms on Corel 1000, Corel 5000, and Corel 10000 the other state-of-the-art descriptors as shown in Figure 68. These figures indicate that the ARP of proposed NRLGO is high, so precision-recall curve lies away from the point of origin. Proposed method NRLGO has a curve with few turning points and outperforms other descriptors for image categories of each dataset. MTH, SED, and MTH are textural descriptors used for efficient image retrieval including the correlation between texture and color, so local contrast information is missing. As correlation achieved by using intensity feature, the poor performance of MTH, SED, and STH is achieved. GLCM, EHD, and Gabor features are textural descriptors; they gave poor performance on natural images because using texture descriptor demonstrates limited texture classification of an image. Gabor filter is also used in textural classification due to the high relation between its cooccurrence and texture. EOAC does not consider textural characteristics of an image. MSD established correlation between color and orientation only and does not take local contrast information into account. The color moment (CM) approach considers only the spatial data of pixels that are near boundary of an image. LBP takes only textural characteristics of an image. A GUI application is established; it takes query image from African, dinosaur, bus, and flower category by the user and then retrieves 12 images from each category. Performance of NRLGO is observed on the basis of image’s features like color, texture, and shape (uniform-nonuniform).

In Figure 9, F-score at Corel 1k is calculated and then its average is obtained. It shows that highest F-score of 0.322 is achieved by using NRLGO method in comparison with state-of-the-art descriptors. Figure 10 shows that highest precision of 0.835 is achieved at retrieval of top 10 images and lowest precision of 0.60 is achieved at retrieval of 100 images. It shows that retrieval performance decreased when the retrieval number of images increased from 10 to 100.

Retrieval of top 12 images done against the search images selected from the African, dinosaur, bus, and flower category is shown in Figures 1114.

In the proposed NRLGO method, NRLBP solves the issue of small pixel difference. It is used to save the local structures from noise by finding an uncertain state. It finds the value of uncertain state on the basis of corrected bits of LBP code. Uniform codes describe the image local structure and nonuniform codes describe the noise patterns. Thus, for finding uncertain state, error correction method that recovers the nonuniform patterns is used. Color feature is obtained in HSV color space by quantizing input image into 72 quantization levels. Texture classification is done by using local gradient orientation. Multiscale gradient orientation is used to extract multiple granularity features. It will reduce noise and illumination variations. V component of HSV is used to extract the intensity attribute. Similarity measure and efficient indexing are done by using Manhattan distance. Color, texture, and local contrast information micro-maps are used to demonstrate extracted features. They describe the important content that is within the uniform region and hide the irrelevant content.

7. Conclusion

A feature descriptor named Noise Resilient Local Gradient Orientation is proposed in this paper for improving the retrieval performance. NRLGO relies on noise removal, color, texture classifications, and local contrast information. Small pixel difference causes noise in color image which leads to changing code abruptly resulting in poor feature extraction. NRLBP protects the local structure from noise by finding the uncertain state first. HSV color space is quantized into 72 levels to obtain the color attribute. Texture classification is done by using LGO approach at 6 quantization levels, which quantize orientation further into supreme orientation. V component of HSV is quantized into 10 levels to obtain the local contrast information. Incorporation of multiresolution gradient orientation gave better texture detection, which increases the relationship between color, texture, and local contrast information. To achieve the resemblance between search image and image in the database, Manhattan distance is used. Correlation is established between color, texture, and local contrast information. For image characteristic representation, correlation is developed between the color, orientation, and local contrast data. Microstructures are developed on the basis of correlation information to develop more fine details in subject field. Multiresolution orientation increased the retrieval performance of proposed NRLGO. NRLGO extracts noise-robust features that remain invariant to variation in scale, illumination, and orientation. Experimental results show that NRLGO has improved retrieval performance of texture, local contrast information, and color features in comparison with state-of-the-art descriptors on Corel with 1000 images, 5000 images, and 10000 images.

Data Availability

The dataset Corel 5k is used during this study which is publicly available at corel5k.20091111.tar.bz2 and https://sites.google.com/site/dctresearch/Home/content-based-image-retrieval.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding this work.

Acknowledgments

This work was supported by Education and Research Promotion Program of KOREATECH (2021).