Abstract

Content-based image retrieval has recently become an important research topic and has been widely used for managing images from repertories. In this article, we address an efficient technique, called MNGS, which integrates multiview constrained nonnegative matrix factorization (NMF) and Gaussian mixture model- (GMM-) based spectral clustering for image retrieval. In the proposed methodology, the multiview NMF scheme provides competitive sparse representations of underlying images through decomposition of a similarity-preserving matrix that is formed by fusing multiple features from different visual aspects. In particular, the proposed method merges manifold constraints into the standard NMF objective function to impose an orthogonality constraint on the basis matrix and satisfy the structure preservation requirement of the coefficient matrix. To manipulate the clustering method on sparse representations, this paper has developed a GMM-based spectral clustering method in which the Gaussian components are regrouped in spectral space, which significantly improves the retrieval effectiveness. In this way, image retrieval of the whole database translates to a nearest-neighbour search in the cluster containing the query image. Simultaneously, this study investigates the proof of convergence of the objective function and the analysis of the computational complexity. Experimental results on three standard image datasets reveal the advantages that can be achieved with the proposed retrieval scheme.

1. Introduction

With the increasing abundance of digital images available from a variety of sources, content-based image retrieval (CBIR) from the huge databases has attracted a lot of attention in the past decade [13]. An effective CBIR system should search images by computing the similarity of the extracted features (views) between the user-defined query pattern and images in large-scale collections. Existing visual features include, but are not limited to, intensity, shape, colour, scene, texture, and local invariant. The early CBIR system calculates the similarity with only one feature [4, 5]. This would normally lead to undesirable retrieval results due to insufficient representation. That is, it is quite difficult to effectively distinguish all types of images using a single feature. Generally, to achieve proper results in the CBIR framework, appropriate features that can well capture the meaningful contents of underlying images are usually integrated [68]. However, these visual features are often high dimensional and nonsparse, and direct manipulation of feature descriptors is the most time-consuming operation. To solve this limitation, reduction in dimensionality is widely used. Some popular dimensionality reduction techniques include linear discriminant analysis (LDA) [9], principal component analysis (PCA) [10], independent component analysis (ICA) [11], and singular value decomposition (SVD) [12]. Nonnegative matrix factorization (NMF), as a novel tool for source separation [1315], can be an alternative way to reduce the dimensionality. It decomposes a nonnegative matrix into two small nonnegative matrices: basis matrix and coefficient matrix. The basis characteristic of NMF is that all elements are not negative, which distinguishes it from other conventional dimensionality reduction techniques. The coefficient matrix models the features of images as an additive combination of a set of basis vectors. However, as is well known, the original NMF does not always yield a structure constraint for sparse representations of features during decomposition [16]. In recent years, various researches have been reported to extend the standard NMF by enforcing a structure preservation constraint on the objective function [17, 18]. Among them, a graph-embedding objective function of NMF encodes the graph information of the images into the sparse representation [19]. Liu et al. [20] introduced constrained nonnegative matrix factorization (CNMF) in which the label information considered as an additional hard constraint of semisupervised retrieval is directly incorporated into the original NMF. Another fashionable NMF algorithm, topographic NMF (TNMF), was proposed by Xiao et al. [21], in which they imposed a topographic constraint on the objective function to pool together structure-corrected features. The normalization strategies proposed for all above-mentioned NMF-based techniques pertain to the coefficient matrix of NMF decomposition. In fact, studies on the constraints of the basic matrix are still limited.

More recently, some works based on statistical model frameworks have been reported for image retrieval [22, 23]. For example, Zeng et al. [24] introduced an image description algorithm that characterizes a colour image by combining the spatiogram with Gaussian mixture model- (GMM-) based colour quantization. Similarly, another colour image indexing method through spatiochromatic multichannel GMM was introduced by Piatek and Smolka [25]. Marakakis et al. [26] proposed a relevance feedback method for CBIR using GMM as image representations where Kullback-Leibler (KL) divergence is employed. Its retrieval capability mainly relies on the facts that mixture model-based techniques have provided powerful methodologies for data clustering [27, 28]. This type of technique has the capability to model the uncertainty in a statistical manner. Specifically, GMM fits different shapes of observed data using multivariable Gaussian distribution. A special virtue of the GMM is that it requires estimation of a small number of parameters.

As the aforementioned discussions, in this paper, we propose a novel technique combining multiview constrained NMF and GMM-based spectral clustering (MNGS) for image retrieval. It is noteworthy to highlight the following attractive characteristics of the proposed MNGS. First, multiple features are extracted from the underlying images, and then MNGS integrates these features to obtain a similarity-preserving matrix. Second, we incorporate two constrained terms into the original NMF objective function to represent latent feature information in a low-dimensional space. The first constrained term will help guarantee the basic matrix orthogonality as much as possible to reduce the redundancy. Therefore, this constraint will tend to obtain competitive sparse representations of the visual features. The remaining constraint allows us to consider the latent graph information of the images and satisfy the structure preservation requirement. More importantly, this study provides the proof of convergence of the objective function in detail to ensure that the algorithm converges to the local minima during decomposition. Third, a multivariable GMM is embedded into the proposed MNGS to model the distribution of the sparse features in terms of the coefficient matrix of NMF. Consequently, images with sparse features belonging to the same Gaussian component are similar, so these images can be labelled with the same subcluster. Considering the complexity of images in repertories, it becomes natural for one to assign more components of GMM to label images. In general, the larger the component, the more accurate the indexing results using GMM. However, the computational cost is more expensive owing to the learning of parameters of GMM. To match the optimal number of components, inspired by the work of [29], finally, spectral clustering based on KL divergence is utilized to merge the GMM components and achieve the desired retrieval results. Specifically, by the eigendecomposition of the similarity matrix measured by KL divergence, similar GMM components can be grouped into several spectral components in a lower-dimensional spectral space. Thus, each sparse feature can be labelled by both a GMM component and a spectral component, which might lead to more accurate clustering results. With the label information of the query image using the mentioned statistical model, similar image retrieval can be effectively performed with the clustering results.

The rest of this paper is organized as follows. Section 2 introduces the related work, including the classical NMF and GMM. In Section 3, we describe the details of the proposed framework, followed by the proof of the convergence of this approach and the complexity analysis in Section 4. Retrieval experiments conducted on real-world image datasets are discussed in Section 5. Section 6 reports the concluding remarks and some suggestions for further research.

2. Preliminaries

This section briefly reviews the classical NMF and GMM. The former is reasonably attractive owing to its intrinsic advantage of providing a low-dimensional description for nonnegative data. The latter shows its accuracy and effectiveness in most clustering tasks.

2.1. Nonnegative Matrix Factorization

The NMF is commonly used to decompose a matrix into two nonnegative matrices under the condition that all its elements are nonnegative. Mathematically, given a nonnegative data matrix , NMF aims at finding two nonnegative matrices and such that the original data matrix can be well approximated bywhere is a new reduced dimension (inner dimension) and satisfies . In this way, each column of can be regarded as a sparse representation of the associated column vector in . There are many criteria to solve the factoring problem and evaluate the quality of the decomposition. Generally, the Euclidean is utilized to construct the objective function; for example,where denotes the Frobenius norm. This objective function is proved to be nonconvex and nonincreasing [16]. To minimize the above objective function, Lee and Seung [30] derived the multiplicative updates of the basic matrix and coefficient matrix , respectively, as follows:The above multiplicative updates can ensure convergence to a local optimal solution. Thus, the iteration stops when the objective function converges or the maximum number of iterations is reached.

2.2. Gaussian Mixture Model

The classical Gaussian mixture model assumes that each observed data point is dependent on the label . The density function of GMM at each observation can be described bywhere , and the prior distribution indicates the probability of each component belonging to GMM, which satisfies the constraintsExpression (4) can be regarded as a linear combination of several Gaussian components. Each component is a Gaussian distribution that has its own covariance and mean , defined asTo optimize the parameter set , the log-likelihood function of (4) must be maximized by the expectation maximization (EM) algorithm [31], which is expressed as

3. The Proposed Method

In this section, we introduce an image retrieval approach, called MNGS, which consists of four major parts: feature extraction, multiview constrained NMF, GMM-based spectral clustering, and similarity ranking. Figure 1 shows the overall framework of the proposed MNGS method.

3.1. Feature Extraction

Images may be represented in terms of different features, each of which implies different information, such as orientation, texture, intensity, scene, colour, and spatial relation. To construct the image feature space, this paper utilizes several visual descriptors for the detection of objects.

Scene characteristics play an important role in the description of the object. To obtain the scene information, in this study, the energy spectrum of the grey-scaled image is first filtered by 32 Gabor filters in 4 frequency bands with 8 orientations. Each filtered spectrum image is then decomposed into grid subregions leading to a 512-dimensional Gist [32] feature.

Orientation information is a common feature for each object. Here, the grey-scaled image with Gamma correction is divided by blocks, which consist of cells. By calculating 8 gradient histograms of each cell, the HOG feature [33] with 1152-dimensionality is obtained.

The texture feature is one of the most important characteristics for image retrieval strategy. This is because each object has its own texture in nature. According to [34], LBP is highly suitable to characterize the texture of an object; therefore, this paper adopts a 512-dimensional LBP feature. More specifically, LBP redefines the grey value of each pixel by thresholding a neighbourhood, and the histogram of this new grey-scaled image is defined as the LBP feature.

Generally, features extracted from the natural image include colour information. Therefore, it is very important for the image retrieval system to select a colour descriptor that is robust to illumination, colour, and hue. This paper computes a 64-bin histogram of each RGB channel and lists them together, leading to a 192-dimensional ColorHist [35].

Thus, each image can be simultaneously described by four visual features, such as Gist, HOG, LBP, and ColorHist.

3.2. Multiview Constrained NMF

This paper seeks a novel constrained objective function to make it converge to a local minimum. We thus first consider incorporating the structure constraint into the conventional NMF framework for similarity preservation.

Let be -dimensional feature vectors of the given images in th view. We then adopt a Gaussian-like heat kernel weighting as follows:where the scalable parameter stands for the median of all paired distances measured by the Frobenius form. We can then construct the following feature matrix to fuse the different views:where is the view number; in the current study, . In classical NMF, the coefficient matrix cannot reflect the latent similar contents shared by different views, which is meaningful to the image retrieval system. To solve this problem, a symmetric similarity measurement matrix is introduced to describe the closeness of the considered images, which is expressed aswhere is a unit matrix with size . With the similarity matrix defined above, we can enforce the structure constraint into the matrix factorization process by introducing the following regularization:where denotes the trace of a matrix, is a Laplacian matrix [36], and represents a diagonal matrix whose element corresponds to .

In addition, considering the redundancy of the basic matrix, the current paper imposes an orthogonality constraint on each basic vector and expects this orthogonal regularization to sharply reduce the redundancy and make the basic matrix near-orthogonal. Here, the regularization for the basic matrix is defined as

Based on the discussions above, by introducing the structure and orthogonality constraints into the objective function of a classical NMF framework, we obtain the proposed objective function formulated bywhere and are two nonnegative parameters for balancing the factorization error and regularized constraints.

Because the objective function of the proposed regularized NMF is not convex, it is unrealistic to search for a global solution with respect to both and . To solve the optimization problem, instead, a novel multiplicative update scheme for and is introduced in this paper to achieve a local minimum.

Considering the nonnegativity of the two variables and , the Lagrange multipliers and are utilized to optimize the objective function (13). This yields the corresponding Lagrange function written byTo find and that minimize , we take the partial derivatives of with respect to and on both sidesApplying the Karush-Kuhn-Tucker condition [37] and lead to the following update formula for and :Based on this condition, it is easy to derive the ultimate update rules as follows:Regarding the convergence of the update rules, we have the following theorem.

Theorem 1. The objective function in (13) is nonincreasing under the updating rules in (18) and (19).

This theorem guarantees that the proposed objective function can converge to a local optimum, and its proof is given via an auxiliary function in Section 4.

3.3. GMM-Based Spectral Clustering

The coefficient matrix of the proposed multiview NMF can be regarded as a low-rank representation of the latent sparse features from various sources of data. Specifically, each row of the coefficient matrix represents the feature of the associated training sample. Thus, once we obtain the coefficient matrix, clustering of the training data is realized by using the common clustering method, such as -means or fuzzy -means (FCM) on the sparse features. In our method, to achieve this, we applied GMM to analyse the data of the coefficient matrix and cluster into labels, where is a -dimensional feature vector. Labels are denoted by . The proposed GMM-based spectral clustering strategy assumes that it is possible to fit the distribution of different features of the coefficient matrix using a GMM with parameter . In light of (7), it is clear that the EM algorithm cannot be directly adopted to optimize these parameters. To solve this problem, Jensen’s inequality in the form of is introduced. Thus, we can rewrite the log-likelihood function (7) as follows:With the Bayesian theory, the posterior probability indicates the possibility of belonging to the component byConsidering the constraints and , the prior distribution can be estimated by setting the partial derivative of the objective function over it to zero:where is the Lagrange multiplier. Equation (22) can yield the following form at step :Similarly, taking the derivative of the objective function with respect to the mean and covariance as zero, we obtain their estimations as follows:According to the posterior probability, the final labelling result is determined by

A crucial problem in GMM-based clustering is how to select a proper number of Gaussian components (subclusters). Owing to the complexity of the training samples, generally, the number of components in GMM is expected to still be larger than the number of artificial labels. The subclusters consisting of sparse features labelled by similar GMM components are then regarded as similar. Finally, we merge the subclusters using spectral clustering to match the number of artificial labels. In our method, two main successive steps are considered. First, the similarity measurement of different Gaussian components is implemented using KL divergence [38], defined asIn particular, according to (27), the explicit expression of KL divergence between two Gaussian distributions can be obtained byAfter similarity measurement has been completed, second, the generation of a symmetric similarity matrix is critical to the success of spectral clustering. We then transform the subclusters obtained by GMM into a two-dimensional eigenspace by utilizing eigenvalues and eigenvectors of the symmetric similarity matrix defined byThe eigenvalues and eigenvectors can be obtained by eigendecomposition of Laplacian matrix , where is a diagonal matrix with . According to graph Laplacian theory [39], the crucial feature information of the subclusters is contained in the eigenvectors corresponding to the second and third minimal eigenvalues. Assume that two chosen eigenvectors are and ; the th subcluster is uniquely characterized by . Some conventional clustering approaches, such as -means and FCM, are then conducted on the points to obtain the clustering results. Finally, if some subclusters belong to the same labels in the spectral space , where is the number of artificial labels, the samples belonging to each subcluster are treated as similar and can be merged as a new spectral component . The framework of the MNGS method is outlined as follows.

3.4. Similarity Ranking

The related kernel matrix of the similarity measurement between a given query image and training samples can be expressed bywhere represents the th feature of the query image and corresponds to the same type of feature about the training sample. Meanwhile, using a linear projection matrix ,We can directly project the kernel matrix (30) into the low-dimensional space and finally obtain the sparse representation of the test sample as follows:To rank the training samples based on its similarity against the query image in descending order, the probability that the query image belongs to different GMM components should be computed first. After all probabilities have been obtained, the next objective is to retrieve the most similar samples for the given query image. The whole process follows the following three steps: (1) for the samples belonging to each component of GMM, the similarity is compared by ranking the probabilities of participators in descending order; (2) we compare the probabilities in each spectral component to distinguish the similarity of different subclusters against the query image with descending order; (3) the last step sorts the probabilities to compare the similarity between different spectral components against the query. With the rules and similarity measurements mentioned above, all training samples can be sorted in a reverse order (e.g., the third step, the second step, and the first step). Finally, for each query, we can retrieve the best matches for it using this queue.

Algorithm 1 (multiview constrained NMF followed by GMM spectral clustering (MNGS)).
Input. Multiview feature fusion matrix using (9), the inner dimension , the regularized parameters (), the components of GMM, and the image dataset.
Output. The trained GMM ; the GMM and spectral component labels for each training sample.(1)Calculate the similarity measurement matrix using (10); initialize the basic matrix and the coefficient matrix ;(2)Repeat(3)Update the basic matrix using (18) and normalize each column of matrix ;(4)Update the coefficient matrix using (19);(5)Until the objective function (13) converges(6)Initialize the GMM parameter set ;(7)Calculate the posterior distribution with (21); then update the prior distribution, the mean vector, and the covariance matrix using (23), (24), and (25), respectively;(8)Repeat step 7 until reaching the maximum number of iterations, and subclusters can be obtained by (26);(9)Calculate the symmetric similarity matrix containing the KL divergence between different GMM components using (29), and spectral clustering is performed to further merge the GMM components.

4. Convergence and Computational Complexity Analysis

4.1. Proof of Convergence

To prove the convergence of the proposed update rules for and , we must demonstrate that the following objective function is nonincreasing under the update rules:The convergence proof of the proposed method benefits from an auxiliary function, which is characterized by the following lemma.

Lemma 2. If is an auxiliary function of and the conditions and are satisfied, then will be convergent under the update

Proof. Obviously, according to the conditions, we haveThe equality holds only if is the local minimum of .
Because the update operations defined by (18) and (19) are element-wise in nature, if we let be a constant, it is sufficient to verify that is convergent for any element in . To accomplish this, this paper defines an auxiliary function regarding as follows:From (36), it is easy to find ; thus, the problem is reduced to prove . To achieve this, the auxiliary function in (36) is compared with the Taylor series expansion of , given bywhere and represent the first and second derivatives, respectively. Using (33), we haveConsequently, comparing (36) to (37) by employing (38), the problem of can be equivalently transformed to proveWith the definition of matrix multiplication, the left hand of the inequality above is certified asAdditionally, as mentioned in Algorithm 1, each column of is normalized during each iteration. This means that . With this conclusion, the remaining part in (39) can be equivalently proved bySubstituting of (36) into (34), we can arrive at the local minimum of the auxiliary function denoted byWe rewrite (42) asIt is obvious that (43) is equivalent to the expression of (16); therefore, the minimum solution (42) is equivalent to the update rule for the basis matrix in (18). Now, combining (36), (37), (39), and (42), we can obtainNext, following a procedure similar to that described above, fixing as a constant, the update rule for will be proved to be nonincreasing. Let represent the element-based objective function. Similarly, the proof begins with the introduction of an auxiliary function with respect to , defined byBecause is obvious, the Taylor series expansion of is then utilized to prove the inequality , expressed asAccording to (33), the corresponding first- and second-order derivatives of regarding can be formulated byWe could find without difficulty from (45) and (46) that yieldsObviously, we have the following inequality:Substituting into (34), the update rule of in (19) can be obtained as a local optimum of the auxiliary function (45):Again combining (45), (46), (48), and (50), the following inequality is inferred asWith (44) and (51), the objective function can be proved to be nonincreasing with (18) and (19) asThus, we have proved the convergence of Theorem 1.

4.2. Complexity Analysis

This subsection mainly analyses the computational complexity of the proposed MNGS algorithm using the large notation. The computational cost of MNGS primarily comprises three parts. First, we need float point operations to obtain the multiview feature fusion matrix and the similarity measurement matrix in terms of (9) and (10), respectively. Based on the updating rules reported by (18) and (19), the cost required for updating matrices and is then . Additionally, the proposed MNGS requires to work with the matrix operation for each component of GMM and to label the membership for each sample of the subclusters. Thus, assuming that subclusters , samples , and iterations have been considered, the total time complexity of the EM algorithm is . Finally, concentrating on the spectral clustering represented by (28) and (29), the cost spent on the similarity matrix of GMM components is . Except for this cost, the computational complexity of clustering the similar components is . Considering the relationships and , the time cost of the spectral clustering can be ignored. Thus, from the foregoing complexity analysis, if the algorithm terminates after iterations, the maximum overall cost of the proposed MNGS is .

5. Experimental Results

This section presents a series of experiments to verify the accuracy and effectiveness of the proposed MNGS algorithm. In the process of training, the searching performance evaluates the potential clustering capability held by multiview constrained NMF. Once the clustering result is obtained using GMM-based spectral clustering, all training images in the dataset are ranked according to the probabilities that the images and the query belong to the clustering components in the retrieval phase. A returned image set with retrieval length in the form of a percentage is then viewed as the images nearest the query. Three publicly available datasets used for the present study include Calth-256 (Calth-256: http://www.vision.caltech.edu/Image_Datasets/Caltech256/) [40], CIFAR-10 (CIFAR-10/100: http://www.cs.toronto.edu/~Ekriz/cifar.html) [41], and CIFAR-100 (CIFAR-10/100: http://www.cs.toronto.edu/~kriz/cifar.html) [41]. The evaluations are carried out on a general purpose computer with an Intel Core 2 Duo 2.1 GHz CPU (T6570) and 3 GB of RAM under the Windows 7 environment. All algorithms have been implemented using the MATLAB 2010b software application.

5.1. Preparation for Datasets

The Calth-256 dataset includes 30,607 images and consists of 257 categories; each category contains more than 80 colour images. These images cover a wide variety of nature scenes and artificial objects. The experiment randomly selects four categories, named fried-egg, touring-bike, tweezer, and watermelon, with a total of 415 images as training samples, and forty images are randomly chosen from these four categories as query images.

CIFAR-10 and CIFAR-100 are databases of 60,000 images with 10 groups and 100 groups, respectively. For the former dataset, we randomly consider 541 images selected from 10 groups as the training set, and 10 images from each group are randomly selected as query samples. We also perform the experiment over the CIFAR-100 dataset, in which we select 400 images from four categories: beetle, bicycle, chair, and mountain. Similarly, the testing set consists of forty images randomly selected from the four categories. In this work, the images are resized to the resolution of pixels for convenience of feature extraction.

5.2. Parameter Selection

It can be seen from Section 3 that there are four parameters in the proposed MNGS: inner dimension of NMF, component numbers of GMM, and regularization parameters , . It is natural to think that the retrieval accuracy might be affected by these parameters; therefore, this subsection first discusses the parameters we choose for the proposed method. The evaluation is performed based on the two most commonly used metrics: precision and recall rate [42]. The precision is defined as the ratio of the relevant images in all retrieval images:The recall represents the ratio of the retrieved relevant image to all relevant images in the dataset, defined asThe NMF-based method has shown great advantage in working with high-dimensional data because NMF can reduce the dimensionality of considered data while preserving the characteristics of the underlying data. It is clear that NMF with a small inner dimension can efficiently reduce the complexity of decomposition. However, an excessively small could lead to an unfavourable retrieval result because of the lost information of the data. Conversely, having a large inner dimension may incur extra computational cost. Thus, one key issue is that can yield a “meaningful” retrieval result. Following a similar consideration to [43], the work first discusses the influence of , for example, , on the retrieval results when parameters and are fixed. Table 1 summarizes the corresponding results under different choices of inner dimension . As we can observe from this table, for the proposed framework, its performance at for datasets Calth-256 and CIFAR-100 and for CIFAR-10 is better than that at other values.

Next, we test the performance of the proposed method with varying components of GMM. As mentioned before, each artificially selected category can be further subdivided into several subclusters owing to the abundant images of the category. Considering that the number of subclusters labelled by GMM is always larger than the number of its artificial categories, the experiment assigns the subclusters (component numbers) of GMM to ( = 6, 8, 10, 12, 14, and 16) for datasets Calth-256 and CIFAR-100 and ( = 12, 14, 18, 22, 26, and 30) for CIFAR-10. Table 2 records the corresponding retrieval results for the three datasets with different component numbers. Based upon the results of precision and recall rate analysis, it is obvious that for Calth-256 and CIFAR-100 and for CIFAR-10 provide higher retrieval precision. Thus, this experiment indicates that the resulting subclusters near the artificial class labels and an excessively large component number might reduce the retrieval precision.

In the proposed MNGS, there are two regularization coefficients , which represent the trade-off between factorization errors and regularized constraints. The following values for regularization coefficient are considered first: 0.05, 0.15, 0.35, 0.55, 0.75, and 0.95. Table 3 illustrates the comparison using different datasets with a fixed coefficient , and the retrieval results indicate that the best performance is obtained using . Similarly, we investigate the effect of parameter on the retrieval performance, and experimental results using different values of are depicted in Table 4. In this case, we can see from the table that using obviously leads to both the highest precision and recall rate for all three datasets. From these results, we find if the parameters are set to and , our approach provides the best retrial results.

To investigate the influence of feature dimensions on the retrieval results and execution time, the proposed method adopts two difference feature dimensions. One is mentioned, which we refer to as a higher dimension case. Another case we just call the lower-dimensional features where Gist feature, HOG feature, LBP feature, and ColorHist feature are 256, 576, 256, and 96, respectively. We can observe from Table 5 that the reduction of feature dimensions helps to reduce the time consumption. But then, it is accompanied with moderately decreasing of precision and recall rate. Unless otherwise specified, the first feature dimensions are used in our method.

5.3. Performance and Comparison

This subsection evaluates the retrieval performance of the proposed MNGS algorithm. Considering some random initialization of the proposed method, repeated retrieval performance should be considered on the same dataset to obtain a relatively stable result. More precisely, taking Calth-256 and CIFAR-100 datasets as an example, for each retrieval task, the training process is conducted once, yet 40 iterations of testing for 40 query images that are selected from the corresponding dataset would be carried out to assess the retrieval accuracy. As mentioned before, the datasets used in this experiment are parts of the complete datasets but are still named Calth-256, CIFAR-10, and CIFAR-100 for convenience in the following description. The experiments will assign parameters in terms of the above discussion. We perform the comparison with three state-of-the-art approaches on these datasets. The first method measures the similarity by modelling the RGB image with Gaussian distributions and calculating the KL divergence between the two GMMs, which we call KL-GMM [44] in our experiment. The second approach detects the resembled images by labelling the dataset via multiview joint nonnegative matrix factorization (JNMF) [45]. The third algorithm searches the nearest neighbour of the query image in a low-dimensional feature space via multiview alignment hashing (MAH) [43]. The precision and recall rate versus different retrieval lengths are adopted to compare the behaviour of different approaches, and the results are exhibited in Figure 2. For the precision curve, we can observe that, compared with using the CIFAR-10 dataset, all approaches achieve higher-precision values while using Calth-256 and CIFAR-100. We believe that this is because the samples of CIFAR-10 present more complex details. Another interesting finding is that the plots of prevision tend to a constant with increasing retrieval length values, where represents the total number of images in each category. This is because most images that heavily match the query assemble in the former queue owing to the similarity ranking of the training set. Additionally, with increasing retrieval length, as a common phenomenon, the recall rates of all algorithms exhibit an ascending trend. It is noteworthy that the algorithms with feature extraction always achieve higher retrieval accuracy, such as JNMF, MAH, and MNGS. In addition, in the proposed method, the GMM-based spectral clustering scheme achieves better performance than regression-based hashing of MAH. Therefore, compared with JNMF, MAH, and KL-GMM shown in Figure 2, it is obvious that MNGS outperforms its competitors in terms of assessment criteria. Table 6 presents the top six images retrieved from the respective categories corresponding to the query images. The advantage of the algorithm can be found from the number of correctly matched images against query images. It can be observed that the retrieval accuracy for KL-GMM is slightly poorer than the other methods. This is consistent with the results depicted by Figure 2. The proposed method is proved again to moderately improve the retrieval performance.

5.4. Time Consumed

This subsection compares the computational time for each of the aforementioned approaches. The runtimes including the training time and testing time of the different algorithms are illustrated in Figure 3. Owing to the feature extraction steps of the JNMF, MAH, and MNGS algorithms, the computational times required in the training phase could have been increased. In contrast, KL-GMM costs remarkably less training time. However, for the average CPU time in the testing phase, as we can see from this figure, no significant difference could be found for all approaches; only KL-GMM and MNGS spend slightly more time than the other methods on Calth-256 and CIFAR-10, as does MNGS on CIFAR-100.

6. Conclusions

In this paper, a novel retrieval algorithm called MNGS was presented, in which the sparse representations of the training data were precisely learnt via constrained NMF with pooling of multiview features. The main contribution of this work includes the following four aspects: (1) the proposed method imposed structure constraints and orthogonal regularization into the standard NMF framework and demonstrated its convergence; (2) by incorporating multiple structure-correlated features together, the coefficient matrix of MNGS preserved features from all images in the low-dimensional space; (3) the EM algorithm was incorporated to estimate the GMM components for the sparse features; and (4) spectral clustering based on the KL divergence strategy was designed to obtain the final retrieval results. Experiments on the three standard datasets indicated that, when appropriate parameters were used, the desired image retrieval could be achieved in terms of searching accuracy and effectiveness.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the National Nature Science Foundation of China under Grant 61371150.