Mathematical Problems in Engineering

Volume 2016 (2016), Article ID 3271924, 15 pages

http://dx.doi.org/10.1155/2016/3271924

## Image Retrieval Based on Multiview Constrained Nonnegative Matrix Factorization and Gaussian Mixture Model Spectral Clustering Method

School of Information Science & Engineering, East China University of Science and Technology, Shanghai 200237, China

Received 15 June 2016; Accepted 3 November 2016

Academic Editor: Wanquan Liu

Copyright © 2016 Qunyi Xie and Hongqing Zhu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Content-based image retrieval has recently become an important research topic and has been widely used for managing images from repertories. In this article, we address an efficient technique, called MNGS, which integrates multiview constrained nonnegative matrix factorization (NMF) and Gaussian mixture model- (GMM-) based spectral clustering for image retrieval. In the proposed methodology, the multiview NMF scheme provides competitive sparse representations of underlying images through decomposition of a similarity-preserving matrix that is formed by fusing multiple features from different visual aspects. In particular, the proposed method merges manifold constraints into the standard NMF objective function to impose an orthogonality constraint on the basis matrix and satisfy the structure preservation requirement of the coefficient matrix. To manipulate the clustering method on sparse representations, this paper has developed a GMM-based spectral clustering method in which the Gaussian components are regrouped in spectral space, which significantly improves the retrieval effectiveness. In this way, image retrieval of the whole database translates to a nearest-neighbour search in the cluster containing the query image. Simultaneously, this study investigates the proof of convergence of the objective function and the analysis of the computational complexity. Experimental results on three standard image datasets reveal the advantages that can be achieved with the proposed retrieval scheme.

#### 1. Introduction

With the increasing abundance of digital images available from a variety of sources, content-based image retrieval (CBIR) from the huge databases has attracted a lot of attention in the past decade [1–3]. An effective CBIR system should search images by computing the similarity of the extracted features (views) between the user-defined query pattern and images in large-scale collections. Existing visual features include, but are not limited to, intensity, shape, colour, scene, texture, and local invariant. The early CBIR system calculates the similarity with only one feature [4, 5]. This would normally lead to undesirable retrieval results due to insufficient representation. That is, it is quite difficult to effectively distinguish all types of images using a single feature. Generally, to achieve proper results in the CBIR framework, appropriate features that can well capture the meaningful contents of underlying images are usually integrated [6–8]. However, these visual features are often high dimensional and nonsparse, and direct manipulation of feature descriptors is the most time-consuming operation. To solve this limitation, reduction in dimensionality is widely used. Some popular dimensionality reduction techniques include linear discriminant analysis (LDA) [9], principal component analysis (PCA) [10], independent component analysis (ICA) [11], and singular value decomposition (SVD) [12]. Nonnegative matrix factorization (NMF), as a novel tool for source separation [13–15], can be an alternative way to reduce the dimensionality. It decomposes a nonnegative matrix into two small nonnegative matrices: basis matrix and coefficient matrix. The basis characteristic of NMF is that all elements are not negative, which distinguishes it from other conventional dimensionality reduction techniques. The coefficient matrix models the features of images as an additive combination of a set of basis vectors. However, as is well known, the original NMF does not always yield a structure constraint for sparse representations of features during decomposition [16]. In recent years, various researches have been reported to extend the standard NMF by enforcing a structure preservation constraint on the objective function [17, 18]. Among them, a graph-embedding objective function of NMF encodes the graph information of the images into the sparse representation [19]. Liu et al. [20] introduced constrained nonnegative matrix factorization (CNMF) in which the label information considered as an additional hard constraint of semisupervised retrieval is directly incorporated into the original NMF. Another fashionable NMF algorithm, topographic NMF (TNMF), was proposed by Xiao et al. [21], in which they imposed a topographic constraint on the objective function to pool together structure-corrected features. The normalization strategies proposed for all above-mentioned NMF-based techniques pertain to the coefficient matrix of NMF decomposition. In fact, studies on the constraints of the basic matrix are still limited.

More recently, some works based on statistical model frameworks have been reported for image retrieval [22, 23]. For example, Zeng et al. [24] introduced an image description algorithm that characterizes a colour image by combining the spatiogram with Gaussian mixture model- (GMM-) based colour quantization. Similarly, another colour image indexing method through spatiochromatic multichannel GMM was introduced by Piatek and Smolka [25]. Marakakis et al. [26] proposed a relevance feedback method for CBIR using GMM as image representations where Kullback-Leibler (KL) divergence is employed. Its retrieval capability mainly relies on the facts that mixture model-based techniques have provided powerful methodologies for data clustering [27, 28]. This type of technique has the capability to model the uncertainty in a statistical manner. Specifically, GMM fits different shapes of observed data using multivariable Gaussian distribution. A special virtue of the GMM is that it requires estimation of a small number of parameters.

As the aforementioned discussions, in this paper, we propose a novel technique combining multiview constrained NMF and GMM-based spectral clustering (MNGS) for image retrieval. It is noteworthy to highlight the following attractive characteristics of the proposed MNGS. First, multiple features are extracted from the underlying images, and then MNGS integrates these features to obtain a similarity-preserving matrix. Second, we incorporate two constrained terms into the original NMF objective function to represent latent feature information in a low-dimensional space. The first constrained term will help guarantee the basic matrix orthogonality as much as possible to reduce the redundancy. Therefore, this constraint will tend to obtain competitive sparse representations of the visual features. The remaining constraint allows us to consider the latent graph information of the images and satisfy the structure preservation requirement. More importantly, this study provides the proof of convergence of the objective function in detail to ensure that the algorithm converges to the local minima during decomposition. Third, a multivariable GMM is embedded into the proposed MNGS to model the distribution of the sparse features in terms of the coefficient matrix of NMF. Consequently, images with sparse features belonging to the same Gaussian component are similar, so these images can be labelled with the same subcluster. Considering the complexity of images in repertories, it becomes natural for one to assign more components of GMM to label images. In general, the larger the component, the more accurate the indexing results using GMM. However, the computational cost is more expensive owing to the learning of parameters of GMM. To match the optimal number of components, inspired by the work of [29], finally, spectral clustering based on KL divergence is utilized to merge the GMM components and achieve the desired retrieval results. Specifically, by the eigendecomposition of the similarity matrix measured by KL divergence, similar GMM components can be grouped into several spectral components in a lower-dimensional spectral space. Thus, each sparse feature can be labelled by both a GMM component and a spectral component, which might lead to more accurate clustering results. With the label information of the query image using the mentioned statistical model, similar image retrieval can be effectively performed with the clustering results.

The rest of this paper is organized as follows. Section 2 introduces the related work, including the classical NMF and GMM. In Section 3, we describe the details of the proposed framework, followed by the proof of the convergence of this approach and the complexity analysis in Section 4. Retrieval experiments conducted on real-world image datasets are discussed in Section 5. Section 6 reports the concluding remarks and some suggestions for further research.

#### 2. Preliminaries

This section briefly reviews the classical NMF and GMM. The former is reasonably attractive owing to its intrinsic advantage of providing a low-dimensional description for nonnegative data. The latter shows its accuracy and effectiveness in most clustering tasks.

##### 2.1. Nonnegative Matrix Factorization

The NMF is commonly used to decompose a matrix into two nonnegative matrices under the condition that all its elements are nonnegative. Mathematically, given a nonnegative data matrix , NMF aims at finding two nonnegative matrices and such that the original data matrix can be well approximated bywhere is a new reduced dimension (inner dimension) and satisfies . In this way, each column of can be regarded as a sparse representation of the associated column vector in . There are many criteria to solve the factoring problem and evaluate the quality of the decomposition. Generally, the Euclidean is utilized to construct the objective function; for example,where denotes the Frobenius norm. This objective function is proved to be nonconvex and nonincreasing [16]. To minimize the above objective function, Lee and Seung [30] derived the multiplicative updates of the basic matrix and coefficient matrix , respectively, as follows:The above multiplicative updates can ensure convergence to a local optimal solution. Thus, the iteration stops when the objective function converges or the maximum number of iterations is reached.

##### 2.2. Gaussian Mixture Model

The classical Gaussian mixture model assumes that each observed data point is dependent on the label . The density function of GMM at each observation can be described bywhere , and the prior distribution indicates the probability of each component belonging to GMM, which satisfies the constraintsExpression (4) can be regarded as a linear combination of several Gaussian components. Each component is a Gaussian distribution that has its own covariance and mean , defined asTo optimize the parameter set , the log-likelihood function of (4) must be maximized by the expectation maximization (EM) algorithm [31], which is expressed as

#### 3. The Proposed Method

In this section, we introduce an image retrieval approach, called MNGS, which consists of four major parts: feature extraction, multiview constrained NMF, GMM-based spectral clustering, and similarity ranking. Figure 1 shows the overall framework of the proposed MNGS method.