Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 239589, 11 pages

http://dx.doi.org/10.1155/2015/239589

## Graph Regularized Nonnegative Matrix Factorization with Sparse Coding

School of Software, Dalian University of Technology, Dalian 116620, China

Received 13 January 2015; Revised 19 February 2015; Accepted 20 February 2015

Academic Editor: Nazrul Islam

Copyright © 2015 Chuang Lin and Meng Pang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In this paper, we propose a sparseness constraint NMF method, named graph regularized matrix factorization with sparse coding (GRNMF_SC). By combining manifold learning and sparse coding techniques together, GRNMF_SC can efficiently extract the basic vectors from the data space, which preserves the intrinsic manifold structure and also the local features of original data. The target function of our method is easy to propose, while the solving procedures are really nontrivial; in the paper we gave the detailed derivation of solving the target function and also a strict proof of its convergence, which is a key contribution of the paper. Compared with sparseness constrained NMF and GNMF algorithms, GRNMF_SC can learn much sparser representation of the data and can also preserve the geometrical structure of the data, which endow it with powerful discriminating ability. Furthermore, the GRNMF_SC is generalized as supervised and unsupervised models to meet different demands. Experimental results demonstrate encouraging results of GRNMF_SC on image recognition and clustering when comparing with the other state-of-the-art NMF methods.

#### 1. Introduction

Previous studies have shown that there is a psychological and physiological evidence for parts-based representation in the human brain [1–5]. NMF is such kind of parts-based matrix factorization methods, which can find out the local features from the original data in nonnegative sense. Indeed, the nonnegative constraint leads to a parts-based representation because it allows only additive combinations of components, not subtractive ones. Generally, NMF method is to find two nonnegative matrices whose product provides a good approximation to the original matrix, and, at the same time, it can also learn the parts of objects, which make them very important in some real applications, for example, in face recognition [6–9] and document clustering [10, 11] fields.

However, the standard NMF [5, 6] algorithm has several limitations, which has been discussed extensively. One of the notable limitations of standard NMF is that it does not always result in completely parts-based representations. Researchers tried to solve this problem by incorporating sparseness constraints [12–14]. These approaches extended the NMF framework to include an adjustable sparseness parameter to learn more localized representation. However, previous sparseness constraint NMF approaches paid main attention to the sparseness property, while ignoring preserving the intrinsic geometric structure of the original data, which is vital for classification and clustering.

Recent research shows that when the data is sampled from a probability distribution that resides in or nearby to a submanifold of the ambient space,* manifold learning* [15–19] can be used to preserve the intrinsic (geometrical) structure. In order to preserve the intrinsic structure of the original data, He and Cai proposed Graph Regularized NMF (GNMF) methods [20, 21], which incorporated local preserving projection (LPP) technique [22] to NMF framework.

The experimental results showed that GNMF achieved higher recognition rates and better clustering effect in some popular facial databases (e.g., ORL and YALE database) comparing with previous sparseness constraint NMF [20, 21]. It means for some datasets, which have apparent geometrical structure, that GNMF really works. While, GNMF still has disadvantage, it cannot ensure the sparseness of the factorization results, which limited the discriminative ability and also increased the computational expense and memory space. Hence, we are motivated to combine the advantage of manifold and sparseness constraint and propose the GRNMF_SC algorithm, which can not only preserve the geometrical structure, but also learn much sparser representations of the input data.* It needs to emphasize that it is nontrivial to solve the objective function which simultaneously incorporates the Laplacian regularization and sparseness constraint into the NMF framework, because the sophisticated L1-norm solving tools cannot be adopted directly*. We start from the initial idea of designing GNMF and incorporate the sparseness constraint smoothly. The concrete steps are first, construct a convex objection function by imposing the above constrains and then develop an optimization algorithm with multiplicative update rules to minimize this objective function. Finally, prove the algorithm can converge to a local minimum. Furthermore, we extend GRNMF_SC to both supervised (S-GRNMF_SC) and unsupervised versions (GRNMF_SC) for image recognition and clustering, respectively; in clustering, the class labels are not available. Experimental results demonstrate that supervised GRNMF_SC achieved higher recognition rates, especially in occluded face recognition, when comparing with the typical sparseness based NMF and manifold based NMF methods, and the unsupervised GRNMF_SC obtained better clustering performance comparing with the popular clustering algorithms.

The rest of the paper is organized as follows. In Section 2, a brief review of standard NMF and its typical sparse variants is given. In Section 3, the proposed GRNMF_SC method and a proof of its convergence are given. Experimental results on image recognition and clustering are presented in Section 4. We conclude the paper and plan the future work in Section 5.

#### 2. Reviews of Standard NMF and Its Sparse Variants

In this section, we briefly describe the standard NMF algorithm [5] and two typical sparseness constraint NMF algorithms [8, 13]; the reason of introducing the two sparseness constraint NMF algorithms is that our method is inspired by them. The introduction of GNMF is merged with GRNMF_SC in Section 3.

##### 2.1. Nonnegative Matrix Factorization (NMF)

First, standard NMF [5, 6] is introduced. Given a data matrix , each column of is an *-*dimensionalsample vector with nonnegative values. NMF aims to find two nonnegative matrices and whose product can well approximate the original matrix. That is, it is to minimize the following cost function: represents the Frobenius norm. Since the objective function is not convex in** U** and** V** together, we are not expected to find the global minimum of . Lee and Seung [5] presented the following iterative update rule:It was proven that the above two equations will find local minima of the objective function .

##### 2.2. Local Nonnegative Matrix Factorization (LNMF)

Inspired by the original NMF algorithm [5], Li et al. [8] introduced the local nonnegative matrix factorization (LNMF) algorithm which is intended for learning spatially localized, parts-based representation of visual patterns. The aim of their work was to obtain truly localized, parts-based components by imposing three additional constraints on classical NMF.

Taking the factorization problem defined in (1), let and , where . LNMF is aimed at learning local features by imposing the following three additional constraints on the NMF basis:(1)Maximum orthogonality of . This constraint imposes that different bases should be as orthogonal as possible so to minimize redundancy between different bases. This can be imposed by .(2)Maximum expressiveness of . This constraint reveals that only components giving most important information should be retained. This is imposed by .(3)Maximum sparseness in encoding matrix . It should contain as many zero elements as possible in coefficient matrix . In other words, the number of basis components required to represent data matrix is minimized. This can be imposed by , where is a basis vector.

The incorporation of the above constraints leads to the following constrained divergence as the objective function for LNMF:where are some constants. The solution to the above constrained minimization can be presented following multiplicative updating rules:

##### 2.3. Sparse Nonnegative Matrix Factorization (SNMF)

Similar to the LNMF algorithm, Liu et al. [13] incorporated linear sparse coding to NMF. The core idea of SNMF is adding sparseness constraints to encoding matrix when conducting matrix factorization. SNMF could learn parts-based representation via fully multiplicative updates because of it adapting a generalized Kullback-Leibler divergence instead of mean square error for approximation error. Thus, the sparse NMF functional is where .

SNMF ensured sparseness via minimizing the sum of all . The multiplicative update rules for matrix are

#### 3. Graph Regularized Nonnegative Matrix Factorization with Sparse Coding (GRNMF_SC)

As mentioned earlier, He and Cai [20, 21] proposed a GNMF algorithm. By utilizing the Laplacian regularization to NMF, GNMF can preserve the intrinsic structure of the data efficiently. GNMF can have more strong discriminative power than the classic NMF, when the dataset has apparent geometrical structure. However, since GNMF did not impose any sparse constraint to the basis matrix or encoding matrix , it cannot learn the sparse enough representation. In this part, we incorporate sparse coding into GNMF and propose a GRNMF_SC algorithm.

##### 3.1. Graph Regularized Nonnegative Matrix Factorization (GNMF)

GNMF first constructs an affinity graph to encode the geometrical information and then seeks a nonnegative matrix factorization which respects the graph structure. The procedure can be stated as follows.

*Step 1. *Consider a graph with vertices where each vertex corresponds to a data point. The edge weight matrix is defined as follows:where denotes the set of nearest neighbors of . Define , where is a diagonal matrix whose unit entries are column sums of , .

*Step 2. *Let be the function that maps the original data point onto the axis . GNMF then uses to measure the smoothness of the function along the geodesics in the intrinsic geometry of the data. When we consider the case that the data is a compact submanifold , then the discrete approximation of is computed as follows:By minimizing , we get a mapping function which is sufficiently smooth on the data manifold. An intuitive explanation of minimizing is that if two data points and are close, then and are similar to each other.

*Step 3. *Finally, GNMF incorporates the constraint and minimizes the new objective functionwith the constraint that and are nonnegative. denotes the trace of a matrix, and is a regularization parameter.

##### 3.2. GRNMF_SC

In order to improve the degree of sparseness of coefficient matrix () and preserve the intrinsic structure of the high dimensional data, we add an L1-norm regularization to the coefficient matrix. By this way, we expect that each sample in can be represented by a linear combination of only few basis vectors in , and thus the sparseness can be guaranteed. The new objective function is as follows:where and . For S-GRNMF_SC, we set , if or and , and otherwise. and denote the class labels of and . For unsupervised GRNMF_SC, the definition of is the same as GNMF’s in Section 3.1. Finally, the multiplicative updating rules for the above objective function can be represented as

GRNMF_SC has fully multiplicative update rules with two parameters. When setting , then GRNMF_SC reduces to NNSC [12]. When setting , GRNMF_SC reduces to GNMF. Also, we find that the updating rules of encoding Matrix could be rewritten as the following gradient decent format:In order to preserve the nonnegative property of coefficient matrix , we should control the parameter and to make positive as well as small. The proof of our optimization scheme of GRNMF_SC is given next.

The core idea of the proof is using auxiliary function technique as the EM algorithm and then taking turns updating basis matrix and coefficient matrix . We begin with the definition of the auxiliary function .

*Definition 1. * is an auxiliary function for if the following conditions are satisfied:

The reason why the auxiliary function is vital for proving is owing to the following theorem.

Theorem 2. *If is an auxiliary function of , then is nonincreasing under the update*

*Proof. *.

We start first to derive the multiplicative update steps of encoding matrix . The objective function of GRNMF_SC can be rewritten as

Considering any element in , we use and to denote the first-order derivative and second-order derivative of the objective function :

And then, we define the auxiliary function as

We need to prove and .

It is easy to prove . Next, to prove , we derive the Taylor series expansion of :The above holds because Finally we obtain In other words, .

According to Theorem 2, by taking derivative with respect to on (17) and setting the derived result to zero, the updating rule of can be expressed asSimilarly, we can also obtain the updating rule of with regularization:

*4. Experimental Results*

*The face recognition experiments were performed on two benchmarks, the ORL 48 × 48 database and YALE 32 × 32 database, to test the recognition rates of the proposed S-GRNMF_SC algorithm. The ORL contains 400 images, 10 different images per person for 40 individuals. For some individuals, the images were taken at different times. There are variations in facial expressions (open or closed eyes and smiling or nonsmiling) and facial details (glasses or no glasses). YALE database is more challenging than ORL, which contains 165 gray-scale images of 15 individuals. The images demonstrate variations in lighting condition (left-light, center-light, and right-light), facial expression (normal, happy, sad, sleepy, and surprised), and facial details (glasses or no glasses). The algorithms, NMF [5], LNMF [8], SNMF [13], GNMF [21], and FMD-NMF [23], are used for comparison. LNMF and SNMF are two classical sparseness based NMF; GNMF and FMD-NMF are two manifold based NMF; they are the ideal comparative targets to S-GRNMF_SC which merged the two kinds of merit. For all face recognition experiments, the nearest neighbor (NN) classifier was used, and the distance is measured by Euclidean metric.*

*The clustering experiment was performed on COIL20 image. COIL20 database contains gray scale images of 20 objects viewed from varying angles and each object has 72 images. Four popular clustering algorithms, -means, NMF [5], SNMF [13], and GNMF [21], were used as comparing algorithms. After the matrix factorization, we have a lower dimensional representation of each image. The clustering is then performed in this lower dimensional space. -means is considered as a baseline method which simply performs clustering in the original feature space.*

*4.1. Recognition Results on ORL and YALE Database*

*In this part, the face recognition experiment is carried out on the ORL and YALE database. First, to evaluate all methods’ ability to dealing with different s for ORL database, a random subset with is taken with labels to form the training set, respectively, and the corresponding remaining part with labels is taken to form the testing set. For YALE database, is taken to form the training set and the corresponding remaining part is taken to form the testing set. For each given , we averaged the results over 10 random splits. Tables 1 and 2 show the optimal average recognition rates obtained by NMF, LNMF, SNMF, GNMF, FMD-NMF, and S-GRNMF_SC with the same feature dimension in different Gms over 10 random splits.*