Abstract

In this paper, we propose a sparseness constraint NMF method, named graph regularized matrix factorization with sparse coding (GRNMF_SC). By combining manifold learning and sparse coding techniques together, GRNMF_SC can efficiently extract the basic vectors from the data space, which preserves the intrinsic manifold structure and also the local features of original data. The target function of our method is easy to propose, while the solving procedures are really nontrivial; in the paper we gave the detailed derivation of solving the target function and also a strict proof of its convergence, which is a key contribution of the paper. Compared with sparseness constrained NMF and GNMF algorithms, GRNMF_SC can learn much sparser representation of the data and can also preserve the geometrical structure of the data, which endow it with powerful discriminating ability. Furthermore, the GRNMF_SC is generalized as supervised and unsupervised models to meet different demands. Experimental results demonstrate encouraging results of GRNMF_SC on image recognition and clustering when comparing with the other state-of-the-art NMF methods.

1. Introduction

Previous studies have shown that there is a psychological and physiological evidence for parts-based representation in the human brain [15]. NMF is such kind of parts-based matrix factorization methods, which can find out the local features from the original data in nonnegative sense. Indeed, the nonnegative constraint leads to a parts-based representation because it allows only additive combinations of components, not subtractive ones. Generally, NMF method is to find two nonnegative matrices whose product provides a good approximation to the original matrix, and, at the same time, it can also learn the parts of objects, which make them very important in some real applications, for example, in face recognition [69] and document clustering [10, 11] fields.

However, the standard NMF [5, 6] algorithm has several limitations, which has been discussed extensively. One of the notable limitations of standard NMF is that it does not always result in completely parts-based representations. Researchers tried to solve this problem by incorporating sparseness constraints [1214]. These approaches extended the NMF framework to include an adjustable sparseness parameter to learn more localized representation. However, previous sparseness constraint NMF approaches paid main attention to the sparseness property, while ignoring preserving the intrinsic geometric structure of the original data, which is vital for classification and clustering.

Recent research shows that when the data is sampled from a probability distribution that resides in or nearby to a submanifold of the ambient space, manifold learning [1519] can be used to preserve the intrinsic (geometrical) structure. In order to preserve the intrinsic structure of the original data, He and Cai proposed Graph Regularized NMF (GNMF) methods [20, 21], which incorporated local preserving projection (LPP) technique [22] to NMF framework.

The experimental results showed that GNMF achieved higher recognition rates and better clustering effect in some popular facial databases (e.g., ORL and YALE database) comparing with previous sparseness constraint NMF [20, 21]. It means for some datasets, which have apparent geometrical structure, that GNMF really works. While, GNMF still has disadvantage, it cannot ensure the sparseness of the factorization results, which limited the discriminative ability and also increased the computational expense and memory space. Hence, we are motivated to combine the advantage of manifold and sparseness constraint and propose the GRNMF_SC algorithm, which can not only preserve the geometrical structure, but also learn much sparser representations of the input data. It needs to emphasize that it is nontrivial to solve the objective function which simultaneously incorporates the Laplacian regularization and sparseness constraint into the NMF framework, because the sophisticated L1-norm solving tools cannot be adopted directly. We start from the initial idea of designing GNMF and incorporate the sparseness constraint smoothly. The concrete steps are first, construct a convex objection function by imposing the above constrains and then develop an optimization algorithm with multiplicative update rules to minimize this objective function. Finally, prove the algorithm can converge to a local minimum. Furthermore, we extend GRNMF_SC to both supervised (S-GRNMF_SC) and unsupervised versions (GRNMF_SC) for image recognition and clustering, respectively; in clustering, the class labels are not available. Experimental results demonstrate that supervised GRNMF_SC achieved higher recognition rates, especially in occluded face recognition, when comparing with the typical sparseness based NMF and manifold based NMF methods, and the unsupervised GRNMF_SC obtained better clustering performance comparing with the popular clustering algorithms.

The rest of the paper is organized as follows. In Section 2, a brief review of standard NMF and its typical sparse variants is given. In Section 3, the proposed GRNMF_SC method and a proof of its convergence are given. Experimental results on image recognition and clustering are presented in Section 4. We conclude the paper and plan the future work in Section 5.

2. Reviews of Standard NMF and Its Sparse Variants

In this section, we briefly describe the standard NMF algorithm [5] and two typical sparseness constraint NMF algorithms [8, 13]; the reason of introducing the two sparseness constraint NMF algorithms is that our method is inspired by them. The introduction of GNMF is merged with GRNMF_SC in Section 3.

2.1. Nonnegative Matrix Factorization (NMF)

First, standard NMF [5, 6] is introduced. Given a data matrix , each column of is an -dimensionalsample vector with nonnegative values. NMF aims to find two nonnegative matrices and whose product can well approximate the original matrix. That is, it is to minimize the following cost function: represents the Frobenius norm. Since the objective function is not convex in U and V together, we are not expected to find the global minimum of . Lee and Seung [5] presented the following iterative update rule:It was proven that the above two equations will find local minima of the objective function .

2.2. Local Nonnegative Matrix Factorization (LNMF)

Inspired by the original NMF algorithm [5], Li et al. [8] introduced the local nonnegative matrix factorization (LNMF) algorithm which is intended for learning spatially localized, parts-based representation of visual patterns. The aim of their work was to obtain truly localized, parts-based components by imposing three additional constraints on classical NMF.

Taking the factorization problem defined in (1), let and , where . LNMF is aimed at learning local features by imposing the following three additional constraints on the NMF basis:(1)Maximum orthogonality of . This constraint imposes that different bases should be as orthogonal as possible so to minimize redundancy between different bases. This can be imposed by .(2)Maximum expressiveness of . This constraint reveals that only components giving most important information should be retained. This is imposed by .(3)Maximum sparseness in encoding matrix . It should contain as many zero elements as possible in coefficient matrix . In other words, the number of basis components required to represent data matrix is minimized. This can be imposed by , where is a basis vector.

The incorporation of the above constraints leads to the following constrained divergence as the objective function for LNMF:where are some constants. The solution to the above constrained minimization can be presented following multiplicative updating rules:

2.3. Sparse Nonnegative Matrix Factorization (SNMF)

Similar to the LNMF algorithm, Liu et al. [13] incorporated linear sparse coding to NMF. The core idea of SNMF is adding sparseness constraints to encoding matrix when conducting matrix factorization. SNMF could learn parts-based representation via fully multiplicative updates because of it adapting a generalized Kullback-Leibler divergence instead of mean square error for approximation error. Thus, the sparse NMF functional is where .

SNMF ensured sparseness via minimizing the sum of all . The multiplicative update rules for matrix are

3. Graph Regularized Nonnegative Matrix Factorization with Sparse Coding (GRNMF_SC)

As mentioned earlier, He and Cai [20, 21] proposed a GNMF algorithm. By utilizing the Laplacian regularization to NMF, GNMF can preserve the intrinsic structure of the data efficiently. GNMF can have more strong discriminative power than the classic NMF, when the dataset has apparent geometrical structure. However, since GNMF did not impose any sparse constraint to the basis matrix or encoding matrix , it cannot learn the sparse enough representation. In this part, we incorporate sparse coding into GNMF and propose a GRNMF_SC algorithm.

3.1. Graph Regularized Nonnegative Matrix Factorization (GNMF)

GNMF first constructs an affinity graph to encode the geometrical information and then seeks a nonnegative matrix factorization which respects the graph structure. The procedure can be stated as follows.

Step 1. Consider a graph with vertices where each vertex corresponds to a data point. The edge weight matrix is defined as follows:where denotes the set of nearest neighbors of . Define , where is a diagonal matrix whose unit entries are column sums of , .

Step 2. Let be the function that maps the original data point onto the axis . GNMF then uses to measure the smoothness of the function along the geodesics in the intrinsic geometry of the data. When we consider the case that the data is a compact submanifold , then the discrete approximation of is computed as follows:By minimizing , we get a mapping function which is sufficiently smooth on the data manifold. An intuitive explanation of minimizing is that if two data points and are close, then and are similar to each other.

Step 3. Finally, GNMF incorporates the constraint and minimizes the new objective functionwith the constraint that and are nonnegative. denotes the trace of a matrix, and is a regularization parameter.

3.2. GRNMF_SC

In order to improve the degree of sparseness of coefficient matrix () and preserve the intrinsic structure of the high dimensional data, we add an L1-norm regularization to the coefficient matrix. By this way, we expect that each sample in can be represented by a linear combination of only few basis vectors in , and thus the sparseness can be guaranteed. The new objective function is as follows:where and . For S-GRNMF_SC, we set , if or and , and otherwise. and denote the class labels of and . For unsupervised GRNMF_SC, the definition of is the same as GNMF’s in Section 3.1. Finally, the multiplicative updating rules for the above objective function can be represented as

GRNMF_SC has fully multiplicative update rules with two parameters. When setting , then GRNMF_SC reduces to NNSC [12]. When setting , GRNMF_SC reduces to GNMF. Also, we find that the updating rules of encoding Matrix could be rewritten as the following gradient decent format:In order to preserve the nonnegative property of coefficient matrix , we should control the parameter and to make positive as well as small. The proof of our optimization scheme of GRNMF_SC is given next.

The core idea of the proof is using auxiliary function technique as the EM algorithm and then taking turns updating basis matrix and coefficient matrix . We begin with the definition of the auxiliary function .

Definition 1. is an auxiliary function for if the following conditions are satisfied:
The reason why the auxiliary function is vital for proving is owing to the following theorem.

Theorem 2. If is an auxiliary function of , then is nonincreasing under the update

Proof. .
We start first to derive the multiplicative update steps of encoding matrix . The objective function of GRNMF_SC can be rewritten as
Considering any element in , we use and to denote the first-order derivative and second-order derivative of the objective function :
And then, we define the auxiliary function as
We need to prove and .
It is easy to prove . Next, to prove , we derive the Taylor series expansion of :The above holds because Finally we obtain In other words, .
According to Theorem 2, by taking derivative with respect to on (17) and setting the derived result to zero, the updating rule of can be expressed asSimilarly, we can also obtain the updating rule of with regularization:

4. Experimental Results

The face recognition experiments were performed on two benchmarks, the ORL 48 × 48 database and YALE 32 × 32 database, to test the recognition rates of the proposed S-GRNMF_SC algorithm. The ORL contains 400 images, 10 different images per person for 40 individuals. For some individuals, the images were taken at different times. There are variations in facial expressions (open or closed eyes and smiling or nonsmiling) and facial details (glasses or no glasses). YALE database is more challenging than ORL, which contains 165 gray-scale images of 15 individuals. The images demonstrate variations in lighting condition (left-light, center-light, and right-light), facial expression (normal, happy, sad, sleepy, and surprised), and facial details (glasses or no glasses). The algorithms, NMF [5], LNMF [8], SNMF [13], GNMF [21], and FMD-NMF [23], are used for comparison. LNMF and SNMF are two classical sparseness based NMF; GNMF and FMD-NMF are two manifold based NMF; they are the ideal comparative targets to S-GRNMF_SC which merged the two kinds of merit. For all face recognition experiments, the nearest neighbor (NN) classifier was used, and the distance is measured by Euclidean metric.

The clustering experiment was performed on COIL20 image. COIL20 database contains gray scale images of 20 objects viewed from varying angles and each object has 72 images. Four popular clustering algorithms, -means, NMF [5], SNMF [13], and GNMF [21], were used as comparing algorithms. After the matrix factorization, we have a lower dimensional representation of each image. The clustering is then performed in this lower dimensional space. -means is considered as a baseline method which simply performs clustering in the original feature space.

4.1. Recognition Results on ORL and YALE Database

In this part, the face recognition experiment is carried out on the ORL and YALE database. First, to evaluate all methods’ ability to dealing with different s for ORL database, a random subset with is taken with labels to form the training set, respectively, and the corresponding remaining part with labels is taken to form the testing set. For YALE database, is taken to form the training set and the corresponding remaining part is taken to form the testing set. For each given , we averaged the results over 10 random splits. Tables 1 and 2 show the optimal average recognition rates obtained by NMF, LNMF, SNMF, GNMF, FMD-NMF, and S-GRNMF_SC with the same feature dimension in different Gms over 10 random splits.

Figure 1 shows the average recognition rates versus feature dimensions of all the competing algorithms. Note that the optimal average recognition rates are obtained over the whole feature dimensions. The feature dimension is chosen from 0 to 100 with intervals of 10 with different s for ORL database and for YALE database.

From Tables 1 and 2 and Figure 1, it is shown that S-GRNMF_SC algorithm increases about 2% in average recognition rates compared with FMD-NMF algorithm and delivers nearly 4% of improvement recognition rates compared to GNMF and SNMF. In addition, FMD-NMF obtains the next best results, for the reason that FMD-NMF also utilizes the class label information and preserves the manifold structure like S-GRNMF_SC. GNMF and SNMF methods perform comparatively close to FMD-NMF. NMF is slightly worse than SNMF, while LNMF algorithm performs the worst; the reason is LNMF pays main attention to keep the bases orthogonal while ignoring the performance of coefficients; this leads to poor classification performance.

4.2. Learning Basis Images from the ORL and YALE Database

In this subsection, we use NMF, LNMF, GNMF, FMD-NMF, and the proposed S-GRNMF_SC algorithms to learn 25 and 49 basis images from the ORL and YALE database, respectively. Then we use the sparseness metric (SP) [24] to measure the sparseness of basis matrix as well as coefficients matrix. The sparseness measure, which is based on the relationship between the L1 norm and the L2 norm, is formularized aswhere is a column vector of matrix . If all elements of are equal, is equal to zero; if only contains a single nonzero element, is equal to unity. The basis images are shown in Figure 2. The sparseness of basis matrix and coefficient matrix is shown in Tables 3 and 4.

From the results, it is clear that the bases obtained by NMF and GNMF are additive, but not spatially localized for facial representations. In contrast, S-GRNMF_SC and FMD-NMF have the following advantages and improvements when compared with them. The bases learnt by S-GRNMF_SC and FMD-NMF are better than NMF and GNMF, since these bases not only reveal the additive property but also capture the discriminant information and preserve the manifold structure. In addition, though LNMF can obtain the spatially localized bases, it does not take the intrinsic structures of the data into account, which is also important in the classification task.

4.3. Face Reconstruction on the ORL Database

In this subsection, the reconstruction experiments of NMF, LNMF, GNMF, FMD-NMF, and S-GRNMF_SC algorithms are performed on the ORL database. We selected 4 different types, that is, men’s frontal view with no facial expression (MFV_NFE), men’s frontal view with wearing glasses (MFV_WG), men’s lateral view with smiling facial expression (MLV_SFE), and women’s frontal view with no facial expression (WFV_NFE), from images in the ORL database. The reconstructed images using these algorithms compared with the original ones are presented in Figure 3, where the leftmost are the original images, and the images to the right are reconstructed images obtained by NMF, LNMF, GNMF, GRNMF_SC, and FMD-NMF, respectively. It is evident that the reconstruction quality of GRNMF_SC is better than the others’. This observation is validated by computing all methods’ reconstruction residual error through pixel gray value difference between the original and reconstructed images. The residual error is represented by the norm of the residual matrix. The results are shown in Table 5. The reconstruction residual error of GRNMF_SC is the smallest in all cases except the first type. From these results, it is clear that the reconstruction quality by GRNMF_SC is robust to gender and different facial expressions. GNMF and FMD-NMF are next to GRNMF_SC. LNMF scheme performs the worst though it provides smoother reconstructions than other methods.

4.4. Occluded Face Recognition on ORL Database

NMF and its variants are parts-based representation algorithms; they can achieve better recognition results than other traditional appearance-based methods (e.g., LPP and LSDA) in the occluded face database. S-GRNMF_SC performs better than other traditional NMF algorithms. In order to highlight the superior advantage of S-GRNMF_SC against other NMF algorithms in the occluded face database, we give the experimental results in what follows. The size of sheltering patch is from 10 × 10, 15 × 15, 20 × 20, and 25 × 25 pixels, respectively, and the position is randomly selected. Figure 4 shows the occluded face images.

Figure 5 shows the recognition accuracies versus different feature dimensions with occluding patch size of 10 × 10, 15 × 15, 20 × 20, and 25 × 25 pixels on the ORL database, respectively. Table 6 presents optimal average recognition rates on occluded ORL database with different size of sheltering patch. From Figure 5 and Table 6, we can see that S-GRNMF_SC performs much better than other algorithms; there is about 5% to 20% improvement. The significant improvements are owing to the sparseness and manifold constraints in S-GRNMF_SC. Specifically, S-GRNMF_SC can obtain sparser representation than FMD-NMF; furthermore S-GRNMF_SC maintains more local information than GNMF, and more discriminant and geometrical information than standard NMF, LNMF, and SNMF. The occluding experiment strongly supports the necessity of imposing sparseness and manifold constraint simultaneously.

4.5. Clustering Experiment on COIL20 Database

In this subsection, we conducted clustering experiment on COIL20 image library. In order to randomize the experiments, we evaluate the clustering performance with different number of clusters (). For each given cluster number (except 20), 10 tests were conducted on different randomly chosen classes; then the average performance as well as the standard deviation was computed over these 10 tests. About the parameter configuration, we report the four matrix factorization based methods (NMF, SNMF, GNMF, and GRNMF_SC) with the number of basis vectors equal to the number of clusters. There are three parameters in GRNMF_SC algorithm: the number of nearest neighbors in Graph and the regularization parameters and . We empirically set , , and .

The clustering result is evaluated by comparing the obtained label of each sample with the label provided by the dataset. Two metrics, the accuracy (AC) and the normalized mutual information metric (NMI), are used to measure the clustering performance. Please see [10, 21] for detailed definitions of these two metrics. Table 7 shows the clustering results on the COIL20; the mean and standard error of the performance are reported in the table.

As shown in Table 7, our GRNMF_SC always results in the best performances in all the cases. GRNMF_SC aims to enable the learned basis to preserve the intrinsic manifold structure of original data and meanwhile ensure the sparseness of new representations under the basis. The above properties guarantee each image data point can be represented by linear combination of only few key basis vectors, which makes our GRNMF_SC particularly suitable for image clustering. Consequently, by simply using -means on the low-dimensional sparse representation, GRNMF_SC achieves very impressive clustering performance. In addition, GNMF gets better clustering results compared to SNMF and NMF, while -means performs the worst among the involved methods.

5. Conclusions and Future Work

In this paper, GRNMF_SC is proposed, which combines the advantages of manifold learning and sparse coding. GRNMF_SC explicitly adds the sparseness constraint on the coefficient matrix , which naturally leads to a sparse representation. The target function of our method is easy to propose, while the solving procedures are really nontrivial; in the paper we gave the detailed derivation of solving the target function and also a strict proof of its convergence, which is a key contribution of the paper. We implement GRNMF_SC in both supervised (S-GRNMF_SC) and unsupervised versions. S-GRNMF_SC increases the recognition rates when comparing with standard NMF, SNMF, LNMF, GNMF, and FMD-NMF, especially in occluded face recognition. Unsupervised GRNMF_SC also improves the clustering performance compared to recent popular clustering algorithms.

It should be noted that, because the proposed GRNMF_SC method has a fully multiplicative update rules and two parameters, one should carefully select the two parameters to strike a balance between the weight of sparseness and discrimination according to the demands of real applications.

The future work may continuously investigate different NMF variants and find out the general rules. One also needs to use the advanced NMF method [25] to solve the problems in other real applications. For example, in [26], researchers successfully used the NMF algorithm solving the problem in neurorehabilitation engineering. Although NMF algorithm has emerged for several years, it is still a very important researching direction in the next few years.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

Chuang Lin acknowledges the financial support of Natural Science Foundation of China (no. 61272371) and Fundamental Research Funds for the Central Universities, Dalian University of Technology, China (no. DUT14QY16).