Mathematical Problems in Engineering

Volume 2008, Article ID 410674, 17 pages

http://dx.doi.org/10.1155/2008/410674

## Incremental Nonnegative Matrix Factorization for Face Recognition

^{1}College of Mathematics and Computational Science, Shenzhen University, Shenzhen 518060, China^{2}College of Computer Science, Chongqing University, Chongqing 400044, China^{3}School of Information Science & Technology, East China Normal University, Shanghai 200241, China

Received 25 May 2008; Accepted 5 June 2008

Academic Editor: Cristian Toma

Copyright © 2008 Wen-Sheng Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Nonnegative matrix factorization (NMF) is a promising approach for local feature extraction in face recognition tasks. However, there are two major drawbacks in almost all existing NMF-based methods. One shortcoming is that the computational cost is expensive for large matrix decomposition. The other is that it must conduct repetitive learning, when the training samples or classes are updated. To overcome these two limitations, this paper proposes a novel incremental nonnegative matrix factorization (INMF) for face representation and recognition. The proposed INMF approach is based on a novel constraint criterion and our previous block strategy. It thus has some good properties, such as low computational complexity, sparse coefficient matrix. Also, the coefficient column vectors between different classes are orthogonal. In particular, it can be applied to incremental learning. Two face databases, namely FERET and CMU PIE face databases, are selected for evaluation. Compared with PCA and some state-of-the-art NMF-based methods, our INMF approach gives the best performance.

#### 1. Introduction

Face recognition has been one of the most challenging problems in
computer science and information technology since 1990 [1, 2]. The approaches of
face recognition can be mainly categorized into two groups, namely geometric
feature-based and appearance-based [3]. The geometric features
are based on the short range phenomena of face images such as eyes, eyebrows,
nose, and mouth. The facial local features are learnt to form a face geometric
feature vector for face recognition. The appearance-based approach relies on
the global facial features, which generate an entire facial feature vector for
face classification. Nonnegative matrix factorization (NMF) [4, 5] belongs to geometric
feature-based category, while principle component analysis (PCA) [6] is based on
the whole facial features. Both NMF and PCA are unsupervised learning methods
for face recognition. The basic ideas of these two approaches are to find the
basis images using different criterions. All face images can be reconstructed by
the basis images. The basis images of PCA are called eigenfaces, which are the eigenvectors
corresponding to large eigenvalues of total scatter matrix. NMF aims to perform
nonnegative matrix decomposition on the training image matrix *V* such that , where *W* and *H* are the basis
image matrix and the coefficient matrix, respectively. The local image features
are learnt and contained in *W* as
column vectors. Follow the success of applying NMF in learning the parts of
objects [4], many researchers have conducted in-depth investigation on NMF and
different NMF-based approaches have been developed [7–19]. Li et al. proposed a local NMF method [7]
by adding some spatial constraints. Wild et al. [8] utilized spherical *K*-means clustering to produce a structured
initialization for NMF. Buciu and Pitas [9] presented a DNMF method for learning facial expressions in a supervised
manner. However, DNMF does not guarantee convergence to a stationary limit
point. Kotsia et al. [15] thus
presented a modified DNMF method using projected gradients. Some similar supervised
methods incorporated into NMF were developed to enhance the classification power
of NMF [11–13, 19]. Hoyer [10] added sparseness constraints to NMF to find
solutions with desired degrees of sparseness. Lin [16, 17] modified traditional NMF
updates using projected gradient method and discussed their convergences. Recently,
Zhang et al. [18] proposed a topology
structure preservation constraint in NMF to improve the NMF performance.

However, to the best of
our knowledge, almost all existing NMF-based approaches encounter two major problems,
namely time-consuming problem and incremental learning problem. In most cases,
the training image matrix *V* is very
large and it leads to expensive computational cost for NMF-based schemes. Also,
when the training samples or classes are updated, NMF must implement repetitive
learning. These drawbacks greatly restrict the practical applications of NMF-based
methods to face recognition. To avoid the above two problems, this paper, motivated
by our previous work on incremental learning [19], proposes a supervised
incremental NMF (INMF) approach under a novel constraint NMF criterion, which aims
to cluster within class samples tightly and augment the between-class distance simultaneously.
Our incremental strategy utilizes the supervised local features, which are
considered as the short-range phenomena of face images, for face
classifications. Two public available face databases, namely FERET and CMU PIE face
databases, are selected for evaluation. Experimental results show that our INMF
method outperforms PCA [6], NMF [4], and BNMF [19] approaches in both
nonincremental learning and incremental learning of face recognition.

The rest of this paper is organized as follows: Section 2 briefly reviews the related works. Theoretical analysis and INMF algorithm design are given in Section 3. Experimental results are reported in Section 4. Finally, Section 5 draws the conclusions.

#### 2. Related Work

This section briefly introduces PCA [6], NMF [4], and BNMF [19] methods. Details are as follows.

##### 2.1. PCA

Principal component analysis (PCA), also called eigenface method,
is a popular statistic appearance-based linear method for dimensionality
reduction in face recognition. The theory used in PCA is based on Karhunen-Loeve
transform. It performs the eigenvalue decomposition on the total scatter matrix *S _{t}* and then selects the large
principal components (eigenfaces) to account for most distributions. All face
images can be expressed by the linear combinations of these basis images
(eigenfaces). However, PCA is not able to exploit all of the feature classification
information and how to choose the principal component elements is still a
problem. Therefore, PCA cannot give satisfactory performance in pattern
recognition tasks.

##### 2.2. NMF

NMF aims to find
nonnegative matrices *W* and *H* such that where matrix *V* is also a nonnegative matrix generated
by total *n* training images. Each
column of *W* is called basis image,
while *H* is the coefficient matrix.
The basis number *r* is usually chosen
less than *n* for dimensionality
reduction. The divergence between *V* and *WH* is defined as

NMF (2.1) is equivalent to the following optimization problem:

The minimization problem (2.3) can be solved using the following iterative formulae, which converge to a local minimum:

##### 2.3. BNMF

The
basic idea of BNMF is to perform NMF on *c* small matrices (), namely where contains training images of the *i*th class, and *c* is the
number of classes. BNMF is yielded from (2.5) as follows: where ,
and is the total number of training images.

#### 3. Proposed INMF

To overcome the drawbacks of existing NMF-based methods, this section proposes a novel incremental NMF (INMF) approach, which is based on a new constraint NMF criterion and our previous block technique [19]. Details are discussed below.

##### 3.1. Constraint NMF Criterion

The objective of our INMF
is to impose supervised class information on NMF such that between-class
distances increase, while the within-class distances simultaneously decrease.
To this end, we define the within-class scatter matrix of the *i*th
coefficient matrix as where is the mean column vector of the *i*th
class. The within-classsamples of the *k*th
class will cluster tightly as becomes small.

Assume is an enlarging vector of ,
that is, with .
Then we have Inequality (3.2) implies that between-class distances are increased
as the mean vectors of classes in *H* are enlarged.

Based on above analysis, we define a constraint divergence
criterion function for the *k*th class
as follows: where parameters and .

Our entire INMF criterion function is then designed as below:

Based on criterion (3.4), the following constraint NMF (CNMF) update
rules (3.5)–(3.7) will be
derived in the next subsection. We can show that the iterative formulae (3.5)–(3.7) converge to a
local minimum as well: where and is the *i*th
entry of vector , .

So, our entire INMF is performed as follows: where

##### 3.2. Convergence of Proposed Constraint NMF

This subsection reports how to derive the iterative formulae (3.5)–(3.7) and discusses their convergences under constraint NMF criterion (3.3).

*Deffinition 3.1 (see [5]). * is called an auxiliary function for , if satisfies where are matrices with the same size.

Lemma 3.2 (see [5]). *If is an auxiliary function for , then is a nonincreasing function under the update
rule ** To obtain iterative rule (3.7)
and prove its convergence, one first constructs
an auxiliary function for F with fixed W.*

Theorem 3.3. *If is the value of criterion function (3.3) with fixed , then is an auxiliary function for , where *

*Proof. *It can be directly verified
that . So we just need show
the inequality . To this end, we will
use the convex function .
For all and , it holds that Substituting into the above inequality, we have Therefore, . This concludes the theorem immediately.

Obviously, the function is also an auxiliary function for the entire constraint
NMF criterion .
Lemma 3.2 indicates that
is nonincreasing under the update rule (3.11). Let and we have From the above equation, it directly induces the iterative formula (3.7), and
lemma 3.2 demonstrates that (3.7) converges to a local minimum. For update rule (3.5)-(3.6),
the proof is similar to that of update rule (3.7) using the following auxiliary
function with fixed *H*:

##### 3.3. Incremental Learning

From the above analysis, our incremental learning algorithm is designed as follows:

(i) *Sample incremental learning*. As a new training sample *x _{0}*
of the

*i*th class is added to training set, we denote that . Thus the training image matrix becomes

In this case, it only needs to perform CNMF on matrix , that is, . The rest decompositions such as need not implement repetitive computation. So, sample incremental learning can be performed as follows:

(ii) *Class incremental learning*. As a new
class, denoted by matrix , is added to the current
training set, it forms a new training image matrix as

The incremental learning settings are similar to the first item (i) that all decompositions need not compute again. We only need perform CNMF on the matrix , that is, . Hence, class incremental learning can be implemented as below:

##### 3.4. INMF Algorithm Design

Based on the above discussions, this subsection will give a detail design on our INMFalgorithm for face recognition. The algorithm involves two stages, namely training stage and testing stage. Details are as follows.

*Training stage**Step 1. *Perform CNMF (3.9) on matrices , namely, *Step 2. *INMF is obtained as where and

If there is a new
training sample or class added to current training set, then the incremental
learning algorithm presented in Section 3.4 is applied to this stage.

*Recognition stage**Step 3. *Calculate the
coordinates of a testing sample in the feature space by where *W ^{+}* is the Moore-Penrose inverse of

*W*.

*Step 4.*Compute the mean column vector of class

*i*and its coordinates vector . The testing image is classified to class

*k*, if , where denotes the Euclidean distance between vectors and

*h*.

_{i}##### 3.5. Sparseness of Coefficient Matrix

Let , define sparseness
function with *L ^{1}*
and

*L*norms [7] by It can be seen that sparseness function with range [0, 1].

^{2}For INMF method, we have the following theorem for each
column *h _{i}* of

*H*.

Theorem 3.4. *Sparseness of each column h_{i} of H in INMF
has the following estimation: *

*Proof. *Let where *h _{i}* belongs to class

*i*in

*H*.

Obviously, Moreover, So, we have It concludes for that

In the experimental section, the parameters are selected as and using INMF on FERET database. It can be calculated that

While on CMU PIE database, we select and and calculate that

These demonstrate that each column of *H* in INMF is highly sparse. Apparently, the coefficient column
vectors between different classes in *H* are automatically orthogonal.

##### 3.6. Computational Complexity

This section discusses
the computational complexity of our proposed INMF approach. The *i*th iterative procedure of proposed INMF
includes two parts, namely and . For each matrix the iteration for needs multiple times. While for , it needs multiple times. Therefore, the total running multiple
times of our INMF are Similar to
INMF, we can obtain the running multiple times of NMF approach as . It can be seen that the
computational complexity of our INMF method is greatly lower than that of NMF.

#### 4. Experimental Results

In this section, FERET and CMU PIE databases are selected to
evaluate the performance of our INMF method along with BNMF, NMF, and PCA
methods. All images in two databases are aligned by the centers of eyes and mouth
and then normalized with resolution . The original images with resolution
are reduced to wavelet feature face with resolution after
two-level D4 wavelet decomposition.
If there are negative pixels in the wavelet faces, we will transform them into
nonnegative faces with simple translations. The nearest neighbor classifier using Euclidean
distance is exploited here. In the following experiments, the parameters are set
to for NMF, for BNMF and INMF, for INMF. The stopping condition of iterative
update is where is the *n*th update criterion function defined in (3.3), the threshold is set to . We stop the iteration if
stopping condition (4.1) is met or if exceeding 1000 times iteration.

##### 4.1. Face Databases

In FERET database, we select 120 people, 6 images for each individual. The six images are extracted from 4 different sets, namely Fa, Fb, Fc, and duplicate. Fa and Fb are sets of images taken with the same camera at the same day but with different facial expressions. Fc is a set of images taken with different camera at the same day. Duplicate is a set of images taken around 6–12 months after the day taking the Fa and Fb photos. Details of the characteristics of each set can be found in [3]. Images from one individual are shown in Figure 1.

CMU PIE database includes totally 68 people. There are 13 pose variations ranging from full right-profile image to full left-profile image and 43 different lighting conditions, 21 flashes with ambient light on or off. In our experiment, for each person, we select 56 images including 13 poses with neutral expression and 43 different lighting conditions in frontal view. Part images of one person are shown in Figure 2.

##### 4.2. Basis Face Images

This section shows the basis images of the training set learnt by PCA, NMF, BNMF, and INMF approaches. Figure 3 shows 25 basis images of each approach on CMU PIE database. It can be seen that the bases of all methods are additive except for PCA. PCA extracts the holistic facial features. INMF learns more local features than NMF and BNMF. Moreover, the greater number of basis image is, the more localization is learnt in all NMF-based approaches.

##### 4.3. Results on FERET Database

This section reports the experimental results with nonincremental learning and incremental learning on FERET database. All methods use the same training and testing face images. The experiments are repeated 10 times; and the average accuracies under different training number, along with the mean running times, are recorded.

###### 4.3.1. Nonincremental Learning

We randomly select images from each person for training, while the rest of () images of each individual for testing. The average accuracies of training samples ranging from 2 to 5 are recorded in Table 1 and plotted in Figure 4(a). The recognition accuracies of INMF, BNMF, NMF, and PCA are 66.73%, 66.07%, 64.44%, and 34.33%, respectively, with 2 training images. The performance for each method is improved when the number of training images increases. When the number of training images is equal to 5, the recognition accuracies of INMF, BNMF, NMF, and PCA are 83.08%, 81.67%, 80.25%, and 37.58%, respectively. In addition, Table 2 gives the comparisons on average time-consuming in three NMF-based approaches. It can be seen that our INMF method gives the best performance for all cases of nonincremental learning on FERET database.

###### 4.3.2. Class Incremental Learning

For 119 people, we randomly select 3 images from each individual for training and then add a new class to the training set. NMF must conduct repeated learning while BNMF and INMF need merely perform incremental training on the new added class. The average accuracies and the mean running times are recorded in Table 3 (plotted in Figure 6(a)) and Table 4, respectively.

Compared with the NMF and BNMF approaches, the proposed method gives around 5% and 1.5% accuracy improvements, respectively. The running time of INMF is around 2 times and 219 times faster than that of NMF with 119 and 120 individuals for training and class-incremental learning, respectively. Above all, our INMF gives the best performance on FERET database.

##### 4.4. Results on CMU PIE Database

The experimental setting on CMU PIE database is similar to that of FERET database. It also includes two parts, namely nonincremental training and incremental learning. The experiments are repeated 10 times and the average accuracies under different training number, along with the mean running times, are recorded for comparisons. Details are as follows.

###### 4.4.1. Nonincremental Learning

For each individual, images are randomly selected for training, while the rest () images for testing. The average recognition rates and mean running times are tabulated in Table 5 (plotted in Figure 4(b)) and Table 6, respectively. It can be seen that the recognition accuracies of INMF, BNMF, NMF, and PCA are 68.91%, 68.58%, 66.21%, and 23.94%, respectively with training number 7. When the number of training images is equal to 28, the recognition accuracies of INMF, BNMF, NMF, and PCA are 77.18%, 76.64%, 71.77%, and 27.51%, respectively. Compared with the PCA and NMF methods, the proposed method gives around 49% and 5% accuracy improvements, respectively. The performance of INMF is slightly better than that of BNMF. However, the computational efficiency of INMF greatly outperforms BNMF.

###### 4.4.2. Sample Incremental Learning

We randomly select 7 images from each person for training, and the rest 49 images for testing. In the sample-incremental learning stage, 7, 14, and 21 images of the first individual are added to the training set, respectively, while the training images from the rest individuals are kept unchanged. Table 7 (Figure 5) and Table 8 show the average recognition accuracies and the mean running times, respectively. Experimental results show that our INMF method gives the best performance for all cases.

###### 4.4.3. Class Incremental Learning

For 67 people, we randomly select 7 images from each individual for training and then add a new class to the training set. NMF should conduct repetitive learning. BNMF and INMF need merely to perform incremental learning on the new added class. The average recognition rates and the mean running times are recorded in Table 9 (plotted in Figure 6(b)) and Table 10, respectively. Experimental results show that INMF outperforms BNMF and NMF in both recognition rates and computational efficiency.

#### 5. Conclusions

This paper proposed a novel constraint INMF method to address
the time-consuming problem and incremental learning problem of existing NMF-based
approaches for face recognition. INMF has some good properties, such as low
computational complexity; sparse coefficient matrix; orthogonal coefficient
column vectors between different classes in coefficient matrix *H*; especially for incremental learning,
and so on. Experimental results on FERET and CMU PIE face database show that INMF
outperforms PCA, NMF, and BNMF approaches in nonincremental learning and
incremental learning.

#### Acknowledgments

This work is supported by NSF of China (60573125, 60603028) and in part by the Program for New Century Excellent Talents of Educational Ministry of China (NCET-06-0762), NSF of Chongqing CSTC (CSTC2007BA2003). The authors would like to thank the US Army Research Laboratory for contribution of the FERET database and CMU for the CMU PIE database.

#### References

- R. Chellappa, C. L. Wilson, and S. Sirohey, “Human and machine recognition of faces: a survey,”
*Proceedings of the IEEE*, vol. 83, no. 5, pp. 705–741, 1995. View at Publisher · View at Google Scholar - W. Zhao, R. Chellappa, A. Rosenfeld, and J. Phillips, “Face recognition: a literature survey,” Tech. Rep. CFAR-TR00-948, University of Maryland, College Park, Md, USA, 2000. View at Google Scholar
- R. Brunelli and T. Poggio, “Face recognition: features versus templates,”
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, vol. 15, no. 10, pp. 1042–1052, 1993. View at Publisher · View at Google Scholar - D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,”
*Nature*, vol. 401, no. 6755, pp. 788–791, 1999. View at Publisher · View at Google Scholar - D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in
*Proceedings of the Advances in Neural Information Processing Systems (NIPS '01)*, vol. 13, pp. 556–562, Vancouver, Canada, December 2001. - M. Turk and A. Pentland, “Eigenfaces for recognition,”
*Journal of Cognitive Neuroscience*, vol. 3, no. 1, pp. 71–86, 1991. View at Publisher · View at Google Scholar - S. Z. Li, X. W. Hou, H. J. Zhang, and Q. S. Cheng, “Learning spatially localized, parts-based representation,” in
*Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '01)*, vol. 1, pp. 207–212, Kauai, Hawaii, USA, December 2001. View at Publisher · View at Google Scholar - S. Wild, J. Curry, and A. Dougherty, “Improving non-negative matrix factorizations through structured initialization,”
*Pattern Recognition*, vol. 37, no. 11, pp. 2217–2232, 2004. View at Publisher · View at Google Scholar - I. Buciu and I. Pitas, “A new sparse image representation algorithm applied to facial expression recognition,” in
*Proceedings of the 14th IEEE Workshop on Machine Learning for Signal Processing (MLSP '04)*, pp. 539–548, Sao Luis, Brazil, September-October 2004. View at Publisher · View at Google Scholar - P. O. Hoyer, “Non-negative matrix factorization with sparseness constraints,”
*Journal of Machine Learning Research*, vol. 5, pp. 1457–1469, 2004. View at Google Scholar · View at MathSciNet - Y. Xue, C. S. Tong, W.-S. Chen, and W. Zhang, “A modified non-negative matrix factorization algorithm for face recognition,” in
*Proceedings of the 18th International Conference on Pattern Recognition (ICPR '06)*, vol. 3, pp. 495–498, Hong Kong, August 2006. View at Publisher · View at Google Scholar - S. Zafeiriou, A. Tefas, I. Buciu, and I. Pitas, “Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification,”
*IEEE Transactions on Neural Networks*, vol. 17, no. 3, pp. 683–695, 2006. View at Publisher · View at Google Scholar - I. Buciu and I. Pitas, “NMF, LNMF, and DNMF modeling of neural receptive fields involved in human facial expression perception,”
*Journal of Visual Communication and Image Representation*, vol. 17, no. 5, pp. 958–969, 2006. View at Publisher · View at Google Scholar - A. Pascual-Montano, J. M. Carazo, K. Kochi, D. Lehmann, and R. D. Pascual-Marqui, “Nonsmooth nonnegative matrix factorization (nsNMF),”
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, vol. 28, no. 3, pp. 403–415, 2006. View at Publisher · View at Google Scholar - I. Kotsia, S. Zafeiriou, and I. Pitas, “A novel discriminant non-negative matrix factorization algorithm with applications to facial image characterization problems,”
*IEEE Transactions on Information Forensics and Security*, vol. 2, no. 3, pp. 588–595, 2007. View at Publisher · View at Google Scholar - C.-J. Lin, “Projected gradient methods for nonnegative matrix factorization,”
*Neural Computation*, vol. 19, no. 10, pp. 2756–2779, 2007. View at Publisher · View at Google Scholar · View at MathSciNet - C.-J. Lin, “On the convergence of multiplicative update algorithms for nonnegative matrix factorization,”
*IEEE Transactions on Neural Networks*, vol. 18, no. 6, pp. 1589–1596, 2007. View at Publisher · View at Google Scholar - T. Zhang, B. Fang, Y. Y. Tang, G. He, and J. Wen, “Topology preserving non-negative matrix factorization for face recognition,”
*IEEE Transactions on Image Processing*, vol. 17, no. 4, pp. 574–584, 2008. View at Publisher · View at Google Scholar - B. B. Pan, W. S. Chen, and C. Xu, “Incremental learning of face recognition based on block non-negative matrix factorization,” 2008 (Chinese), to appear in
*Computer Application Research*. View at Google Scholar