Direct Neighborhood Discriminant Analysis for Face Recognition

Cheng, Miao; Fang, Bin; Tang, Yuan Yan; Wen, Jing

doi:https://doi.org/10.1155/2008/825215

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusions Appendix Acknowledgments References Copyright Related Articles

Special Issue

Short Range Phenomena: Modeling, Computational Aspects, and Applications

View this Special Issue

Research Article | Open Access

Volume 2008 | Article ID 825215 | https://doi.org/10.1155/2008/825215

Direct Neighborhood Discriminant Analysis for Face Recognition

Miao Cheng,¹Bin Fang,¹Yuan Yan Tang,¹and Jing Wen¹

Academic Editor: Ming Li

Received21 Mar 2008

Accepted16 Apr 2008

Published01 Sept 2008

Abstract

Face recognition is a challenging problem in computer vision and pattern recognition. Recently, many local geometrical structure-based techiniques are presented to obtain the low-dimensional representation of face images with enhanced discriminatory power. However, these methods suffer from the small simple size (SSS) problem or the high computation complexity of high-dimensional data. To overcome these problems, we propose a novel local manifold structure learning method for face recognition, named direct neighborhood discriminant analysis (DNDA), which separates the nearby samples of interclass and preserves the local within-class geometry in two steps, respectively. In addition, the PCA preprocessing to reduce dimension to a large extent is not needed in DNDA avoiding loss of discriminative information. Experiments conducted on ORL, Yale, and UMIST face databases show the effectiveness of the proposed method.

1. Introduction

Many pattern recognition and data mining problems involve data in very high-dimensional spaces. In the past few decades, face recognition (FR) has become one of the most active topics in machine vision and pattern recognition, where the feature dimension of data usually can be very large and hardly handled directly. To get a high recognition rate for FR, numerous feature extraction and dimension reduction methods have been proposed to find the low-dimensional feature representation with enhanced discriminatory power. Among these methods, two state-of-the-art FR methods, principle component analysis (PCA) [1], and linear discriminant analysis (LDA) [2] have been proved to be useful tools for dimensionality reduction and feature extraction.

LDA is a popular supervised feature extraction technique for pattern recognition, which intends to find a set of projective direction to maximize the between-class scatter matrix S_b and minimize the within-class scatter matrix S_w simultaneously. Although successful in many cases, many LDA-based algorithms suffer from the so-called “small sample size” (SSS) problem that exists when the number of available samples is much smaller than the dimensionality of the samples, which is particularly problematic in FR applications. To solve this problem, many extensions of LDA have been developed in the past. Generally, these approaches to address SSS problem can be divided into three categories, namely, Fisherface method, Regularization methods, and Subspace methods. Fisherface methods incorporate a PCA step into the LDA framework as a preprocessing step. Then LDA is performed in the lower dimensional PCA subspace [2], where the within-class scatter matrix is no longer singular. Regularization methods [3, 4] add a scaled identity matrix to scatter matrix so that the perturbed scatter matrix becomes nonsingular. However, Chen etal. [5] have proved that the null space of S_w contains the most discriminate information, while the SSS problem takes place, and proposed the null space LDA (NLDA) method which only extracts the discriminant features present in the null space of the S_w. Later, Yu and Yang [6] utilized discriminatory information of both S_b and S_w, and proposed a direct-LDA (DLDA) method to solve SSS problem.

Recently, the motivation for finding the manifold structure in high-dimensionality data elevates the wide application of manifold learning in data mining and machine learning. Among these methods, Isomap [7], LLE [8], and Laplacian eigenmaps [9, 10] are representative techniques. Based on the locality preserving concept, some excellent local embedding analysis techniques are proposed to find the manifold structure based on local nearby data [11, 12]. However, these methods are designed to preserve the local geometrical structure of original high-dimensional data in the lower dimensional space rather than good discrimination ability. In order to get a better classification effect, some supervised learning techniques are proposed by incorporating the discriminant information into the locality preserve learning techniques [13–15]. Moreover, Yan etal. [15] explain the manifold learning techniques and the traditional dimensionality reduction methods as a unified framework that can be defined in a graph embedding way instead of a kernel view [16]. However, the SSS problem is still exists in the graph embedding-based discriminant techniques. To deal with such problem, PCA is usually performed to reduce dimension as a preprocessing step in such environment [11, 15].

In this paper, we present a two-stage feature extraction technique named direct neighborhood discriminant analysis (DNDA). Compared to other geometrical structure learning work, the PCA step is not needed to be done in our method. Thus, more discriminant information can be kept for FR purpose, and as a result improved performance is expected. The rest of the paper is structured as follows: we give a brief review of LDA and DLDA in Section 2. We then introduce in Section 3 the proposed method for dimensionality reduction and feature extraction in FR. The effectiveness of our method is evaluated in a set of FR experiments in Section 4. Finally, we give concluding remarks in Section 5.

2. Review of LDA and DLDA

2.1. LDA

LDA is a very popular technique for linear feature extraction and dimensionality reduction [2], which chooses the basis vectors of the transformed space as those directions of the original space to make the ratio of the between-class scatter and the within-class scatter are maximized. Formally, the goal of LDA is to seek the optimal orthogonal matrix w, such that maximizing the following quotient, the Fisher Criterion: where S_b is the between-class scatter matrix, S_w is the within-class scatter matrix, such that w can be formed by the set of generalized eigenvectors corresponding to following eigenanalysis problem:

When the inverse of S_w exists, the generalize vectors can be obtained by eigenvalue decomposition of . However, one usually confronts the difficulty that the within-class scatter matrix S_w is singular (SSS) in FR problem. The so-called PCA plus LDA approach [2] is a very popular technique which intends to overcome such circumstances.

2.2. DLDA

To take discriminant information of both S_b and S_w into account without conducting PCA, a direct LDA (DLDA) technique has been presented by Yu and Yang [6]. The basic idea behind the approach is that no significant information will be lost if the null space of S_b is discarded. Based on the assumption, it can be concluded that the optimal discriminant features exist in the range space of S_b.

Let multiclass classification be considered, given a data matrix where each column x_i represents a sample data. Suppose X is composed of c classes and total number of samples is denoted by for the ith class consists of N_i samples. Then, the between-class scatter matrix is defined as where are the class mean sample, and denotes the total mean sample. Similarly, the within-class scatter matrix is defined as where, In DLDA, eigenvalue decomposition is performed on the between-class matrix S_b, firstly. Suppose the rank of S_b is t, and let be a diagonal matrix with the t largest eigenvalue on the main diagonal in descending order, is the eigenvector matrix that consists of t corresponding eigenvectors. Then, dimensionality of data x is reduced by using the projection matrix from d to t, And eigenvalue decomposition is performed on the within-class scatter matrix of the projected samples, Let be the ascending order eigenvalue matrix of and be the corresponding eigenvector matrix. Therefore, the final transformation matrix is given by

To address the computation complexity problem of high dimensional data, the eigenanalysis method presented by Turk and Pentland [1] is applied in DLDA, which makes the eigenanalysis of scatter matrices be progressed in an efficient way. For the eigenvalue decomposition of any symmetry matrix A with the form of , we can consider the eigenvectors v_i of such that Premultiplying both sides by G, we have from which it can be concluded that the eigenvectors of A is Bv_i with the corresponding eigenvalue

3. Direct Neighborhood Discriminant Analysis

Instead of mining the statistical discriminant information, manifold learning techniques try to find out the local manifold structure of data. Derived from the locality preserving idea [10, 11], graph embedding framework-based techniques extract the local discriminant features for classification. For a general pattern classification problem, it is expected to find a linear transformation, such that the compactness for the samples that belong to the same class and the separation for the samples of the interclass should be enhanced in the transformed space. As an example, a simple multiclass classification problem is illustrated in Figure 1. Suppose there are two nearest inter- and intraclass neighbors searched for classification. The inter- and intracalss nearby data points of five data points A–E is shown in Figures 1(b) and 1(c), respectively. For data point A, it is optimal that the distance from its interclass neighbors should be maximized to alleviate their bad influence for classification. On the other hand, the distance between data point A and its intraclass neighbors should be minimized to make A be classified correctly.

(a)

(b)

(c)

Based on the consideration, two graphs, that is, the between-class graph G and the within-class graph are constructed to discover the local discriminant structure [13, 15]. For each data point x_i, its sets of inter- and intraclass neighbors are indicated by and , respectively. Then, the weight W_ij reflects the weight of the edge in the between-class graph G is defined as and similarly define within-class affinity weight as Let the transformation matrix be denoted by which transforms the original data x from high-dimensional space into a low-dimensional space by The separability of interclass samples in the transformed low-dimensional space can be defined as where is the trace of matrix, is the data matrix, and D^b is a diagonal matrix, of which entries are column (or row, since W^b is symmetric) sum of W^b, Similarly, the compactness of intraclass samples can be characterized as Here, D^w is a diagonal matrix of which entries are column (or row) sum of on the main diagonal, Then, the optimal transformation matrix P can be obtained by solving the following problem:

Here, S_c is always singular with small training sample set leading problem to get projective matrix P directly, thus previous local discriminant techniques still suffer from the curse of high dimensionality. Generally, PCA is usually performed to reduce dimension as a preprocessing step in such environment [15], however, possible discriminant information may be ignored. Inspired by DLDA, we can perform eigenanalysis on S_s and S_c successively to extract the complete local geometrical structure directly without PCA preprocessing. To alleviate the burden of computation, we reformulate S_s and S_c so that Turk’s eigenanalysis method can be employed. For each nonzero element of , we build an N dimensional interclass index vector of all zeroes except the ith and jth element is set to be 1 and , respectively: Suppose there are N_b nonzero elements in W^b, let be the interclass index matrix made up of N_b interclass index vectors. It can be easily obtained that which we prove in Appendix A. Therefore, S_s can be reformulated as As each column in H_s has only two nonzero elements 1 and , we can make the first row in H_s be a null row by adding all rows but the first to the first row. On the other hand, for each column in H_s, there is another column with contrary sign. Then, it is clear that where N_b is the number of nonzero elements in W^b. Due to the properties of matrix trace [17], we can get In many FR cases, the number of pixels in a facial image is much larger than the number of available samples, that is, It tells us that the rank of S_s is at most Similarly, S_c can also be reformulated as Here, is the intraclass index matrix consisting of all the N_w intraclass index vectors as columns, which is constructed according to the N_w nonzero elements in . Similar to S_s, the rank of S_c is up to Based on the modified formulation, the optimal transformation matrix P can be obtained as

As the null space of S_s contributes little to classification, it is feasible to remove such subspace by projecting S_s into its range space. We apply the eigenvalue decomposition to S_s and unitize it through Turk’s eigenanalysis method, while discarding those eigvectors whose corresponding eigvalues are zero, which do not take much power for discriminant analysis. Then, the discriminant information in S_c can be obtained by performing eigenanalysis on which is gotten by transforming S_c into the range subspace of S_s. This algorithm can be implemented by the pseudocode shown in Algorithm 1.

Input: Data matrix class label L
Output: Transformed matrix
1. Construct the between-class and the within-class affinity weight matrix , .
2. Construct the interclass and the intraclass index matrix H_s, H_c according to the nonzero elements
of , .
For the kth nonzero element of , the corresponding kth column in is
constructed as

3. Apply eigenvalue decomposition to S_s and keep the largest t nonzero eigenvalues
and corresponding eigenvectors after sorted in decreasing
order, where
4. Compute P_s as where is diagonal matrix with on the
main diagonal.
5. Perform eigenvalue decomposition on . Let be the
eigenvalue matrix of in ascending order and be the corresponding
eigenvector matrix. Calculate P_c as
6.

DNDA has a computational complexity of (N_b is the number of nonzero elements in W^b), as it preserves a similar procedure to DLDA (). Compared with Eigenface () and Fisherface (), DNDA is still more efficient for feature extraction in high dimensionality if

4. Experiments

In this section, we investigate the performance of the proposed DNDA method for face recognition. Three popular face databases, ORL, Yale, and UMIST are used in the experiments. To verify the performance of DNDA, each experiment is compared with classical approaches: Eigenface [1], Fisherface [2], DLDA [6], LPP [11], and MFA [15]. The three nearest-neighbor classifier with Euclidean distance metric is employed to find the image in the database with the best match.

4.1. ORL Database

In ORL database [18], there are 10 different images for each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling), and facial details (glasses/no glasses). All the images are taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). The original images have size of pixels with 256 gray levels; such one subject is illustrated in Figure 2(a).

(a)

(b)

(c)

The experiments are performed with different numbers of training samples. As there are 10 images for each subject, of them are randomly selected for training and the remaining are used for testing. For each n, we perform 20 times to choose randomly the training set and the average recognition rate is calculated. Figure 3 illustrates the plot of recognition rate versus the number of features used in the matching for Eigenface, Fisherface, DLDA, LPP, MFA, and DNDA. The best performance obtained by each method and the corresponding dimension of reduced space in the bracket are shown in Table 1.

(a)

(b)

(c)

4.2. Yale Database

The Yale Face Database [19] contains 165 grayscale images of 15 individuals. There are 11 images per subject, one per different lighting condition (left-light, center-light, right-light), facial expression (normal, happy, sad, sleepy, surprised, wink), and with/without glasses. Each images used in the experiments is pixels with 256 gray levels. The facial images of one individual are illustrated in Figure 2(b).

The experimental implementation is the same as before. For each individual, images are randomly selected for training and the rest are used for testing. For each given n, we average the results over experiments repeated 20 times independently. Figure 4 illustrates the plot of recognition rate versus the number of features used in the matching for Eigenface, Fisherface, DLDA, LPP, MFA, and DNDA. The best results obtained in the experiments and the corresponding reduced dimension for each method is shown in Table 2.

(a)

(b)

(c)

4.3. UMIST Database

The UMIST face database [20] consists of 564 images of 20 people. For simplicity, the Precropped version of the UMIST database is used in this experiment, where each subject covers a range of poses from profile to frontal views and a range of race/sex/appearance. The size of cropped image is pixels with 256 gray levels. The facial images of one subject with different views are illustrated in Figure 2(c).

For each individual, we chose 8 images of different views distributed uniformly in the range 0–90^° for training, and the rest are used for training. Figure 5 illustrates the plot of recognition rate versus the number of features used in the matching for Eigenface, Fisherface, DLDA, LPP, MFA, and DNDA. The best performance and the corresponding dimensionalities of the projected spaces for each method are shown in Table 3.

From the experiment results, it is very obvious that DNDA achieves higher accuracy than the other methods. This is probably due to the fact that DNDA is a two-stage local discriminant technique, different form LPP and MFA. Moreover, PCA is removed in DNDA preserving more discriminant information compared with others.

5. Conclusions

Inspired by DLDA, we propose in this paper a novel local discriminant feature extraction method called direct neighborhood discriminant analysis (DNDA). In order to avoid SSS problem, DNDA performs a two-stage eigenanalysis approach, which can be implemented efficiently by using Turk’s method. Compared with other methods, PCA preprocessing is left out in DNDA with the immunity from the SSS problem. Experiments on ORL, Yale, and UMIST face databases show the effectiveness and robustness of our proposed method for face recognition. To get a better classification result, the improvement and extension of DNDA are to be taken into account in our future work.

Appendix

A. Proof of

Given the graph weight matrix W with l nonzero elements, consider two matrices For each nonzero element in W, there is corresponding column in M and N with common location, respectively. Let be the index set of nonzero elements in W. For the kth nonzero element W_ij in W, the kth column of M, N is represented as Then, it is easy to get for , and where and denote the kth row of M and N, respectively. Therefore, we can get where is the Kronecker delta. Note that both matrix D and W are symmetry matrices, based on the above equations, it is easy to find out It is easy to check that , which completes the proof.

Acknowledgments

This work is supported in part by the Program for New Century Excellent Talents of Educational Ministry of China (NCET-06-0762) and Specialized Research Fund for the Doctoral Program of Higher Education (20060611009), and in part by Natural Science Foundations of Chongqing CSTC (CSTC2007BA2003 and CSTC2006BB2003), the National Natural Science Foundation of China under the Project Grant no. 60573125.

References

M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991.
View at: Publisher Site | Google Scholar
P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. fisherfaces: recognition using class specific linear projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711–720, 1997.
View at: Publisher Site | Google Scholar
J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “Regularized discriminant analysis for the small sample size problem in face recognition,” Pattern Recognition Letters, vol. 24, no. 16, pp. 3079–3087, 2003.
View at: Publisher Site | Google Scholar
W.-S. Chen, P. C. Yuen, and J. Huang, “A new regularized linear discriminant analysis method to solve small sample size problems,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 19, no. 7, pp. 917–935, 2005.
View at: Publisher Site | Google Scholar
L.-F. Chen, H.-Y. M. Liao, M.-T. Ko, J.-C. Lin, and G.-J. Yu, “New LDA-based face recognition system which can solve the small sample size problem,” Pattern Recognition, vol. 33, no. 10, pp. 1713–1726, 2000.
View at: Publisher Site | Google Scholar
H. Yu and J. Yang, “A direct LDA algorithm for high dimensional data-with application to face recognition,” Pattern Recognition, vol. 34, no. 10, pp. 2067–2070, 2001.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
View at: Publisher Site | Google Scholar
S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.
View at: Publisher Site | Google Scholar
M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,” in Advances in Neural Information Processing Systems 14 (NIPS '01), pp. 585–591, MIT Press, Cambridge, Mass, USA, 2002.
View at: Google Scholar
X. He and P. Niyogi, “Locality preserving projections,” in Advances in Neural Information Processing Systems 16 (NIPS '03), pp. 153–160, MIT Press, Cambridge, Mass, USA, 2004.
View at: Google Scholar
X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang, “Face recognition using Laplacian faces,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 328–340, 2005.
View at: Publisher Site | Google Scholar
X. He, D. Cai, S. Yan, and H.-J. Zhang, “Neighborhood preserving embedding,” in Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV '05), vol. 2, pp. 1208–1213, Beijing, China, October 2005.
View at: Publisher Site | Google Scholar
H.-T. Chen, H.-W. Chang, and T.-L. Liu, “Local discriminant embedding and its variants,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), vol. 2, pp. 846–853, San Diego, Calif, USA, June 2005.
View at: Publisher Site | Google Scholar
W. Zhang, X. Xue, H. Lu, and Y.-F. Guo, “Discriminant neighborhood embedding for classification,” Pattern Recognition, vol. 39, no. 11, pp. 2240–2243, 2006.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: a general framework for dimensionality reduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40–51, 2007.
View at: Publisher Site | Google Scholar
J. Ham, D. D. Lee, S. Mika, and B. Schölkopf, “A kernel view of the dimensionality reduction of manifolds,” in Proceedings of the 21th International Conference on Machine Learning (ICML '04), pp. 369–376, Banff, Alberta, Canada, July 2004.
View at: Google Scholar
G. H. Golub and C. F. van Loan, Matrix Computations, Johns Hopkins Studies in the Mathematical Sciences, Johns Hopkins University Press, Baltimore, Md, USA, 3rd edition, 1996.
View at: Zentralblatt MATH | MathSciNet
Olivetti & Oracle Research Laboratory, The Olivetti & Oracle Research Laboratory Face Database of Faces, http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.
Yale Face Database, 2002, http://cvc.yale.edu/projects/yalefaces/yalefaces.html.
D. B. Graham and N. M. Allinson, “Characterizing virtual eigensignatures for general purpose face recognition,” in Face Recognition: From Theory to Applications, H. Wechsler, P. J. Phillips, V. Bruce, F. Fogelman-Soulie, and T. S. Huang, Eds., vol. 163 of NATO ASI Series F, Computer and Systems Sciences, pp. 446–456, Springer, Berlin, Germany, 1998.
View at: Google Scholar

Copyright

Copyright © 2008 Miao Cheng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1174

Downloads

1061

Citations