Abstract

Face recognition is a challenging problem in computer vision and pattern recognition. Recently, many local geometrical structure-based techiniques are presented to obtain the low-dimensional representation of face images with enhanced discriminatory power. However, these methods suffer from the small simple size (SSS) problem or the high computation complexity of high-dimensional data. To overcome these problems, we propose a novel local manifold structure learning method for face recognition, named direct neighborhood discriminant analysis (DNDA), which separates the nearby samples of interclass and preserves the local within-class geometry in two steps, respectively. In addition, the PCA preprocessing to reduce dimension to a large extent is not needed in DNDA avoiding loss of discriminative information. Experiments conducted on ORL, Yale, and UMIST face databases show the effectiveness of the proposed method.

1. Introduction

Many pattern recognition and data mining problems involve data in very high-dimensional spaces. In the past few decades, face recognition (FR) has become one of the most active topics in machine vision and pattern recognition, where the feature dimension of data usually can be very large and hardly handled directly. To get a high recognition rate for FR, numerous feature extraction and dimension reduction methods have been proposed to find the low-dimensional feature representation with enhanced discriminatory power. Among these methods, two state-of-the-art FR methods, principle component analysis (PCA) [1], and linear discriminant analysis (LDA) [2] have been proved to be useful tools for dimensionality reduction and feature extraction.

LDA is a popular supervised feature extraction technique for pattern recognition, which intends to find a set of projective direction to maximize the between-class scatter matrix Sb and minimize the within-class scatter matrix Sw simultaneously. Although successful in many cases, many LDA-based algorithms suffer from the so-called “small sample size” (SSS) problem that exists when the number of available samples is much smaller than the dimensionality of the samples, which is particularly problematic in FR applications. To solve this problem, many extensions of LDA have been developed in the past. Generally, these approaches to address SSS problem can be divided into three categories, namely, Fisherface method, Regularization methods, and Subspace methods. Fisherface methods incorporate a PCA step into the LDA framework as a preprocessing step. Then LDA is performed in the lower dimensional PCA subspace [2], where the within-class scatter matrix is no longer singular. Regularization methods [3, 4] add a scaled identity matrix to scatter matrix so that the perturbed scatter matrix becomes nonsingular. However, Chen etal. [5] have proved that the null space of Sw contains the most discriminate information, while the SSS problem takes place, and proposed the null space LDA (NLDA) method which only extracts the discriminant features present in the null space of the Sw. Later, Yu and Yang [6] utilized discriminatory information of both Sb and Sw, and proposed a direct-LDA (DLDA) method to solve SSS problem.

Recently, the motivation for finding the manifold structure in high-dimensionality data elevates the wide application of manifold learning in data mining and machine learning. Among these methods, Isomap [7], LLE [8], and Laplacian eigenmaps [9, 10] are representative techniques. Based on the locality preserving concept, some excellent local embedding analysis techniques are proposed to find the manifold structure based on local nearby data [11, 12]. However, these methods are designed to preserve the local geometrical structure of original high-dimensional data in the lower dimensional space rather than good discrimination ability. In order to get a better classification effect, some supervised learning techniques are proposed by incorporating the discriminant information into the locality preserve learning techniques [1315]. Moreover, Yan etal. [15] explain the manifold learning techniques and the traditional dimensionality reduction methods as a unified framework that can be defined in a graph embedding way instead of a kernel view [16]. However, the SSS problem is still exists in the graph embedding-based discriminant techniques. To deal with such problem, PCA is usually performed to reduce dimension as a preprocessing step in such environment [11, 15].

In this paper, we present a two-stage feature extraction technique named direct neighborhood discriminant analysis (DNDA). Compared to other geometrical structure learning work, the PCA step is not needed to be done in our method. Thus, more discriminant information can be kept for FR purpose, and as a result improved performance is expected. The rest of the paper is structured as follows: we give a brief review of LDA and DLDA in Section 2. We then introduce in Section 3 the proposed method for dimensionality reduction and feature extraction in FR. The effectiveness of our method is evaluated in a set of FR experiments in Section 4. Finally, we give concluding remarks in Section 5.

2. Review of LDA and DLDA

2.1. LDA

LDA is a very popular technique for linear feature extraction and dimensionality reduction [2], which chooses the basis vectors of the transformed space as those directions of the original space to make the ratio of the between-class scatter and the within-class scatter are maximized. Formally, the goal of LDA is to seek the optimal orthogonal matrix w, such that maximizing the following quotient, the Fisher Criterion: where Sb is the between-class scatter matrix, Sw is the within-class scatter matrix, such that w can be formed by the set of generalized eigenvectors corresponding to following eigenanalysis problem:

When the inverse of Sw exists, the generalize vectors can be obtained by eigenvalue decomposition of . However, one usually confronts the difficulty that the within-class scatter matrix Sw is singular (SSS) in FR problem. The so-called PCA plus LDA approach [2] is a very popular technique which intends to overcome such circumstances.

2.2. DLDA

To take discriminant information of both Sb and Sw into account without conducting PCA, a direct LDA (DLDA) technique has been presented by Yu and Yang [6]. The basic idea behind the approach is that no significant information will be lost if the null space of Sb is discarded. Based on the assumption, it can be concluded that the optimal discriminant features exist in the range space of Sb.

Let multiclass classification be considered, given a data matrix where each column xi represents a sample data. Suppose X is composed of c classes and total number of samples is denoted by for the ith class consists of Ni samples. Then, the between-class scatter matrix is defined as where are the class mean sample, and denotes the total mean sample. Similarly, the within-class scatter matrix is defined as where, In DLDA, eigenvalue decomposition is performed on the between-class matrix Sb, firstly. Suppose the rank of Sb is t, and let be a diagonal matrix with the t largest eigenvalue on the main diagonal in descending order, is the eigenvector matrix that consists of t corresponding eigenvectors. Then, dimensionality of data x is reduced by using the projection matrix from d to t, And eigenvalue decomposition is performed on the within-class scatter matrix of the projected samples, Let be the ascending order eigenvalue matrix of and be the corresponding eigenvector matrix. Therefore, the final transformation matrix is given by

To address the computation complexity problem of high dimensional data, the eigenanalysis method presented by Turk and Pentland [1] is applied in DLDA, which makes the eigenanalysis of scatter matrices be progressed in an efficient way. For the eigenvalue decomposition of any symmetry matrix A with the form of , we can consider the eigenvectors vi of such that Premultiplying both sides by G, we have from which it can be concluded that the eigenvectors of A is Bvi with the corresponding eigenvalue

3. Direct Neighborhood Discriminant Analysis

Instead of mining the statistical discriminant information, manifold learning techniques try to find out the local manifold structure of data. Derived from the locality preserving idea [10, 11], graph embedding framework-based techniques extract the local discriminant features for classification. For a general pattern classification problem, it is expected to find a linear transformation, such that the compactness for the samples that belong to the same class and the separation for the samples of the interclass should be enhanced in the transformed space. As an example, a simple multiclass classification problem is illustrated in Figure 1. Suppose there are two nearest inter- and intraclass neighbors searched for classification. The inter- and intracalss nearby data points of five data points A–E is shown in Figures 1(b) and 1(c), respectively. For data point A, it is optimal that the distance from its interclass neighbors should be maximized to alleviate their bad influence for classification. On the other hand, the distance between data point A and its intraclass neighbors should be minimized to make A be classified correctly.

Based on the consideration, two graphs, that is, the between-class graph G and the within-class graph are constructed to discover the local discriminant structure [13, 15]. For each data point xi, its sets of inter- and intraclass neighbors are indicated by and , respectively. Then, the weight Wij reflects the weight of the edge in the between-class graph G is defined as and similarly define within-class affinity weight as Let the transformation matrix be denoted by which transforms the original data x from high-dimensional space into a low-dimensional space by The separability of interclass samples in the transformed low-dimensional space can be defined as where is the trace of matrix, is the data matrix, and Db is a diagonal matrix, of which entries are column (or row, since Wb is symmetric) sum of Wb, Similarly, the compactness of intraclass samples can be characterized as Here, Dw is a diagonal matrix of which entries are column (or row) sum of on the main diagonal, Then, the optimal transformation matrix P can be obtained by solving the following problem:

Here, Sc is always singular with small training sample set leading problem to get projective matrix P directly, thus previous local discriminant techniques still suffer from the curse of high dimensionality. Generally, PCA is usually performed to reduce dimension as a preprocessing step in such environment [15], however, possible discriminant information may be ignored. Inspired by DLDA, we can perform eigenanalysis on Ss and Sc successively to extract the complete local geometrical structure directly without PCA preprocessing. To alleviate the burden of computation, we reformulate Ss and Sc so that Turk’s eigenanalysis method can be employed. For each nonzero element of , we build an N dimensional interclass index vector of all zeroes except the ith and jth element is set to be 1 and , respectively: Suppose there are Nb nonzero elements in Wb, let be the interclass index matrix made up of Nb interclass index vectors. It can be easily obtained that which we prove in Appendix A. Therefore, Ss can be reformulated as As each column in Hs has only two nonzero elements 1 and , we can make the first row in Hs be a null row by adding all rows but the first to the first row. On the other hand, for each column in Hs, there is another column with contrary sign. Then, it is clear that where Nb is the number of nonzero elements in Wb. Due to the properties of matrix trace [17], we can get In many FR cases, the number of pixels in a facial image is much larger than the number of available samples, that is, It tells us that the rank of Ss is at most Similarly, Sc can also be reformulated as Here, is the intraclass index matrix consisting of all the Nw intraclass index vectors as columns, which is constructed according to the Nw nonzero elements in . Similar to Ss, the rank of Sc is up to Based on the modified formulation, the optimal transformation matrix P can be obtained as

As the null space of Ss contributes little to classification, it is feasible to remove such subspace by projecting Ss into its range space. We apply the eigenvalue decomposition to Ss and unitize it through Turk’s eigenanalysis method, while discarding those eigvectors whose corresponding eigvalues are zero, which do not take much power for discriminant analysis. Then, the discriminant information in Sc can be obtained by performing eigenanalysis on which is gotten by transforming Sc into the range subspace of Ss. This algorithm can be implemented by the pseudocode shown in Algorithm 1.

Input: Data matrix class label L
Output: Transformed matrix
1. Construct the between-class and the within-class affinity weight matrix , .
2. Construct the interclass and the intraclass index matrix Hs, Hc according to the nonzero elements
of , .
For the kth nonzero element of , the corresponding kth column in is
constructed as
3. Apply eigenvalue decomposition to Ss and keep the largest t nonzero eigenvalues
and corresponding eigenvectors after sorted in decreasing
order, where
4. Compute Ps as where is diagonal matrix with on the
main diagonal.
5. Perform eigenvalue decomposition on . Let be the
eigenvalue matrix of in ascending order and be the corresponding
eigenvector matrix. Calculate Pc as
6.

DNDA has a computational complexity of (Nb is the number of nonzero elements in Wb), as it preserves a similar procedure to DLDA (). Compared with Eigenface () and Fisherface (), DNDA is still more efficient for feature extraction in high dimensionality if

4. Experiments

In this section, we investigate the performance of the proposed DNDA method for face recognition. Three popular face databases, ORL, Yale, and UMIST are used in the experiments. To verify the performance of DNDA, each experiment is compared with classical approaches: Eigenface [1], Fisherface [2], DLDA [6], LPP [11], and MFA [15]. The three nearest-neighbor classifier with Euclidean distance metric is employed to find the image in the database with the best match.

4.1. ORL Database

In ORL database [18], there are 10 different images for each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling), and facial details (glasses/no glasses). All the images are taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). The original images have size of pixels with 256 gray levels; such one subject is illustrated in Figure 2(a).

The experiments are performed with different numbers of training samples. As there are 10 images for each subject, of them are randomly selected for training and the remaining are used for testing. For each n, we perform 20 times to choose randomly the training set and the average recognition rate is calculated. Figure 3 illustrates the plot of recognition rate versus the number of features used in the matching for Eigenface, Fisherface, DLDA, LPP, MFA, and DNDA. The best performance obtained by each method and the corresponding dimension of reduced space in the bracket are shown in Table 1.

4.2. Yale Database

The Yale Face Database [19] contains 165 grayscale images of 15 individuals. There are 11 images per subject, one per different lighting condition (left-light, center-light, right-light), facial expression (normal, happy, sad, sleepy, surprised, wink), and with/without glasses. Each images used in the experiments is pixels with 256 gray levels. The facial images of one individual are illustrated in Figure 2(b).

The experimental implementation is the same as before. For each individual, images are randomly selected for training and the rest are used for testing. For each given n, we average the results over experiments repeated 20 times independently. Figure 4 illustrates the plot of recognition rate versus the number of features used in the matching for Eigenface, Fisherface, DLDA, LPP, MFA, and DNDA. The best results obtained in the experiments and the corresponding reduced dimension for each method is shown in Table 2.

4.3. UMIST Database

The UMIST face database [20] consists of 564 images of 20 people. For simplicity, the Precropped version of the UMIST database is used in this experiment, where each subject covers a range of poses from profile to frontal views and a range of race/sex/appearance. The size of cropped image is pixels with 256 gray levels. The facial images of one subject with different views are illustrated in Figure 2(c).

For each individual, we chose 8 images of different views distributed uniformly in the range 0–90° for training, and the rest are used for training. Figure 5 illustrates the plot of recognition rate versus the number of features used in the matching for Eigenface, Fisherface, DLDA, LPP, MFA, and DNDA. The best performance and the corresponding dimensionalities of the projected spaces for each method are shown in Table 3.

From the experiment results, it is very obvious that DNDA achieves higher accuracy than the other methods. This is probably due to the fact that DNDA is a two-stage local discriminant technique, different form LPP and MFA. Moreover, PCA is removed in DNDA preserving more discriminant information compared with others.

5. Conclusions

Inspired by DLDA, we propose in this paper a novel local discriminant feature extraction method called direct neighborhood discriminant analysis (DNDA). In order to avoid SSS problem, DNDA performs a two-stage eigenanalysis approach, which can be implemented efficiently by using Turk’s method. Compared with other methods, PCA preprocessing is left out in DNDA with the immunity from the SSS problem. Experiments on ORL, Yale, and UMIST face databases show the effectiveness and robustness of our proposed method for face recognition. To get a better classification result, the improvement and extension of DNDA are to be taken into account in our future work.

Appendix

A. Proof of

Given the graph weight matrix W with l nonzero elements, consider two matrices For each nonzero element in W, there is corresponding column in M and N with common location, respectively. Let be the index set of nonzero elements in W. For the kth nonzero element Wij in W, the kth column of M, N is represented as Then, it is easy to get for , and where and denote the kth row of M and N, respectively. Therefore, we can get where is the Kronecker delta. Note that both matrix D and W are symmetry matrices, based on the above equations, it is easy to find out It is easy to check that , which completes the proof.

Acknowledgments

This work is supported in part by the Program for New Century Excellent Talents of Educational Ministry of China (NCET-06-0762) and Specialized Research Fund for the Doctoral Program of Higher Education (20060611009), and in part by Natural Science Foundations of Chongqing CSTC (CSTC2007BA2003 and CSTC2006BB2003), the National Natural Science Foundation of China under the Project Grant no. 60573125.