Abstract

Local Fisher discriminant analysis (LFDA) was proposed for dealing with the multimodal problem. It not only combines the idea of locality preserving projections (LPP) for preserving the local structure of the high-dimensional data but also combines the idea of Fisher discriminant analysis (FDA) for obtaining the discriminant power. However, LFDA also suffers from the undersampled problem as well as many dimensionality reduction methods. Meanwhile, the projection matrix is not sparse. In this paper, we propose double sparse local Fisher discriminant analysis (DSLFDA) for face recognition. The proposed method firstly constructs a sparse and data-adaptive graph with nonnegative constraint. Then, DSLFDA reformulates the objective function as a regression-type optimization problem. The undersampled problem is avoided naturally and the sparse solution can be obtained by adding the regression-type problem to a penalty. Experiments on Yale, ORL, and CMU PIE face databases are implemented to demonstrate the effectiveness of the proposed method.

1. Introduction

Dimensionality reduction tries to transform the high-dimensional data into lower-dimensional space in order to preserve the useful information as much as possible. It has a wide range of applications in pattern recognition, machine learning, and computer vision. A well-known approach for supervised dimensionality reduction is linear discriminant analysis (LDA) [1]. It tries to find a projection transformation by maximizing the between-class distance and minimizing the within-class distance simultaneously. In practical applications, LDA usually suffers from some limitations. First, LDA usually suffers from the undersampled problem [2]; that is, the dimension of data is larger than the number of training samples. Second, LDA can only uncover the global Euclidean structure. Third, the solution of LDA is not sparse, which cannot give the physical interpretation.

To deal with the first problem, many methods have been proposed. Belhumeur et al. [3] proposed a two-stage principal component analysis (PCA) [4] + LDA method, which utilizes PCA to reduce dimensionality so as to make the within-class scatter matrix nonsingular, followed by LDA for recognition. However, some useful information may be compromised in the PCA stage. Chen et al. [5] extracted the most discriminant information from the null space of within-class scatter matrix. However, the discriminant information in the nonnull space of within-class scatter matrix would be discarded. Huang et al. [6] proposed an efficient null-space approach, which first removes the null space of total scatter matrix. This method is based on the observation that the null space of total scatter matrix is the intersection of the null space of between-class scatter matrix and the null space of within-class scatter matrix. Qin et al. [7] proposed a generalized null space uncorrelated Fisher discriminant analysis technique that integrates the uncorrelated discriminant analysis and weighted pairwise Fisher criterion for solving the undersampled problem. Yu and Yang [8] proposed direct LDA (DLDA) to overcome the undersampled problem. It removes the null space of between-class scatter matrix and extracts the discriminant information that corresponds to the smallest eigenvalues of the within-class scatter matrix. Zhang et al. [9] proposed an exponential discriminant analysis (EDA) method to extract the most discriminant information which is contained in the null space of the within-class scatter matrix.

To deal with the second problem, many methods have been developed for dimensionality reduction. These methods focus on finding the local structure of the original data space. Locality preserving projections (LPP) [10] was proposed to find an embedding subspace that preserves local information. One limitation of LPP is that it is an unsupervised method. Because the discriminant information is important to the classification tasks, some locality preserving discriminant methods have been proposed. Discriminant locality preserving projection (DLPP) [11] was proposed to improve the performance of LPP. Laplacian linear discriminant analysis (LapLDA) [12] tries to capture the global and local structure of the data simultaneously by integrating LDA with a locality preserving regularizer. Local Fisher discriminant analysis (LFDA) [13] was proposed to deal with the multimodal problem. It combines the ideas of Fisher discriminant analysis (FDA) [1] and LPP and maximizes between-class separability and preserves within-class local structure simultaneously. In LDA, the dimension of the embedding space should be less than the number of classes. This limitation can be solved by using the LFDA algorithm.

To deal with the third problem, many dimensionality reduction methods integrating the sparse representation theory have been proposed. These methods can be classified into two categories. The first category focuses on finding a subspace spanned by sparse vectors. The sparse projection vectors reveal which element or region of the patterns is important for recognition tasks. Sparse PCA (SPCA) [14] was proposed by using the least angle regression and elastic net to produce sparse principal components. Sparse discriminant analysis (SDA) [15] and sparse linear discriminant analysis (SLDA) [16] were proposed to learn a sparse discriminant subspace for feature extraction and classification in biological and medical data analysis. Both methods try to transform the original objective into a regression-type problem and add a lasso penalty to obtain the sparse projection axes. One disadvantage of these methods is that the number of sparse vectors is at most . is the number of class. The second category focuses on the sparse reconstructive weight among the training samples. Graph embedding framework views many dimensionality reduction methods as the graph construction [17]. The -nearest neighbor and the -ball based methods are two popular ways for graph construction. Instead of them, Cheng et al. built the -graph based on sparse representation [18]. The -graph has proved that it is efficient and robust to data noise. -graph based subspace learning methods include sparse preserving projections (SPP) [19] and discriminant sparse neighborhood preserving embedding (DSNPE) [20].

Motivated by -graph and sparse subspace learning, in this paper, we proposed double sparse local Fisher discriminant analysis (DSLFDA) for multimodal problem. It measures the similarity on the graph by integrating the sparse representation and nonnegative constraint. To obtain sparse projection vectors, the objective function can be transformed into a regression-type problem. Furthermore, the space spanned by the solution of regression-type problem is identical to that spanned by the solution of original problem. The proposed DSLFDA has two advantages: (1) it remains the sparse characteristic of -graph; (2) to enhance the discriminant power of DSLFDA, the label information is used in the definition of local scatter matrices. Meanwhile, the projection vectors are sparse, which can make the physical meaning of the patterns clear. The proposed method is applied to face recognition and is examined using the Yale, ORL, and PIE face databases. Experimental results show that it can enhance the performance of LFDA effectively.

The rest of this paper is organized as follows. In Section 2, the LFDA algorithm is presented. The double sparse local Fisher discriminant analysis algorithm is proposed in Section 3. In Section 4, experiments are implemented to evaluate our proposed algorithm. The conclusions are given in Section 5.

In this section, we give a brief of LDA and LFDA. Given a data set with each column corresponding to a data sample, , the class label of is set to , and is the number of classes. We denote as the number of samples in the th class. Dimensionality reduction tries to map the point into () by the linear transformation:

The above transformation can be written as matrix form:where .

2.1. Linear Discriminant Analysis

Linear discriminant analysis tries to find the discriminant vectors by the Fisher criterion, that is, the within-class distance is minimized and the between-class distance is maximized simultaneously. The within-class scatter matrix and between-class scatter matrix are, respectively, defined as follows:where is the data set of class . is the mean of the samples in class and is the mean of the total data. LDA seeks the optimal projection matrix by maximizing the following Fisher criterion:The above optimization is equivalent to solving the following generalized eigenvalue problem: consists of the eigenvectors of corresponding to the first largest eigenvalues.

2.2. Local Fisher Discriminant Analysis

Local Fisher discriminant analysis (LFDA) is also a discriminant analysis method. It aims to deal with the multimodal problem. The local within-class scatter matrix and the between-class scatter matrix are defined aswhere is the affinity matrix. , and is the local scaling of defined by where is the th nearest neighbor of .

The objection function of LFDA is formulated as where is the trace of a matrix. The projection matrix can be obtained by calculating the eigenvectors of the following generalized eigenvalue problem:. Because of the definition of matrix , LFDA can effectively preserve the local structure of the data.

3. Double Sparse Local Fisher Discriminant Analysis

3.1. Graph Construction by Sparse Representation

In LFDA, the affinity matrices are defined by the local scaling method. This method can be regarded as an extension of -nearest neighbor method. Recent research shows that the -graph is robust to data noise and efficient for finding the underlying manifold structure. Therefore, we defined the new affinity matrix by sparse representation theory. Let ; each is a sparse vector and obtained by the following -minimization problem:where is a -dimensional vector in which the th element is equal to zero. is a vector of all ones. The -minimization problem (10) can be solved by many efficient numerical algorithms. In this paper, the LARS algorithm [21] is used for solving problem (10). The matrix can be seen as the similarity measurement by setting the matrix . Therefore, the new local scatter matrices can be defined as follows:where and are the weight matrices and defined asThe final objective function is described as follows:The optimal projection can be obtained by solving the following generalized eigenvalue problem:When the matrix is nonsingular, the eigenvectors are obtained by the eigendecomposition of matrix . However, the projection matrix is not sparse.

3.2. Finding the Sparse Solution

We first reformulate formulas (11) and (12) in matrix form. Considerwhere is the diagonal matrix and the th diagonal element is , . Similarly, formula (12) can be expressed as where , is the diagonal matrix, and the th diagonal element is .

Matrices and are always symmetric and positive semidefinite; therefore, the eigendecomposition of and can be expression as follows:where and are the diagonal matrices. Their diagonal elements are the eigenvalues of matrices and , respectively. So and can be rewritten aswhere and .

The following result which was inspired by [14, 16] gives the relationship between problem (10) and the regression-type problem.

Theorem 1. Suppose that is positive definite; its Cholesky decomposition can be expressed as , where is a lower triangular matrix. Let be the eigenvector of problem (15) associated with the first largest eigenvalues. Let and be the optimal solution to the following problem:where and is the th column of . Then the columns of span the same linear space as well as those of .

To obtain sparse projection vectors, we add a penalty to the objective function (20):

Generally speaking, it is difficult to compute the optimal and simultaneously. An iterative algorithm was usually used for solving problem (21). For a fixed , there exists an orthogonal matrix such that is column orthogonal matrix. Then the first term of (21) can be rewritten asIf is fixed, then problem (21) is transformed intowhich is equivalent to independent LASSO problem.

For a fixed , problem (21) is equivalent to minimizing the following problem with ignoring the constant terms:which is subject to . The optimal solution can be obtained by computing the singular value decomposition and .

The algorithm procedure of DSLFDA is summarized as follows.

Input: the data matrix .

Output: the sparse projection matrix .(1)Calculate affinity matrix by -minimization problem  (10).(2)Calculate matrix by (18) and matrix by the Cholesky decomposition of .(3)Initialize matrix as an arbitrary column orthogonal matrix.(4)For given , solve -minimization problem (23) which is equivalent to independent LASSO problem.(5)Calculate the SVD of and update .(6)Repeat steps 4 and 5 until converges.

4. Experimental Results

In this section, we use the proposed DSLFDA method for face recognition. Three face image databases, that is, Yale [22], ORL [23], and PIE [24], are used in the experiments. We compare our proposed algorithm with PCA, LDA, LPP, LFDA, SPCA, SPP, DSNPE, and SLDA. For simplicity, we use nearest neighbor classifier for classification task and the Euclidean metric is used as the distance measure.

4.1. Experiment on the Yale Face Database

The Yale face database contains 165 grayscale images of 15 individuals. Each individual has 11 images. These images were captured under lighting conditions (left-light, center-light, and right-light), with various facial expressions (normal, happy, sad, sleepy, surprised, and wink), and with facial details (with glasses or without). The original size of the images is pixels. In our experiments, the face region of each original image was cropped based on the location of eyes. Each cropped image was resized to pixels. Figure 1 shows the cropped sample images of two individuals from the Yale database.

In the first experiment, we randomly select () images per subject for training and the remaining images are for testing. 10 time runs were implemented for stable performance. The average rates are used as the final recognition accuracies. For LFDA, the parameter is set to for simplicity. LPP is implemented in supervised model. For SPCA, we manually choose the sparse principal component in order to obtain the best performance. Table 1 shows the recognition accuracies of different methods with the corresponding dimension.

In the second experiment, we experiment with different dimensionalities of the projected space. Five images per individual were randomly selected for training, and the remaining images were used for testing. Figure 2 shows the performance of different methods.

4.2. Experiment on the ORL Face Database

The ORL database contains 400 images of 40 individuals. Each individual has 10 images. The images were captured at different times, under various light conditions, and with different facial expressions. The original size of the images is pixels. The images were manually cropped and resized to pixels. Figure 3 shows the cropped sample images of two individuals from the ORL database.

In the first experiment, we randomly select () images per subject for training and the remaining images are for testing. 10 time runs were implemented for stable performance. The average rates are used as the final recognition accuracies. The experimental parameters were set as in Section 4.1.

In the second experiment, we experiment with different dimensionalities of the projected space. Five images per individual were randomly selected for training, and the remaining images were randomly selected for testing. Figure 4 shows the performance of different methods (Table 2).

4.3. Experiment on the PIE Face Database

The CMU PIE face database contains 41368 images of 68 individuals. The images were captured under 13 different poses, under 43 different illumination conditions, and with 4 different expressions. In our experiments, we choose a subset (C29) that contains 1632 images of 68 individuals. These were manually cropped and resized to pixels. Figure 5 shows the cropped sample images of two individuals from CMU PIE database.

In the first experiment, we randomly select () images per subject for training and the remaining images are for testing. 10 time runs were implemented for stable performance. The average rates are used as the final recognition accuracies. The experimental parameters were set as in Section 4.1. Table 3 shows the recognition accuracies of different methods with the corresponding dimension.

In the second experiment, we experiment with different dimensionalities of the projected space. Fifteen images per individual were randomly selected for training, and the remaining images were randomly selected for testing. Figure 6 shows the performance of different methods.

5. Discussion and Conclusion

5.1. Discussion

Based on the above experimental results, we can conclude the following observations.(1)For each method, the recognition rate increases with the increase of training sample sizes. The supervised extension of LPP can effectively improve the performance. PCA and SPCA achieve the worst results in all experiments; meanwhile, the performance of SPCA is inferior to that of PCA on all face databases. The reason may be that the number of nonzero variables for each component is selected equally.(2)For LDA and SLDA, the dimensionalities of projected subspace are at most; is the number of classes. LFDA and DSLFDA can overcome this limitation; hence, we may project the original high-dimensional data into a low-dimensional subspace whose dimensionality is larger than the number of classes.(3)From Table 3, LPP and SLDA outperform LFDA on the CMU PIE database. However, DSLFDA can achieve better performance than other methods. This point shows that DSLFDA improves not only the performance of LFDA but also the performance of sparse-based method, such as SLDA. The proposed DSLFDA algorithm constructs the graph on the original data and obtains the nonnegative similarity measurement. This is different from SPP and DSNPE.(4)From the experimental results, we obtain that SPP can get competitive performance on CMU PIE database, rather than ORL and Yale databases. The reason may be that the sparse representation needs abundant training samples. Conversely, the nonnegative similarity measurement in DSLFDA is adaptive and can overcome the drawback of sparse representation.(5)DSNPE can be regarded as an extension of SPP. It can extract the discriminant information and perform better than SPP. On the Yale database, DSNPE can achieve the best recognition performance when the training samples per individual are four and five.

5.2. Conclusion

In this paper, we proposed a sparse projection method, called DSLFDA, for face recognition. It defines a novel affinity matrix that describes the relationships of points on the original high-dimensional data. The sparse projection vectors are obtained by solving the -optimization problem. Experiments on Yale, ORL, and CMU PIE face databases indicate that DSLFDA can get competitive performance compared to other dimensionality reduction methods.

We only focus on supervised learning in this paper. Because a large amount of unlabeled data is available in practical applications, semisupervised learning has attracted much attention in recent years [2527]. One of our future works is to extend our approach under the semisupervised learning framework. On the other hand, DSLFDA needs the local within-class scatter matrix positive definite. We add an identity matrix to the local within-class scatter matrix for regularization. This may motivate us to find the regularization method to approximate the local within-class scatter matrix well.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported partly by the National Natural Science Foundation of China (61172128 and 61003114), National Key Basic Research Program of China (2012CB316304), the Fundamental Research Funds for the Central Universities (2013JBM020 and 2013JBZ003), Program for Innovative Research Team in University of Ministry of Education of China (IRT201206), and Doctoral Foundation of China Ministry of Education (20120009120009).