Mathematical Problems in Engineering

Volume 2015, Article ID 636928, 9 pages

http://dx.doi.org/10.1155/2015/636928

## Face Recognition Using Double Sparse Local Fisher Discriminant Analysis

^{1}Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China^{2}Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China

Received 17 October 2014; Revised 4 March 2015; Accepted 9 March 2015

Academic Editor: Zhan Shu

Copyright © 2015 Zhan Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Local Fisher discriminant analysis (LFDA) was proposed for dealing with the multimodal problem. It not only combines the idea of locality preserving projections (LPP) for preserving the local structure of the high-dimensional data but also combines the idea of Fisher discriminant analysis (FDA) for obtaining the discriminant power. However, LFDA also suffers from the undersampled problem as well as many dimensionality reduction methods. Meanwhile, the projection matrix is not sparse. In this paper, we propose double sparse local Fisher discriminant analysis (DSLFDA) for face recognition. The proposed method firstly constructs a sparse and data-adaptive graph with nonnegative constraint. Then, DSLFDA reformulates the objective function as a regression-type optimization problem. The undersampled problem is avoided naturally and the sparse solution can be obtained by adding the regression-type problem to a penalty. Experiments on Yale, ORL, and CMU PIE face databases are implemented to demonstrate the effectiveness of the proposed method.

#### 1. Introduction

Dimensionality reduction tries to transform the high-dimensional data into lower-dimensional space in order to preserve the useful information as much as possible. It has a wide range of applications in pattern recognition, machine learning, and computer vision. A well-known approach for supervised dimensionality reduction is linear discriminant analysis (LDA) [1]. It tries to find a projection transformation by maximizing the between-class distance and minimizing the within-class distance simultaneously. In practical applications, LDA usually suffers from some limitations. First, LDA usually suffers from the undersampled problem [2]; that is, the dimension of data is larger than the number of training samples. Second, LDA can only uncover the global Euclidean structure. Third, the solution of LDA is not sparse, which cannot give the physical interpretation.

To deal with the first problem, many methods have been proposed. Belhumeur et al. [3] proposed a two-stage principal component analysis (PCA) [4] + LDA method, which utilizes PCA to reduce dimensionality so as to make the within-class scatter matrix nonsingular, followed by LDA for recognition. However, some useful information may be compromised in the PCA stage. Chen et al. [5] extracted the most discriminant information from the null space of within-class scatter matrix. However, the discriminant information in the nonnull space of within-class scatter matrix would be discarded. Huang et al. [6] proposed an efficient null-space approach, which first removes the null space of total scatter matrix. This method is based on the observation that the null space of total scatter matrix is the intersection of the null space of between-class scatter matrix and the null space of within-class scatter matrix. Qin et al. [7] proposed a generalized null space uncorrelated Fisher discriminant analysis technique that integrates the uncorrelated discriminant analysis and weighted pairwise Fisher criterion for solving the undersampled problem. Yu and Yang [8] proposed direct LDA (DLDA) to overcome the undersampled problem. It removes the null space of between-class scatter matrix and extracts the discriminant information that corresponds to the smallest eigenvalues of the within-class scatter matrix. Zhang et al. [9] proposed an exponential discriminant analysis (EDA) method to extract the most discriminant information which is contained in the null space of the within-class scatter matrix.

To deal with the second problem, many methods have been developed for dimensionality reduction. These methods focus on finding the local structure of the original data space. Locality preserving projections (LPP) [10] was proposed to find an embedding subspace that preserves local information. One limitation of LPP is that it is an unsupervised method. Because the discriminant information is important to the classification tasks, some locality preserving discriminant methods have been proposed. Discriminant locality preserving projection (DLPP) [11] was proposed to improve the performance of LPP. Laplacian linear discriminant analysis (LapLDA) [12] tries to capture the global and local structure of the data simultaneously by integrating LDA with a locality preserving regularizer. Local Fisher discriminant analysis (LFDA) [13] was proposed to deal with the multimodal problem. It combines the ideas of Fisher discriminant analysis (FDA) [1] and LPP and maximizes between-class separability and preserves within-class local structure simultaneously. In LDA, the dimension of the embedding space should be less than the number of classes. This limitation can be solved by using the LFDA algorithm.

To deal with the third problem, many dimensionality reduction methods integrating the sparse representation theory have been proposed. These methods can be classified into two categories. The first category focuses on finding a subspace spanned by sparse vectors. The sparse projection vectors reveal which element or region of the patterns is important for recognition tasks. Sparse PCA (SPCA) [14] was proposed by using the least angle regression and elastic net to produce sparse principal components. Sparse discriminant analysis (SDA) [15] and sparse linear discriminant analysis (SLDA) [16] were proposed to learn a sparse discriminant subspace for feature extraction and classification in biological and medical data analysis. Both methods try to transform the original objective into a regression-type problem and add a lasso penalty to obtain the sparse projection axes. One disadvantage of these methods is that the number of sparse vectors is at most . is the number of class. The second category focuses on the sparse reconstructive weight among the training samples. Graph embedding framework views many dimensionality reduction methods as the graph construction [17]. The -nearest neighbor and the -ball based methods are two popular ways for graph construction. Instead of them, Cheng et al. built the -graph based on sparse representation [18]. The -graph has proved that it is efficient and robust to data noise. -graph based subspace learning methods include sparse preserving projections (SPP) [19] and discriminant sparse neighborhood preserving embedding (DSNPE) [20].

Motivated by -graph and sparse subspace learning, in this paper, we proposed double sparse local Fisher discriminant analysis (DSLFDA) for multimodal problem. It measures the similarity on the graph by integrating the sparse representation and nonnegative constraint. To obtain sparse projection vectors, the objective function can be transformed into a regression-type problem. Furthermore, the space spanned by the solution of regression-type problem is identical to that spanned by the solution of original problem. The proposed DSLFDA has two advantages: (1) it remains the sparse characteristic of -graph; (2) to enhance the discriminant power of DSLFDA, the label information is used in the definition of local scatter matrices. Meanwhile, the projection vectors are sparse, which can make the physical meaning of the patterns clear. The proposed method is applied to face recognition and is examined using the Yale, ORL, and PIE face databases. Experimental results show that it can enhance the performance of LFDA effectively.

The rest of this paper is organized as follows. In Section 2, the LFDA algorithm is presented. The double sparse local Fisher discriminant analysis algorithm is proposed in Section 3. In Section 4, experiments are implemented to evaluate our proposed algorithm. The conclusions are given in Section 5.

#### 2. Related Work

In this section, we give a brief of LDA and LFDA. Given a data set with each column corresponding to a data sample, , the class label of is set to , and is the number of classes. We denote as the number of samples in the th class. Dimensionality reduction tries to map the point into () by the linear transformation:

The above transformation can be written as matrix form:where .

##### 2.1. Linear Discriminant Analysis

Linear discriminant analysis tries to find the discriminant vectors by the Fisher criterion, that is, the within-class distance is minimized and the between-class distance is maximized simultaneously. The within-class scatter matrix and between-class scatter matrix are, respectively, defined as follows:where is the data set of class . is the mean of the samples in class and is the mean of the total data. LDA seeks the optimal projection matrix by maximizing the following Fisher criterion:The above optimization is equivalent to solving the following generalized eigenvalue problem: consists of the eigenvectors of corresponding to the first largest eigenvalues.

##### 2.2. Local Fisher Discriminant Analysis

Local Fisher discriminant analysis (LFDA) is also a discriminant analysis method. It aims to deal with the multimodal problem. The local within-class scatter matrix and the between-class scatter matrix are defined aswhere is the affinity matrix. , and is the local scaling of defined by where is the th nearest neighbor of .

The objection function of LFDA is formulated as where is the trace of a matrix. The projection matrix can be obtained by calculating the eigenvectors of the following generalized eigenvalue problem:. Because of the definition of matrix , LFDA can effectively preserve the local structure of the data.

#### 3. Double Sparse Local Fisher Discriminant Analysis

##### 3.1. Graph Construction by Sparse Representation

In LFDA, the affinity matrices are defined by the local scaling method. This method can be regarded as an extension of -nearest neighbor method. Recent research shows that the -graph is robust to data noise and efficient for finding the underlying manifold structure. Therefore, we defined the new affinity matrix by sparse representation theory. Let ; each is a sparse vector and obtained by the following -minimization problem:where is a -dimensional vector in which the th element is equal to zero. is a vector of all ones. The -minimization problem (10) can be solved by many efficient numerical algorithms. In this paper, the LARS algorithm [21] is used for solving problem (10). The matrix can be seen as the similarity measurement by setting the matrix . Therefore, the new local scatter matrices can be defined as follows:where and are the weight matrices and defined asThe final objective function is described as follows:The optimal projection can be obtained by solving the following generalized eigenvalue problem:When the matrix is nonsingular, the eigenvectors are obtained by the eigendecomposition of matrix . However, the projection matrix is not sparse.

##### 3.2. Finding the Sparse Solution

We first reformulate formulas (11) and (12) in matrix form. Considerwhere is the diagonal matrix and the th diagonal element is , . Similarly, formula (12) can be expressed as where , is the diagonal matrix, and the th diagonal element is .

Matrices and are always symmetric and positive semidefinite; therefore, the eigendecomposition of and can be expression as follows:where and are the diagonal matrices. Their diagonal elements are the eigenvalues of matrices and , respectively. So and can be rewritten aswhere and .

The following result which was inspired by [14, 16] gives the relationship between problem (10) and the regression-type problem.

Theorem 1. *Suppose that is positive definite; its Cholesky decomposition can be expressed as , where is a lower triangular matrix. Let be the eigenvector of problem (15) associated with the first largest eigenvalues. Let and be the optimal solution to the following problem:**where and is the th column of . Then the columns of span the same linear space as well as those of .*

To obtain sparse projection vectors, we add a penalty to the objective function (20):

Generally speaking, it is difficult to compute the optimal and simultaneously. An iterative algorithm was usually used for solving problem (21). For a fixed , there exists an orthogonal matrix such that is column orthogonal matrix. Then the first term of (21) can be rewritten asIf is fixed, then problem (21) is transformed intowhich is equivalent to independent LASSO problem.

For a fixed , problem (21) is equivalent to minimizing the following problem with ignoring the constant terms:which is subject to . The optimal solution can be obtained by computing the singular value decomposition and .

The algorithm procedure of DSLFDA is summarized as follows.

*Input*: the data matrix .

*Output*: the sparse projection matrix .(1)Calculate affinity matrix by -minimization problem (10).(2)Calculate matrix by (18) and matrix by the Cholesky decomposition of .(3)Initialize matrix as an arbitrary column orthogonal matrix.(4)For given , solve -minimization problem (23) which is equivalent to independent LASSO problem.(5)Calculate the SVD of and update .(6)Repeat steps 4 and 5 until converges.

#### 4. Experimental Results

In this section, we use the proposed DSLFDA method for face recognition. Three face image databases, that is, Yale [22], ORL [23], and PIE [24], are used in the experiments. We compare our proposed algorithm with PCA, LDA, LPP, LFDA, SPCA, SPP, DSNPE, and SLDA. For simplicity, we use nearest neighbor classifier for classification task and the Euclidean metric is used as the distance measure.

##### 4.1. Experiment on the Yale Face Database

The Yale face database contains 165 grayscale images of 15 individuals. Each individual has 11 images. These images were captured under lighting conditions (left-light, center-light, and right-light), with various facial expressions (normal, happy, sad, sleepy, surprised, and wink), and with facial details (with glasses or without). The original size of the images is pixels. In our experiments, the face region of each original image was cropped based on the location of eyes. Each cropped image was resized to pixels. Figure 1 shows the cropped sample images of two individuals from the Yale database.