Abstract

Tensor subspace analysis (TSA) and discriminant TSA (DTSA) are two effective two-sided projection methods for dimensionality reduction and feature extraction of face image matrices. However, they have two serious drawbacks. Firstly, TSA and DTSA iteratively compute the left and right projection matrices. At each iteration, two generalized eigenvalue problems are required to solve, which makes them inapplicable for high dimensional image data. Secondly, the metric structure of the facial image space cannot be preserved since the left and right projection matrices are not usually orthonormal. In this paper, we propose the orthogonal TSA (OTSA) and orthogonal DTSA (ODTSA). In contrast to TSA and DTSA, two trace ratio optimization problems are required to be solved at each iteration. Thus, OTSA and ODTSA have much less computational cost than their nonorthogonal counterparts since the trace ratio optimization problem can be solved by the inexpensive Newton-Lanczos method. Experimental results show that the proposed methods achieve much higher recognition accuracy and have much lower training cost.

1. Introduction

Many applications in the field of information process, such as data mining, information retrieval, machine learning, and pattern recognition, require dealing with high-dimensional data. Dimensionality reduction has been a key technique for achieving high efficiency in manipulating the high-dimensional data. In dimensionality reduction, the high-dimensional data are transformed into a low-dimensional subspace with limited loss of information.

Principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2] are two of the most well-known and widely used dimension reduction methods. PCA is an unsupervised method, which aims to find the projection directions by maximizing variance of features in the low-dimensional subspace. It is also considered as the best data representation method in that the mean squared error between the original data and the data reconstructed using the PCA transform result is the minimum. LDA is a supervised method and is based on the following idea: the transform results of the data points of different classes should be far as much as possible from each other and the transform results of the data points of the same class should be close as much as possible to each other. To achieve this goal, LDA seeks to find optimal linear transformation by minimizing the within-class distance and maximizing the between-class distance simultaneously. The optimal transformation of LDA can be computed by solving a generalized eigenvalue problem involving scatter matrices. LDA has been applied successfully for decades in many important applications including pattern recognition [24], information retrieval [5], face recognition [6, 7], microarray data analysis [8, 9], and text classification [10]. One of main drawbacks of LDA is that the scatter matrices are required to be nonsingular, which is not true when the data dimension is larger than the number of data samples. This is known as the undersampled problem and also called a small sampled problem [2]. To make LDA applicable for undersampled problems, researchers have proposed several variants of LDA including PCA + LDA [7, 11], LDA/GSVD [12, 13], two-stage LDA [14], regularized LDA [1518], orthogonal LDA [19, 20], null space LDA [21, 22], and uncorrelated LDA [20, 23].

As is well known, both PCA and LDA only take into account the global Euclidean structure of the original data. However, the high-dimensional data in real-world often lie on or near a smooth low-dimensional manifold. So, it is important to preserve the local structure. Locality preserving projection (LPP) [24] is a locality structure preserving method that aims to preserve the intrinsic geometry of the original data. Usually, LPP has better performance than the methods only preserving the global structure information such as PCA and LDA for recognition problems. Moreover, LPP is not sensitive to noise and outliers. In its original form, LPP is only an unsupervised dimension reduction method. The supervised version of LPP (SLPP) [25] exploits the class label information of the training samples and thus has a higher classification accuracy than the unsupervised LPP. Other improvements to LPP include the discriminant locality preserving projection (DLPP) [26] and the orthogonal discriminant locality preserving projection method (ODLPP) [27].

During dealing with two-dimensional data such as images, the traditional approach is first to transform these image matrices into one-dimensional vectors and then apply these dimension reduction methods mentioned above to the vectorized image data. The approach of vectorizing image matrices can bring high computational cost and a loss of the underlying spatial structure information of the images. In order to overcome disadvantages of the vectorization approach, researchers have proposed 2D-PCA [28], 2D-LDA [29], 2D-LPP [30], and 2D-DLPP [31]. These methods are directly based on the matrix expression of image data. However, these two-dimensional methods only employ single-side projection and thus cannot still preserve the intrinsic spatial structure information of the images.

In the last decade, some researchers have developed several second-order tensor methods for dimension reduction of image data. These methods aim to find two subspaces for two-sided projection. Ye [32] has proposed a generalized low-rank approximation method (GLRAM), which seeks to find the left and right projections by minimizing the reconstruction error. Moreover, an iterative procedure is presented. One of the main drawbacks of GLRAM is that one eigenvalue decomposition is required at each iteration step. So, the computational cost is high. To overcome this disadvantage, Ren and Dai [33] have proposed to replace the projection vectors obtained from the eigenvalue decomposition by the bilinear Lanczos vectors at each iteration step of GLRAM. Experimental results show that the approach based on bilinear Lanczos vectors is competitive with the conventional GLRAM in classification accuracy, while it has a much lower computational cost. We note that GLRAM is an unsupervised method and only preserves the global Euclidean structure of the image data. Tensor subspace analysis (TSA) [34] is another two-sided projection method for dimension reduction and feature extraction of image data. TSA preserves the local structure information of the original data, while it does not employ the discriminant information. Wang et al. [35] have proposed a discriminant TSA (DTSA) by combining TSA with the discriminant information. Like GLRAM, both TSA and DTSA use an iterative procedure to compute the optimal solution of two projection matrices. At each iteration of TSA and DTSA, two generalized eigenvalue problems are required to solve, which makes them inapplicable for dimension reduction and feature extraction of high-dimensional image data.

In this paper, we propose the orthogonal TSA (OTSA) and orthogonal DTSA (ODTSA) by constraining the left and right projection matrices to orthogonal matrices. Similarly to TSA and DTSA, OTSA and ODTSA also iteratively compute the left and right projection matrices. However, instead of solving two generalized eigenvalue problems as in TSA and DTSA, we require solving two trace ratio optimization problems at each iteration of OTSA and ODTSA during iteratively computing the left and right projection matrices. Thus, OTSA and ODTSA have much less computational cost than their nonorthogonal counterparts since the trace ratio optimization problem can be solved by the inexpensive Newton-Lanczos method. Two experiments on face recognition are conducted to evaluate the efficiency and effectiveness of the proposed OTSA and ODTSA. Experimental results show that these methods proposed in this paper achieve much higher recognition accuracy and have much lower training cost than TSA and DTSA.

The remainder of the paper is organized as follows. In Section 2, we briefly review TSA and DTSA. In Section 3, we firstly propose the OTSA and ODTSA. Then, we give a brief review of the trace ratio optimization problem and outline the Newton's method and the Newton-Lanczos for solving the trace ratio optimization problem. Finally, we present the algorithms for computing the left and right projection matrices of OTSA and ODTSA. Section 4 is devoted to numerical experiments. Some concluding remarks are provided in Section 5.

2. A Brief Review of TSA and DTSA

In this section, we give a brief review of TSA and DTSA, which are two recently proposed linear methods for dimension reduction and feature extraction of face recognition.

Given a set of image data, where . For simplicity of discussion, we assume that the given data set is partitioned into classes as where is the number of samples in the th class and .

Let denote the total within-class similarity matrix. Its entry is defined by where is a positive parameter which can be determined empirically and denotes the Frobenius norm for a matrix, that is, . Note that the total within-class similarity matrix has a block diagonal form, where the th block is the within-class similarity matrix of the th class and the size of the th block is equal to the number of samples in the th class; that is,

The between-class similarity matrix is defined as follows: where is the mean of samples in the th class.

Define the diagonal matrix with Then, is called the within-class Laplacian matrix and is symmetric positive semidefinite. Similarly, the between-class Laplacian matrix is defined as , where is a diagonal matrix with its th entry being the row sum of the th row of .

In two-sided projection methods such as TSA and DTSA for dimension reduction and feature extraction of matrix data, we aim to find two projection matrices , with and such that the low-dimensional data are easier to be distinguished.

2.1. Tensor Subspace Analysis

In TSA, we seek to find the left and right transformation matrices , by solving the following optimization problem: The numerator part of the objective function in (9) denotes the global variance on the manifold in low-dimensional subspace, while the denominator part of the objective function is a measure of nearness of samples from the same class. Therefore, by maximizing the objective function, the samples from the same class are transformed into data points close to each other and samples from different classes are transformed into data points far from each other.

Define These two matrices, respectively, are called the total left and right transformation matrices in [35].

The optimization problem (9) can be equivalently rewritten as the following optimization problem: or Here and in the following, denotes an identity matrix of order and represents the Kronecker product of the matrices.

Clearly, from the equivalence between the maximization problem (9) and the optimization problem (11) or (12), we have the following results, from which an iterative algorithm for the computation of the transformation matrices and results.

Theorem 1. Let and be the solution of the maximization problem (9). Then, Consider the following.(1)For a given , consists of the eigenvectors of the generalized eigenvalue problem corresponding to the largest eigenvalues.(2)For a given , consists of the eigenvectors of the generalized eigenvalue problem corresponding to the largest eigenvalues.

Based on Theorem 1, iterative implementation of TSA has been given in Algorithm 1; see, also, [34].

Input: A set of sample matrices with class label information, , .
Output: left and right transformation matrices and .
(1) Initialize with an identity matrix;
(2) Until convergence Do:
   (2.1) Form the matrix ;
   (2.2) Form the matrix ;
   (2.3) Compute the eigenvectors of the pencil
     corresponding to the largest eigenvalues.
   (2.4) Set ;
   (2.5) Form the matrix ;
   (2.6) Form the matrix ;
   (2.7) Compute the eigenvectors of the pencil
     corresponding to the largest eigenvalues.
   (2.8) Set ;
   End Do

2.2. Discriminant Tensor Subspace Analysis

In this subsection, we simply review the second-order DTSA, which is proposed in [35] for face recognition. DTSA combines the advantages of tensor methods and manifold methods and thus preserves the spatial structure information of the original image data and the local structure of the samples distribution. Moreover, by integrating the class label information into TSA, DTSA obtains higher recognition accuracy for face recognition.

In DTSA, the optimization problem is described as follows: where is the mean of samples in the th class.

We note that the objective function in (15) has the same denominator part as that of the objective function in (9) and however has a different numerator part from that of the objective function in (9). Since the numerator part of the objective function in (15) is established based on the class label information, DTSA has better performance than TSA for transforming samples from different classes into data points far from each other.

Define the mean left and right transformation matrices , by Then, similarly, the optimization problem (15) can be equivlently formulated as the optimization problem or the optimization problem where is the within-class Laplacian matrix and is the between-class Laplacian matrix.

Similarly, for the optimization problem (15), we have the following result.

Theorem 2. Let and be the solution of the maximization problem (15). Then, Consider the following.(1)For a given , consists of the eigenvectors of the generalized eigenvalue problem corresponding to the largest eigenvalues.(2)For a given , consists of the eigenvectors of the generalized eigenvalue problem corresponding to the largest eigenvalues.

The algorithm proposed in [35] for implementing DTSA is described in Algorithm 2.

Input: A set of sample matrices with class label information, , .
Output: left and right transformation matrices and .
(1) Initialize with an identity matrix;
(2) Until convergence Do:
   (2.1) Form the matrix ;
   (2.2) Form the matrix ;
   (2.3) Compute the eigenvectors of the pencil
     corresponding to the largest eigenvalues.
   (2.4) Set ;
   (2.5) Form the matrix ;
   (2.6) Form the matrix ;
   (2.7) Compute the eigenvectors of the pencil
     corresponding to the largest eigenvalues.
   (2.8) Set ;
   End Do

3. Orthogonal TSA and DTSA

Although TSA and DTSA are two effective methods for dimension reduction and feature extraction of facial images, they still have two serious defects. Firstly, as shown in the section above, the column vectors of the left and right transformation matrices and are the eigenvectors of symmetric positive semidefinite pencils. So, they are not usually orthonormal. The requirement of the orthogonality of the columns of projection matrices is common in that orthogonal projection matrices preserve the metric structure of the facial image space. Thus, orthogonal methods have better locality preserving power and higher discriminating power than nonorthogonal methods. Secondly, at each iteration step of TSA algorithm or DTSA algorithm, two generalized eigenvalue problems are required to solve for iteratively computing the left and right projection matrices. As a result, when computational efficiency is critical, relatively high computational complexities of TSA and DTSA make them inapplicable for real applications.

In this section, we propose the orthogonal TSA (OTSA) and the orthogonal DTSA (ODTSA) for dimension reduction and feature extraction of facial images.

In OTSA, we seek to obtain the orthogonal projection matrices and by solving the optimization problem while in ODTSA, the optimization problem to be solved is

Clearly, for OTSA and ODTSA, we have the following theorems.

Theorem 3. Let and be the solution of the maximization problem (21). Then, Consider the following. (1)For a given , is the solution of the trace ratio optimization problem (2)For a given , is the solution of the trace ratio optimization problem

Theorem 4. Let and be the solution of the maximization problem (22). Then, Consider the following.(1)For a given , is the solution of the trace ratio optimization problem (2)For a given , is the solution of the trace ratio optimization problem

The only difference between OTSA and TSA or between ODTSA and DTSA is that and are constrained to orthogonal matrices in OTSA and ODTSA. However, the projection matrices and of orthogonal methods are quite different from those of nonorthogonal methods. In nonorthogonal methods, and can be formulated by some eigenvectors of the generalized eigenvalue problems, while those of orthogonal methods are the solutions of the trace ratio optimization problems.

3.1. Trace Ratio Optimization

In this subsection, we consider the following trace ratio optimization problem: where are symmetric matrices.

For the trace ratio optimization problem (27), we have the following result, which is given in [36].

Theorem 5. Let , be two symmetric matrices and assume that is positive semidefinite with rank greater than . Then the ratio (27) admits a finite maximum value .

Define the function as follows:

We collect some important properties presented in [36] on the function in the following theorem. Some of them indicate the relation between the trace ratio optimization problem (27) and the function .

Theorem 6. Let , be two symmetric matrices and assume that is positive semidefinite with rank greater than . Then (1) is a non-increasing function of ;(2) if and only if , where is the finite maximum value of the ratio (27);(3)the derivative of is given by where (4)the columns of the solution matrix of the trace ratio optimization problem (27) consists of the eigenvectors of the matrix corresponding to the largest eigenvalues, that is,

Theorem 6 shows that instead of solving the trace ratio optimization problem (27), , the solution of the trace ratio optimization problem (27), can be obtained through two steps:(1)compute the solution of the nonlinear equation ;(2)compute the eigenvectors of the matrix corresponding to the largest eigenvalues.

Newton's method [37] is the most well-known and widely used method for solving a nonlinear equation. The iterative scheme of Newton's method for solving takes the form where consists of the eigenvectors of the matrix corresponding to the largest eigenvalues.

We now outline the procedure of Newton's method for solving the trace ratio optimization problem (27) in Algorithm 3.

Input: and a dimension
Output: which solves the trace ratio optimization problem (27)
(1) Select an initial unitary matrix ;
(2) Compute
       ;
(3) Until convergence Do:
   (3.1) Compute the eigenvectors of the matrix
     corresponding to the largest eigenvalues.
   (3.2) Set ;
   (3.3) Compute
         .
   End Do

We remark that since Newton's method is commonly of quadratic convergence, only several iterations are required in Algorithm 3 for obtaining a good approximation of . The main cost at each iteration in Algorithm 3 is due to the computation of the eigenvectors of a symmetric matrix corresponding to the largest eigenvalues.

3.2. Lanczos Vectors

In this subsections we review the Lanczos procedure for generating the Lanczos vectors of a symmetric matrix and the Newton-Lanczos method for solving the trace ratio optimization problem (27).

Given a symmetric matrix and an initial unit vector . Let denote the Krylov subspace associated with and , which is defined as

The Lanczos vectors , which form an orthonormal basis of the Krylov subspace , can be established by the 3-term recurrence with . The coefficients and are computed so as to ensure that and . The pseudocode of the Lanczos procedure for constructing the Lanczos vectors is outlined in Algorithm 4.

Input: and a dimension
Output: Lanczos vectors
(1) Set , and ;
(2) For
   (2.1) ;
   (2.2) ;
   (2.3) ;
   (2.4) ;
   (2.5) ;
   End For

It is known [38] that Lanczos vectors are commonly good approximation of the eigenvectors of a symmetric matrix corresponding to the largest eigenvalues. So it is reasonable that the eigenvectors of the matrix corresponding to the largest eigenvalues in Algorithm 3 are substituted by the Lanczos vectors of the matrix to save the expensive cost for computing the eigenvectors. This substitution deduces the Newton-Lanczos method for solving the trace ratio optimization problem (27), which is outlined in Algorithm 5; see, also, [36].

Input: and a dimension
Output: which solves the trace ratio optimization problem (27)
(1) Select an initial unitary matrix ;
(2) Compute
       ;
(3) Until convergence Do:
   (3.1) Compute the Lanczos vectors of by Algorithm 4.
   (3.2) Set ;
   (3.3) Compute
         .
    End Do

3.3. OTSA and ODTSA

Similarly, from Theorem 3, we can obtain two iterative procedures for computing the left and right transformation matrices and of OTSA and ODTSA. Algorithms 6 and 7 summarize the steps to compute and for OTSA and ODTSA, respectively.

Input: A set of sample matrices with class label information, , .
Output: left and right transformation matrices and .
(1) Initialize with an identity matrix;
(2) Until convergence Do:
   (2.1) Form the matrix ;
   (2.2) Form the matrix ;
   (2.3) Compute by solving the trace ratio optimization problem (27) with
      and ;
   (2.4) Form the matrix ;
   (2.5) Form the matrix ;
   (2.6) Compute by solving the trace ratio optimization problem (27) with
      and .
   End Do

Input: A set of sample matrices with class label information, , .
Output: left and right transformation matrices and .
(1) Initialize with an identity matrix;
(2) Until convergence Do:
   (2.1) Form the matrix ;
   (2.2) Form the matrix ;
   (2.3) Compute by solving the trace ratio optimization problem (27) with
      and ;
   (2.4) Form the matrix ;
   (2.5) Form the matrix ;
   (2.6) Compute by solving the trace ratio optimization problem (27) with
      and .
   End Do

The trace ratio optimization problem in Algorithms 6 and 7 can be solved by Newton's method or Newton-Lanczos method. For distinguishing these two cases, we use OTSA-N and ODTSA-N to denote the OTSA and ODTSA algorithms with the trace ratio optimization problem being solved by Newton's method and use OTSA-NL and ODTSA-NL to denote the OTSA and ODTSA algorithms with the trace ratio optimization problem being solved by Newton-Lanczos method.

3.4. Computational Complexity Analysis

We now discuss the computational complexity of TSA, DTSA, OTSA-N, ODTSA-N, OTSA-NL, and ODTSA-NL.

In each iteration of TSA, it costs about , , , , , and flops (floating-point operations) for computing , , , , , and , respectively. Moreover, it takes flops for computing the eigenvectors of the pencil and for . So, the total cost for each iteration of TSA is about flops.

The main difference between DTSA and TSA is that the matrix , in TSA is replaced by , in DTSA, respectively. For computing , , , and in each iteration of DTSA, it will spend about , , , and flops. Thus, DTSA costs flops for each iteration. In case , The computation amount of DTSA is less than TSA.

It is known that for solving the trace ratio optimization problem (27), it costs flops in Newton’s method (Algorithm 3) and in the Newton-Lanczos method (Algorithm 5), where is the number of the Newton's iteration steps. So, the total cost for each iteration of OTSA-N, ODTSA-N, OTSA-NL, and ODTSA-NL is about , , , and flops, respectively. In general, 2 or 3 iteration steps are enough for the convergence of the Newton's iteration. Therefore, OTSA and ODTSA require less computation amount than TSA and DTSA.

The time complexity for TSA, DTSA, OTSA-N, ODTSA-N, OTSA-NL, and ODTSA-NL is presented in Table 1. We note that the space complexity of all the methods is .

It is well known that Newton's iterative method for a nonlinear equation is commonly of quadratic convergence. So, it converges very fast for the nonlinear equation , where is defined in (28). We have observed in our numerical experiments that 5 Newton's iteration steps are enough for convergence. Therefore, the total computation costs of OTSA and ODTSA are much less than those of TSA and DTSA for obtaining the left and right transformation matrices and .

4. Experimental Results

In order to evaluate the performance of the proposed OTSA-N, ODTSA-N, OTSA-NL, ODTSA-NL algorithms, two well-known face image databases, that is, ORL (http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html) and Yale (http://www.cad.zju.edu.cn/home/dengcai/Data/data.html), are used in the experiments. We compare the recognition performance of OTSA-N, ODTSA-N, OTSA-NL, and ODTSA-NL algorithms with TSA [34] and DTSA [35]. In the experiments, the nearest neighbor classifier is used to classify the transformed results of samples obtained using different methods.

4.1. Experiment on the ORL Database of Face Images

The ORL database contains 400 images of 40 individuals. Each individual has 10 images, which were taken at different time, different lighting conditions, different facial expressions, and different accessories (glasses/no glasses). The sample images of one individual from the ORL database are shown in Figure 1.

We randomly select samples of each individual for training, and the remaining ones are used for testing. Based on the training set, the project matrices are obtained by TSA, DTSA, OTSA-N, ODTSA-N, OTSA-NL, and ODTSA-NL. Then all the testing samples are projected to generate the low-dimensional samples, which will be recognized by using the nearest neighbor classifier. We repeat the process 10 times and calculate the mean and standard deviation of recognition rates.

In our experiments, the parameters and in all the methods are set to be 10. The parameter is set to 1. The mean and standard deviation of recognition accuracy of 10 runs of tests of six algorithms are presented in Table 2. The training time for each method is presented in Table 3. It shows that for all methods, the recognition increases with the increase in training sample size. Moreover, the orthogonal methods have higher recognition accuracy than their nonorthogonal versions, and the orthogonal methods based on the Newton-Lanczos approach cost least computational time.

4.2. Experiment on the Yale Database

The Yale face database contains 165 gray-scale images for 15 individuals where each individual has 11 images. These facial images have variations in lighting conditions (left-light, center-light, right-light), facial expressions (normal, happy, sad, sleepy, surprised, and wink), and with/without glasses. The 11 sample images of one individual from the Yale database are shown in Figure 2.

As in the previous experiments, the parameters and are set to be 10, and is set to 1. The mean and standard deviation of recognition accuracy of 10 runs of tests for the Yale database are presented in Table 4. The training time of each method for the Yale database is presented in Table 5. Clearly, ODTSA-N and ODTSA-NL perform better than TSA, DTSA, OTSA-N, and OTSA-NL for this database, and ODTSA-NL and OTSA-NL outperform TSA, DTSA, OTSA-N, and ODTSA-N according to computational time.

5. Conclusion

In this paper, we propose an orthogonal TSA and orthogonal DTSA for face recognition by constraining the left and right projection matrices to orthogonal matrices. Similarly to TSA and DTSA, OTSA and ODTSA also iteratively compute the left and right projection matrices. However, instead of solving two generalized eigenvalue problems as in TSA and DTSA, it requires solving two trace ratio optimization problems at each iteration of OTSA and ODTSA during iteratively computing the left and right projection matrices. Thus, OTSA and ODTSA have much less computational cost than their nonorthogonal counterparts since the trace ratio optimization problem can be solved by the inexpensive Newton-Lanczos method. Experimental results show that these methods proposed in this paper achieve much higher recognition accuracy and have much lower training cost than TSA and DTSA.

Conflict of Interests

The authors declare that there is no conflict of interests.

Acknowledgments

Yiqin Lin is supported by the National Natural Science Foundation of China under Grant 10801048, the Natural Science Foundation of Hunan Province under Grant 11JJ4009, the Scientific Research Foundation of Education Bureau of Hunan Province for Outstanding Young Scholars in University under Grant 10B038, the Science and Technology Planning Project of Hunan Province under Grant 2010JT4042, and the Chinese Postdoctoral Science Foundation under Grant 2012M511386. Liang Bao is supported by the National Natural Science Foundation of China under Grants 10926150 and 11101149 and the Fundamental Research Funds for the Central Universities.