Journal of Electrical and Computer Engineering

Volume 2016 (2016), Article ID 3919472, 10 pages

http://dx.doi.org/10.1155/2016/3919472

## A Complete Subspace Analysis of Linear Discriminant Analysis and Its Robust Implementation

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China

Received 21 September 2016; Revised 27 October 2016; Accepted 7 November 2016

Academic Editor: Ping Feng Pai

Copyright © 2016 Zhicheng Lu and Zhizheng Liang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Linear discriminant analysis has been widely studied in data mining and pattern recognition. However, when performing the eigen-decomposition on the matrix pair (within-class scatter matrix and between-class scatter matrix) in some cases, one can find that there exist some degenerated eigenvalues, thereby resulting in indistinguishability of information from the eigen-subspace corresponding to some degenerated eigenvalue. In order to address this problem, we revisit linear discriminant analysis in this paper and propose a stable and effective algorithm for linear discriminant analysis in terms of an optimization criterion. By discussing the properties of the optimization criterion, we find that the eigenvectors in some eigen-subspaces may be indistinguishable if the degenerated eigenvalue occurs. Inspired from the idea of the maximum margin criterion (MMC), we embed MMC into the eigen-subspace corresponding to the degenerated eigenvalue to exploit discriminability of the eigenvectors in the eigen-subspace. Since the proposed algorithm can deal with the degenerated case of eigenvalues, it not only handles the small-sample-size problem but also enables us to select projection vectors from the null space of the between-class scatter matrix. Extensive experiments on several face images and microarray data sets are conducted to evaluate the proposed algorithm in terms of the classification performance, and experimental results show that our method has smaller standard deviations than other methods in most cases.

#### 1. Introduction

Linear discriminant analysis (LDA) [1–4] plays an important role in data analysis and has been widely used in many fields such as data mining and pattern recognition [5]. The main aim of LDA is to find optimal projection vectors by simultaneously minimizing the within-class distance and maximizing the between-class distance in the projection space and optimal projection vectors can be achieved by solving a generalized eigenvalue problem. In solving classical LDA, the within-class scatter matrix is required to be nonsingular in the general case. However, in many applications such as text classification and face recognition [6], the within-class scatter matrix is often singular since the dimension of data we deal with is much bigger than the number of data points. This is known as the small-sample-size (SSS) problem.

In the past several decades, various variants of LDA [7–10] have been proposed to address the problems of high-dimensional data and the SSS problem. It is noted that most of LDA-based methods are divided into four categories in terms of the combination of spaces of the within-class scatter and between-class scatter matrices [11].

The first category of these methods is to consider the range space of the within-class scatter matrix and the range space of the between-class scatter matrix. The typical algorithm of this category is the Fisherface [1] method where PCA is first employed to reduce the dimension of features to make the within-class scatter matrix be full-rank and then the standard LDA is performed. In the direct LDA method [12], the null space of the between-class scatter matrix is first removed and then the projection vectors are obtained by minimizing the within-class scatter distance in the range space of the between-class scatter matrix. Li et al. [13] proposed an efficient and stable algorithm to extract the discriminant vectors by defining the maximum margin criterion (MMC). The main difference between Fisher’s criterion and MMC is that the former is to maximize the Fisher quotient while the latter is to maximize the average distance.

The second category mainly depends on exploiting the null space of within-class scatter matrix and the range space of the between-class scatter matrix. In terms of the null space-based LDA, Chen et al. [14] proposed to maximize the between-class scatter in the null space of the within-class scatter matrix and their method is referred to as the NLDA method. In order to reduce the computational cost of calculating the null space of the within-class scatter matrix, several effective methods have been proposed. Instead of directly obtaining the null space of the within-class scatter matrix, Çevikalp et al. [15] first obtained the range space of the within-class scatter matrix and then defined the scatter matrix of common vectors. Based on this, the projection vectors were obtained from the scatter matrix they defined. They also adopted difference subspaces and the Gram-Schmidt orthogonalization procedure to obtain discriminative common vectors. Chu and Thye [16] adopted the QR factorization on several matrices to exploit a new algorithm for the null space-based LDA method. Sharma and Paliwal [17] proposed an alternative null LDA method and discussed its fast implementation. Paliwal and Sharma [18] also developed a variant of pseudoinverse linear discriminant analysis and this method yields better classification performance.

The third category consists of those methods that make use of the null space of within-class scatter matrix, the range space of between-class scatter matrix, and the range space of within-class scatter matrix. Sharma et al. [19] applied improved RLDA to devise a feature selection method to extract important genes. In order to address the problem of the regularization parameter in RLDA, Sharma and Paliwal [20] applied a deterministic method to estimate the parameter by maximizing modified Fisher’s criterion.

The fourth category is made up of those methods that explore all the spaces of the within-class scatter matrix and the between-class scatter matrix. Sharma and Paliwal [11] applied a two-stage technique to regularize both the between-class scatter and within-class scatter matrices to achieve the discriminant information.

In addition, there are other variants of LDA that do not belong to four categories mentioned above. Uncorrelated local Fisher discriminant analysis in terms of manifold learning is devised for ear recognition [21]. An exponential locality preserving projection (ELPP) is presented by introducing the matrix exponential to address the SSS problem. A double shrinking model [22] is constructed for manifold learning and feature selection. Li et al. [23] analyzed linear discriminant analysis in the worst case and reduced this problem to a scalable semidefinite feasibility problem. Zollanvari and Dougherty [24] discussed asymptotic generalization bound of linear discriminant analysis. Lu and Renals [25] used probabilistic linear discriminant analysis to model acoustic data.

In this paper, we revisit the optimization criterion for linear discriminant analysis. We find that there exists the degenerated case for some generalized eigenvalues. In order to deal with the degeneration of eigenvalues, we develop a robust implementation for this criterion in this paper. To be specific, the null space of the total scatter matrix is first removed to remedy the singularity problem. Then the eigen-subspace corresponding to each specific eigenvalue is achieved. Finally, in each eigen-subspace, the discriminability of eigenvectors is measured by the maximum margin criterion and the projection vectors can be achieved by optimizing this criterion. We also conduct extensive experiments to evaluate the proposed method on various well-known data sets such as face images and microarray data sets. Experimental results show that our method is more stable than other methods in most cases.

#### 2. Related Works

Assume that there are a set of -dimensional data points, denoted by , where . When the labels of data points are available, each data point belongs to exactly one of object classes and the number of samples in class is . Thus, is the number of all data points. In classical linear discriminant analysis, the between-class scatter matrix, the within-class scatter matrix, and the total scatter matrix are defined as follows:where is the centroid of the th class and is the global centroid of the data set. The precursor matrices are defined aswhere and is the data matrix that consists of data points from class .

Classical LDA is to find the projection direction by making data points from different classes far from each other and data points from the same class close to each other. To be specific, LDA is to obtain the optimal projection vector by optimizing the following objective function: The optimal projection direction can be achieved by solving the generalized eigenproblem: . In general, there are at most eigenvectors corresponding to nonzero generalized eigenvalues since the rank of the matrix is not bigger than . When is singular, some methods including PCA plus LDA [1], LDA/GSVD [7], and LDA/QR [26] can be used to deal with this problem.

#### 3. Optimization Criterion and Its Robust Implementation

In this section, we revisit an optimization criterion for linear discriminant analysis and its properties are analyzed in detail. Finally, we discuss its robust implementation.

Note that if the matrix is singular, the optimal function value of (3) will take the positive infinity. There are several variants of the model in (3) that can be found [27]. In fact, when the matrix is nonsingular, it is not difficult to verify that these variants of (3) are equivalent [27]. For convenience, we adopt the following optimization criterion to give a stable and efficient algorithm for linear discriminant analysis, denoted by

The main aim for adopting (4) is based on the following reasons. First, the objective function is a bounded function in the general case, which avoids the case that the objective function takes the infinity. Second, since the null space of plays an important role in some cases, especially in the small-sample-size problem, the optimization criterion of (4) also provides convenience for analyzing the null space of . In fact, it is straightforward to verify that (4) and (3) are equivalent under some conditions. Most importantly, (4) can produce more generalized eigenvalues than (3) since the rank of is not smaller than the rank of . In addition, from the viewpoint of optimization, the objective function we optimize is usually bounded. Thus, (4) is more preferred than (3) in some cases.

It is obvious that the optimal projection of (4) can be achieved by solving the generalized eigenproblem: when the matrix is nonsingular. Later we will note that the generalized eigenvalue will take values in the interval of 0 and 1. Different from classical LDA, we extract the discriminant vectors which are composed of the first eigenvectors of corresponding to the first smallest eigenvalues if is nonsingular. In such a case, we can avoid the singularity problem of the matrix . Before giving an explicit implementation of the optimization criterion of (4), we start by giving the definitions of some subspaces [28].

*Definition 1. *Let be an positive semidefinite matrix and be an eigenvalue of . The set of all eigenvectors of corresponding to the eigenvalue , together with the zero vector, forms a subspace. This subspace is referred to as the eigen-subspace of with .

*Definition 2. *The null space of the matrix is the set of all eigenvectors of with

*Definition 3. *The range space of the matrix is the set of all eigenvectors of corresponding to nonzero eigenvalues.

In the case of the positive semidefinite matrix, the number of repeated roots of the characteristic equation determines the dimension of the eigen-subspace of with . If the dimension of the eigen-subspace of with is bigger than 1, the eigenvalue is degenerative since the number of repeated roots of the characteristic equation is bigger than 1. It is observed from (1) that the matrices , and are positive semidefinite. According to the above definitions, we can obtain the following four subspaces from and [20]:(a)The null space of is denoted by null ().(b)The null space of is denoted by null ().(c)The range space of is denoted by span ().(d)The range space of is denoted by span ().

Based on these four subspaces, we can construct another four subspaces.(e)Subspace is defined as the intersection of span () and null ().(f)Subspace is defined as the intersection of span () and span ().(g)Subspace is defined as the intersection of null () and span ().(h)Subspace is defined as the intersection of null () and null ().From Subspaces , , , and , we find that the objective function of (4) satisfies the following equation: From (5), one can see that if is taken from Subspace , Subspace , or Subspace , the objective function is bounded. If belongs to Subspace , the objective function takes the indefinite value. It is of interest to note that the null space of is the intersection of the null space of and the null space of . It has been proved that the null space of does not contain any discriminant information [29]. Thus, Subspace does not contain any discriminant information and this also shows that part of the null space of does not contain discriminant information. Therefore, Subspace can be removed without losing any information and this can be done by removing the null space of . An effective method to remove the null space of is to perform the singular value decomposition (SVD) [28] on , denoted by , where consists of the left singular vectors corresponding to the nonzero singular values of . In such a case, we do not lose any information of data. By doing so, we also remove part of the null space of that does not contain discriminant information. Since we focus on (4), the range space of must be considered. If the null space of is removed, it is necessary to consider three subspaces in the case of (4): the null space of , the range space of , and the range space of . For these three subspaces, we also give their relations with Subspace , Subspace , and Subspace . It is not difficult to verify that the intersection of the null space of and the range space of is equivalent to Subspace , and the intersection of the range space of and the range space of contains Subspaces and . This shows that we do not lose any discriminant information from Subspace , Subspace , and Subspace if we solve (4). In such a case, we first remove the null space of . That is, we consider the following optimization function in the case of the range space of , where , .

It is evident that in (6) is nonsingular when the null space of is removed. In such a case, we obtain the projection vectors which are composed of eigenvectors of corresponding to eigenvalues. From (6), we can see that takes values in the interval of 0 and 1. In fact, the value gives an indicator of choosing the effective subspace. According to the definition of the optimization criterion, we have the following conclusions: the subspace corresponding to is the most important; the subspace corresponding to is the second important; the subspace corresponding to is the least important.

By solving the generalized eigenproblem, , we can obtain eigenvalues, which produces eigenvectors. In some cases some of these eigenvalues may be equal. In other words, some eigenvalues degenerate into the same eigenvalue, which may affect the performance of some algorithms. Assume that these eigenvalues consist of different values in an increasing order and have multiplicities , where denotes the algebraic multiplicities of the eigenvalue and . In some situations, it is useful to work with the set of all eigenvectors associated with a specific value . Let us define the following set:

The dimension of is in general equal to the algebraic multiplicities of since and are symmetric real matrices. The set forms the eigen-subspace of the matrix pairs corresponding to the generalized eigenvalue . When the dimension of is equal to 1, it is not necessary to deal with this subspace since it only contains an eigenvector. However, when the dimension of is bigger than 1, it is impossible to determine which eigenvector in this eigen-subspace is the most important since all the eigenvectors correspond to the same eigenvalue. The case often occurs in the small-sample-size problem where the dimension of the eigen-subspace of is relatively high. In such a case, it is infeasible to determine which projection vector in the eigen-subspace of is the most important if we only use (7). For some nonzero generalized eigenvalues from the matrix pair , the dimension of may be bigger than 1. For example, shows that the eigenvector is taken from the null space of . Generally speaking, the dimension of the null space is bigger than 1 and this makes the dimension of be bigger than 1. So it is necessary to use an additional strategy to determine the importance of eigenvectors if the dimension of is bigger than 1. For the subspace , we can obtain a matrix whose columns consist of the eigenvectors of the generalized eigenvalue , denoted by . Obviously the dimension of is equal to the number of the columns of . If this matrix is provided, it is straightforward to obtain an orthogonal basis by performing the QR decomposition on and the orthogonal basis can be expressed in the matrix form: . Note that the space spanned by the column vectors of is equivalent to the space spanned by the column vectors of . Thus, in the space spanned by the column vectors of , we formulate the following objective function based on the maximum margin criterion.where .

When the dimension of the set is 1, it is easy to prove that . When the dimension of the set is bigger than 1, it is necessary to obtain eigenvectors of corresponding to eigenvalues in a decreasing order. These eigenvectors form the matrix . Thus, the discriminability of eigenvectors in the eigen-subspace of can be measured by the eigenvalues of . This gives suggestions on how to choose effective discriminant vectors in the eigen-subspace , which solves the degenerated case of eigenvalues. In classical LDA, the discriminability of eigenvectors in the eigen-subspace is sometimes neglected.

Note that, in the small-sample-size problem, the dimension of the eigen-subspace of is relatively high. In such a case, we need to obtain this eigen-subspace. In fact, it is noted that the eigen-subspace is the null space of and obtaining the null space may be time consuming when the dimension of the null space of is high. Fortunately, several effective methods have been proposed to obtain the null space of . Çevikalp et al. [15] have proposed an effective algorithm to avoid computing the null space of by finding the range space of . Note that the dimension of the range space of is equal to the rank of the matrix . Based on the range space of , we can obtain common vectors for each class and construct the scatter matrix of the common vectors as done in [15]. Finally, the projection vectors can be obtained by performing eigen-decomposition on the scatter matrix of the common vectors.

As a summary of the above discussion, we list the detailed steps for solving linear discriminant analysis in Algorithm 4.

*Algorithm 4. *It is a stable and efficient algorithm for solving linear discriminant analysis.*Step **1*. Construct , , and , and compute the left singular matrix of by performing the SVD on , where consists of singular vectors corresponding to the nonzero singular values of ; obtain . *Step **2.* Obtain the range space of , denoted by whose column vectors are orthogonal; perform the SVD on and assign in an increasing order from the diagonal elements of . *Step **3.* Let . If is not a zero matrix, perform Step ; otherwise, go to Step . *Step **4.* Based on , obtain the common vectors of each class, compute the scatter matrix of common vectors, and perform the eigen-decomposition on the scatter matrix of common vectors to obtain projection vectors, denoted by . *Step **5.* For each nonzero , do the following.*Step **5(a).* Obtain the singular submatrix by searching the column vectors of corresponding to the singular value ; let ; apply the QR decomposition on to obtain the matrix whose column vectors are orthogonal. *Step **5(b).* Let and ; compute all discriminant vectors which are the eigenvectors of ; sort the eigenvectors according to the eigenvalues of in a decreasing order and form the matrix . *Step **6.* Obtain the transformation matrix .

Note that, in Step of Algorithm 4, we only need to obtain the range space of , that is, an orthonormal basis of . There are some effective methods for obtaining the range space of . For example, the range space of can be achieved by finding the left singular matrix of corresponding to nonzero singular values. It is pointed out in [28] that computing left singular vectors of corresponding to nonzero singular values is more efficient than finding left singular vectors of corresponding to all singular values including zeros. In addition, one may resort to difference subspaces and the Gram-Schmidt orthogonalization procedure [15] to obtain the range space of . Note that, in Step of Algorithm 4, we use a criterion to judge whether the null space of exists. If is not a zero matrix, this shows that there exists the null space of . In such a case, one may use the method (Step of Algorithm 4) proposed in [15] to further deal with the null space of . It is observed from Algorithm 4 that we need to perform Step of Algorithm 4 regardless of the existence of the null space of . In such a case, we can see that eigenvectors can be ordered in terms of their importance. By performing Algorithm 4, we can evaluate the projection vectors from Subspace which is often neglected in the previous literature. It is obvious that the above method can provide discriminant vectors because the rank of is which is much bigger than . As a result, this method may be helpful when the number of classes is relatively small. Note that we use the eigenvalue in (7) and it is not difficult to verify that . If the singular value occurs only once in the diagonal elements of , we do not need to perform Step (b) in real applications.

#### 4. Experiments Results

In our experiments, we use the ORL face database, the Yale face database, and microarray data sets to evaluate the performance of Algorithm 4. The ORL face database consists of 40 distinct persons, with each containing 10 different images with variations in poses, illumination, facial expressions, and facial details. The original face images are resized to pixels with 256-level gray scales. The Yale face database contains 165 gray-scale images of 15 individuals. The images demonstrate variations in lighting condition and facial expressions. All of these face images are aligned based on eye coordinates and are resized to pixels. Six microarray data sets including ALLMLL [30], Duke-Breast [31], Colon [32], Prostate [32], Leukemia [32], and MLL [32] are used to test the proposed method. Table 1 lists the statistics of the data sets we use. It is observed that the dimensions of features of samples on these data sets are much higher than the number of samples. The experiments are performed on a PC with the operating system of Windows 8.1, an i3 CPU (3.30 GHz) and an 8 G memory. The programming environment is MATLAB 2014a.