#### Abstract

Linear discriminant analysis has been widely studied in data mining and pattern recognition. However, when performing the eigen-decomposition on the matrix pair (within-class scatter matrix and between-class scatter matrix) in some cases, one can find that there exist some degenerated eigenvalues, thereby resulting in indistinguishability of information from the eigen-subspace corresponding to some degenerated eigenvalue. In order to address this problem, we revisit linear discriminant analysis in this paper and propose a stable and effective algorithm for linear discriminant analysis in terms of an optimization criterion. By discussing the properties of the optimization criterion, we find that the eigenvectors in some eigen-subspaces may be indistinguishable if the degenerated eigenvalue occurs. Inspired from the idea of the maximum margin criterion (MMC), we embed MMC into the eigen-subspace corresponding to the degenerated eigenvalue to exploit discriminability of the eigenvectors in the eigen-subspace. Since the proposed algorithm can deal with the degenerated case of eigenvalues, it not only handles the small-sample-size problem but also enables us to select projection vectors from the null space of the between-class scatter matrix. Extensive experiments on several face images and microarray data sets are conducted to evaluate the proposed algorithm in terms of the classification performance, and experimental results show that our method has smaller standard deviations than other methods in most cases.

#### 1. Introduction

Linear discriminant analysis (LDA) [1–4] plays an important role in data analysis and has been widely used in many fields such as data mining and pattern recognition [5]. The main aim of LDA is to find optimal projection vectors by simultaneously minimizing the within-class distance and maximizing the between-class distance in the projection space and optimal projection vectors can be achieved by solving a generalized eigenvalue problem. In solving classical LDA, the within-class scatter matrix is required to be nonsingular in the general case. However, in many applications such as text classification and face recognition [6], the within-class scatter matrix is often singular since the dimension of data we deal with is much bigger than the number of data points. This is known as the small-sample-size (SSS) problem.

In the past several decades, various variants of LDA [7–10] have been proposed to address the problems of high-dimensional data and the SSS problem. It is noted that most of LDA-based methods are divided into four categories in terms of the combination of spaces of the within-class scatter and between-class scatter matrices [11].

The first category of these methods is to consider the range space of the within-class scatter matrix and the range space of the between-class scatter matrix. The typical algorithm of this category is the Fisherface [1] method where PCA is first employed to reduce the dimension of features to make the within-class scatter matrix be full-rank and then the standard LDA is performed. In the direct LDA method [12], the null space of the between-class scatter matrix is first removed and then the projection vectors are obtained by minimizing the within-class scatter distance in the range space of the between-class scatter matrix. Li et al. [13] proposed an efficient and stable algorithm to extract the discriminant vectors by defining the maximum margin criterion (MMC). The main difference between Fisher’s criterion and MMC is that the former is to maximize the Fisher quotient while the latter is to maximize the average distance.

The second category mainly depends on exploiting the null space of within-class scatter matrix and the range space of the between-class scatter matrix. In terms of the null space-based LDA, Chen et al. [14] proposed to maximize the between-class scatter in the null space of the within-class scatter matrix and their method is referred to as the NLDA method. In order to reduce the computational cost of calculating the null space of the within-class scatter matrix, several effective methods have been proposed. Instead of directly obtaining the null space of the within-class scatter matrix, Çevikalp et al. [15] first obtained the range space of the within-class scatter matrix and then defined the scatter matrix of common vectors. Based on this, the projection vectors were obtained from the scatter matrix they defined. They also adopted difference subspaces and the Gram-Schmidt orthogonalization procedure to obtain discriminative common vectors. Chu and Thye [16] adopted the QR factorization on several matrices to exploit a new algorithm for the null space-based LDA method. Sharma and Paliwal [17] proposed an alternative null LDA method and discussed its fast implementation. Paliwal and Sharma [18] also developed a variant of pseudoinverse linear discriminant analysis and this method yields better classification performance.

The third category consists of those methods that make use of the null space of within-class scatter matrix, the range space of between-class scatter matrix, and the range space of within-class scatter matrix. Sharma et al. [19] applied improved RLDA to devise a feature selection method to extract important genes. In order to address the problem of the regularization parameter in RLDA, Sharma and Paliwal [20] applied a deterministic method to estimate the parameter by maximizing modified Fisher’s criterion.

The fourth category is made up of those methods that explore all the spaces of the within-class scatter matrix and the between-class scatter matrix. Sharma and Paliwal [11] applied a two-stage technique to regularize both the between-class scatter and within-class scatter matrices to achieve the discriminant information.

In addition, there are other variants of LDA that do not belong to four categories mentioned above. Uncorrelated local Fisher discriminant analysis in terms of manifold learning is devised for ear recognition [21]. An exponential locality preserving projection (ELPP) is presented by introducing the matrix exponential to address the SSS problem. A double shrinking model [22] is constructed for manifold learning and feature selection. Li et al. [23] analyzed linear discriminant analysis in the worst case and reduced this problem to a scalable semidefinite feasibility problem. Zollanvari and Dougherty [24] discussed asymptotic generalization bound of linear discriminant analysis. Lu and Renals [25] used probabilistic linear discriminant analysis to model acoustic data.

In this paper, we revisit the optimization criterion for linear discriminant analysis. We find that there exists the degenerated case for some generalized eigenvalues. In order to deal with the degeneration of eigenvalues, we develop a robust implementation for this criterion in this paper. To be specific, the null space of the total scatter matrix is first removed to remedy the singularity problem. Then the eigen-subspace corresponding to each specific eigenvalue is achieved. Finally, in each eigen-subspace, the discriminability of eigenvectors is measured by the maximum margin criterion and the projection vectors can be achieved by optimizing this criterion. We also conduct extensive experiments to evaluate the proposed method on various well-known data sets such as face images and microarray data sets. Experimental results show that our method is more stable than other methods in most cases.

#### 2. Related Works

Assume that there are a set of -dimensional data points, denoted by , where . When the labels of data points are available, each data point belongs to exactly one of object classes and the number of samples in class is . Thus, is the number of all data points. In classical linear discriminant analysis, the between-class scatter matrix, the within-class scatter matrix, and the total scatter matrix are defined as follows:where is the centroid of the th class and is the global centroid of the data set. The precursor matrices are defined aswhere and is the data matrix that consists of data points from class .

Classical LDA is to find the projection direction by making data points from different classes far from each other and data points from the same class close to each other. To be specific, LDA is to obtain the optimal projection vector by optimizing the following objective function: The optimal projection direction can be achieved by solving the generalized eigenproblem: . In general, there are at most eigenvectors corresponding to nonzero generalized eigenvalues since the rank of the matrix is not bigger than . When is singular, some methods including PCA plus LDA [1], LDA/GSVD [7], and LDA/QR [26] can be used to deal with this problem.

#### 3. Optimization Criterion and Its Robust Implementation

In this section, we revisit an optimization criterion for linear discriminant analysis and its properties are analyzed in detail. Finally, we discuss its robust implementation.

Note that if the matrix is singular, the optimal function value of (3) will take the positive infinity. There are several variants of the model in (3) that can be found [27]. In fact, when the matrix is nonsingular, it is not difficult to verify that these variants of (3) are equivalent [27]. For convenience, we adopt the following optimization criterion to give a stable and efficient algorithm for linear discriminant analysis, denoted by

The main aim for adopting (4) is based on the following reasons. First, the objective function is a bounded function in the general case, which avoids the case that the objective function takes the infinity. Second, since the null space of plays an important role in some cases, especially in the small-sample-size problem, the optimization criterion of (4) also provides convenience for analyzing the null space of . In fact, it is straightforward to verify that (4) and (3) are equivalent under some conditions. Most importantly, (4) can produce more generalized eigenvalues than (3) since the rank of is not smaller than the rank of . In addition, from the viewpoint of optimization, the objective function we optimize is usually bounded. Thus, (4) is more preferred than (3) in some cases.

It is obvious that the optimal projection of (4) can be achieved by solving the generalized eigenproblem: when the matrix is nonsingular. Later we will note that the generalized eigenvalue will take values in the interval of 0 and 1. Different from classical LDA, we extract the discriminant vectors which are composed of the first eigenvectors of corresponding to the first smallest eigenvalues if is nonsingular. In such a case, we can avoid the singularity problem of the matrix . Before giving an explicit implementation of the optimization criterion of (4), we start by giving the definitions of some subspaces [28].

*Definition 1. *Let be an positive semidefinite matrix and be an eigenvalue of . The set of all eigenvectors of corresponding to the eigenvalue , together with the zero vector, forms a subspace. This subspace is referred to as the eigen-subspace of with .

*Definition 2. *The null space of the matrix is the set of all eigenvectors of with

*Definition 3. *The range space of the matrix is the set of all eigenvectors of corresponding to nonzero eigenvalues.

In the case of the positive semidefinite matrix, the number of repeated roots of the characteristic equation determines the dimension of the eigen-subspace of with . If the dimension of the eigen-subspace of with is bigger than 1, the eigenvalue is degenerative since the number of repeated roots of the characteristic equation is bigger than 1. It is observed from (1) that the matrices , and are positive semidefinite. According to the above definitions, we can obtain the following four subspaces from and [20]:(a)The null space of is denoted by null ().(b)The null space of is denoted by null ().(c)The range space of is denoted by span ().(d)The range space of is denoted by span ().

Based on these four subspaces, we can construct another four subspaces.(e)Subspace is defined as the intersection of span () and null ().(f)Subspace is defined as the intersection of span () and span ().(g)Subspace is defined as the intersection of null () and span ().(h)Subspace is defined as the intersection of null () and null ().From Subspaces , , , and , we find that the objective function of (4) satisfies the following equation: From (5), one can see that if is taken from Subspace , Subspace , or Subspace , the objective function is bounded. If belongs to Subspace , the objective function takes the indefinite value. It is of interest to note that the null space of is the intersection of the null space of and the null space of . It has been proved that the null space of does not contain any discriminant information [29]. Thus, Subspace does not contain any discriminant information and this also shows that part of the null space of does not contain discriminant information. Therefore, Subspace can be removed without losing any information and this can be done by removing the null space of . An effective method to remove the null space of is to perform the singular value decomposition (SVD) [28] on , denoted by , where consists of the left singular vectors corresponding to the nonzero singular values of . In such a case, we do not lose any information of data. By doing so, we also remove part of the null space of that does not contain discriminant information. Since we focus on (4), the range space of must be considered. If the null space of is removed, it is necessary to consider three subspaces in the case of (4): the null space of , the range space of , and the range space of . For these three subspaces, we also give their relations with Subspace , Subspace , and Subspace . It is not difficult to verify that the intersection of the null space of and the range space of is equivalent to Subspace , and the intersection of the range space of and the range space of contains Subspaces and . This shows that we do not lose any discriminant information from Subspace , Subspace , and Subspace if we solve (4). In such a case, we first remove the null space of . That is, we consider the following optimization function in the case of the range space of , where , .

It is evident that in (6) is nonsingular when the null space of is removed. In such a case, we obtain the projection vectors which are composed of eigenvectors of corresponding to eigenvalues. From (6), we can see that takes values in the interval of 0 and 1. In fact, the value gives an indicator of choosing the effective subspace. According to the definition of the optimization criterion, we have the following conclusions: the subspace corresponding to is the most important; the subspace corresponding to is the second important; the subspace corresponding to is the least important.

By solving the generalized eigenproblem, , we can obtain eigenvalues, which produces eigenvectors. In some cases some of these eigenvalues may be equal. In other words, some eigenvalues degenerate into the same eigenvalue, which may affect the performance of some algorithms. Assume that these eigenvalues consist of different values in an increasing order and have multiplicities , where denotes the algebraic multiplicities of the eigenvalue and . In some situations, it is useful to work with the set of all eigenvectors associated with a specific value . Let us define the following set:

The dimension of is in general equal to the algebraic multiplicities of since and are symmetric real matrices. The set forms the eigen-subspace of the matrix pairs corresponding to the generalized eigenvalue . When the dimension of is equal to 1, it is not necessary to deal with this subspace since it only contains an eigenvector. However, when the dimension of is bigger than 1, it is impossible to determine which eigenvector in this eigen-subspace is the most important since all the eigenvectors correspond to the same eigenvalue. The case often occurs in the small-sample-size problem where the dimension of the eigen-subspace of is relatively high. In such a case, it is infeasible to determine which projection vector in the eigen-subspace of is the most important if we only use (7). For some nonzero generalized eigenvalues from the matrix pair , the dimension of may be bigger than 1. For example, shows that the eigenvector is taken from the null space of . Generally speaking, the dimension of the null space is bigger than 1 and this makes the dimension of be bigger than 1. So it is necessary to use an additional strategy to determine the importance of eigenvectors if the dimension of is bigger than 1. For the subspace , we can obtain a matrix whose columns consist of the eigenvectors of the generalized eigenvalue , denoted by . Obviously the dimension of is equal to the number of the columns of . If this matrix is provided, it is straightforward to obtain an orthogonal basis by performing the QR decomposition on and the orthogonal basis can be expressed in the matrix form: . Note that the space spanned by the column vectors of is equivalent to the space spanned by the column vectors of . Thus, in the space spanned by the column vectors of , we formulate the following objective function based on the maximum margin criterion.where .

When the dimension of the set is 1, it is easy to prove that . When the dimension of the set is bigger than 1, it is necessary to obtain eigenvectors of corresponding to eigenvalues in a decreasing order. These eigenvectors form the matrix . Thus, the discriminability of eigenvectors in the eigen-subspace of can be measured by the eigenvalues of . This gives suggestions on how to choose effective discriminant vectors in the eigen-subspace , which solves the degenerated case of eigenvalues. In classical LDA, the discriminability of eigenvectors in the eigen-subspace is sometimes neglected.

Note that, in the small-sample-size problem, the dimension of the eigen-subspace of is relatively high. In such a case, we need to obtain this eigen-subspace. In fact, it is noted that the eigen-subspace is the null space of and obtaining the null space may be time consuming when the dimension of the null space of is high. Fortunately, several effective methods have been proposed to obtain the null space of . Çevikalp et al. [15] have proposed an effective algorithm to avoid computing the null space of by finding the range space of . Note that the dimension of the range space of is equal to the rank of the matrix . Based on the range space of , we can obtain common vectors for each class and construct the scatter matrix of the common vectors as done in [15]. Finally, the projection vectors can be obtained by performing eigen-decomposition on the scatter matrix of the common vectors.

As a summary of the above discussion, we list the detailed steps for solving linear discriminant analysis in Algorithm 4.

*Algorithm 4. *It is a stable and efficient algorithm for solving linear discriminant analysis.*Step **1*. Construct , , and , and compute the left singular matrix of by performing the SVD on , where consists of singular vectors corresponding to the nonzero singular values of ; obtain . *Step **2.* Obtain the range space of , denoted by whose column vectors are orthogonal; perform the SVD on and assign in an increasing order from the diagonal elements of . *Step **3.* Let . If is not a zero matrix, perform Step ; otherwise, go to Step . *Step **4.* Based on , obtain the common vectors of each class, compute the scatter matrix of common vectors, and perform the eigen-decomposition on the scatter matrix of common vectors to obtain projection vectors, denoted by . *Step **5.* For each nonzero , do the following.*Step **5(a).* Obtain the singular submatrix by searching the column vectors of corresponding to the singular value ; let ; apply the QR decomposition on to obtain the matrix whose column vectors are orthogonal. *Step **5(b).* Let and ; compute all discriminant vectors which are the eigenvectors of ; sort the eigenvectors according to the eigenvalues of in a decreasing order and form the matrix . *Step **6.* Obtain the transformation matrix .

Note that, in Step of Algorithm 4, we only need to obtain the range space of , that is, an orthonormal basis of . There are some effective methods for obtaining the range space of . For example, the range space of can be achieved by finding the left singular matrix of corresponding to nonzero singular values. It is pointed out in [28] that computing left singular vectors of corresponding to nonzero singular values is more efficient than finding left singular vectors of corresponding to all singular values including zeros. In addition, one may resort to difference subspaces and the Gram-Schmidt orthogonalization procedure [15] to obtain the range space of . Note that, in Step of Algorithm 4, we use a criterion to judge whether the null space of exists. If is not a zero matrix, this shows that there exists the null space of . In such a case, one may use the method (Step of Algorithm 4) proposed in [15] to further deal with the null space of . It is observed from Algorithm 4 that we need to perform Step of Algorithm 4 regardless of the existence of the null space of . In such a case, we can see that eigenvectors can be ordered in terms of their importance. By performing Algorithm 4, we can evaluate the projection vectors from Subspace which is often neglected in the previous literature. It is obvious that the above method can provide discriminant vectors because the rank of is which is much bigger than . As a result, this method may be helpful when the number of classes is relatively small. Note that we use the eigenvalue in (7) and it is not difficult to verify that . If the singular value occurs only once in the diagonal elements of , we do not need to perform Step (b) in real applications.

#### 4. Experiments Results

In our experiments, we use the ORL face database, the Yale face database, and microarray data sets to evaluate the performance of Algorithm 4. The ORL face database consists of 40 distinct persons, with each containing 10 different images with variations in poses, illumination, facial expressions, and facial details. The original face images are resized to pixels with 256-level gray scales. The Yale face database contains 165 gray-scale images of 15 individuals. The images demonstrate variations in lighting condition and facial expressions. All of these face images are aligned based on eye coordinates and are resized to pixels. Six microarray data sets including ALLMLL [30], Duke-Breast [31], Colon [32], Prostate [32], Leukemia [32], and MLL [32] are used to test the proposed method. Table 1 lists the statistics of the data sets we use. It is observed that the dimensions of features of samples on these data sets are much higher than the number of samples. The experiments are performed on a PC with the operating system of Windows 8.1, an i3 CPU (3.30 GHz) and an 8 G memory. The programming environment is MATLAB 2014a.

##### 4.1. Face Recognition

In this set of experiments, the number of each individual in the training set varies from 2 to , and the remaining images in the data set are used to form the testing set. To reduce the variations of the accuracies from randomness, the classification performance we report in the experiments is achieved over twenty runs. That is, there are twenty different training and testing sets used for evaluating the classification performance. We compare the proposed method with some previous methods including LDA/GSVD [7], LDA/QR [26], DLDA [12], PCA+LDA [1], MMC [13], and the discriminant common vector (DCV) approach [15] which is an effective approach for solving NLDA [14]. Note that these methods are designed to solve the small-sample-size problem when linear discriminant analysis is used. Subspace or Subspace are considered in LDA/QR, DLDA, PCA+LDA, and DCV. Although LDA/GSVD makes use of three subspaces (Subspace , Subspace , and Subspace ), the importance of projection vectors in some eigen-subspaces is not effectively measured in some cases. In this paper we do not compare other discriminant methods since the main objective of paper is to provide a stable and efficient algorithm for solving the degenerated eigenvalue of LDA. Note that we do not give the running time of algorithms we test since some methods only make use of part of subspaces in (5). Generally speaking, the performance of each algorithm varies with the change of the dimension of features. For comparisons, we try to search for the performance on all the feature dimensions and list the best one.

Figure 1 shows the error rate of each method we test with different training images in each class on the ORL and Yale face databases. For clarity, we also show the mean and standard deviation in the parentheses of the error rates of each method in Table 2. Note that the best performance of each method in each line is highlighted in bold and we show the results of 2, 4, 6, and 8 training images per class.

**(a) ORL**

**(b) Yale**

From Figure 1 and Table 2, one can see that the error rate of each algorithm decreases as the number of the training samples in each class increases in most cases. It is observed from Table 2 that the standard deviation of our method is smaller than that of the other methods in most cases. On the ORL face database, the error rate of our method decreases from 16.32% with 2 training samples per class to 1% with 9 training samples per class, while the error rates of DLDA, PCA+LDA, MMC, DCV, LDA/QR, and LDA/GSVD decrease from 36.73%, 29.82%, 18.04%, 16.43%, 21.62%, and 19.92% with 2 training samples per class to 1.75%, 1.25%, 1.625%, 1.125%, 2.75%, and 3% with 9 training samples per class, respectively. The results show that our method outperforms other methods in most cases. On the Yale face database, although the DCV method gives the best result in the case of 2 training samples per class, it obtains the biggest standard deviation. It is also observed that our method is superior to other methods in terms of the classification performance with the increase of training samples.

Since the number of the extracted features of samples by using the proposed method is not limited by the number of classes and only limited by the rank of , we can project the samples onto the space whose dimension is greater than the number of classes. Figure 2 shows a plot of the error rate versus dimensionality. The numbers in the parentheses denote the optimal dimension corresponding to the best classification performance. As can be seen from Figure 2, the error rate of the proposed method decreases with the increase of training samples per class. It is also found from Figure 2 that the classification performance may be improved when the dimension of the reduced space is bigger than the number of classes. On the Yale face database, it is observed that the error rate of the proposed method first decreases and then rises with the increase of dimensions, which shows that choosing too many features yields the overfitting phenomenon in the classification task. On the ORL face database, the error rate of the proposed method first decreases drastically and then becomes flat when the number of training samples is bigger than 2. It is found that the best performance of our method is achieved when the number of extracted features is much bigger than the number of classes. In short, these experimental results show that Subspace which is often neglected in classical LDA in (5) may play a role in face recognition in some cases.

**(a) ORL**

**(b) Yale**

Now let us explain the reason why our method can achieve the good classification performance. The DLDA and LDA/QR methods first remove the null space of . However, removing the null space of will also lose part of the null space of and may result in the loss of important information in the null space of . The PCA+LDA method does not consider the null space of . It has been proved that the null space of will play an important role in the SSS problem [14]. The DCV method does not make use of subspace in (5) and this subspace may be helpful in obtaining discriminant vectors in the SSS problem. Although the LDA/GSVD method considers three subspaces, the discriminability of each eigen-subspace is not analyzed. In the MMC method, the discriminant vectors in Subspace and Subspace in (5) may have the same objective function. This results in the difficulty in determining which discriminant vector is the most important. In fact, Subspace in (5) is often neglected in LDA-based methods in the previous literature. We give a strategy to measure the importance of each discriminant vector in all subspaces including Subspace for the first time. As can be seen from Figure 2, Subspace also plays a role in face recognition. As a result, the proposed method can achieve better classification performance than other methods in the general case.

In the following experiments, we study the effect of image sizes on the classification performance in terms of two face databases. Since the number of face images on these two face databases is relatively small, the leave-one-out method is performed where it takes one image for testing and the remaining images for training. By reducing the image resolution of pixels, we can obtain pixels where each pixel value is the average value of a subimage of the original images. Similarly, we can achieve the images with pixels. In such cases, there exists the null space of the within-class scatter matrix. Table 3 shows the experimental results of each method in three resolutions on two face databases.

As can be seen from Table 3, the error rate of each method does not always increase with the reduction of image resolutions. On the ORL face database, the DCV method obtains the best classification result on the resolution of pixels. With the reduction of image resolutions, the performance of NLDA becomes worse since the dimension of the null space of becomes smaller. On the ORL face database, the proposed method is better than LDA/GSVD and has a smaller standard deviation than other methods in most cases. The main reason is that we consider the degenerated case of the eigenvalue. It is noted that our method achieves the best classification result when the resolution of images is pixels. On the Yale face database, the proposed method outperforms other methods in terms of the classification performance. It is also observed that the best recognition rate among all methods is 92.13% and is achieved by the proposed method when the images are pixels on the Yale face database. From these experiments, we can also notice that it is not necessary to use the large-size images to obtain good classification performance in the classification task.

##### 4.2. Applications to Microarray Data Sets

In this set of experiments, we further validate the proposed method on microarray data sets. In order to evaluate the classification performance of various LDA methods, we adopt the tenfold cross validation on these data sets. In other words, we divide each data set into ten subsets of approximately equal sizes. Then we perform training and testing ten times, each time leaving out one of the subsets for training and the discarded subset for testing. The classification performance is averaged over ten runs. Table 4 shows the mean and the standard deviation of the error rate of each method.

As can be seen from Table 4, the classification performance of the proposed method is consistently superior to that of other methods on all the data sets we tested. It is found that our method is more stable than other methods since the standard deviation of our method is smaller than that of other methods on all of data sets we tested. It is noted that PCA+LDA performs poorly on Leukemia and MLL data sets. This may come from the fact that the null space of the within-class scatter matrix is removed and it plays an important role in obtaining discriminant feature vectors. It is also found that DLDA does not give satisfactory results on Duke-Breast and Colon data sets since DLDA may remove the part of the null space of the within-class scatter matrix. One can see from Table 4 that the NLDA method achieves good classification accuracies on these data sets since these data sets are the small-sample-size sets. One can also observe that the LDA/QR method does not perform well on some data sets. This may be explained by the fact that the LDA/QR method may remove part of the range space of and part of the null space of . It is found that LDA/GSVD is not better than our method although LDA/GSVD considers three subspaces. This is possibly because in LDA/GSVD the discriminability of each eigen-subspace is not given. Because the discriminant vectors in Subspace and Subspace in the MMC method may correspond to the same objective function, this may lead to the degradation in MMC. Overall, the proposed method is very stable on these data sets due to the fact that we consider the degenerated eigenvalues of scatter matrices, especially for Subspace which is neglected in previous literature.

#### 5. Conclusions

In this paper, we revisit linear discriminant analysis based on an optimization criterion. Different from the existing LDA-based algorithms, the new algorithm adopts the spirit of the maximum margin criterion (MMC) and applies MMC to the eigen-subspace when the eigenvalue is degenerative. The new implementation avoids the singularity problem in the SSS problem and provides more than discriminant vectors. We also conduct a series of comparative studies on face images and microarray data sets to evaluate the proposed method. Our experiments on face images and microarray data sets demonstrate that the classification performance achieved by our method is better than that of other LDA-based algorithms in most cases and the proposed method is an effective and stable linear discriminant method for dealing with high-dimensional data.

#### Competing Interests

The authors declare that they have no competing interests.

#### Acknowledgments

This work was partially supported by the Fundamental Research Funds for the Central Universities (2015XKMS084).