Abstract

Local graph based discriminant analysis (DA) algorithms recently have attracted increasing attention to mitigate the limitations of global (graph) DA algorithms. However, there are few particular concerns on the following important issues: whether the local construction is better than the global one for intraclass and interclass graphs, which (intraclass or interclass) graph should locally or globally be constructed? and, further how they should be effectively jointed for good discriminant performances. In this paper, pursuing our previous studies on the graph construction and DA, we firstly address the issues involved above, and then by jointly utilizing both the globality and the locality, we develop, respectively, a Globally marginal and Locally compact Discriminant Analysis (GmLcDA) algorithm based on so-introduced global interclass and local intraclass graphs and a Locally marginal and Globally compact Discriminant Analysis (LmGcDA) based on so-introduced local interclass and global intraclass graphs, the purpose of which is not to show how novel the algorithms are but to illustrate the analyses in theory. Further, by comprehensively comparing the Locally marginal and Locally compact DA (LmLcDA) based on locality alone, the Globally marginal and Globally compact Discriminant Analysis (GmGcDA) just based on globality alone, GmLcDA, and LmGcDA, we suggest that the joint of locally constructed intraclass and globally constructed interclass graphs is more discriminant.

1. Introduction

Discriminant analysis (DA) techniques [1] are indispensable in many fields including machine learning, pattern recognition, data compression, scientific visualization, and neural computation. Multiple discriminant analysis (MDA) [24] is one of the most popular global DA methods. However, owing to globally constructing both intraclass and interclass graphs, they generally fail to effectively capture underlying local structures in data, for example, many low-dimensional local manifolds of samples residing on the original input space. To mitigate such limitations, plenty of local graph based DA algorithms have been proposed as powerful tools typically including marginal Fisher analysis (MFA) [5] and its variants [6], locality sensitive discriminant analysis (LSDA) [7], LDE [8], and ANMM [915]. These algorithms locally construct both intraclass and interclass graphs. However, is the local construction better than the global one for intraclass and interclass graphs? Subsequently, some globally maximizing and locally minimizing DR algorithms are proposed [1618]. By contrast, there is no locally marginal and globally compact based DA algorithm to be studied. Several issues need to be addressed, that is, which (intraclass or interclass) graph should locally or globally be constructed? Further, how should they be effectively jointed for good discriminant performances? Up to date, to our knowledge, there are few particular concerns on these issues. So, pursuing our previous studies on the graph construction and DA [1921], in this paper, we elaborately address the issues involved above. Concretely, firstly, we illustrate the meanings of local compactness, global compactness, local margin, and global margin in DA, as shown in Figure 1; secondly, we formulate globally constructed intraclass and interclass graphs; thirdly, resorting to the relation between the scatter and the structure preservation property of DA based on graph construction, by Proposition 1 and Corollary 2 and Proposition 3 and Corollary 4, we demonstrate that the interclass graph should be globally constructed and the intraclass graph should be locally constructed; finally, by jointly utilizing both the globality and the locality, we develop two DA algorithms; that is, one is Globally marginal and Locally compact Discriminant Analysis (GmLcDA) algorithm based on so-introduced global interclass and local intraclass graphs, and the other is Locally marginal and Globally compact Discriminant Analysis (LmGcDA) based on so-introduced local interclass and global intraclass graphs. It is worth pointing out that the purpose of developing both DA algorithms is not to show how novel the algorithms are but to illustrate the analyses in theory. Further, we perform experiments to compare GmLcDA, LmGcDA, LmLcDA, and GmGcDA. Concretely, the comparative experiments among GmLcDA, LmGcDA, LmLcDA (MFA and LSDA), and GmGcDA (MDA) are on the toy and real-world datasets. By the comparisons above, we suggest that the joint of locally constructed intraclass and globally constructed interclass graphs is more discriminant. (It is necessary to point out that the two concepts of adjacency matrix and graph are alternatively used in the whole paper since a graph is corresponding to an adjacency matrix.)

The rest of this paper is organized as follows. In Section 2, the graph construction and two typical DA algorithms, MDA and MFA, are briefly reviewed. In Section 3, we firstly indicate the meanings of compactness and margin in DA and introduce the global intraclass graph and interclass graph and then heuristically demonstrate the involved above issues and further develop GmLcDA and LmGcDA and finally compare GmLcDA, LmGcDA, LmLcDA, and GmGcDA. In Section 4, the comparative experiments are performed. Finally, the suggestions and remarks for future work are given in Section 5.

2.1. Graph Construction

Let , denote a set of samples; current graph constructions mainly include the two types of -nearest-neighbor and -neighborhood [22]. And the construction of adjacency matrix is to weight edges of a graph by a similarity function, which mainly refers to the heat kernel and 0-1 two ways [22]. The graph construction of this work focuses on discussing the latter due to its simplicity and generality:

2.2. Typical DA Algorithms

MDA is deemed an example of GmGcDA here from the viewpoint of graph embedding [5]. Given a dataset of samples belonging to classes with the label , . It seeks the projection directions that maximize the interclass margin and simultaneously minimize the intraclass compactness and thus preserve the global structure in data but fail to discover the local geometric structure in manifold data embedded in the ambient space.

In order to mitigate the limitations of global algorithms, there is increasing interest in graph embedding based DA algorithms. MFA is a typical one, induced from the graph embedding framework for dimensionality reduction [5]. According to the graph embedding framework, MFA constructs a local intraclass graph with the adjacency matrix to characterize the intraclass compactness and a local interclass graph with the adjacency matrix to characterize the interclass separability: where indicates the index set of the nearest neighbors of the sample in the same class and is a set of nearest sample pairs from different classes.

Here, we call such DA algorithms based on local intraclass and interclass graphs as Locally marginal and Locally compact DA (LmLcDA).

3. Analyzing and Addressing Issues

From the reviews above we have found that motivations of most local graph based DA algorithms are to mitigate the limitations of global algorithms, despite different formulations. However, up to date, there are few particular analyses on whether the local construction is always better than the global one for intraclass and interclass graphs and which (intraclass or interclass) graph should locally or globally be constructed. Further how should they be effectively jointed for good discriminant performances? In this section, in order to further analyze and address these issues, we first illustrate the meanings of local compactness, global compactness, local margin, and global margin in DA, then formally introduce the globally constructed intraclass graph and interclass graph , elaborately analyze and address the issues involved above, and finally develop Globally marginal and Locally compact Discriminant Analysis (GmLcDA) and Locally marginal and Globally compact Discriminant Analysis (LmGcDA).

3.1. Meanings of Compactness and Margin in DA

Now we firstly illustrate the meanings of local compactness, global compactness, local margin, and global margin in DA, respectively, which are all shown in Figure 1. Figure 1(a) shows the structures of local compactness of and , where is locally linked with the five dots within the same class and with pentacles, and such structures of local compactness are encompassed by the two pink dash-line ellipses for the sake of clearer display. Meanwhile, Figure 1(b) shows the structures of local margin of and , where is locally linked with the four pentacles from the different classes, with dots, and such structures of local margin are encompassed, respectively, by the blue and cyan dot-line ellipses. By contrast, Figure 1(c) shows the structures of global compactness of classes 1 and 2, while Figure 1(d) shows the structures of global margin of both classes. And it should be noted that the gray dash-line and dot-line ellipses do not denote a cluster but all points linked within them.

3.2. Globally Constructed Intraclass and Interclass Graphs

The global intraclass graph is formulated as follows: and the global interclass graph as follows:

From the formulation of in (3) and in (4), it can be seen that globally constructed intraclass and interclass graphs are parameter-free. In order to compare them with locally constructed graphs, their neighbor parameters may, respectively, be viewed as and for a dataset with classes and samples ( samples per class), where denotes the maximum number of neighbor sample pairs intraclass.

3.3. Heuristic Demonstration of Issues

In this subsection, resorting to the relation between the scatter and the structure preservation property of DA based on graph construction, by Proposition 1 and Corollary 2 and Proposition 3 and Corollary 4, we demonstrate that the interclass graph should be globally constructed and the intraclass graph should be locally constructed. Those inequalities in the two propositions and corollaries demonstrate the scatter discrepancies between locally constructed graph and globally constructed graph in the input space. And on the other hand, there is a geometry structure preservation hypothesis; that is, the intraclass graph in DA can preserve the compact structures of the input space into the embedded space, while the interclass graph preserves the margin structures of the input space into the embedded space. Under such hypothesis, those scatter inequalities heuristically demonstrate to some extent that the intraclass graph should be locally constructed and the interclass graph should be globally constructed.

Proposition 1. For a locally constructed intraclass graph corresponding to the adjacency matrix with parameter , then the intraclass scatters corresponding to and are where the parameter stands for the nearest neighbors of the sample or in the same class, as defined in (2a), while the parameters and denote that in takes and , respectively, for and .

Proof. According to the definition of the adjacency matrix , its element takes 1 or 0. What take value as 1 in is less than that in if ; and when . Hence, for , the intraclass scatters are . The proof is completed.

Corollary 2. .

From Proposition 1 it can be seen that Corollary 2 is clear since the parameter .

From Proposition 1 and Corollary 2 it can be shown that the intraclass scatter corresponding to locally constructed graph in the input space is not larger than that corresponding to globally constructed graph. And, it is well known that the intraclass graph for DA algorithms aims to preserve the local compactness structures of intraclass samples in the input space into the embedded space, as shown by Figure 1(a) not Figure 1(c). So, according to the geometry structure preservation property of graph embedding DA algorithms, small intraclass scatter in the input space is often also small in the embedding space; in other words, large intraclass scatter in the input space is often also large in the embedding space; then the compactness of intraclass samples corresponding to locally constructed graph often can be preserved in the low-dimensional space. Thus intraclass graph often should locally but not globally be constructed, which is consistent with the statement in [23] that, empirically, small neighbor parameter tends to perform better.

Proposition 3. For a locally constructed interclass graph corresponding to the adjacency matrix with parameter , then the interclass scatters corresponding to and are where the parameter stands for the nearest sample pairs from different classes, as defined in (2b), while the parameters and denote that in takes and , respectively, for and .

The proof is similar to Proposition 1 and thus omitted.

Corollary 4. .

From Proposition 3 it can be seen that Corollary 4 is clear since .

The interclass graph for DA algorithms aims to effectively separate samples from different classes: thus the local construction of interclass graph is not quite reasonable on several facets. It is expected that the samples from different classes are separated as effectively as possible, as shown in Figure 1(d) not Figure 1(b); that is, the interclass scatter is as large as possible, while the interclass scatter corresponding to local graph in the input space is not larger than that corresponding to global graph, as demonstrated in Proposition 3 and Corollary 4. Considering the properties of structure preservation for graph embedding DA algorithms, the local construction for interclass graph seems not desirable. The exhaustive search for the neighbor parameter (such as in MFA) in high-dimensional input space is unavoidable and very expensive for a good performance. Moreover, if the parameter is unsuitably set, then the supervised information cannot be sufficiently effectively utilized, and further it seems not so easy to confirm which parameter value on earth is most suitable to a task. By contrast, the global construction of interclass graph does not need exhaustive search for the neighbor parameter anymore because the neighbor parameter is adaptive to different datasets. Furthermore, all available supervised information as prior knowledge can be sufficiently utilized in maximizing the interclass margin. So, interclass graph often should globally but not locally be constructed, for good discriminant performances.

From the discussions above, it can be expected that the joint of local construction for intraclass and global one for interclass is more discriminant for good performances. Next, in order to illustrate the analyses above, we develop two DA algorithms of GmLcDA and LmGcDA.

3.4. GmLcDA and LmGcDA

In this subsection, two DA algorithms are developed: one is the Globally marginal and Locally compact Discriminant Analysis (GmLcDA) by jointly utilizing the global interclass graph in (4) and the local intraclass graph in (2a) and the other is Locally marginal and Globally compact Discriminant Analysis (LmGcDA) by jointly utilizing the local interclass graph in (2b) and the global intraclass graph in (3).

In order to develop GmLcDA, the adjacency matrix in (2a) is rewritten as corresponding to the local intraclass graph: where denotes the set of nearest neighbors of sample within the same class. In fact, in (7) is consistent with in (2a). However, since GmLcDA only involves one neighbor parameter for the local intraclass graph (the global interclass graph is parameter-free), the formulation with parameter in (7) is used rather than the one with parameter in (2a) for a clearer understanding.

Now compute the interclass scatter and intraclass scatter , where is a diagonal matrix and its entries are column (or row, since is symmetric) sum of , , and the corresponding Laplacian matrix is symmetric and positive semidefinite matrices as well. Likely, is obtained by replacing in (8) with and correspondingly we obtain and .

Further, the objective criterion of GmLcDA can be formulated by jointly maximizing and minimizing to simultaneously preserve the local compactness structures of intraclass data and the margins between different-class data. Consider which can be solved by the generalized eigen-decomposition [24].

Likely, for LmGcDA, the local interclass adjacency matrix is rewritten as corresponding to the local interclass graph: where is a set of nearest sample pairs from different classes. Further in (10) and in (3) are jointed to form the objective criterion of LmGcDA as follows: which can be solved by the generalized eigen-decomposition [24].

3.5. Comparison among GmLcDA, LmGcDA, LmLcDA, and GmGcDA

GmLcDA jointly constructs global interclass graph, as shown in Figure 1(d), and local intraclass graph, as shown in Figure 1(a). Such graph construction leads to larger margin between classes and larger compactness within the same class, so it can preserve the geometry structure and is consistent to Proposition 1 and Corollary 2; Proposition 3 and Corollary 4. As a result, it more likely results in good discriminant performances, as shown in the experiments in Section 4.

LmGcDA exactly shows an opposite effect. That is, it is difficult for a locally constructed interclass graph (shown in Figure 1(b)) and a globally constructed intraclass graph (shown in Figure 1(c)) to effectively preserve geometry structure of input samples, in fact, which leads to the worse performances in real-world tasks, as shown by the experiments in Section 4.

By contrast, LmLcDA constructs locally both intra- and interclass graphs, as shown in Figures 1(b) and 1(a), respectively. It is almost well known that the local parameter settings of graph construction are relatively intractable, especially for interclass. Accordingly, for good discriminant performance, such construction of graph needs more domain knowledge and experience.

Different from the three above, the graph construction of GmGcDA only adopts the global way, for both intraclass and interclass. For such construction, its accomplishment is easy and stable for different domains.

4. Experiments

In this section, to further illustrate and support the analyses in the above theory, we compare LmLcDA, GmLcDA, LmGcDA, and GmGcDA by performing the experiments on the toy and the real-world datasets; the latter includes UCI [25], face recognition, and object categorization. The nearest neighbor classifier (1NN) [26] is followed after these DA algorithms to evaluate their classification performances. The ridge regularization [27] is adopted for all compared DA algorithms. That is, all the compared DA algorithms are derived from a regularized objective of the same form as with parameter , where and , respectively, respond to inter- and intraclass scatters.

4.1. A Toy Example

Here, the toy example illustrates that both intraclass and interclass graphs locally constructed show less discriminant projections, implying that the global construction appears more necessary and thus suggesting that a globally constructed interclass graph and a locally constructed intraclass graph can be desirable to obtain more discriminant performance.

For LmLcDA, such as MFA, due to the injection of the locality into both intraclass and interclass graphs, they may fail to obtain discriminant projections for some tasks that the global structure needs to consider. Moreover, there is not any guidance on the selection of the neighbor parameter for the interclass graph. When the neighbor parameters for constructing graph are inappropriately set, their projection results may be more unfavorable. For example, for a linearly separable problem shown in Figure 2(a), for the DA algorithms that adopt at least one global graph, GmGcDA (MDA), GmLcDA with parameter and LmGcDA with can effectively separate the two-class samples, as shown in Figures 2(c)2(e). On the contrary, from Figure 2(b), we can observe that, in the reduced one-dimensional subspace of MFA ( is empirically set to 3 and different ’s), the two-class samples overlap to different extents. Moreover, no laws can be found for the selection of . We only can observe that when , the separation of two class samples is relatively satisfactory.

Further while the input points are added, as shown in Figure 2(f), it is changed that both the margin between the two classes and the compactness within Class 2. And it is more difficult to partition the two-class samples in the one-dimensional projection space of LmGcDA than in the one of GmLcDA, as shown in Figures 2(g) and 2(h).

4.2. Real-World Datasets

On these real-world datasets, we compare GmLcDA, LmGcDA, two LmLcDAs (MFA and LSDA), and GmGcDA (MDA). In order to effectively evaluate the various algorithms, their model parameters are searched from a large candidate range and correspondingly the best results are reported. To address the singularity of MDA, here the inverse of matrix is replaced by the pseudoinverse [24].

4.2.1. UCI Datasets

We select the 5 two-class UCI datasets, whose descriptions are shown in column 1 of Table 1, and the classification results of 1NN on the original data are reported.

For each dataset, samples are randomly divided into training set and testing set, which, respectively, contain half of the samples. The random division is performed 30 times and the average accuracies for each algorithm are tabulated in Table 1. The neighbor parameters for GmLcDA, for LmGcDA, for MFA, and for LSDA are searched from 2 to half of the minimum of each class sample with the increment of 5. The parameter for MFA is searched from 20 to the sum (Notice that the maximum of for MFA, i.e., the sum of training samples (), is smaller than the so-called parameter of global interclass graph for GmLcDA ().) of training samples with the increment of 20.

From the results shown in Table 1, we can observe the following.(i)GmLcDA produces the optimal accuracies on the 4 datasets except for Sonar, which clearly outperforms LmGcDA, MDA, MFA, LSDA, and 1NN with the unreduced data. The accuracy of GmLcDA only is 0.0048, lower than LSDA on Sonar. Moreover, the optimal reduced dimensions of GmLcDA (except for Sonar) are lower than those of MFA and LSDA, which clearly is helpful for efficient testing.(ii)LmGcDA is the worst on all datasets except for Crx, on which it is only better than MDA. Such results clearly testify that the locality for interclass graph and globality for intraclass graph are not alternatives.(iii)Both MFA and LSDA achieve almost similar results since they adopt local intraclass and interclass graphs to preserve the local geometry of data. However, the optimal reduced dimensions of MFA are higher than those of LSDA overall.(iv)The performances of MDA are relatively inferior, especially on Crx and Spectf and even worse than 1NN on the unreduced data. Moreover, the standard deviations of MDA on all datasets seem larger than those of the other algorithms, which can be attributed to the factor that its reduced dimension can only be limited to one dimension for two-class datasets.(v)GmLcDA, LmLcDA, and GmGcDA almost all outperform 1NN on all datasets (except for GmGcDA on Crx, Sonar, and Spectf) with lower reduced-dimensions. By contrast, LmGcDA is worse than 1NN on all datasets.

4.2.2. Face Recognition

It is well known that face recognition is a very important task in pattern recognition and machine learning. In this subsection, two benchmark datasets, Yale and ORL, are used to evaluate the uses of local and global graphs for DA algorithms on the face image recognition.

Data Description. The Yale dataset contains 165 face images of 15 individuals , 11 images per individual, and these 11 images were gotten at different facial expressions or configuration. Figure 3(a) is 11-sample face images of one person in Yale dataset. The ORL dataset consists of 400 face images for 40 distinct subjects , 10 images per subject. These images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). Figure 3(b) is 10 sample face images of one person in ORL dataset. The size of each image in Yale and ORL is and pixels, with 256 grey levels per pixel.

Experimental Settings. Considering the very high dimension of the face images, in our experiment, PCA [28, 29] is firstly adopted to identify a low-dimensional subspace. Here, 99% energy is kept. Then the various DA algorithms are performed in the obtained PCA subspace. In order to show effects of the various algorithms on training sets with different numbers of samples, each face dataset is partitioned into the different gallery and probe sets where indicates that images for each person are randomly selected for training and the remaining images for testing. For each dataset, 30 random splits with are generated and the average results of the 30 classification accuracies are reported. The neighbor parameters for GmLcDA and for MFA are searched from , for LmGcDA and for MFA from on Yale, from on ORL, and for LSDA from . The optimal results of the several algorithms on the two datasets are listed in Table 2.

From the results of Table 2, we can observe the following.(i)With the increase of gallery samples, that is, to 7 on Yale (ORL), the performances of these algorithms increase to different degrees.(ii)For the two datasets with different divisions, GmLcDA always obtains better accuracies than the other algorithms, which shows effectiveness and feasibility for face recognition task. These relatively excellent results of GmLcDA may be ascribed to the joint injection of the global margin interclass and local compactness intraclass.(iii)MFA and LSDA produce some worse results than MDA, such as on Yale with and for MFA, for LSDA, and on ORL with for MFA. Moreover, MFA is relatively worse. A possible reason is that only the local geometry of data for intraclass and interclass limits their performances, especially MFA, whose marginal points possibly limit its generalization ability to a certain degree.(iv)On Yale, the unsupervised PCA is worse than the other algorithms. However, it is worth noting that PCA produces better accuracies than MDA, LSDA, and MFA on ORL. That may be ascribed to such a factor that, for ORL, the margin of different-class samples is larger in the PCA subspace with 99% energy. Besides, on Yale, the standard deviations of MDA and PCA are larger than those of the other algorithms.

4.2.3. Object Categorization

A classical problem in computer vision and pattern recognition is to classify a set of objects into a group of known categories. Here, we use the popular benchmark dataset Coil20. The dataset consists of gray-scale images of 20 objects and 72 images for each object with pose intervals of 5°, the size of each image being pixels, as shown in Figure 4.

Experimental Settings. Similar to the face recognition experiment, the dataset is partitioned into the different gallery and probe sets with 4 groups. For each group , 20 random splits are generated and the average results of the 20 classification accuracies are reported. For the first three groups with relatively small gallery sets , the neighbor parameters for GmLcDA and for MFA are searched from , for LmGcDA and for MFA from , and for LSDA from . For relatively large gallery samples with , and are searched from , and from , and from . The accuracies with respect to different reduced dimensions are displayed in Figure 5.

From Figure 5 we can clearly see the following.(i)Figures 5(a)5(d) show that, with the increase of gallery samples, the performances of these algorithms are improved to different extents. Likewise, with the gradual increase of reduced dimension, their accuracies correspondingly increase. However, with the further increase of reduced dimension, their performances (except for MDA whose reduced dimension is at most ) decrease gradually. This result is especially evident for small gallery samples with , which undoubtedly is an example of “curse of dimensionality” [2].(ii)For the four groups from small to relative large , GmLcDA outperforms all MFA, LSDA, and MDA with respect to different reduced dimensions. In particular, when the reduced dimension is less than 10, its superiority is more obvious. For example, when the reduced dimension is 2, it is over 30% better than the worst LSDA. Moreover, the reduced dimension of GmLcDA with respect to its best accuracy is lower than that of the other algorithms. We can observe that the accuracies of GmLcDA have achieved the best value when the reduced dimension is less than or equal to 20, and afterward they tend to maintain stable.(iii)The performances of LSDA and MFA do not show very remarkable distinction except for . Specifically, for the first two groups, MFA is more predominant than LSDA overall, as shown in Figures 5(a) and 5(b). In contrast, with the increase of the gallery sample number, LSDA exceeds MFA in performance, as shown in Figures 5(c) and 5(d). We basically claim that LSDA is not suitable to small training samples. Moreover, we can very clearly observe that the performances of LSDA are always the worst when the reduced dimension is relatively low.(iv)When the reduced dimensions are less than the reduced range of MDA, the accuracies of MDA are almost parallel to those of MFA and LSDA. However, since the reduced dimension is only limited to the maximum of , its performances are inferior to the other three algorithms overall.

5. Conclusion and Future Work

In this paper, we elaborately address some important issues in DA based on graph construction. And in order to illustrate and support the analyses in theory, by jointly utilizing both the globality and the locality, we develop GmLcDA algorithm based on the global interclass and local intraclass graphs and LmGcDA based on the local interclass and global intraclass graphs. Further, by comprehensively comparing LmLcDA (MFA, LSDA), GmLcDA, LmGcDA, and GmGcDA (MDA) on toy and real-world datasets, we suggest that the joint of locally constructed intraclass and globally constructed interclass graphs is more discriminant.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is partially supported by NSFC (61170151, 61363051), JS-QingLan Project, Program of Higher-Level Talents of Inner Mongolia University (115118), Postdoctoral Science Foundation funded Project of Inner Mongolia University (30105-135113), General Financial Grant from China Postdoctoral Science Foundation (2013M540217).