Abstract
Discriminant graph embeddingbased dimensionality reduction methods have attracted more and more attention over the past few decades. These methods construct an intrinsic graph and penalty graph to preserve the intrinsic geometry structures of intraclass samples and separate the interclass samples. However, the marginal samples cannot be accurately characterized only by penalty graphs since they treat every sample equally. In practice, these marginal samples often influence the classification performance, which needs to be specially tackled. In this study, the near neighbors’ hypothesis margin of marginal samples has been further maximized to separate the interclass samples and improve the discriminant ability by integrating intrinsic graph and penalty graph. A novel discriminant dimensionality reduction named LMGEDDR has been proposed. Several experiments on public datasets have been conducted to verify the effectiveness of the proposed LMGEDDR such as ORL, Yale, UMIST, FERET, CMIUPIE09, and AR. LMGEDDR performs better than other compared methods, and the corresponding standard deviation of LMGEDDR is smaller than others. This demonstrates that the evaluation method verifies the effectiveness of the introduced method.
1. Introduction
Dimensionality reduction (DR) is more important in most fields such as machine learning and pattern recognition [1–4]. It aims to resolving the curse of dimensionality by achieving relevant lowdimensional representations of highdimensional datasets. Linear discriminant analysis (LDA) and principal component analysis (PCA) are the most representative methods [5, 6]. PCA obtains lowdimensional space by maximizing variance. LDA can use label information to project the feature space to distinguish categories by maximizing the interclass distance and minimizing the intraclass distance. However, LDA cannot capture the local structure of data. As is known, the local structures of highdimensional data are very important for data representation.
K near neighbor graph can better characterize the local structure of data [7]. Thus, over the past years, graph embeddingbased dimensionality reduction methods have sprung up [7, 8], such as LLE [9], Isomap [10, 11], and Laplacian eigenmap [12]. However, these manifold learning methods do not directly process the new samples because they do not obtain any mapping function, which is known as the ‘outofsample’ problem [13]. Therefore, to solve the problem, a more effective method is presented to obtain the explicit projection mapping. Locality preserving projections (LPPs) are to preserve the local structure of data in the lowdimensional space, which is a famous method [2]. For its simplicity and effectiveness, its variants have been proposed [14, 15]. However, LPP performs worse in classification since it does not fully use label information, which is an unsupervised method [16]. Neighborhood preserving projection (NPP) preserved the local neighborhood information on the data manifold [17].
To further improve the classification performance, discriminant graph embeddingbased methods have gradually become a popular research topic by using label information, which aims to preserve the withinclass geometrical structure while, at the same time, maximizing the betweenclass distances of different manifolds [18]. Thus, recently, more and more discriminant graph embeddingbased methods have been studied. Marginal fisher analysis (MFA) constructs two adjacency graphs to maximize the separability between pairwise marginal data points [19]. Local discriminant embedding (LDE) [20] utilized the label information and proposed the nearest neighborbased embedding. However, it suffers from the socalled smallsamplesize (SSS) problem that it cannot directly be applied to highdimensional data [20]. Considering the local intraclass attraction or interclass repulsion, discriminant neighborhood embedding (DNE) was proposed to make data points in the same class compacted, whereas the gaps between classes become wider in a lowdimensional subspace [21]. However, DNE does not always set the edges with its neighbors of different classes, which would reduce the interclass distance in the new space and will deteriorate the classification [22]. Thus, Ding et al. constructed double adjacency graphs to link their homogeneous and heterogeneous neighbors and introduced a more effective version of DNE termed DAGDNE [22]. Inspired by DAGDNE, some discriminant analysisbased methods have been proposed over the past few years [23–33].
Most dimensionality reduction methods can be unified in the graphembedding framework [19]. The ways to construct the similarity graph and the penalty graph among these methods are different [34]. Therefore, the graphembeddingbased methods are sensitive to the weight matrix, whereas they endow the same weight for each sample (including marginal samples) in the same way. However, as stated in [35], these marginal samples located in the class margin in the highdimensional space have been treated to achieve maximum betweenclass hypothesis margin and good classification performance, which is more crucial in the classification performance. Therefore, large hypothesis margins between near neighbors of these marginal samples can improve the discriminating power of embedding features and should be treated separately. In this study, for the marginal samples, the nearest neighbors’ hypothesis margin of the marginal sample has been considered and maximized to improve the discriminant power, in addition to constructing double adjacency graphs. In this study, a novel large margin graph embeddingbased discriminant dimensionality reduction named LMGEDDR has been introduced. Most experimental results confirm the effectiveness of the proposed LMGEDDR on several public datasets.
2. Methods
Firstly, the common notations in this study are presented. The highdimensional data are denoted as with samples in ddimensions and include classes with class . denotes that the sample is transformed by the matrix , where , is any one column vector. and , respectively, denote the neighbors with the same class (different class) and k neighbors of sample .
2.1. DNE
Discriminant neighborhood embedding (DNE) considered the local intraclass attraction and interclass repulsion and learned the intrinsic graph and penalty graph as follows:
The objective function can be denoted as follows:
Herein,where .where .
The constraint can preserve the local structure and reinforce the discriminant ability [36].
The objective in (2) can be rewritten by the formal of trace as follows:where . Therefore, the objective function (2) can be rewritten as follows:
The projection matrix can be found by resolving the following eigenvector problem:where is the eigenvalues, i = 1,...,d, and (i = 1,...,d) is the corresponding eigenvector. Assume and . The details are presented in [21].
2.2. DAGDNE
Double adjacency graphbased discriminant neighborhood embedding termed DAGDNE constructed double adjacency graphs to propose a more effective version of DNE. In DAGDNE, and can be defined as follows:
The projection matrix can be solved as in DNE as follows:
3. Proposed Method
It is revealed that the weights in the adjacency matrix have been endowed in the same way for each sample including the marginal sample, which cannot further improve the betweenclass hypothesis margin and deteriorates the classification performance. In this study, the marginal sample is defined in Definition 1. The hypothesis margin was studied as in [37–39].
Definition 1. (marginal sample). is regarded as a marginal sample if and .
The marginal samples in this study are the ones located in class margin. Figure 1 is the near neighbors’ graph and shows the marginal samples (i.e., {5, 6, 7, 8}).
Definition 2. (hypothesis margin). As is shown in [37], the hypothesis margin can be defined as follows:where and denote the nearest neighbors of sample x with the same class and different class, respectively. represents the L_{2} norm. The sample x can be accurately recognized by 1NN classifier (the nearest neighbor) when , as illustrated in Figure 2.
Definition 3. (heterogeneous near neighbors’ hypothesis margin). A marginal of sample is shown in Figure 3 to illustrate the heterogeneous near neighbors’ hypothesis margin of , which is defined as follows:Herein, , .
As shown in (11), it can keep the heterogeneous samples separated and achieve a large margin between heterogeneous near neighbors when all the expressions in brackets are larger than zero, which means it can be correctly classified by the 1NN classifier.
4. LMGEDDR
On the basis of DAGDNE, the marginal samples in highdimensional space are additionally treated separately by maximizing the heterogeneous near neighbors’ margin, which can improve the discriminant power. LMGEDDR can be proposed as follows:
The interclass weight and are the same as in DAGDNE, the objective function of LMGEDDR can be denoted as
Here, are the same as in DAGDNE. MS denotes the marginal samples set in the highdimensional space. is a tradeoff parameter and here .
This objective function is transformed into two parts as follows:
based on (5).
The solution of (12) is easily obtained by solving the maximum eigenvalue problem.
Here, where is the eigenvalues, i = 1,...,d, and is the corresponding eigenvector, i = 1,...,d. Assume and .
The details of LMGEDDR can be seen in Algorithm 1.

5. Analysis of LMGEDDR
In this section, LMGEDDR will be analyzed to illustrate the effectiveness in preserving the geometrical and discriminant structures.
Although LMGEDDR is similar to DAGDNE in constructing an adjacency graph, for the marginal samples in highdimensional space, LMGEDDR maximizes the heterogeneous near neighbors’ hypothesis margin to achieve a large betweenclass margin in lowdimensional subspace and discriminate the local structure of neighbors, improving the discriminant power compared to DAGDNE.
The performances of LMGEDDR in a Toy data are illustrated in Figure 4.
(a)
(b)
(c)
(d)
(e)
(f)
As shown in Figure 4(a), for the sample , is and is . Thus, based on (12), the hypothesis margin of is denoted as follows:
Based on Definition 2, the sample will be recognized by mistake because its hypothesis margin is less than zero.
The embedded results and hypothesis margins in onedimensional space are illustrated in Figures 4(b)–4(e)). It can be seen that the hypothesis margins of sample in the lowdimensional space are less than zero in MFA and DAGDNE, which is the opposite situation in MNMDP, DNE, and LMGEDDR. In LMGEDDR, the hypothesis margin of sample is larger (H() = 0.39) than that in DAGDNE, which is useful for the classification.
Overall, maximizing the heterogeneous near neighbors’ hypothesis margin of marginal samples can further improve the discriminant power in lowdimensional space.
6. Experiments
In this section, compared with several popular methods such as DAGDNE, DNE, MNMDP, and MFA, LMGEDDR is conducted on several experiments systematically to verify its effectiveness. Specifically, the performance of LMGEDDR is illustrated on the experiments of face recognition and 2dimensional visualization. The randomly selected l images from each person constitute the training data, and the remaining are the testing data. The nearest neighbor parameters k, k_{1}, and k_{2} in constructing adjacency graphs are set as l1 for all methods as in [40]. PCA is taken to reduce the dimensions by nearly 98% of the image energy. The 1NN classifier is applied to perform the classification. The average result of 20 runs is regarded as the classification result.
6.1. 2D Visualization
Wine dataset is taken to perform the 2D visualization as shown in Figure 5 [41]; from Figure 5, it can be seen clearly that the sample points in the lowdimensional space learned by LMGEDDR are separated compared to DAGDNE.
(a)
(b)
(c)
(d)
(e)
6.2. Face Recognition
LMGEDDR is evaluated on the ORL (http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html), FERET [42], AR [43], Yale†, UMIST (https://www.sheffield.ac.uk/eee/research/iel/research/face), and CMUPIE09† face datasets to evaluate the classification performance and it has been systematically compared with several popular methods such as MFA, MNMDP, DNE, and DAGDNE.
6.2.1. Parameter Analysis
The sensitivity of parameter k_{0}, a in LMGEDDR is analyzed on several face datasets when parameters k, k_{1}, and k_{2} have been set as l1. Figure 6 presents the best recognition rates of LMGEDDR with the different values of k_{0}, a. The results in Figure 6 reveal that the recognition accuracy of LMGEDDR fluctuates up and down. In total, the best recognition accuracy can be achieved when a and k_{0} are larger. The reason is that large a can make marginal samples tightly clustered toward the class center. The large k_{0} is, the more marginal samples are. That is to say, heterogeneous near neighbors’ margin of more marginal samples can be maximized and achieve large betweenclass margin, which is favorable for classification. Thus, the values of k_{0} and a in LMGEDDR on different datasets are adopted by crossvalidation in face recognition experiments.
(a)
(b)
(c)
(d)
6.2.2. Experiments Results
In this section, several experiments on public datasets have been conducted to verify the effectiveness of the proposed LMGEDDR, such as ORL, Yale, UMIST, FERET, CMIUPIE09, and AR, whose example images are shown in Figure 7. Each image in ORL is first aligned and cropped to 32 × 32. Each image in Yale is first aligned and cropped to 32 × 32. Each image in UMIST is first aligned and cropped to 40 × 50. All the images in FERET are cropped to 80 × 80. All the images in CMIUPIE09 are cropped to 64 × 64. All the images in AR are cropped to 50 × 40. Tables 1–6 are the best recognition results on different datasets. Figure 8 are the recognition results on different dimensions.
(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
(c)
(d)
As shown in Figure 8 and Tables 1–6, we can see that in most experiments, LMGEDDR performs better than other compared methods and the corresponding standard deviation of LMGEDDR is smaller than others.
6.2.3. Time Cost Analysis
In this section, the time cost of different methods is evaluated on several datasets including ORL, Yale, UMIST, and FERTET. In Table 7, it is computed by the time of running one time where l = 5, d = 20.
It can be concluded that LMGEDDR is comparable with other methods in time cost; however, some perform better than others.
7. Conclusions and Future Works
In this study, we propose a novel graph embeddingbased dimensionality reduction approach named LMGEDDR, which is based on heterogeneous near neighbors’ hypothesis margin. Different from other discriminant learning methods, for marginal samples in highdimensional space, we additionally maximize the heterogeneous near neighbors’ hypothesis margin to achieve a large betweenclass margin, excluding learning two kinds of adjacency graphs for each same equally This is very crucial for classification of the experiment results. Experimental results illustrate the effectiveness of LMGEDDR. In this paper, we also employed several evaluation methods to evaluate the proposed model. The results show that on several public datasets such as ORL, Yale, UMIST, FERET, CMIUPIE09, and AR, the proposed model outperformed other benchmark models. However, in constructing adjacency graphs and marginal samples, it will be influenced by the noise, which is not completely avoided. In the future works, how to evaluate the reliability of neighborhood will be studied by introducing an adaptive adjacency factor as in [44].
Data Availability
The experimental data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding this work.