Abstract

Discriminant graph embedding-based dimensionality reduction methods have attracted more and more attention over the past few decades. These methods construct an intrinsic graph and penalty graph to preserve the intrinsic geometry structures of intraclass samples and separate the interclass samples. However, the marginal samples cannot be accurately characterized only by penalty graphs since they treat every sample equally. In practice, these marginal samples often influence the classification performance, which needs to be specially tackled. In this study, the near neighbors’ hypothesis margin of marginal samples has been further maximized to separate the interclass samples and improve the discriminant ability by integrating intrinsic graph and penalty graph. A novel discriminant dimensionality reduction named LMGE-DDR has been proposed. Several experiments on public datasets have been conducted to verify the effectiveness of the proposed LMGE-DDR such as ORL, Yale, UMIST, FERET, CMIU-PIE09, and AR. LMGE-DDR performs better than other compared methods, and the corresponding standard deviation of LMGE-DDR is smaller than others. This demonstrates that the evaluation method verifies the effectiveness of the introduced method.

1. Introduction

Dimensionality reduction (DR) is more important in most fields such as machine learning and pattern recognition [14]. It aims to resolving the curse of dimensionality by achieving relevant low-dimensional representations of high-dimensional datasets. Linear discriminant analysis (LDA) and principal component analysis (PCA) are the most representative methods [5, 6]. PCA obtains low-dimensional space by maximizing variance. LDA can use label information to project the feature space to distinguish categories by maximizing the interclass distance and minimizing the intraclass distance. However, LDA cannot capture the local structure of data. As is known, the local structures of high-dimensional data are very important for data representation.

K near neighbor graph can better characterize the local structure of data [7]. Thus, over the past years, graph embedding-based dimensionality reduction methods have sprung up [7, 8], such as LLE [9], Isomap [10, 11], and Laplacian eigenmap [12]. However, these manifold learning methods do not directly process the new samples because they do not obtain any mapping function, which is known as the ‘out-of-sample’ problem [13]. Therefore, to solve the problem, a more effective method is presented to obtain the explicit projection mapping. Locality preserving projections (LPPs) are to preserve the local structure of data in the low-dimensional space, which is a famous method [2]. For its simplicity and effectiveness, its variants have been proposed [14, 15]. However, LPP performs worse in classification since it does not fully use label information, which is an unsupervised method [16]. Neighborhood preserving projection (NPP) preserved the local neighborhood information on the data manifold [17].

To further improve the classification performance, discriminant graph embedding-based methods have gradually become a popular research topic by using label information, which aims to preserve the within-class geometrical structure while, at the same time, maximizing the between-class distances of different manifolds [18]. Thus, recently, more and more discriminant graph embedding-based methods have been studied. Marginal fisher analysis (MFA) constructs two adjacency graphs to maximize the separability between pairwise marginal data points [19]. Local discriminant embedding (LDE) [20] utilized the label information and proposed the nearest neighbor-based embedding. However, it suffers from the so-called small-sample-size (SSS) problem that it cannot directly be applied to high-dimensional data [20]. Considering the local intraclass attraction or interclass repulsion, discriminant neighborhood embedding (DNE) was proposed to make data points in the same class compacted, whereas the gaps between classes become wider in a low-dimensional subspace [21]. However, DNE does not always set the edges with its neighbors of different classes, which would reduce the interclass distance in the new space and will deteriorate the classification [22]. Thus, Ding et al. constructed double adjacency graphs to link their homogeneous and heterogeneous neighbors and introduced a more effective version of DNE termed DAG-DNE [22]. Inspired by DAG-DNE, some discriminant analysis-based methods have been proposed over the past few years [2333].

Most dimensionality reduction methods can be unified in the graph-embedding framework [19]. The ways to construct the similarity graph and the penalty graph among these methods are different [34]. Therefore, the graph-embedding-based methods are sensitive to the weight matrix, whereas they endow the same weight for each sample (including marginal samples) in the same way. However, as stated in [35], these marginal samples located in the class margin in the high-dimensional space have been treated to achieve maximum between-class hypothesis margin and good classification performance, which is more crucial in the classification performance. Therefore, large hypothesis margins between near neighbors of these marginal samples can improve the discriminating power of embedding features and should be treated separately. In this study, for the marginal samples, the nearest neighbors’ hypothesis margin of the marginal sample has been considered and maximized to improve the discriminant power, in addition to constructing double adjacency graphs. In this study, a novel large margin graph embedding-based discriminant dimensionality reduction named LMGE-DDR has been introduced. Most experimental results confirm the effectiveness of the proposed LMGE-DDR on several public datasets.

2. Methods

Firstly, the common notations in this study are presented. The high-dimensional data are denoted as with samples in d-dimensions and include classes with class . denotes that the sample is transformed by the matrix , where , is any one column vector. and , respectively, denote the neighbors with the same class (different class) and k neighbors of sample .

2.1. DNE

Discriminant neighborhood embedding (DNE) considered the local intraclass attraction and interclass repulsion and learned the intrinsic graph and penalty graph as follows:

The objective function can be denoted as follows:

Herein,where .where .

The constraint can preserve the local structure and reinforce the discriminant ability [36].

The objective in (2) can be rewritten by the formal of trace as follows:where . Therefore, the objective function (2) can be rewritten as follows:

The projection matrix can be found by resolving the following eigenvector problem:where is the eigenvalues, i = 1,...,d, and (i = 1,...,d) is the corresponding eigenvector. Assume and . The details are presented in [21].

2.2. DAG-DNE

Double adjacency graph-based discriminant neighborhood embedding termed DAG-DNE constructed double adjacency graphs to propose a more effective version of DNE. In DAG-DNE, and can be defined as follows:

The projection matrix can be solved as in DNE as follows:

3. Proposed Method

It is revealed that the weights in the adjacency matrix have been endowed in the same way for each sample including the marginal sample, which cannot further improve the between-class hypothesis margin and deteriorates the classification performance. In this study, the marginal sample is defined in Definition 1. The hypothesis margin was studied as in [3739].

Definition 1. (marginal sample). is regarded as a marginal sample if and .
The marginal samples in this study are the ones located in class margin. Figure 1 is the near neighbors’ graph and shows the marginal samples (i.e., {5, 6, 7, 8}).

Definition 2. (hypothesis margin). As is shown in [37], the hypothesis margin can be defined as follows:where and denote the nearest neighbors of sample x with the same class and different class, respectively. represents the L2 norm. The sample x can be accurately recognized by 1NN classifier (the nearest neighbor) when , as illustrated in Figure 2.

Definition 3. (heterogeneous near neighbors’ hypothesis margin). A marginal of sample is shown in Figure 3 to illustrate the heterogeneous near neighbors’ hypothesis margin of , which is defined as follows:Herein, , .
As shown in (11), it can keep the heterogeneous samples separated and achieve a large margin between heterogeneous near neighbors when all the expressions in brackets are larger than zero, which means it can be correctly classified by the 1NN classifier.

4. LMGE-DDR

On the basis of DAG-DNE, the marginal samples in high-dimensional space are additionally treated separately by maximizing the heterogeneous near neighbors’ margin, which can improve the discriminant power. LMGE-DDR can be proposed as follows:

The interclass weight and are the same as in DAG-DNE, the objective function of LMGE-DDR can be denoted as

Here, are the same as in DAG-DNE. MS denotes the marginal samples set in the high-dimensional space. is a trade-off parameter and here .

This objective function is transformed into two parts as follows:

based on (5).

The solution of (12) is easily obtained by solving the maximum eigenvalue problem.

Here, where is the eigenvalues, i = 1,...,d, and is the corresponding eigenvector, i = 1,...,d. Assume and .

The details of LMGE-DDR can be seen in Algorithm 1.

Input: a training set , and the dimensionality of discriminant subspace r.
 Output: projection matrix P;
(1)Construct the intraclass adjacency graph by:
and interclass adjacency graph by:
(2)computing the based on (15)
(3)Eigendecompose the matrix, where
(4)Choose the r largest eigenvalues corresponding eigenvectors: .

5. Analysis of LMGE-DDR

In this section, LMGE-DDR will be analyzed to illustrate the effectiveness in preserving the geometrical and discriminant structures.

Although LMGE-DDR is similar to DAG-DNE in constructing an adjacency graph, for the marginal samples in high-dimensional space, LMGE-DDR maximizes the heterogeneous near neighbors’ hypothesis margin to achieve a large between-class margin in low-dimensional subspace and discriminate the local structure of neighbors, improving the discriminant power compared to DAG-DNE.

The performances of LMGE-DDR in a Toy data are illustrated in Figure 4.

As shown in Figure 4(a), for the sample , is and is . Thus, based on (12), the hypothesis margin of is denoted as follows:

Based on Definition 2, the sample will be recognized by mistake because its hypothesis margin is less than zero.

The embedded results and hypothesis margins in one-dimensional space are illustrated in Figures 4(b)4(e)). It can be seen that the hypothesis margins of sample in the low-dimensional space are less than zero in MFA and DAG-DNE, which is the opposite situation in MNMDP, DNE, and LMGE-DDR. In LMGE-DDR, the hypothesis margin of sample is larger (H() = 0.39) than that in DAG-DNE, which is useful for the classification.

Overall, maximizing the heterogeneous near neighbors’ hypothesis margin of marginal samples can further improve the discriminant power in low-dimensional space.

6. Experiments

In this section, compared with several popular methods such as DAG-DNE, DNE, MNMDP, and MFA, LMGE-DDR is conducted on several experiments systematically to verify its effectiveness. Specifically, the performance of LMGE-DDR is illustrated on the experiments of face recognition and 2-dimensional visualization. The randomly selected l images from each person constitute the training data, and the remaining are the testing data. The nearest neighbor parameters k, k1, and k2 in constructing adjacency graphs are set as l-1 for all methods as in [40]. PCA is taken to reduce the dimensions by nearly 98% of the image energy. The 1NN classifier is applied to perform the classification. The average result of 20 runs is regarded as the classification result.

6.1. 2D Visualization

Wine dataset is taken to perform the 2D visualization as shown in Figure 5 [41]; from Figure 5, it can be seen clearly that the sample points in the low-dimensional space learned by LMGE-DDR are separated compared to DAG-DNE.

6.2. Face Recognition

LMGE-DDR is evaluated on the ORL (http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html), FERET [42], AR [43], Yale†, UMIST (https://www.sheffield.ac.uk/eee/research/iel/research/face), and CMU-PIE09† face datasets to evaluate the classification performance and it has been systematically compared with several popular methods such as MFA, MNMDP, DNE, and DAG-DNE.

6.2.1. Parameter Analysis

The sensitivity of parameter k0, a in LMGE-DDR is analyzed on several face datasets when parameters k, k1, and k2 have been set as l-1. Figure 6 presents the best recognition rates of LMGE-DDR with the different values of k0, a. The results in Figure 6 reveal that the recognition accuracy of LMGE-DDR fluctuates up and down. In total, the best recognition accuracy can be achieved when a and k0 are larger. The reason is that large a can make marginal samples tightly clustered toward the class center. The large k0 is, the more marginal samples are. That is to say, heterogeneous near neighbors’ margin of more marginal samples can be maximized and achieve large between-class margin, which is favorable for classification. Thus, the values of k0 and a in LMGE-DDR on different datasets are adopted by cross-validation in face recognition experiments.

6.2.2. Experiments Results

In this section, several experiments on public datasets have been conducted to verify the effectiveness of the proposed LMGE-DDR, such as ORL, Yale, UMIST, FERET, CMIU-PIE09, and AR, whose example images are shown in Figure 7. Each image in ORL is first aligned and cropped to 32 × 32. Each image in Yale is first aligned and cropped to 32 × 32. Each image in UMIST is first aligned and cropped to 40 × 50. All the images in FERET are cropped to 80 × 80. All the images in CMIU-PIE09 are cropped to 64 × 64. All the images in AR are cropped to 50 × 40. Tables 16 are the best recognition results on different datasets. Figure 8 are the recognition results on different dimensions.

As shown in Figure 8 and Tables 16, we can see that in most experiments, LMGE-DDR performs better than other compared methods and the corresponding standard deviation of LMGE-DDR is smaller than others.

6.2.3. Time Cost Analysis

In this section, the time cost of different methods is evaluated on several datasets including ORL, Yale, UMIST, and FERTET. In Table 7, it is computed by the time of running one time where l = 5, d = 20.

It can be concluded that LMGE-DDR is comparable with other methods in time cost; however, some perform better than others.

7. Conclusions and Future Works

In this study, we propose a novel graph embedding-based dimensionality reduction approach named LMGE-DDR, which is based on heterogeneous near neighbors’ hypothesis margin. Different from other discriminant learning methods, for marginal samples in high-dimensional space, we additionally maximize the heterogeneous near neighbors’ hypothesis margin to achieve a large between-class margin, excluding learning two kinds of adjacency graphs for each same equally This is very crucial for classification of the experiment results. Experimental results illustrate the effectiveness of LMGE-DDR. In this paper, we also employed several evaluation methods to evaluate the proposed model. The results show that on several public datasets such as ORL, Yale, UMIST, FERET, CMIU-PIE09, and AR, the proposed model outperformed other benchmark models. However, in constructing adjacency graphs and marginal samples, it will be influenced by the noise, which is not completely avoided. In the future works, how to evaluate the reliability of neighborhood will be studied by introducing an adaptive adjacency factor as in [44].

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding this work.