Dimension Reduction Using Samples’ Inner Structure Based Graph for Face Recognition
Graph construction plays a vital role in improving the performance of graph-based dimension reduction (DR) algorithms. In this paper, we propose a novel graph construction method, and we name the graph constructed from such method as samples’ inner structure based graph (SISG). Instead of determining the -nearest neighbors of each sample by calculating the Euclidean distance between vectorized sample pairs, our new method employs the newly defined sample similarities to calculate the neighbors of each sample, and the newly defined sample similarities are based on the samples’ inner structure information. The SISG not only reveals the inner structure information of the original sample matrix, but also avoids predefining the parameter as used in the -nearest neighbor method. In order to demonstrate the effectiveness of SISG, we apply it to an unsupervised DR algorithm, locality preserving projection (LPP). Experimental results on several benchmark face databases verify the feasibility and effectiveness of SISG.
Dimensionality reduction (DR) [1–4] has been intensively used as an effective approach to analyze high-dimensional data, especially face images. In particular, graph-based DR receives more and more attention recently in the fields of pattern recognition and machine learning. It is stated that most existing DR methods [5–11] actually fall into the graph embedding framework . In graph embedding algorithms graph construction plays a vital role, because graph is an effective tool to reveal the structure information hidden in the original data. So it is worthwhile to study graph construction [13–17] and develop novel construction approaches to construct more reasonable graphs for graph-based DR methods. Jebara et al.  presented the so-called -matching graph, which is an alternative approach to the traditional -nearest neighbor graph. The authors in [15, 17] focused on developing a way of combining different graphs so that a better graph will be given a heavier weight. However, we point out that the traditional graph construction method suffers from the following two issues.(1)The -nearest neighbors of each sample are based on the Euclidean distance between every two vectorized samples. However, samples’ inner structure information is not taken into consideration by the traditional graph construction method, and such information can be utilized to construct better graphs for dimension reduction algorithms.(2)The same neighbor parameter (or ) [18, 19] has to be predefined for all samples before constructing graphs. This may cause the difficulty of parameter selection and it is not reasonable to set the same parameter value for all samples.
To mitigate the shortcomings of the traditional graph construction method, in this paper we present a samples’ inner structure based graph construction method, and we name the graph constructed from such method as samples’ inner structure based graph (SISG). In this new method we first use the column similarity to determine the nearest neighbors of each column for sample matrices. Then we use the sample similarity measured by the number of nearest neighbor columns between sample pairs to determine the nearest neighbors of each sample. This strategy not only avoids the stiff criterion (predefining the same parameter for all samples) as used in the traditional graphs but also utilizes every sample’s inner structure information. We summarize the favorable and attractive characteristics of SISG as follows.(1)SISG preserves samples’ intrinsic features by using the inner structure information of sample matrices to construct graph.(2)SISG uses the newly defined sample similarities to calculate the neighbors of each sample. This strategy avoids predefining the neighbor parameter (or ) in traditional graph construction methods.(3)The edge weights of SISG are determined by the sample similarities between sample pairs. If the sample similarity between two samples is high, the edge weights between these two samples will be big. This means the greater the sample similarity between two samples is, the more important the corresponding edge is in the graph.(4)Both the weighted adjacency matrix and the adjacency matrix of SISG are generally asymmetric. This characteristic may be more reasonable for capturing the relationship among samples.(5)The construction method of SISG is very general. It can be applied to many graph-based DR algorithms.
The rest of this paper is organized as follows. Section 2 briefly reviews traditional graph construction and locality preserving projection (LPP). Section 3 firstly presents SISG’s construction method and then applies SISG to LPP. In Section 4 we perform a series of the experiments to evaluate the feasibility and effectiveness of SISG. This is followed by the conclusions made in Section 5.
2. Related Work
2.1. Traditional Graph Construction
Let be a set of sample matrices, which are taken from an dimensional image space. Then, the original samples were transformed into their vectorial forms, and we denote these vectors by , . The weighted graph can be denoted by , where corresponds to the vectors in the set , is the set of edges, each of which is between one sample pair, and the matrix contains weight values of the edges among sample pairs. The construction process of the graph normally consists of two steps.
-Neighborhood: and are connected if , where is the Euclidean distance in and is a local threshold parameter.
-Nearest neighbor: and are connected if is one of the -nearest neighbors of or is one of the -nearest neighbors of .
The second step is the calculation of the weight value for each edge. The weight value of can be calculated by the following two ways :(1)heat kernel where is the width parameter in the heat kernel;(2)simple minded
2.2. Locality Preserving Projection
The aim of locality preserving projection (LPP)  is to map the sample set in high-dimensional space into the low-dimensional one, in which the local manifold structures of high-dimensional space are preserved. This means if the original points and in the high-dimensional space are neighbors, the corresponding points and in the projected low-dimensional space are also neighbors. Suppose the projection from the high-dimensional space to the low-dimensional one is , where is the projection vector; the objective function of LPP is given in 
In (3) the weighted adjacency matrix can be computed by either (1) or (2). The matrix is a diagonal one , and . The constraint was added so that the arbitrary scaling in the embedding can be removed. The minimization problem in (3) thus becomes
LPP can be solved by the generalized eigenvector approach :
3. Samples’ Inner Structure Based Graph (SISG)
As discussed in Section 1, the traditional graph construction method suffers from two main issues: the stiff criterion problem and the loss of inner structure information of samples. To overcome these limitations to some extent, in this section we first present a new approach to graph construction, and graphs constructed by this new approach are called samples’ inner structure based graph (SISG); then we incorporate SISG into LPP, which forms a new algorithm called SISG-LPP.
3.1. The Construction of SISG
Given a set of sample matrices , which is taken from an dimensional image space, we let , , denote the vector pattern of image set , and let (, and is the maximum column number of the sample matrix) be the column number of each sample (Algorithm 1). denotes the th column of sample matrix . SISG can be denoted by , where corresponds to the vectors in the set is the adjacency matrix of SISG which denotes the edge set between the sample pairs, is the sample similarity matrix of SISG which denotes sample similarities between sample pairs, and is the weighted adjacency matrix which denotes the weight values of the edges between sample pairs. There are two steps to build a SISG, as detailed below.
Step 1. For the th columns of all the samples, we calculate the nearest neighbors of each column.
Definition 1 (column similarity). Column similarity is calculated by the column similarity function: , where and is the width parameter in the heat kernel. Let denote the adjacency matrix of all samples’ th columns:
The meaning of (6) is as follows: for the th column of the sample matrix (), if the column similarity between this column and is greater than the mean of column similarities between and all other samples’ th column, will become a neighbor of , and we place an edge between and ; that is, .
Figure 1 shows the 3 nearest column neighbors of , and the black rectangular boxes in the sample matrices represent the th column of each sample.
Step 2. Determine every sample’s neighbors by the sample similarity between sample pairs, and calculate the weight value for each neighbor pair.
In this step, the original samples are transformed into their forms representation, and we denote these vectors by .
Definition 2 (sample similarity). Sample similarity is determined by the number of column neighbors between sample pairs.
We let denote the number of column neighbors of sample for . So, ( is the maximum column number of a sample matrix) means the number of column neighbors of sample matrix for sample matrix . It is noted that is normally not equal to , and this characteristic is simply shown in Section 3.2. Consequently, the sample similarity of sample for is described by .
When deciding which samples can be the nearest neighbors of sample , we only consider those samples whose sample similarity with sample is nonzero. This is because if there are no nearest neighbor columns between two sample matrices, these two sample matrices will not be similar at all, and thus they cannot become neighbors. The weighted adjacency matrix of the SISG is constructed according to the following equation: where is a vector and , . denotes the sample similarity of sample for all other samples. represents the -norm, which is the number of nonzero entries in a vector. So denotes the number of nonzero elements in vector . represents the -norm, which is a linear combination of the absolute values of all entries in a vector. So is the sum of all entries in vector .
The meaning of (7) is as follows: if the sample similarity between samples and is greater than the mean of sample similarities between and all samples, then becomes a neighbor of , and we put an edge between them; that is, .
The weight value of the edge between and is the heat kernel multiplied by the sample similarity between samples and . By doing this, the greater the sample similarity between samples and , the more important the corresponding edge in . The meaning of is as follows: the weight value between and is proportional to their sample similarity and inversely proportional to their Euclidean distance.
3.2. Characteristic of SISG
Both the weighted adjacency matrix and adjacency matrix are asymmetric in most situations, and the symmetric ones are only special cases. This characteristic is more reasonable for effectively capturing and fitting the relationship among samples.
For , the mean of column similarities between this column and all other samples’ th column is calculated as follows:
For , the mean of column similarities between this column and all other samples’ th column is calculated as follows: (1)The following situations may arise when calculating the column similarity.(1.1)if and become column neighbors of each other.(1.2)if is not a column neighbor of , and vice versa.(1.3)if is a column neighbor of while is not a column neighbor of .(2)The following situations may arise when calculating the column similarity.(2.1)if sample and sample become neighbors of each other.(2.2)if sample is not a neighbor of sample , and vice versa.(2.3)if sample is a neighbor of sample while sample is not a neighbor of sample .
If all the columns of and all the columns of meet (1.1) or (1.2), will equal . Furthermore, if (2.1) or (2.2) can always be met for arbitrary and , . This is because the edge weights of SISG depend on the sample similarity of each sample. In this case, both the weighted adjacency matrix and the adjacency matrix are symmetric.
If the condition in (1.3) is met, will not be equal to . Furthermore, if (2.1) or (2.2) can always be met for arbitrary and , the adjacency matrix will be symmetric, but will not be equal to and the weighted adjacency matrix will be asymmetric.
Apart from the two cases discussed in the above two paragraphs, both the weighted adjacency matrix and adjacency matrix are normally asymmetric.
We should notice that for all the columns of and all the columns of they must meet (1.1) or (1.2) at the same time, but (1.1) can never be met alone. Because is the mean of column similarities, it is impossible that all the column similarities between any two columns and are both greater than . For similar reasons, (1.2), (2.1), and (2.2) can never be met alone either.
From what has been discussed above, we can see that both the weighted adjacency matrix and adjacency matrix are generally asymmetric, and symmetric ones are just their special cases, which are very rare situations.
The construction method of SISG is very general. So SISG can be used in many graph-based dimensionality reduction algorithms. In this subsection, we use SISG in state-of-the-art unsupervised DR algorithm, locality preserving projection (LPP), to develop a new DR algorithm called SISG-LPP.
Similar to LPP, the goal of SISG-LPP is preserving the local manifold structures in high-dimensional space. Given a set of samples , , in high-dimensional space, we try to find a transformation matrix which can map these points to a set of points in low-dimensional space. Assuming that the projection is , where is the projection vector, the objective function of SISG-LPP is given in
In the above, is an asymmetric matrix, but is a symmetric matrix. Let and denote the diagonal matrices; the entries of are column sums of , and the entries of are column sums of . is the diagonal matrix whose entries are column sums of . So, is the Laplacian matrix. Because provides a measure on the “importance” of the data points, we impose the following constraint:
Thus, the optimization objective of SISG-LPP is
The solutions of (18) can be obtained by solving the generalized eigenvalue decomposition problem . That is to say, the projection vectors of (18) are actually the eigenvectors which correspond to the first smallest eigenvalues of .
In order to intuitively illustrate the construction process and the properties of SISG, we created an experiment to elucidate structure changes of SISG during the construction process. In addition, the experiment was also designed to show the differences between SISG and the -nearest neighbor graph.
To investigate the influence of parameters on the classification performance of learning algorithms, we designed an experiment to show the sensitivity of LPP to the neighbor parameter .
In order to test and evaluate the effectiveness of SISG and SISG-LPP, we conducted a series of face recognition experiments on three well-known databases.
4.1. Experiment for the Structure of SISG
In this experiment, we hope to demonstrate the structure changes of SISG during the construction process. By comparing the structure of -nearest neighbor graph and that of SISG, we will be able to illustrate differences between them. This experiment was conducted on the well-known ORL database . The ORL database contains 400 images from 40 different persons (ten for each person). All images are gray scale and the size of each image is pixels. Figure 6(a) shows 10 sample face images for one person in ORL.
First, we design a dataset containing ten images selected from the ORL database. Among all these images, six of them belong to the same person (in our dataset, images 2, 3, 4, 6, 7, and 9 belong to the same person) and the rest were selected at random. Then, we visualize the sample similarity matrix () of SISG, the adjacency matrix () of SISG, and the traditional -nearest neighbor graph for the dataset, as shown in Figures 2, 3(a), and 3(b). Finally, we compare the adjacency matrix of SISG and the -nearest neighbor graph.
In Figure 2, those numbers without parentheses display the sample similarity between sample pairs. Sample similarity is described by the number of column neighbors between sample pairs. The value of Row 2 and Column 6 of Figure 2 is 16, and the value of Row 6 and Column 2 is 19, which, respectively, means that 16 columns of Image become column neighbors of Image and 19 columns of Image become column neighbors of Image . After calculating the sample similarities, we determine each nearest neighbor according to sample similarity. When determining the nearest neighbors of each sample, we do not consider such sample pairs which are not similar to each other at all. For example, the value of Row 1 and Column 3 of Figure 2 is zero, which means Image is impossible to become a neighbor of Image .
Black squares in Figures 3(a) and 3(b) indicate the two samples which are connected by them are neighbors. From Figures 3(a) and 3(b) we can see the following.(1)The -nearest neighbor graph is symmetric while the SISG is asymmetric. For example, we can observe from (a) that Image is a neighbor of Image but Image is not a neighbor of Image .(2)SISG can more accurately reflect the relationship between samples. From (a) we can see that images which became neighbors generally belong to the same person. For example, from the second row of (a) we can see that Images , and are neighbors of Image , and from the third row of (a) we also can see that Images , and are neighbors of . We know that, in our dataset, Images , and belong to the same person, so this example shows that the SISG makes similar samples become neighbors.(3)The -nearest neighbor graph does not very successfully reflect the relationship between samples. We also take Image as an example; from the second row of (b) we can observe that Images , and are neighbors of Image , but in fact, Images and do not belong to the same person. From the third row of (b) we can also see that Images , and are neighbors of Image , but in fact, Images , and do not belong to the same person.
4.2. Face Manifold Visualization
In this experiment, we compare the visualization effect of SISG-LPP, LPP, PCA, and NPE. We randomly selected 4 people from the ORL database and 10 samples from each person. Then we mapped all these samples to the 2-dimensional subspace using these algorithms. From Figure 4(a) we can see that SISG-LPP more effectively separates the 4-class samples in its 2-dimensional reduction subspace. In contrast, in the subspaces of LPP (Figure 4(b)) and NPE (Figure 4(c)), the samples are not very well separated, and more than half of the samples overlapped. From Figure 4(d) we can see that in the subspace of PCA, the samples are basically entangled together.
4.3. Parameter Sensitivity of LPP
Since SISG does not have the neighbor parameter , in this experiment, we only investigate the sensitivity of LPP to the neighbor parameter . During this experiment, the set of images selected from face databases were partitioned into different sample collections. We use to indicate that for each person in the face database images were selected at random for training and the remaining images were employed for testing. We conducted this experiment on in the ORL database. From Figure 5 we can see that LPP is very sensitive to the neighbor parameter . In contrast, SISG-LPP does not have neighbor parameter , so it is much less sensitive to the parameter than LPP.
4.4. Face Recognition
To evaluate the proposed SISG and SISG-LPP algorithm, we compare the performance of SISG-LPP with LPP on three face databases. Here, we adopt the benchmark face databases ORL , YALE , and the subset of CMU PIE  to conduct experiments on face recognition. There are faces of 15 people in the YALE database, and each person has 11 face images with size . The CMU PIE database contains 41,368 images from 68 people, and the word PIE means Pose, Illumination, and Expression. The size of images in the PIE database is . In this research, we use the Illumination subset of the CMU PIE database to conduct our experiment; we select 16 different images per person from the Illumination subset. Figures 6(a), 6(b), and 6(c) show part of the face images in ORL, YALE, and the Illumination subset of PIE, respectively.
As described above, we use to indicate that images from each person are randomly selected as the training data and the remaining images are used for testing. For each division with , 50 random splits are generated and the final performance of the algorithm being tested is obtained by averaging the results of 50 classification accuracy values. The neighbor parameter for LPP is set to .
Firstly, for each person we select ( = 5, 6) images from the ORL database and YALE database, respectively. Four divisions are considered: on the ORL database as well as , on the YALE database; 50 random splits are generated and the final results of the four divisions are obtained by taking the mean of the 50 classification accuracy values. The accuracy values versus the numbers of reduced dimensions are shown by Figure 7.
From Figures 7(a) and 7(b) we can see that SISG-LPP outperforms LPP for all divisions, and from Figure 7(b) we can observe that SISG-LPP significantly outperforms LPP when the number of the reduced dimensions is relatively low. From Figures 7(c) and 7(d) we can find that SISG-LPP outperforms LPP for most of the situations.
From the results shown in Tables 1~3, one can find the following.(1)The overall accuracy values of all algorithms are improved at various degrees when the number of training samples is increased.(2)From the results shown in Tables 1 and 2, one can see that the recognition accuracy values of SISG-LPP are much higher than that of LPP when the number of training samples is relatively small. For instance, the accuracy of SISG-LPP on the division of the ORL database is higher than that of LPP, while the accuracy of SISG-LPP on the division is only higher than that of LPP.(3)SISG-LPP outperforms LPP in all divisions with on the ORL and YALE databases. SISG-LPP outperforms LPP for most of the divisions with on the Illumination subset of PIE database.
In this paper, we present a new graph construction method, and we name the graph constructed by this method as samples’ inner structure based graph (SISG). Unlike the traditional graph construction method, SISG avoids predefining neighbor parameter (or ). Moreover, SISG can also well preserve intrinsic features of samples by using samples’ inner structure information to construct graph. Both the weighted adjacency matrix and the adjacency matrix of SISG are generally asymmetric, which may be more reasonable for capturing the relationships among samples. For the sake of proving that the construction method of SISG is very general, we incorporated it into a state-of-the-art DR algorithm, locality preserving projection (LPP), and thus developed a novel DR algorithm SISG-LPP. Finally, several experiments are conducted on three well-known face databases. Experimental results verified the effectiveness and feasibility of the SISG and SISG-LPP algorithms.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research is supported by (1) the Ph.D. Programs Foundation of Ministry of Education of China under Grant (no. 20120061110045) and (2) the Natural Science Foundation of Jilin Province of China under Grant (no. 201115022).
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholköpf, “Learning with local and global consistency,” in Advances in Neural Information Processing Systems, vol. 16, pp. 321–328, The MIT Press, 2004.View at: Google Scholar
M. Turk and A. P. Pentland, “Face recognition using eigenfaces,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 586–591, June 1991.View at: Google Scholar
H. Yu and J. Yang, “A direct LDA algorithm for high-dimensional datawith application to face recognition,” Pattern Recognition, vol. 34, pp. 2067–2070, 2001.View at: Google Scholar
J. B. Tenenbaum, “Mapping a manifold of pe rceptual observations,” Advances in Neural Information Processing Systems, pp. 682–688, 1998.View at: Google Scholar
M. Maier, U. von Luxburg, and M. Hein, “Influence of graph construction on graph-based clustering measures,” in Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS '08), pp. 1025–1032, December 2008.View at: Google Scholar
T. Jebara, J. Wang, and S. F. Chang, “Graph construction and b-matching for semi-supervised learning,” in Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09), pp. 441–448, ACM, Montreal, Canada, June 2009.View at: Google Scholar
S. I. Daitch, J. A. Kelner, and D. A. Spielman, “Fitting a graph to vector data,” in Proceedings of the 26th International Conference on Machine Learning (ACM '09), pp. 201–208, Montreal, Canada, June 2009.View at: Google Scholar
A. Argyriou, M. Herbster, and M. Pontil, “Combining graph laplacians for semi-supervised learning,” in Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS '05), pp. 67–74, December 2005.View at: Google Scholar
M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,” in Proceedings of the Neural Information Processing Systems (NIPS '01), vol. 14, pp. 585–591, 2001.View at: Google Scholar
X. F. He and P. Niyogi, “Locality preserving projections,” in Proceedings of the Neural Information Processing Systems (NIPS '04), vol. 6, pp. 1059–1071, 2004.View at: Google Scholar
F. Chung, Spectral Graph Theory, Cbms Regional Conference Series in Mathematics, no. 92, American Mathematical Society, 1997.
G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, Md, USA, 3rd edition, 1996.View at: MathSciNet
ORL database: Cambridge University Computer Laboratory, 2002, http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.
D. Cai, X. He, and J. Han, “SRDA: an efficient algorithm for large scale discriminant analysis,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 1, pp. 1–12, 2007.View at: Google Scholar