An Improved EMD-Based Dissimilarity Metric for Unsupervised Linear Subspace Learning

Yu, Xiangchun; Yu, Zhezhou; Pang, Wei; Li, Minghao; Wu, Lei

doi:https://doi.org/10.1155/2018/8917393

Complexity

On this page

Abstract Introduction Related Work Results Conclusions Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2018 | Article ID 8917393 | https://doi.org/10.1155/2018/8917393

An Improved EMD-Based Dissimilarity Metric for Unsupervised Linear Subspace Learning

Xiangchun Yu,¹Zhezhou Yu,¹Wei Pang,²Minghao Li,³and Lei Wu¹

Academic Editor: Danilo Comminiello

Received04 Jul 2017

Revised20 Nov 2017

Accepted04 Dec 2017

Published18 Feb 2018

Abstract

We investigate a novel way of robust face image feature extraction by adopting the methods based on Unsupervised Linear Subspace Learning to extract a small number of good features. Firstly, the face image is divided into blocks with the specified size, and then we propose and extract pooled Histogram of Oriented Gradient (pHOG) over each block. Secondly, an improved Earth Mover’s Distance (EMD) metric is adopted to measure the dissimilarity between blocks of one face image and the corresponding blocks from the rest of face images. Thirdly, considering the limitations of the original Locality Preserving Projections (LPP), we proposed the Block Structure LPP (BSLPP), which effectively preserves the structural information of face images. Finally, an adjacency graph is constructed and a small number of good features of a face image are obtained by methods based on Unsupervised Linear Subspace Learning. A series of experiments have been conducted on several well-known face databases to evaluate the effectiveness of the proposed algorithm. In addition, we construct the noise, geometric distortion, slight translation, slight rotation AR, and Extended Yale B face databases, and we verify the robustness of the proposed algorithm when faced with a certain degree of these disturbances.

1. Introduction

Although many sophisticated algorithms have been proposed, face recognition is still a challenging problem affected by many external factors such as the occlusion, illumination, noise, geometric distortion, translation, and rotation of face images. Recently, face recognition algorithms based on deep learning have achieved good performance [1–6]. Stacked autoencoder (SAE) [7] is an unsupervised neural network approach, where the input and target values are the same. In SAE, the deepest hidden layer carries the features we are interested in. The input layer and the deepest hidden layer are connected by multiple encoding layers, and the deepest hidden layer and output layer are connected by multiple decoding layers. The activation values of the deepest hidden layer nodes are essentially the deep representation features which are used to perform classification tasks by feeding them to the corresponding classifier such as Softmax. In order to obtain more robust features, random noise can be added to the input layer of SAE. This method is called Stacked Denoising Autoencoders (SDAE) [8]. In practical applications, the values of the input layer nodes can be set to be 0 with a certain probability and it can extract more robust features. However, SAE and SDAE both adopt the fully connected way to establish a connection between the input layer (hidden layer) and another hidden layer. The disadvantage is that a large number of parameters need to be learned when training SAE and SDAE. Take SDAE as an example, we set the hidden nodes to be 100, and the number of the network weights and bias will be when extracting the deep features over images with pixels. In order to overcome the limitations of the fully connected network, Convolutional Neural Networks (CNN) [9, 10] are proposed, where the local connected way can effectively reduce the computational complexity of model training. In addition, another important advantage of using local connected network is that we can extract local information in the input space, which is consistent with the mechanism of the visual center. The LeNet-5 [10] model is one of the most classic CNN models. The convolution kernel is learned during model training by back propagation. Another method to learn convolution kernels is the unsupervised approach: the stacked autoencoders [7, 8] are used to extract the corresponding size of convolution kernels. In UFLDL_CNN [11], convolution kernels are learned by unsupervised Stacked Autoencoder (SAE) [7, 8] and used to perform the convolution operations at the convolutional layer. However, in addition to learning a lot of network weight values and bias, these methods need to choose several hyperparameters, for example, sparsity penalty coefficient (as in autoencoder algorithm [12, 13]) and weight penalty coefficient (as in regularization of deep neural network) [14]. Furthermore, these parameters need to be selected by cross validation, so the complexity and high computational cost are attached to these algorithms [14]. On the other hand, algorithms based on keypoints, such as sift [15] and surf [16], have demonstrated good performance in face recognition and they are robust to scale changes, rotation, illumination, and other disturbances [15–17]. But the disadvantage of these methods is that they may require a large number of keypoints, and the number of queries between keypoints is even larger. This makes it difficult to effectively perform face recognition or retrieval tasks in large-scale face image datasets.

Linear Subspace Learning is a kind of linear projection method which assumes that the high-dimensional data are located in a low-dimensional manifold that is linearly or approximately linearly embedded in the ambient space [18–22]. Linear Subspace Learning is often used in pattern recognition and computer vision tasks. Iosifidis et al. [20] proposed the optimal class representation algorithm based on the linear discriminant analysis, which increased discrimination between classes. F. Liu and X. Liu [23] proposed the Locality Enhanced Spectral Embedding (LESE) and novel Spatially Smooth Spectral Regression (SSR) methods for face recognition. It not only constructed a good locality preserving mapping but also made full use of the spatial locality information of face image matrix. Tzimiropoulos et al. [21] proposed a subspace learning algorithm based on the image gradient orientations, which has shown good performance on appearance-based object recognition. Zhang et al. [24] proposed a Linear Subspace Learning method using the sparse coding to learn a dictionary, and the aim is to fully exploit different image components. Both unsupervised and supervised criteria are proposed in order to learn the corresponding subspace. Cai et al. [22] proposed a spatially smooth subspace approach for face recognition which took full account of spatial correlation of face image and used the Laplacian penalty to learn the corresponding spatially smooth subspace. In addition, some kernel based technologies [25–28] are also applied in face recognition, and they are mainly used to explore the nonlinear relationship between face images. When nonlinear information is contained in the dataset, the kernel based techniques will exhibit good properties. These Linear Subspace Learning methods mainly fall into two categories, supervised and unsupervised methods. For supervised methods, sample labels are used in model training, provided that these labels have been manually marked. Although sometimes some supervised methods can learn good subspaces, the disadvantage is that the samples need to be marked; so in reality, unsupervised methods are more commonly used. In this research, we focus on unsupervised methods and are committed to building a good subspace by unsupervised learning algorithms. For Linear Subspace Learning, the high-dimensional input data are mapped onto a low-dimensional space by linear projection to achieve dimensionality reduction; it is a common and critical processing module in pattern recognition. Preprocessing, feature selection, feature extraction, pooling operations, and so on are implicitly or explicitly attached to dimensionality reduction operations [29]. Furthermore, the discrimination process can be viewed as a dimensionality reduction operation where high-dimensional input data are mapped onto low-dimensional class data (binary vector consisting of 0 and 1) [29].

Raw data in reality are often high-dimensional. High-dimensional data on the one hand can increase the computational burden of recognition system; on the other hand, it brings in a negative impact (arising from noise or outlier) on robust recognition tasks with limited training sample sets [29]. Importantly, raw data are often unlabeled. Therefore, this research focuses on the algorithms based on Unsupervised Linear Subspace Learning. Different from deep learning algorithms, Unsupervised Linear Subspace Learning does not need the process of selecting complex hyperparameters. Different from the keypoint ones, grids of HOG [17] or grids of pHOG over each face image are extracted and all of them are collected to form the final descriptors.

However, the raw data and the “features” extracted from them such as grids of HOG or grids of pHOG are still high-dimensional, so it is necessary to further learn the linear subspace of them. More importantly, the data from subspace can be guaranteed to have the same dimension after completing Linear Subspace Learning, and it enables subsequent classifier training, such as Softmax or SVM. In this research, in order to best evaluate the performance of the subspace learning algorithm, we adopt the nearest neighbor (NN) to be the classifier.

One of the most important tasks of Linear Subspace Learning is to construct the adjacency graph which is used to describe the nearest neighbor relationship between samples. In order to calculate the dissimilarity, a sample is drawn into a row or column which ignores the structural information of the sample. Then metric (Euclidean distance) is often used to measure the dissimilarity between any two samples.

Face images taken from cameras often suffer from noise, geometric distortions [30, 31] and sometimes complex geometric distortion can occur during shooting, storage, and transmission. The most common geometric distortion is radial distortion, which includes barrel distortion and pincushion distortion. Some examples of face images suffering from noise, geometric distortions, slight translation, and rotation changes are shown in Figure 1.

In this research, we consider the algorithms that are robust to a certain degree of noise, geometric distortions, slight translation, and slight rotation changes. We construct the noise, geometric distortion, slight translation, slight rotation AR, and Extended Yale B face databases, and we also verify the robustness of our proposed algorithms to a certain degree of these disturbances. See Section 4 for detailed information about these databases.

The main contributions of our proposed algorithm are as follows.

In order to reduce the computational complexity and enhance robustness, we propose and perform pooling operations on the “granularity” of cell over each block. That is to say, we accumulate the histograms for all cells over the block and we obtain a pHOG histogram over the block. Then, an improved EMD metric instead of metric is adopted to compare any two pHOG histograms over corresponding blocks from two different face images. It can effectively deal with the quantization problem of rigid binning.

We attach great importance to the structural information of samples. In order to effectively preserve the structural information of the sample, each face image is divided into blocks with the specified size and we propose the Block Structure LPP (BSLPP) algorithm based on the improved EMD metric, which overcomes the limitation of the original LPP.

We construct the noise, geometric distortion, slight translation, slight rotation AR, and Extended Yale B face databases and verify the robustness of the algorithm against a certain degree of these disturbances.

The rest of this paper is organized as follows: we first review related work in Section 2. In Section 3, we present our improved our improved EMD-based dissimilarity metric for Unsupervised Linear Subspace Learning. Experiments and results are reported in Section 4, and this is followed by the conclusions made in Section 5.

Earth Mover’s Distance (EMD) [32] is a metric proposed for some vision problems, and it can measure the dissimilarity between two distributions. EMD has been successfully applied in image retrieval, and with EMD the quantitative measure of dissimilarity between any two samples is defined by the dissimilarity of two distributions, which correlates to human perception to some extent [32].

An intuitive explanation of EMD is as follows: given two distributions (normalized histograms), one is taken as “supply” with a mass of earth properly spreading in space, and the other is regarded as “demand” with collection of holes. So, the solution is the minimal work (cost) that must be done to fill the holes with earth [32]. And the formula of EMD defined by Rubner et al. [32] is given as follows:subject to

The variables involved in Formula (2) are consistent with the ones in Formula (3). Compared with the metric histogram matching technique, EMD (the Cross-Bin Dissimilarity Measure, as shown in Figure 2(b)) can not only effectively deal with the quantization problem of rigid binning (the Bin-by-Bin Dissimilarity Measure, as shown in Figure 2(a)), but also demonstrate robustness to shape deformation.

(a) The Bin-by-Bin Dissimilarity Measure

(b) The Cross-Bin Dissimilarity Measure

We explain the results of the dissimilarity measure in Figure 2: in Figure 2(a), the or metric is adopted to measure the dissimilarity. For simplicity and intuitive display, we choose the metric and let denote the distance between and , so ; in Figure 2(b), EMD is adopted to measure the dissimilarity, so EMD (according to Formula (1)). So it is not difficult to see that the “Cross-Bin Dissimilarity Measure” can effectively deal with the quantization problem of rigid binning and correlates to human perception.

However, the EMD metric can only be used for normalized histograms. More importantly, it will suffer from high computational burden, and the worst-case complexity of time for this algorithm is exponential [33]. In order to avoid the limitation of EMD, Pele and Werman proposed the EMD variant [33]: an improved EMD-based dissimilarity measure with thresholded ground distance. It is a metric for nonnormalized histograms and shows robustness to quantization, shape deformation, and occlusion. Furthermore, it is a linear time algorithm and the time complexity is [33]. Pele and Werman’s EMD variant is given as follows [33]:where and are the two (nonnormalized) histograms and is the flow, with each denoting the amount of mass flowing from the -th “supply” to the -th “demand”. represents the thresholded ground distance, which is set to be zero for corresponding bins, one for the adjacent bins and two for other bins including the extra mass in the histogram [33]. Thresholded ground distance is just the thresholded module metric, and see [33] for detailed definitions. Parameter in Formula (3) controls the value of the second item when the masses of and are not equal.

The improved EMD metric is a metric for nonnormalized histograms. So, in order to measure the dissimilarity between two images by the improved EMD algorithm, we first obtain the corresponding histograms of the image and an optional one is the Histogram of Oriented Gradient (HOG). The HOG features [17] possess a certain degree of invariance to local geometric and photometric deformations, and the local shape of objects in an image can be characterized by capturing edge or gradient structure [17]. Dalal and Triggs [17] applied the HOG descriptors to human detection, which performed much better than other feature sets. They also explored the influence of the fine-scale gradients, orientation binning, spatial binning, the local contrast normalization operation, and so on, and they finally obtained the HOG descriptors for the robust visual object recognition. Zhu et al. [34] adopted a cascade of histograms of oriented gradients for fast human detection. They used the AdaBoost algorithm to select the best blocks and then built the rejector-based cascade, which not only is a near real-time human detection method but also performs well in terms of accuracy. Freeman and Roth [35] presented the histograms of local orientation for hand gestures recognition. Newell and Griffin [36] extended the HOG and proposed multiscale histogram of oriented gradient descriptors for robust character recognition. Monzo et al. [37] compared the novel face recognition algorithm HOG-EBGM with GABOR-EBGM. The experiments showed that HOG-EBGM was more robust to illumination and rotation of images. Dniz et al. [38] employed the HOG features for face recognition. They firstly normalized the face images and then acquired the HOG descriptors using a regular grid. They also implemented a fusion strategy to combine information from different sizes of patches.

The main process of extracting the HOG features is illustrated in Figure 3.

The dimensionality of “features” is always high and contains redundant information (e.g., noise or outliers). Therefore, many features are not necessary and we aim to extract a small number of good features. Linear Subspace Learning [29, 39, 40] is one of the most powerful tools to perform dimensionality reduction. According to whether the labeled samples are used in training process, Linear Subspace Learning can be divided into three categories: the first category is Unsupervised Linear Subspace Learning [41], where no labeled samples are used; the second one is Semisupervised Linear Subspace Learning [42], where part of labeled samples are used; the last one is Supervised Linear Subspace Learning [41, 43] where all labeled samples are used.

The most typical unsupervised, semisupervised, and supervised algorithms in face recognition are Locality Preserving Projections (LPP) [18], Semisupervised Discriminant Analysis (SDA) [44], and Locality Sensitive Discriminant Analysis (LSDA) [45], respectively. However, the raw data are often unlabeled, so in this research we focus on the algorithms based on Unsupervised Linear Subspace Learning. Therefore, we adopt the typical Unsupervised Linear Subspace Learning methods LPP [18] to reduce the dimensionality of “features” of face images. More importantly, in order to make better use of the structural information of face images, we proposed a novel algorithm named Block Structure LPP (BSLPP). We also use BSLPP to reduce the dimensionality of “features” of face images.

The adjacency graph building method plays an important role in LPP and BSLPP. We adopt a dissimilarity metric based on the improved EMD metric rather than the metric (Euclidean metric) to conduct Unsupervised Linear Subspace Learning, where we expect to achieve better performance on the recognition rate and robustness to illumination, occlusion, noise, geometric distortion, and other disturbances.

3. Unsupervised Linear Subspace Learning Based on the Improved EMD

In this section, we describe our improved EMD-based dissimilarity metric for Unsupervised Linear Subspace Learning. First of all, we describe the Locality Preserving Projections (LPP) algorithm. Then, we elaborate our first algorithm (Algorithm 1): the improved EMD metric for LPP. Finally, we further introduce our second algorithm (Algorithm 2): the improved EMD metric for BSLPP.

Input: the sample set with samples, parameter , block parameter , pHOG bins, nearest neighbors parameter
Output: adjacency graph , weight matrix , transformation matrix , eigenvalues , and subspace y
While
Extract HOG histogram over each block of per face image
Carry out the pooling operation over each block and then get the pHOG histogram
Obtain the grids of pHOG vector for one face image and grids of pHOG vectors for the rest of face images ,
Compute the dissimilarity between and by Equations (8) and (9)
Obtain the nearest neighbors of the face image :
EndWhile
Build the adjacency graph and calculate the corresponding weight matrix by Equation (10)
Begin // compute the projection
Get the diagonal matrix
Solve the generalized eigenvector problem of Equation (11) on the sample set
Get the eigenvectors with respect to eigenvalues
End // compute the projection
Obtain the transformation matrix
Obtain the subspace for the sample set by Equation (13)
(16) Perform face recognition by the classifier

Input: the sample set with samples, parameter , sub adjacency graph weight parameter , block , pHOG bins, parameter
nearest neighbors parameter
Output: adjacency graph , weight matrix , transformation matrix , eigenvalues , and subspace y
While
While
Extract HOG histogram over the block of per face image
Carry out the pooling operation over the block and then get the pHOG histogram
Obtain pHOG vector over the block for one face image and pHOG vectors over the blocks for
the rest of face images ,
Compute the dissimilarity between and by Equations (3) and (2)
Obtain the nearest neighbors for the block of the face image :
EndWhile
Obtain the adjacency graph and the corresponding weight matrix
EndWhile
Merge these sub adjacency graphs over blocks by Equation (15)
Build the adjacency graph and calculate the corresponding weight matrix by Equation (16)
Begin // compute the projection
Get the diagonal matrix
Solve the generalized eigenvector problem of Equation (11) on the sample set
Get the the eigenvectors with respect to eigenvalues
End // compute the projection
Obtain the transformation matrix
Obtain the subspace for samples set by Equation (13)
Perform face recognition by the classifier

3.1. Locality Preserving Projections (LPP)

Locality Preserving Projections are a linear dimensionality reduction method, which falls into the graph embedding framework [46–48]. The adjacency graph building method [18, 44–48] plays an important role in the performance of LPP. The detailed steps of LPP are as follows:

(a) Use the -neighborhoods to build the adjacency graph , and and will be connected if one of the two nodes is among the nearest neighbors of the other one and the value is set to be 1; otherwise 0.

(b) Choose the weights. The two commonly used methods are heat kernel and simple-minded, [18]. We apply the K-nearest neighbor (KNN) to build the adjacency graph which can well present the local geometrical structure on data manifold. Let be the set of its -nearest neighbors. We choose the simple-minded weight, so the adjacency graph and the corresponding weight matrix are defined below.

(c) Compute the projection. We solve the following generalized eigenvector problem to get the eigenvectors in accordance with the eigenvalues.where and denotes the number of samples; , and is a diagonal matrix whose entries are the row or column sum of the sparse symmetric weight matrix [18], that is, And are the eigenvectors with respect to eigenvalues .

(d) LPP embedding: is the transformation matrix, and the original samples can be embedded into the dimensional subspace through the following embedding:

3.2. The Improved EMD Metric for LPP

The original Linear Subspace Learning method adopts the metric to calculate the dissimilarity between two samples. However, the dissimilarity between two nonnormalized histograms (such as HOG histograms) by the metric may suffer from the quantization problem of rigid binning, while the improved EMD metric calculates the dissimilarity between two nonnormalized histograms which correlates with human perception and tolerates the problems of quantization, distortion, occlusion, and other disturbances. The improved EMD metric has been briefly introduced in Section 2. In order to preserve the structural information of samples, we divided each face image into blocks with the specified size and extracted Histogram of Oriented Gradient over each block. In order to reduce the computational complexity and improve robustness, we perform the pooling operations on the “granularity” of cell over each block. And the main process of extracting the pHOG features is illustrated in Figure 4. The detailed steps of our first algorithm (Algorithm 1) are given as follows:

(a) Calculate the dissimilarity [17, 33]. The improved EMD metric is a linear time histogram metric with a low computational cost. We use this metric to calculate the dissimilarity of the pHOG histograms (vectors) over blocks instead of the original EMD. The face image is divided into blocks and a pooled histogram of oriented gradients (pHOG) with 12 bins is obtained over each block. We compare any two pHOG histograms over corresponding blocks from two different face images by the improved EMD metric and the sum of the dissimilarity is taken as the dissimilarity between the two face images. Then we use the dissimilarity measure to obtain the -nearest neighbors of each face image to build the adjacency graph . So the final dissimilarity measure metric is as follows:subject to

In the above, denotes the dissimilarity of and , where , denotes the pHOG histogram over the block, similarly , and denotes the pHOG histogram over the block. denotes the flows of the pHOG histogram over the block: the amount transported from the bin (supply) to the bin (demand) is represented by . denotes the ground distance from the bin to the bin. According to Pele and Werman’s EMD variant [33], the is a metric when and the ground distance is also a metric. And is usually set to be 1 and we also adopt the same parameter in this research. The detailed process of calculating the dissimilarity between any two face images of the AR face database is shown in Figure 5.

(b) Chose the weights. We use the improved EMD metric (as described by (8) and (9)) to calculate the -nearest neighbor. Let be the set of its -nearest neighbors calculated by the improved EMD metric. The adjacency graph and the corresponding weight matrix are defined below:

(c) Compute the projection. Solve the following generalized eigenvector problem to obtain the eigenvectors in accordance with the eigenvalues.where , denotes the number of training samples, , and is a diagonal matrix whose entries are the row or column sum of , as shown below:

are the eigenvectors with respect to eigenvalues .

(d) LPP embedding: is the transformation matrix, and the original samples can be embedded into a dimensional subspace through the following embedding:

3.3. The Improved EMD Metric for Block Structure LPP

In the original LPP algorithm, in order to calculate the dissimilarity, a sample is drawn into a row or column, and this ignores the structural information of the sample, which plays an important role in Linear Subspace Learning. In order to preserve the structural information of samples, we divided each face image into several blocks with the specified size. We proposed a novel algorithm named Block Structure LPP (BSLPP) based on the improved EMD metric. The main difference between Algorithms 1 and 2 is that the adjacency graph is constructed differently. So in this section, we only elaborate the detailed process of building the adjacency graph and other steps of Algorithm 2 are consistent with Algorithm 1.

The process of building affinity graph in our proposed algorithm includes three main steps.

We firstly calculate the dissimilarity between the block from one face image and corresponding blocks from the rest of face images with the improved EMD metric.

Secondly, we get the -nearest neighbors for the corresponding block and we build the sub-adjacency graph over blocks, denoted by . Let be the set of its -nearest neighbors calculated by the improved EMD metric. Let denote all the blocks of face image . The adjacency graph and the corresponding weight matrix over the blocks of all face images are defined below.

Finally, we obtain the final adjacency graph by merging these sub-adjacency graphs over blocks. The merge function is as follows:

Among them, parameter denotes the weight of the sub-adjacency graph . In this research, we simply set this parameter to be . The final adjacency graph and the corresponding weight matrix are defined below.

When the final adjacency graph and the corresponding weight matrix are obtained, we can conduct the Block Structure LPP subspace learning. When features of face images are mapped onto a subspace, we will get the final “features” for each face image.

In this paper, we present a dissimilarity metric based on the improved EMD for Unsupervised Linear Subspace Learning. The dissimilarity between two samples is calculated by an improved EMD-based dissimilarity metric, which is a variant of the original EMD [33]. For simplicity, we refer to this dissimilarity metric as “the improved EMD metric” from now on. The whole process is described as follows.

Firstly, the metric will suffer from the quantization problem of rigid binning. So, the improved EMD metric [33] instead of the metric is adopted to compare any two pHOG histograms over the corresponding blocks from two different face images and the sum of the dissimilarity is taken as the final dissimilarity between two different face images. The aim of the pooling operation is to reduce the computational complexity for calculating the improved EMD metric and enhance robustness against occlusion, noise, and other disturbances.

Secondly, in order to preserve the structural information of samples, each face image is divided into blocks with the specified size, and then the pHOG histogram over each block is obtained. In one way (which we call Algorithm 1), an adjacency graph is constructed by comparing -nearest neighbors among face images. In another way (which we call Algorithm 2), we firstly obtain the sub-adjacency graph denoted by over blocks and then get the final adjacency graph by merging these adjacency graphs over blocks.

Finally, a small number of good “features” of face images are obtained by Unsupervised Linear Subspace Learning which includes Algorithm 1 (LPP based on the improved EMD metric, named LPP_IEMD) and Algorithm 2 (BSLPP based on the improved EMD metric, named BSLPP_IEMD). When “features” of face images are mapped onto a subspace, we will get the final “features” for each face image. Among them, the “features” include the grayscale face image, grids of pHOG, and grids of HOG. See Section 4 for more detailed information about these “features.”

4. Experiments and Results

In this section, firstly, we introduce the face databases used in this research as well as detailed experimental settings on these face databases, including training set, test set, and the choice of parameters. Secondly, we describe the experimental setups and the corresponding results for Unsupervised Linear Subspace Learning.

4.1. Face Databases

4.1.1. The AR Face Database

The AR face database has a total of 4,000 frontal images, including 126 individuals (males and females), with 26 images for each person, of which the first 13 and the last 13 were taken in two sessions (14 days). Each image has pixels. Partial occlusions by sun glasses and scarves, illumination variation, and facial expressions occur in this database. In order to verify the effectiveness of the proposed algorithms to a certain degree of noise, geometric distortion, slight translation, and slight rotation, we construct the noise, geometric distortion, slight translation, and slight rotation AR face database.

In order to reduce the difficulty of introducing the noise, geometric distortion, slight translation, and slight rotation into the AR face database, we chose the first 15 males and the first 15 females with first 13 images (we do not consider the time factor) of each person to construct the subAR database, and this gives a total of 390 images for our experiments. We add salt and pepper noise with noise density of 0.02 to the AR face database. We use Adobe Photoshop CS6 to simulate the geometric distortions of the face images, including barrel distortion, pincushion distortion, and the complex geometric distortion. We also use Adobe Photoshop CS6 to simulate the slight translation and slight rotation of face images. The 2nd, 6th, and 9th images of each person on our subAR face database are modified and the aim is that we consider the fusion of the simulated interference factors (noise, geometric distortion, slight translation, and slight rotation) and the inherent interference factors (occlusions, illumination, and facial expressions).

For the noise AR face database, we add salt and pepper noise with noise density of 0.02 to the 2nd, 6th, and 9th images of each person on our subAR face database.

For the geometric distortion AR face database, we add three variants, barrel distortion, pincushion distortion, and the complex geometric distortion, respectively, to the 2nd, 6th, and 9th images of each person on our subAR face database.

For the slight translation AR face database, we add slight translation to the 2nd, 6th, and 9th images of each person on our subAR face database.

For the slight rotation AR face database, we add slight rotation towards to the 2nd, 6th, and 9th images of each person on our subAR face database.

The specific details for our constructing subAR face database are shown in Figure 6.

The specific experimental settings for subAR and the noise, geometric distortion, slight translation, and slight rotation AR face databases, including training set, test set, and the choice of parameters, are as follows.

Five groups (G4/P9,…,G8/P5) of different training and testing sets are selected and we iterate every group data for 20 times, and finally we choose the average value of 20 trials as the recognition rate. denotes images of each person for training and images for testing, where . And the parameters of this experiment are pixels for each block, blocks with a length of 12 bins for each face image in total.

4.1.2. The Extended Yale B Face Database

The second face database used in the experiment is the Extended Yale B. The Extended Yale B face database has 2,414 face images in total, containing 38 individuals with 64 images of each person under 64 illumination conditions. Each image is in pixels. In order to reduce the difficulty of introducing the noise, geometric distortion, slight translation, and slight rotation into the Extended Yale B face database, we chose the first 30 persons with 16 images (we choose the first one in every four of 64 face images of each person) of each person to construct the sub Extended Yale B database, and this gives a total of 480 images for our experiments. We add salt and pepper noise with noise density of 0.02 to the Extended Yale B face database. We use Photoshop to simulate the geometric distortions of the face images, including barrel distortion, pincushion distortion, and the complex geometric distortion. We also use Photoshop to simulate the slight translation and slight rotation of face images.

For the noise Extended Yale B face database, we add the noise (salt and pepper noise with noise density of 0.02) to 2nd, 6th, 10th, and 14th images of each person on our sub Extended Yale B database.

For the geometric distortion Extended Yale B face database, we add three variants, barrel distortion, pincushion distortion, and the complex geometric distortion, respectively, to the 2nd, 6th, 10th, and 14th images of each person on our sub Extended Yale B database.

For the slight translation Extended Yale B face database, we add slight translation to the 2nd, 6th, 10th, and 14th images of each person on our sub Extended Yale B database.

For the slight rotation Extended Yale B face database, we add slight rotation towards the 2nd, 6th, 10th, and 14th images of each person on our sub Extended Yale B database.

The specific details for our constructing sub Extended Yale B face database are as shown in Figure 7.

The specific experimental settings for the sub Extended Yale B and the noise, geometric distortion, slight translation, and slight rotation Extended Yale B face database, including training set, test set, and the choice of parameters, are as follows.

Five groups (G6/P10,…,G10/P6) different training and testing sets are selected and we iterate every group data for 20 times, and finally we choose the average value of 20 trials as the recognition rate. denotes images of each person for training and images for testing, where . And the parameters of this experiment are pixels for each block, blocks with a length of 12 bins for each face image in total.

4.2. Comparison of Experiments with Other Approaches

Before conducting the Unsupervised Linear Subspace Learning, we conducted several experiments for comparison in order to assess the effectiveness of our algorithms. In addition to the algorithms proposed in this paper, those involved in the comparative experiments include deep learning based approaches, keypoints based approaches, and kernel based approaches. The deep learning based approaches include Stacked Denoising Autoencoders (SDAE) [7, 8, 49], LeNet-5 [10, 49], and UFLDL_CNN [11]. For SDAE and LeNet-5, we use the same settings as in [49]. For UFLDL_CNN, at the convolutional layer, 400 convolution kernels are learned by the unsupervised Stacked Autoencoder (SAE) algorithm, and then at the pooling layer we choose a pool size of 5 to conduct the pooling operation. The keypoints based approach we adopt is the sift [15] algorithm, and the specific parameters are the same as those in [15]. Kernel PCA [50] is the kernel based approach. Among them, LeNet-5 [10, 49] is the supervised algorithm, while UFLDL_CNN [11] belongs to the unsupervised one because the 400 convolution kernels are learned by SAE.

Firstly, we conducted several comparative experiments to assess the effectiveness of our algorithms on the subAR face database. We randomly selected 7 images of each face for training, and the rest for testing, and we conducted the comparative experiments on a PC with Intel(R) Core(TM) i7-4790 3.60 GHz Win 8 machine with 8 GB memory. We recorded the corresponding “cputime” (including the training and testing time) for each approach. The final results of experiments are shown in Table 1.

From Table 1, we can see that our proposed algorithm has obtained higher accuracy and consumed relatively less cputime. Although the sift approach achieves high accuracy, it consumes almost the second-longest cputime. The original face images were resized to for UFLDL_CNN1 and resized to for UFLDL_CNN2. The reason why the cputime is longer than 437 seconds is that we need to use SAE to learn about 400 convolution kernels and the same for UFLDL_CNN2. We also point out that our algorithms learn a subspace, which means we get a relatively small number of good features, and therefore our algorithms will spend less cputime when the unseen samples need to be tested. This is essentially an advantage of subspace learning methods over deep learning based and keypoints based ones.

Secondly, we conducted several comparative experiments to verify the effectiveness of our algorithms on the sub Extended Yale B face database. We randomly selected 7 images of each face for training, and the rest for testing. Other configurations are similar to the comparison experiments on subAR face database. The final results of comparative experiments are shown in Table 2.

From Table 2, we can see that our proposed algorithm has obtained higher accuracy with consuming relatively less cputime. However, the sift approach achieves a lower accuracy and consumes the longest cputime. LeNet-5 and Kernel PCA obtained the low accuracy and it may reveal that the supervised LeNet-5 and the kernel based kernel PCA approaches do not perform well when faced with heavy illumination variation.

4.3. Experiments and Results on Unsupervised Linear Subspace Learning

In this subsection, we will further demonstrate a certain degree of robustness of our proposed algorithms against partial occlusions, illumination variation, noise, geometric distortion, slight translation, and slight rotation on our constructed face databases compared with the original one.

4.3.1. Experiments and Results on the AR Face Database

First of all, we report the recognition rates on subAR face database. In this experiment, we conduct the Unsupervised Linear Subspace Learning over the features including the grayscale face image, grids of pHOG, and grids of HOG, denoted by , , and , respectively. We reveal the effectiveness of the improved EMD metric, pooling HOG operation, and the BSLPP. Then, compared with the experiments on subAR face database, we obtain the experimental results on noise, geometric distortion, slight translation, and slight rotation AR face databases.

The parameter “bins” plays an important role in the pooling operation on the “granularity” of cell over each block, so we explore the impact of the number of “bins” on Algorithms 1 and 2 on our subAR face database. The range of the number of “bins” is and the impact of the “bins” size on our sub AR face database is shown in Figure 8.

(a) The impact of number of “bins” for our Algorithm 1 over “F2” features

(b) The impact of number of “bins” for our Algorithm 2 over “F3” features

From Figure 8, we can see that the recognition rates with different numbers of “bins” are low. We hypothesize that it is the good performance of the improved EMD metric that leads to this result. And we selected the “bin” size of 12 in this experiment.

The recognition rates on subAR face database are shown in Table 3 and Figure 9. In Table 3, we compare three different algorithms, namely, Baseline, LPP, and Algorithm 1 (LPP_IEMD), where Baseline represents the nearest neighbor algorithm over the original “features” space. In particular, Algorithm 1 ()_nonpooled means that the nonpooling HOG with 192 bins (12 bins for the pHOG) is adopted to measure the dissimilarity between the two blocks for our Algorithm 1 over the original “” features.

From Table 3 and Figure 9, we can see that our Algorithm 1 achieves the highest recognition rates over “F1” and “F2” features (except for the group of over “F2” features). Among them, the dimensionality is just the corresponding one of the highest recognition rate in the 20 iterations for each group. This comparison experiment verifies the effectiveness of Algorithm 1. Succinctly, we just compare three algorithms including Baseline, LPP, and our Algorithm 1 on noise, geometric distortion, slight translation, and slight rotation AR face databases.

The experimental results on noise AR face database are shown in Table 4 and Figure 10(a). As we can see from Table 4 and Figure 10(a), for the noise AR face database, our Algorithm 1 achieves the best results over “F1” features. As for “F2” features, our Algorithm 1 obtains the best results for some of the experiments. It is worth noting that although our Algorithm 1 over “F2” features does not achieve the best results, we can speed up the recognition of the unseen samples which have a smaller dimensions (good features) with a lightly lower recognition rate.

(a) Noise

(b) Barrel distortion

(c) Complex geometric distortion

(d) Pincushion distortion

(e) Slight translation

(f) Slight rotation

The experimental results on barrel distortion AR face database are shown in Table 5 and Figure 10(b). As we can see from Table 5 and Figure 10(b), our Algorithm 1 achieves the best results over both “F1” and “F2” features. So, it shows that our Algorithm 1 is robust to the barrel distortion (the most common geometric distortion) to a certain degree.

The experimental results on complex geometric distortion AR face database are shown in Table 6 and Figure 10(c). As we can see from Table 6 and Figure 10(c), our Algorithm 1 achieves the best results over “F1” features. As for “F2” features, our Algorithm 1 achieves a lower recognition rate than the Baseline over “F2” features. For the complex geometric distortion, our Algorithm 1 over “F2” features may lose some discriminative information which may affect recognition rates to some extent. However, the advantage of our Algorithm 1 is that it can speed up face recognition with lower dimensionality.

The experimental results on pincushion distortion AR face database are shown in Table 7 and Figure 10(d). As we can see from Table 7 and Figure 10(d), our Algorithm 1 achieves the best results over “F1” features. As for “F2” features, our Algorithm 1 achieves a lower recognition rate than the Baseline over “F2” features. The advantage of our Algorithm 1 is that it can speed up face recognition with lower dimensionality, while the disadvantage is that our Algorithm 1 loses some discriminative information which can improve the recognition performance.

The experimental results on slight translation AR face database are shown in Table 8 and Figure 10(e). As we can see from Table 8 and Figure 10(e), our Algorithm 1 achieves the best results over both “F1” and “F2” features. So, it shows that our Algorithm 1 is robust to slight rotation to a certain degree.

The experimental results on slight rotation AR face database are shown in Table 9 and Figure 10(f). As we can see from Table 9 and Figure 10(f), our Algorithm 1 achieves the best results over both “F1” and “F2” features. So, it shows that our Algorithm 1 is robust to the slight rotation to a certain degree.

As shown in Tables 4, 6, and 7, our Algorithm 1 does not have an obvious advantage over “F2” features. “F3” features are the more robust ones, so in order to better learn the linear subspace, we adopt Algorithm 2 (BSLPP_IEMD) to conduct the Unsupervised Linear Subspace Learning over “F3” features. The experimental results on subAR are shown in Table 10 and Figure 11. In Table 10, we compare three different algorithms, namely, Baseline, LPP, and our Algorithm 2, where Baseline represents the nearest neighbor algorithm over the original “F3” features space. In particular, Algorithm 2 (F3)_ means that metric is adopted to measure the dissimilarity between the two blocks for our Algorithm 2 over the original “F3” features. Algorithm 2 (F3)_nonpooled means that the nonpooling HOG with 192 bins (12 bins for the pHOG) is adopted to measure the dissimilarity between the two blocks for our Algorithm 2 over the original “F3” features.

As one can see from Table 10 and Figure 11, Algorithm 2 achieves the highest recognition rates over “F3” features. This comparison experiment verifies the effectiveness of Algorithm 2. Succinctly, we just compare three algorithms including Baseline, LPP, Algorithm 2 (F3)_, and Algorithm 2 on noise, geometric distortion, slight translation, and slight rotation AR face databases.

As we can see from Tables 11–15 and Figure 12, our Algorithm 2 achieves the best results over “F3” features. It shows that our Algorithm 2 is robust to the noise, geometric distortion, slight translation, and slight rotation to a certain degree. And it can well validate the effectiveness of our algorithm. More importantly, our Algorithm 2 with much lower dimensionality will provide an effective guarantee for face recognition in terms of speed and accuracy.

(a) Noise

(b) Barrel distortion

(c) Complex geometric distortion

(d) Pincushion distortion

(e) Slight translation

(f) Slight rotation

The experimental results on noise AR face database over “F3” features are shown in Table 11 and Figure 12(a).

The experimental results on barrel distortion AR face database over “F3” features are shown in Table 12 and Figure 12(b).

The experimental results on barrel distortion AR face database over “F3” features are shown in Table 13 and Figure 12(c).

The experimental results on pincushion distortion AR face database over “F3” features are shown in Table 14 and Figure 12(d).

The experimental results on slight translation AR face database over “F3” features are shown in Table 15 and Figures 12(e) and 12(f).

4.3.2. Experiments and Results on the Extended Yale B

Similar to the experiments on AR face database, we get the experimental results on sub Extended Yale B, noise, geometric distortion, slight translation, and slight rotation Extended Yale B face databases. The recognition rates are shown in Tables 16–21 and Figure 13. In Tables 16–21, we compare three different algorithms, namely, Baseline, LPP, and our Algorithm 1.

(a) Sub

(b) Noise

(c) Barrel distortion

(d) Complex geometric distortion

(e) Pincushion distortion

(f) Slight translation

(g) Slight rotation

As we can see from Tables 16–21 and Figure 13, Algorithm 1 achieves the best results over “F1” features. It is worth noting that Algorithm 1 over “F1” features is even better than that over “F2” features. As for “F2” features, Algorithm 1 achieves the partial best results on noise, complex geometric distortion, slight translation, and slight rotation Extended Yale B face databases. So our conclusion is that Algorithm 1 over “F2” features is less effective than that over “F1” features in the case of suffering from heavily varying illumination.

The recognition rates on sub Extended Yale B face database are shown in Table 16 and Figure 13(a). The recognition rates on noise Extended Yale B face database are shown in Table 17 and Figure 13(b). The recognition rates on barrel distortion Extended Yale B face database are shown in Table 18 and Figure 13(c). The recognition rates on complex geometric distortion Extended Yale B face database are shown in Table 19 and Figure 13(d). The recognition rates on pincushion distortion Extended Yale B face database are shown in Table 20 and Figure 13(e). The recognition rates on slight translation and rotation Extended Yale B face databases are shown in Table 21 and Figures 13(f) and 13(g).

In order to better deal with the problem of heavily varying illumination and solve the serious lighting problem, we adopt Algorithm 2 to conduct the Unsupervised Linear Subspace Learning over “F3” features. The experimental results on sub, noise, geometric distortion, slight translation, and slight rotation Extended Yale B face databases are shown in Tables 22–27 and Figure 14 and the specific setting of the experiment is the same as that of Tables 16–21.

(a) Sub

(b) Noise

(c) Barrel distortion

(d) Complex geometric distortion

(e) Pincushion distortion

(f) Slight translation

(g) Slight rotation

As we can see from Tables 22–27 and Figures 14(a)–14(g), Algorithm 2 achieves the best results on slight translation, and slight rotation Extended Yale B over “F3” features, and the effectiveness of our algorithm against slight translation and rotation is well validated. Algorithm 2 (F3)_ achieves the partial best results on sub, noise, complex geometric distortion, and pincushion distortion Extended Yale B face databases. The Baseline method achieves the best results on sub (except for the group of over “F3” features) and barrel distortion Extended Yale B. In spite of slightly lower recognition results on sub and barrel distortion Extended Yale B, Algorithm 2 with much lower dimensionality will provide an effective guarantee for face recognition in terms of speed and accuracy.

5. Conclusions and Future Work

In this research, in order to reduce the computational complexity and improve robustness when calculating the improved EMD metric between numerous blocks and suffering from disturbances, we firstly carry out the pooling operation over each block to extract the pHOG features and then adopt the improved EMD metric instead of the metric as a dissimilarity measure to conduct Unsupervised Linear Subspace Learning, which has demonstrated a certain degree of robustness against partial occlusions, illumination variation, noise, geometric distortion, slight translation, slight rotation, and other disturbances. The experimental results on well-known databases confirm the effectiveness of our Unsupervised Linear Subspace Learning algorithms: Algorithm 1 (LPP_IEMD) and Algorithm 2 (BSLPP_IEMD).

Although our proposed algorithms achieve higher performance and demonstrate good robustness against some disturbances, there are still some limitations:

Although the improved EMD metric is a linear time algorithm with time complexity, compared with the metric, the training time of the model is longer. However, the model training process is offline, which still makes it acceptable.

Heavy illumination variation is really a challenging problem. Unfortunately, our algorithms do not show a distinct advantage as they suffer from heavy illumination variation and this is an issue for future investigation.

Our future work will focus on how to more effectively measure the dissimilarity between samples, such as further refining the improved EMD metric and how to better represent the neighborhood relationship between samples besides KNN. To determine the weight of subadjacency graph by the adaptive weight learning method is another concern. Finally, we will also be committed to extract more robust features and make further improvement for the HOG algorithm.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the Science and Technology Developing Project of Jilin Province, China (Grant no. 20150204007GX), and the Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education.

References

G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
S. Bu, P. Han, Z. Liu, J. Han, and H. Lin, “Local deep feature learning framework for 3D shape,” Computers and Graphics, vol. 46, pp. 117–129, 2015.
View at: Publisher Site | Google Scholar
G. Hinton, “A practical guide to training restricted boltzmann machines,” Momentum, vol. 7700, pp. 599–619, 2012.
View at: Publisher Site | Google Scholar
Y. Bengio, AC. Courville, and P. Vincent, “Unsupervised feature learning and deep learning,” in A review and new perspectives, CoRR, 2012.
View at: Google Scholar
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems, pp. 91–99, 2015.
View at: Google Scholar
L. Tóth and T. Grósz, “A comparison of deep neural network training methods for large vocabulary speech recognition,” in International Conference on Text, Speech and Dialogue, vol. 2013, pp. 36–43, Springer.
View at: Publisher Site | Google Scholar
G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” American Association for the Advancement of Science: Science, vol. 313, no. 5786, pp. 504–507, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103, ACM, July 2008.
View at: Google Scholar
J. Bouvrie, “Notes on Convolutional Neural Networks,” Neural Nets, 2006.
View at: Google Scholar
Y. LeCun, B. Boser E, J. Denker S et al., “Handwritten digit recognition with a back-propagation network,” in Advances in Neural Information Processing Systems, pp. 396–404, in, 1990.
View at: Google Scholar
matlab code: https://www.amolgmahurkar.com/ classifySTLusingCNN.
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. A. Manzagol, “Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion,” Journal of Machine Learning Research, vol. 11, pp. 3371–3408, 2010.
View at: Google Scholar | MathSciNet
P. Baldi, “unsupervised learning, and deep architectures,” in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 37–49, 2012.
View at: Google Scholar
A. Coates, A. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 215–223, 2011.
View at: Google Scholar
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
View at: Publisher Site | Google Scholar
H. Bay, T. Tuytelaars, and L. van Gool, “Speeded up robust features,” in European Conference on Computer Vision, pp. 404–417, Springer, 2006.
View at: Publisher Site | Google Scholar
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), vol. 1, pp. 886–893, June 2005.
View at: Publisher Site | Google Scholar
X. He and P. Niyogi, “Locality preserving projections,” Advances in Neural Information Processing Systems, pp. 153–160, 2004.
View at: Google Scholar
X. He, D. Cai, S. Yan, and H. Zhang, “Neighborhood preserving embedding,” in Tenth IEEE International Conference on Computer Vision, pp. 1208–1213, Beijing, China, October 2005.
View at: Publisher Site | Google Scholar
A. Iosifidis, A. Tefas, and I. Pitas, “On the optimal class representation in linear discriminant analysis,” IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 9, pp. 1491–1497, 2013.
View at: Publisher Site | Google Scholar
G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “Subspace learning from image gradient orientations,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 12, pp. 2454–2466, 2012.
View at: Publisher Site | Google Scholar
D. Cai, X. He, Y. Hu, J. Han, and T. Huang, “Learning a spatially smooth subspace for face recognition,” in Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07, USA, June 2007.
View at: Publisher Site | Google Scholar
F. Liu and X. Liu, “Locality enhanced spectral embedding and spatially smooth spectral regression for face recognition,” in Proceedings of the 2012 IEEE International Conference on Information and Automation, ICIA 2012, pp. 299–303, China, June 2012.
View at: Publisher Site | Google Scholar
L. Zhang, P. Zhu, Q. Hu, and D. Zhang, “A linear subspace learning approach via sparse coding,” in Proceedings of the 2011 IEEE International Conference on Computer Vision, ICCV 2011, pp. 755–761, November 2011.
View at: Publisher Site | Google Scholar
A. Iosifidis, A. Tefas, and I. Pitas, “Class-specific reference discriminant analysis with application in human behavior analysis,” IEEE Transactions on Human-Machine Systems, vol. 45, no. 3, pp. 315–326, 2015.
View at: Publisher Site | Google Scholar
J. Li, W. Hao, and X. Zhang, “Learning kernel subspace for face recognition,” Neurocomputing, vol. 151, no. 3, pp. 1187–1197, 2015.
View at: Publisher Site | Google Scholar
A. Iosifidis, A. Tefas, and I. Pitas, “Kernel reference discriminant analysis,” Pattern Recognition Letters, vol. 49, pp. 85–91, 2014.
View at: Publisher Site | Google Scholar
S. Zafeiriou, G. Tzimiropoulos, M. Petrou, and T. Stathaki, “Regularized kernel discriminant analysis with a robust kernel for face recognition and verification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 3, pp. 526–534, 2012.
View at: Publisher Site | Google Scholar
X. Jiang, “Linear subspace learning-based dimensionality reduction,” IEEE Signal Processing Magazine, vol. 28, no. 2, pp. 16–26, 2011.
View at: Publisher Site | Google Scholar
J. Weng, P. Cohen, and M. Herniou, “Camera calibration with distortion models and accuracy evaluation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 10, pp. 965–980, 1992.
View at: Publisher Site | Google Scholar
Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330–1334, 2000.
View at: Publisher Site | Google Scholar
Y. Rubner, C. Tomasi, and LJ. Guibas, “The earth movers distance as a metric for image retrieval,” International Journal of Computer Vision, vol. 40, pp. 99–121, 2000.
View at: Google Scholar
O. Pele and M. Werman, “A Linear Time Histogram Metric for Improved SIFT Matching,” Computer Vision – ECCV, vol. 5304, pp. 495–508, 2008.
View at: Publisher Site | Google Scholar
Q. Zhu, S. Avidan, M.-C. Yeh, and K.-T. Cheng, “Fast human detection using a cascade of histograms of oriented gradients,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '06), vol. 2, pp. 1491–1498, IEEE, June 2006.
View at: Publisher Site | Google Scholar
WT. Freeman and M. Roth, “Orientation histograms for hand gesture recognition,” in Proceedings of the International Workshop on Automatic Face And Gesture Recognition, vol. 12, pp. 296–301, 1995.
View at: Google Scholar
A. J. Newell and L. D. Griffin, “Multiscale histogram of oriented gradient descriptors for robust character recognition,” in Proceedings of the 11th International Conference on Document Analysis and Recognition, ICDAR 2011, pp. 1085–1089, China, September 2011.
View at: Publisher Site | Google Scholar
D. Monzo, A. Albiol, J. Sastre, and A. Albiol, “Hog-EBGM vs. gabor-EBGM,” in Proceedings of the 2008 IEEE International Conference on Image Processing, ICIP 2008, pp. 1636–1639, USA, October 2008.
View at: Publisher Site | Google Scholar
O. Dniz, G. Bueno, J. Salido, and F. D. L. Torre, “Face recognition using histograms of oriented gradients,” in Pattern Recognition Letters, vol. 32, pp. 1598–1603, 2011.
View at: Google Scholar
M. Yin, Y. Guo, and J. Gao, “Linear Subspace Learning via sparse dimension reduction,” in Proceedings of the 2014 International Joint Conference on Neural Networks, IJCNN 2014, pp. 3540–3547, China, July 2014.
View at: Publisher Site | Google Scholar
R. M. Martins, D. B. Coimbra, R. Minghim, and A. C. Telea, “Visual analysis of dimensionality reduction quality for parameterized projections,” Computers and Graphics, vol. 41, no. 1, pp. 26–42, 2014.
View at: Publisher Site | Google Scholar
X.-Y. Jing, S. Li, D. Zhang, J. Yang, and J.-Y. Yang, “Supervised and unsupervised parallel subspace learning for large-scale image recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 10, pp. 1497–1511, 2012.
View at: Publisher Site | Google Scholar
D. Zhou and C. Zhang, “Semi-supervised learning using random subspace based linear embedding repulsion graph,” in Proceedings of 31st Chinese Control Conference(CCC), pp. 3676–3680, 2012.
View at: Google Scholar
D. Huang, M. Storer, F. De La Torre, and H. Bischof, “Supervised local subspace learning for continuous head pose estimation,” in Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, pp. 2921–2928, USA, June 2011.
View at: Publisher Site | Google Scholar
D. Cai, X. He, and J. Han, “Semi-supervised discriminant analysis,” in Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV '07), pp. 1–7, Rio de Janeiro, Brazil, October 2007.
View at: Publisher Site | Google Scholar
D. Cai, X. He, K. Zhou, J. Han, and H. Bao, “Locality sensitive discriminant analysis,” Proceedings of IJCAI, vol. 2007, p. 1713, 2007.
View at: Google Scholar
S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: a general framework for dimensionality reduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40–51, 2007.
View at: Publisher Site | Google Scholar
M. Wan, Z. Lai, J. Shao, and Z. Jin, “Two-dimensional local graph embedding discriminant analysis (2DLGEDA) with its application to face and palm biometrics,” Neurocomputing, vol. 73, no. 1-3, pp. 197–203, 2009.
View at: Publisher Site | Google Scholar
A. Iosifidis, A. Tefas, and I. Pitas, “Graph embedded extreme learning machine,” IEEE Transactions on Cybernetics, vol. 46, no. 1, pp. 311–324, 2016.
View at: Publisher Site | Google Scholar
J. He and Y. Zhu, “Hierarchical Multi-task Learning with Application to Wafer Quality Prediction,” in Proceedings of the 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 290–298, Brussels, Belgium, December 2012.
View at: Publisher Site | Google Scholar
B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2018 Xiangchun Yu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

931

Downloads

1334

Citations

Complexity

An Improved EMD-Based Dissimilarity Metric for Unsupervised Linear Subspace Learning

Abstract

1. Introduction

2. Related Work

3. Unsupervised Linear Subspace Learning Based on the Improved EMD

3.1. Locality Preserving Projections (LPP)

3.2. The Improved EMD Metric for LPP

3.3. The Improved EMD Metric for Block Structure LPP

4. Experiments and Results

4.1. Face Databases

4.1.1. The AR Face Database

4.1.2. The Extended Yale B Face Database

4.2. Comparison of Experiments with Other Approaches

4.3. Experiments and Results on Unsupervised Linear Subspace Learning

4.3.1. Experiments and Results on the AR Face Database

4.3.2. Experiments and Results on the Extended Yale B

5. Conclusions and Future Work

Conflicts of Interest

Acknowledgments

References

Copyright