Abstract

The sparse representation based classification (SRC) method and collaborative representation based classification (CRC) method have attracted more and more attention in recent years due to their promising results and robustness. However, both SRC and CRC algorithms directly use the training samples as the dictionary, which leads to a large fitting error. In this paper, we propose the Laplace graph embedding class specific dictionary learning (LGECSDL) algorithm, which trains a weight matrix and embeds a Laplace graph to reconstruct the dictionary. Firstly, it can increase the dimension of the dictionary matrix, which can be used to classify the small sample database. Secondly, it gives different dictionary atoms with different weights to improve classification accuracy. Additionally, in each class dictionary training process, the LGECSDL algorithm introduces the Laplace graph embedding method to the objective function in order to keep the local structure of each class, and the proposed method is capable of improving the performance of face recognition according to the class specific dictionary learning and Laplace graph embedding regularizer. Moreover, we also extend the proposed method to an arbitrary kernel space. Extensive experimental results on several face recognition benchmark databases demonstrate the superior performance of our proposed algorithm.

1. Introduction

The sparse representation algorithm based on dictionary learning (dictionary learning for sparse representation) is attracting more and more attention in computer vision due to its impressive performance in many applications, such as image processing, image ranking [1], human activity recognition [2], and image classification [3, 4]. Different from the traditional subspace methods, such as PCA, the sparse representation algorithm allows the bases of a dictionary to be much larger than the dimension of the sample characteristics, so the sample can be fitted more effectively.

We know that deep learning-based methods are currently the mainstream methods in image classification. Taigman et al. [5] proposed a DeepFace neural network for face recognition, which has achieved human-level performance. Ding and Tao [6] proposed a comprehensive framework based on Convolutional Neural Networks to overcome challenges in video-based face recognition. Florian Schroff et al. [7] proposed a FaceNet which can learn the mapping from face images to a compact Euclidean space. Sun et al. [8] proposed a DeepID2+ convolutional network which increases the dimension of hidden representations and adds supervision to early convolutional layers. Liu et al. [9] proposed a multipatch deep CNN and deep metric learning method to extract discriminative features for face recognition. However, the depth learning method performed well when the sample size was large, and the effect was not satisfactory under the condition of a small database. Therefore, we propose a dictionary learning method based on Laplacian embedding and sparse representation, which can still achieve good results in the case of very small samples.

The sparse representation based classifier has been widely used in the field of face recognition. Normally, classifying the samples involves two stages: first, obtain the sample feature, and then the sample feature can be sent to the classifier for classification. In the process of feature extraction, many subspace methods are proposed. The principal component analysis method was proposed to reduce a complex data set to a lower dimension to reveal the hidden, simplified data structures [10]. The linear discriminant analysis (LDA) algorithm was proposed to find the projection hyperplane that minimizes the interclass variance and maximizes the distance between the projected means of the classes [11]. Tao et al. [12] proposed a general tensor discriminant analysis method as a preprocessing step for the LDA algorithm to reduce the undersampling problem. The locality preserving projection algorithm was proposed to preserve the neighborhood structure of the data [13]. In the procedure of classification, Liu et al. [14] proposed a new belief-based -nearest neighbor classifier to make the classification result more robust to misclassification errors. Noh et al. [15] proposed a nearest neighbor algorithm to enhance the performance of the nearest neighbor classification by learning a local metric. Although the -nearest neighbor classifier and the nearest neighbor classifier have achieved good results on some data sets, they did not select the most discriminatory feature of the sample to classify. So, subspace-based classifier design methods were proposed to improve the classification effect.

The sparse representation based classification algorithm uses training samples to construct an overcomplete dictionary, and the test samples can be well represented as a sparse linear combination of elements from the dictionary [16]. But the subsequent research shows that sparseness cannot extract the most discriminatory features of the samples. Collaborative representation based classification (CRC) was proposed, which uses the L2-norm constraint to reveal the internal structure of the testing sample [17]. Although the SRC and CRC methods have achieved superior performance in visual recognition, both SRC and CRC algorithms directly use the training samples as the dictionary matrix. The direct use of training samples to build dictionaries can lead to two drawbacks: first, very few samples to build an overcomplete dictionary, which may result in low classification accuracy, and second, very redundant dictionary samples, which prevent the original signals from being effectively expressed, resulting in poor classifier performance.

So, dictionary learning methods are proposed to improve the classification effect. Discriminative dictionary learning approaches can be divided into three types: shared dictionary learning, class specific dictionary learning, and hybrid dictionary learning. The shared dictionary learning method usually uses all training samples to obtain a classification dictionary. Lu et al. [18] proposed a locality weighted sparse representation based classification (WSRC) method which utilizes both data locality and linearity to train a classification dictionary. Yang et al. [19] proposed a novel dictionary learning method based on the Fisher discrimination criterion to improve the pattern classification performance. Yang et al. [20] proposed a latent dictionary learning method to learn a discriminative dictionary and build its relationship to class labels adaptively. Jiang et al. [21] proposed an algorithm to learn a single overcomplete dictionary and an optimal linear classifier for face recognition. Zhou et al. [22] presented a dictionary learning algorithm to exploit the visual correlation within a group of visually similar object categories for dictionary learning where a commonly shared dictionary and multiple category-specific dictionaries are accordingly modeled. The class specific dictionary learning method trained a dictionary for each class of samples. Sun et al. [23] learned a class specific subdictionary for each class and a common subdictionary shared by all classes to improve the classification performance. Wang and Kong [24] proposed a method to explicitly learn a class specific dictionary for each category, which captures the most discriminative features of this category, and simultaneously learn a common pattern pool, whose atoms are shared by all the categories and only contribute to representation of the data rather than discrimination.

The hybrid dictionary learning method is the combination of the above two methods. Rodriguez and Sapiro [25] proposed a new dictionary learning method which uses a class-dependent supervised constraint and orthogonal constraint; this method learns the intraclass structure while increasing the interclass discrimination and expands the difference between classes. Gao et al. [26] learned a category-specific dictionary for each category and a shared dictionary for all the categories, and this method improves conventional basic-level object categorization. Liu et al. [27] proposed a locality sensitive dictionary learning algorithm with global consistency and smoothness constraint to overcome the restriction of linearity at a relatively low cost. Although the hybrid dictionary learning method achieved good results, these methods usually operate in the original Euclidean space, which cannot capture nonlinear structures hidden in data. So, many kernel-based classifiers are designed to solve this problem. Nguyen et al. [28] presented a dictionary learning method for sparse representation based on the kernel method. Liu et al. [29] proposed a multiple-view self-explanatory sparse representation dictionary learning algorithm (MSSR) to capture and combine various salient regions and structures from different kernel spaces, and this method achieved superior performance in the field of face recognition.

As better effects had been achieved by MSSR algorithm, this algorithm neither took into consideration the details of training samples in the original sample space nor protected this powerful information conducive to classification in the dictionary space. Therefore, in this algorithm, the Laplace constraint is added to the objective function to make the closely similar samples in low dimensional space also very close in the high dimensional dictionary space.

Motivated by this, we proposed a Laplace graph embedding class specific dictionary learning algorithm and extended this method to arbitrary kernel space. The main contribution is listed in four aspects. We propose a Laplace embedding sparse representation algorithm. It combines the advantages of SRC’s discriminant ability and maintains the intrinsic local geometric feature of the sample features by Laplace embedding. We propose a Laplace embedding constraint dictionary learning algorithm to construct superior subspace and reduce the residual error. We extend this algorithm to arbitrary kernel space to find the nonlinear structure of face images. Experimental results on several benchmark databases demonstrate the superior performance of our proposed algorithm.

The rest of the paper is organized as follows. Section 2 overviews the three classical face recognition algorithms. Section 3 proposes our Laplace graph embedding class specific dictionary learning algorithm with kernels. The solution to the minimization of the objective function is elaborated in Section 4. Then, experimental results and analysis are shown in Section 5. Finally, discussions and conclusions are drawn in Section 6.

2. Overview of SRC and CRC

In this section, we will briefly overview two classical face recognition algorithms, SRC and CRC.

Suppose that there are classes in the training samples and each class has elements. , where represents the total number of training samples; represents all the training samples, , where ; represents the dimension of the sample features; represents the th class of the training samples. Supposing that is a test sample and , the sparse representation of sample can be expressed aswhere is the sparse coding of sample in the th dictionary and is the regularization parameter in formula (1), which is used to control the sparsity and accuracy of the expression.

The collaborative representation based classification algorithm applies L2-norm constraint on the object function; the objective of the CRC algorithm can be rewritten as follows:where is the regularization parameter to control the expression accuracy of the object function.

Both SRC and CRC methods directly use the training samples as the dictionary. And each base in the dictionary has the same contribution to the sample expression. The testing sample can be encoded asHere, is the dictionary matrix composed of the th class training samples, and is the sparse coding of .

Directly using the training samples as the dictionary leads to high residual error. Liu et al. [29] proposed a single-view self-explanatory sparse representation dictionary learning algorithm (SSSR). Supposing that represents the class number of the training samples and means the collection of sample characteristics of class , the objective function of the SSSR method can be formulated aswhere is the sparse codes of the th class and represents the th column of . The SSSR algorithm reconstructed the dictionary matrix, is the dictionary weight matrix, , and is the number of the th classes. expands the original dictionary space into a more complete dictionary space; when the identity matrix appears, the class specific dictionary learning algorithm evolves into the SRC method. The existence of matrix makes dictionary learning more flexible in the process of expression, and the reconstruction error may be reduced as well.

Meanwhile, Liu et al. [29] extended the SSSR algorithm into kernel spaces, which can map the original sample features into a high dimensional nonlinear space for better mining of nonlinear relationships between samples. The objective function of the multiple-view kernel-based class specific dictionary learning algorithm (KCSDL) is shown as follows:where means the kernel function; it maps the original feature space into a high dimensional kernel space.

3. Our Proposed Approach

Although the above methods have achieved good results in the field of face recognition, there are still some deficiencies. The SSSR algorithm uses a reconstructed dictionary matrix to make sparse representation on samples; however, it does not take into account the fact that only the sparsity constraint on the target is not necessary to gain results for better classification.

Motivated by this, we have proposed the sparse representation algorithm based on Laplace graph embedding, while taking into account the sparse representation on the samples; this algorithm mines the details implicit in the training samples; therefore, the same sample is more concentrated in the sparse expression space, so as to reduce the fitting error and improve the classification effect.

The objective function of our proposed sparse representation algorithm based on Laplace graph embedding now becomeswhere means class training samples, means dictionary weight matrix, means the dictionary representation of class samples, and represents the th column of .

4. Optimization of the Objective Function

In this section, we focus on solving the optimization problem for the proposed Laplace graph embedding class specific dictionary learning algorithm. The dictionary weight matrix and sparse representation matrix can be optimized by iterative approaches.

When each element in the matrix is updated, the remaining elements in matrix and matrix are fixed; at this time, the objective function is changed into an L2-norm constrained least-squares minimization subproblem. Similarly, when each element in matrix is updated, matrix and the remaining elements in matrix are fixed. The objective function can be seen as an L1-norm constrained least-squares minimization subproblem.

4.1. L1-Norm Regularized Minimization Subproblem

When updating the elements in matrix, the nonupdated elements in and matrix will be fixed. Here, the objective function can be formulated aswhere is the weight value which describes the neighboring degree of and and . and are training samples that belong to the th class, and is a constant which controls the range of . Formula (7) can be simplified aswhere , , and matrix is the weight matrix expressing the sample neighboring distance. means the kernel function of the sample, and is calculated prior to dictionary updating. .

In this algorithm, each element in is updated sequentially; when is updated, the other elements in the matrix are regarded as constants. After ignoring the constant term of formula (8), formula (8) can be simplified as

According to the solving method in [29], it is easy to obtain the solution of the minimum value of under the current iteration condition:where and .

4.2. Norm Constrained Minimization Subproblem

When updating the dictionary matrix , and the nonupdated elements in matrix will be fixed. The objective function can be transformed into the following form:

The Lagrange multiplier method is used to optimize the above problems; then, the objective function can be reduced to

Here, is a variable. Meanwhile, the algorithm uses Karush-Kuhn-Tucker (KKT) conditions to optimize the objective function, and Karush-Kuhn-Tucker (KKT) conditions meet the following three criteria:

Hence, the solution to becomeswhere and .

5. Experimental Results

In this section, we present experimental results on five benchmark databases to illustrate the effectiveness of our method. We compare the Laplace graph embedding class specific dictionary learning algorithm (LGECSDL) with some state-of-the-art methods. In the following section, we introduce the experimental environment setting, database descriptions, and experimental results. In the end, we accordingly analyze the experimental results.

5.1. Experimental Settings

In this section, we evaluate our method on five benchmark databases. The proposed LGECSDL algorithm is compared with another seven classical face recognition algorithms: nearest neighbor (NN) classification, collaborative representation based classification (CRC) [30], sparse representation based classification (SRC) [31], kernel-based probabilistic collaborative representation based classifier (ProKCRC) [32], VGG19 [33], kernel-based class specific dictionary learning (KCSDL) algorithm [29], and SVM [34].

There are two parameters in the objective function of the LGECSDL algorithm that need to be specified. is an important parameter in the LGECSDL algorithm which is used to adjust the trade-off between the reconstruction error and the sparsity. We increase from to in each experiment and find the best in our experiments.

is another important factor in the LGECSDL algorithm. is used to control the trade-off between the reconstruction error and the collaborative information. We increase from to and find the best in all of our experiments.

We also evaluate the effect of different kernels for the LGECSDL algorithm. Three different kernel functions are adopted: as linear kernel , Hellinger kernel , and polynomial kernel . Here, in our experiments, and are set to be and , respectively.

5.2. Database Descriptions

There are five image databases involved in our experiments. The first one is the Extended YaleB database, which contains 38 categories and 2414 frontal-face images. All the images are captured under varying illumination conditions. In our experiments, the image has been cropped and normalized to pixels. Figure 1 shows several example images in the Extended YaleB database.

The second one is the AR database. The AR database contains over 3000 images of 126 individuals; images are shot under different conditions of expression, illumination, and occlusion, and each person has 26 images. Figure 3 shows some examples in the AR database.

The third database is the CMU-PIE database. The CMU-PIE database consists of 41368 pieces of pictures, which are captured under different lighting conditions, poses, and expressions. The database contains 68 individuals in total, and each person has 43 different kinds of images with 13 different poses. We selected two types of images to carry out our experiment: five near-frontal poses and all different illumination conditions. We chose 11,554 images in total for our evaluation. Each person has about 170 images. Figure 5 shows some example images in the CMU-PIE database.

We also selected the Caltech101 database to verify the LGECSDL algorithm. The Caltech101 database contains 9144 images belonging to 101 categories; each class has 31 to 800 images. We selected 5 images as training images in each class and the rest as test images. Figure 7 shows some examples in the Caltech101 database.

The fifth database is Oxford-102 flower database that contains 8,189 flower images belonging to 102 categories. Each image contains 40 to 250 images and the minimum edge length of the image is greater than 500 pixels. The Oxford-102 flower database contains pictures of flowers taken in different illumination, angle, and occlusion environments, and each kind of flower image has a high degree of similarity. In our experiments, all the images are manually cropped and resized to pixels. Figure 9 shows several images in the Oxford-102 flower database.

5.3. Experiments on the Extended YaleB Database

We randomly selected 5 images as the training samples in each category and 10 images as the testing samples. In our experiments, we set the weight of the sparsity term as , , and for the linear kernel, Hellinger kernel, and polynomial kernel, respectively. The optimal is , , and for the linear kernel, Hellinger kernel, and polynomial kernel, respectively. We independently performed all the methods ten times and then reported the average recognition rates. Table 1 shows the recognition rates of all the algorithms using different kernel methods.

From Table 1, we can clearly see that LGECSDL achieves the best recognition rates of , , and in the linear kernel, Hellinger kernel, and polynomial kernel space, respectively, while KCSDL, the second best method, arrives at , , and . Since illumination variations of images are relatively large in the Extended YaleB database, these experiment results validate the effectiveness and robustness of LGECSDL for image recognition with illumination variations. VGG19 neural network in this experiment can only achieve the highest recognition rate of . Using a small database to train neural networks does not take advantage of neural networks. We also verify the effect of and on the LGECSDL algorithm, and the experimental results are shown in Figure 2.

From Figure 2, we can easily know that the LGECSDL algorithm has achieved better recognition results in Hellinger kernel space. With the parameter varied from to , the recognition rate increased gradually and then decreased. The influence of parameter on the LGECSDL algorithm is similar to that of the parameter . The highest recognition rate was achieved when and in Hellinger kernel space. In the linear kernel space, the recognition rate achieves the maximum value at and , and in the polynomial kernel space, the maximum recognition rate was obtained at and .

5.4. AR Database

In this experiment, we randomly selected 5 images of each individual as training samples and the rest for testing. Each image has been cropped to 32 × 32 and pulled into a column vector; the image vectors have been performed by normalization. All the methods are independently run ten times, and the average recognition rates are reported. The recognition rate of AR database is shown in Table 2.

From Table 2, we can clearly see that LGECSDL algorithm outperforms the other methods in all kernel spaces. The LGECSDL algorithm achieves the best recognition rate of in the polynomial kernel space; in the linear kernel space and Hellinger kernel space, the recognition rates are and , respectively.

Moreover, we can also know from Table 2 that the KCSDL algorithm achieves the best recognition rate of in the linear kernel space, which is the highest one among the other methods. From these experimental results, we further confirm the effectiveness and robustness of the LGECSDL algorithm for image recognition with illumination variations and expression changes. We also verify the effect of and on the AR database; Figure 4 shows the experiment results.

From Figure 4, we can clearly see that the recognition rate reached the maximum value when is and is in Hellinger kernel space and polynomial kernel space; in the linear kernel space, the recognition rate achieves the highest value when is equal to and is . With changed from to , the recognition rate increased first and then decreased. The parameter shows a similar trend, and when is greater than , the recognition rate decreases rapidly.

5.5. CMU-PIE Database

In this experiment, we chose the CMU-PIE database to evaluate the performance of the LGECSDL algorithm. Five images of each individual are randomly selected for training and the remainder for testing. We also cropped each image to and then pulled them into a column vector. Finally, we normalized all the vectors by normalization. We independently ran all the methods ten times and then reported the average recognition rates. Table 3 gives the recognition rates of all the methods under different kernel spaces.

From Table 3, we can see that the LGECSDL algorithm always achieves the highest recognition rates under all different kernel spaces. In the polynomial kernel space, the LGECSDL algorithm outperforms KCSDL, which achieves the second highest recognition rate, by more than improvement of recognition rate. In Hellinger kernel space, the LGECSDL achieves the best recognition rate of and points higher than the KCSDL algorithm. In the linear kernel space, the face recognition rate of LGECSDL and KCSDL is and , respectively. From these experimental results, we confirm the effectiveness and robustness of the LGECSDL algorithm. We also evaluate the effect of and on the CMU-PIE database; Figure 6 shows the experiment results.

From Figure 6, we can see that when is and is , the face recognition rate reaches the highest value of in Hellinger kernel space, and when is and is , the highest face recognition rate is obtained in the polynomial kernel space. We can also know from Figure 6 that when is greater or less than the maximum value, the recognition rate decreases rapidly, and parameter also has the same effect on the face recognition rate.

5.6. Caltech101 Database

In this experiment, we further evaluate the performance of the LGECSDL algorithm for image recognition on the Caltech101 database. Figure 7 shows some examples in the Caltech101 database. We randomly split the Caltech101 database into two parts. Part one, which contains about 5 images of each subject, is used as a training set, and the other part is used as a testing set. We use the VGG_ILSVRC_19_layers model to obtain the features of each image. Here, we employ the second fully connected layer outputs as the input features whose dimension size is 4096. We independently ran all the methods ten times and then gave the average recognition rates of all the methods in Figure 8.

LGECSDL-H represents the LGECSDL algorithm in Hellinger kernel space, and LGECSDL-P represents the LGECSDL algorithm in the polynomial kernel space, similarly to the CSDL algorithm. From Figure 8, we can easily see that the LGECSDL-H algorithm achieves the highest recognition rate, and LGECSDL-P is the second one. More concretely, LGECSDL-H and LGECSDL-P achieve the recognition rates of and , respectively, while CSDL-P, the third best method, arrives at . Experimental results show that training the VGG19 network with a small database did not give the desired results. VGG19 network achieved the recognition rate of .

We also verified the computational time of each method in Caltech101 database. The experimental environment consists of the following: Core i7 CPU (2.4 GHz), 8 GB memory, Windows 7 operating system, and NVIDIA Quadro K2100M computer graphics processor. From Table 4, we can see that the VGG19 method achieves the best result. This is mainly due to the neural network GPU accelerated architecture that saves most of the computational time. The second is SVM, followed by CSDL-H algorithm. The LGECSDL-H algorithm needs 109.50 milliseconds to classify each picture, whereas the LGECSDL-P algorithm requires 112.25 milliseconds.

5.7. Oxford-102 Flower Database

In this experiment, we chose the Oxford-102 database to evaluate the performance of the LGECSDL algorithm in the case of image recognition with precise image classification. Five images of each individual are randomly selected for training, and the rest of the images are for testing. The image features are obtained from the outputs of a pretrained VGG_ILSVRC_19_layers network which contains five convolutional and three fully connected layers. Here, we use the second fully connected layer outputs as the input features whose dimension size is 4096. We independently ran all the methods ten times and then reported the average recognition rates. The best recognition rates of all the methods are presented in Figure 10.

From Figure 10, we can clearly know that the LGECSDL achieves the best recognition rates of and in polynomial kernel space and Hellinger kernel space, respectively, while CRC arrives at , which is the highest one among those of the other methods. LGECSDL-H and LGECSDL-P outperform VGG19, SVM, SRC, CRC, ProKCRC, CSDL-H, and CSDL-P by at least improvement of recognition rate. The experimental results show that, in the small database experiment, other methods have a higher recognition rate than the VGG19 neural network. The classical method has more advantages in the small sample base experiment.

We also verify the performance of the LGECSDL algorithm with different values of or in different kernel spaces on the Oxford-102 database. The performance with different values of or is reported in Figures 11 and 12.

From Figure 11, we can see that the LGECSDL algorithm achieves the maximum value when and ; when is fixed, the recognition rate increases firstly and then decreases with the increase of . Similarly, the recognition rate also increases firstly and then decreases with the increase of .

From Figure 12, we can see that the LGECSDL algorithm in the polynomial kernel space achieves the maximum recognition rate when and . In the polynomial kernel space, the influence of and on the algorithm is similar to that in Hellinger kernel space.

6. Conclusion

We present a novel Laplace graph embedding class specific dictionary learning algorithm with kernels. The proposed LGECSDL algorithm improves the classical classification algorithm threefold. First, it concisely combines the discriminant ability (sparse representation) to enhance the interpretability of face recognition. Second, it greatly reduces the residual error according to Laplace constraint dictionary learning. Third, it easily finds the nonlinear structure hidden in face images by extending the LGECSDL algorithm to arbitrary kernel space. Experimental results on several publicly available databases have demonstrated that LGECSDL can provide superior performance to the traditional face recognition approaches.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This paper is supported partly by the National Natural Science Foundation of China (Grants nos. 61402535 and 61271407), the Natural Science Foundation for Youths of Shandong Province, China (Grant no. ZR2014FQ001), the Fundamental Research Funds for the Central Universities, China University of Petroleum (East China) (Grant no. 16CX02060A), and the International S and T Cooperation Program of China (Grant no. 2015DFG12050).