Journal of Electrical and Computer Engineering

Volume 2018 (2018), Article ID 2179049, 11 pages

https://doi.org/10.1155/2018/2179049

## Laplace Graph Embedding Class Specific Dictionary Learning for Face Recognition

College of Information and Control Engineering, China University of Petroleum (East China), Qingdao, China

Correspondence should be addressed to Yan-Jiang Wang

Received 23 September 2017; Revised 2 December 2017; Accepted 6 December 2017; Published 7 February 2018

Academic Editor: Tongliang Liu

Copyright © 2018 Li Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The sparse representation based classification (SRC) method and collaborative representation based classification (CRC) method have attracted more and more attention in recent years due to their promising results and robustness. However, both SRC and CRC algorithms directly use the training samples as the dictionary, which leads to a large fitting error. In this paper, we propose the Laplace graph embedding class specific dictionary learning (LGECSDL) algorithm, which trains a weight matrix and embeds a Laplace graph to reconstruct the dictionary. Firstly, it can increase the dimension of the dictionary matrix, which can be used to classify the small sample database. Secondly, it gives different dictionary atoms with different weights to improve classification accuracy. Additionally, in each class dictionary training process, the LGECSDL algorithm introduces the Laplace graph embedding method to the objective function in order to keep the local structure of each class, and the proposed method is capable of improving the performance of face recognition according to the class specific dictionary learning and Laplace graph embedding regularizer. Moreover, we also extend the proposed method to an arbitrary kernel space. Extensive experimental results on several face recognition benchmark databases demonstrate the superior performance of our proposed algorithm.

#### 1. Introduction

The sparse representation algorithm based on dictionary learning (dictionary learning for sparse representation) is attracting more and more attention in computer vision due to its impressive performance in many applications, such as image processing, image ranking [1], human activity recognition [2], and image classification [3, 4]. Different from the traditional subspace methods, such as PCA, the sparse representation algorithm allows the bases of a dictionary to be much larger than the dimension of the sample characteristics, so the sample can be fitted more effectively.

We know that deep learning-based methods are currently the mainstream methods in image classification. Taigman et al. [5] proposed a DeepFace neural network for face recognition, which has achieved human-level performance. Ding and Tao [6] proposed a comprehensive framework based on Convolutional Neural Networks to overcome challenges in video-based face recognition. Florian Schroff et al. [7] proposed a FaceNet which can learn the mapping from face images to a compact Euclidean space. Sun et al. [8] proposed a DeepID2+ convolutional network which increases the dimension of hidden representations and adds supervision to early convolutional layers. Liu et al. [9] proposed a multipatch deep CNN and deep metric learning method to extract discriminative features for face recognition. However, the depth learning method performed well when the sample size was large, and the effect was not satisfactory under the condition of a small database. Therefore, we propose a dictionary learning method based on Laplacian embedding and sparse representation, which can still achieve good results in the case of very small samples.

The sparse representation based classifier has been widely used in the field of face recognition. Normally, classifying the samples involves two stages: first, obtain the sample feature, and then the sample feature can be sent to the classifier for classification. In the process of feature extraction, many subspace methods are proposed. The principal component analysis method was proposed to reduce a complex data set to a lower dimension to reveal the hidden, simplified data structures [10]. The linear discriminant analysis (LDA) algorithm was proposed to find the projection hyperplane that minimizes the interclass variance and maximizes the distance between the projected means of the classes [11]. Tao et al. [12] proposed a general tensor discriminant analysis method as a preprocessing step for the LDA algorithm to reduce the undersampling problem. The locality preserving projection algorithm was proposed to preserve the neighborhood structure of the data [13]. In the procedure of classification, Liu et al. [14] proposed a new belief-based -nearest neighbor classifier to make the classification result more robust to misclassification errors. Noh et al. [15] proposed a nearest neighbor algorithm to enhance the performance of the nearest neighbor classification by learning a local metric. Although the -nearest neighbor classifier and the nearest neighbor classifier have achieved good results on some data sets, they did not select the most discriminatory feature of the sample to classify. So, subspace-based classifier design methods were proposed to improve the classification effect.

The sparse representation based classification algorithm uses training samples to construct an overcomplete dictionary, and the test samples can be well represented as a sparse linear combination of elements from the dictionary [16]. But the subsequent research shows that sparseness cannot extract the most discriminatory features of the samples. Collaborative representation based classification (CRC) was proposed, which uses the L2-norm constraint to reveal the internal structure of the testing sample [17]. Although the SRC and CRC methods have achieved superior performance in visual recognition, both SRC and CRC algorithms directly use the training samples as the dictionary matrix. The direct use of training samples to build dictionaries can lead to two drawbacks: first, very few samples to build an overcomplete dictionary, which may result in low classification accuracy, and second, very redundant dictionary samples, which prevent the original signals from being effectively expressed, resulting in poor classifier performance.

So, dictionary learning methods are proposed to improve the classification effect. Discriminative dictionary learning approaches can be divided into three types: shared dictionary learning, class specific dictionary learning, and hybrid dictionary learning. The shared dictionary learning method usually uses all training samples to obtain a classification dictionary. Lu et al. [18] proposed a locality weighted sparse representation based classification (WSRC) method which utilizes both data locality and linearity to train a classification dictionary. Yang et al. [19] proposed a novel dictionary learning method based on the Fisher discrimination criterion to improve the pattern classification performance. Yang et al. [20] proposed a latent dictionary learning method to learn a discriminative dictionary and build its relationship to class labels adaptively. Jiang et al. [21] proposed an algorithm to learn a single overcomplete dictionary and an optimal linear classifier for face recognition. Zhou et al. [22] presented a dictionary learning algorithm to exploit the visual correlation within a group of visually similar object categories for dictionary learning where a commonly shared dictionary and multiple category-specific dictionaries are accordingly modeled. The class specific dictionary learning method trained a dictionary for each class of samples. Sun et al. [23] learned a class specific subdictionary for each class and a common subdictionary shared by all classes to improve the classification performance. Wang and Kong [24] proposed a method to explicitly learn a class specific dictionary for each category, which captures the most discriminative features of this category, and simultaneously learn a common pattern pool, whose atoms are shared by all the categories and only contribute to representation of the data rather than discrimination.

The hybrid dictionary learning method is the combination of the above two methods. Rodriguez and Sapiro [25] proposed a new dictionary learning method which uses a class-dependent supervised constraint and orthogonal constraint; this method learns the intraclass structure while increasing the interclass discrimination and expands the difference between classes. Gao et al. [26] learned a category-specific dictionary for each category and a shared dictionary for all the categories, and this method improves conventional basic-level object categorization. Liu et al. [27] proposed a locality sensitive dictionary learning algorithm with global consistency and smoothness constraint to overcome the restriction of linearity at a relatively low cost. Although the hybrid dictionary learning method achieved good results, these methods usually operate in the original Euclidean space, which cannot capture nonlinear structures hidden in data. So, many kernel-based classifiers are designed to solve this problem. Nguyen et al. [28] presented a dictionary learning method for sparse representation based on the kernel method. Liu et al. [29] proposed a multiple-view self-explanatory sparse representation dictionary learning algorithm (MSSR) to capture and combine various salient regions and structures from different kernel spaces, and this method achieved superior performance in the field of face recognition.

As better effects had been achieved by MSSR algorithm, this algorithm neither took into consideration the details of training samples in the original sample space nor protected this powerful information conducive to classification in the dictionary space. Therefore, in this algorithm, the Laplace constraint is added to the objective function to make the closely similar samples in low dimensional space also very close in the high dimensional dictionary space.

Motivated by this, we proposed a Laplace graph embedding class specific dictionary learning algorithm and extended this method to arbitrary kernel space. The main contribution is listed in four aspects. We propose a Laplace embedding sparse representation algorithm. It combines the advantages of SRC’s discriminant ability and maintains the intrinsic local geometric feature of the sample features by Laplace embedding. We propose a Laplace embedding constraint dictionary learning algorithm to construct superior subspace and reduce the residual error. We extend this algorithm to arbitrary kernel space to find the nonlinear structure of face images. Experimental results on several benchmark databases demonstrate the superior performance of our proposed algorithm.

The rest of the paper is organized as follows. Section 2 overviews the three classical face recognition algorithms. Section 3 proposes our Laplace graph embedding class specific dictionary learning algorithm with kernels. The solution to the minimization of the objective function is elaborated in Section 4. Then, experimental results and analysis are shown in Section 5. Finally, discussions and conclusions are drawn in Section 6.

#### 2. Overview of SRC and CRC

In this section, we will briefly overview two classical face recognition algorithms, SRC and CRC.

Suppose that there are classes in the training samples and each class has elements. , where represents the total number of training samples; represents all the training samples, , where ; represents the dimension of the sample features; represents the th class of the training samples. Supposing that is a test sample and , the sparse representation of sample can be expressed aswhere is the sparse coding of sample in the th dictionary and is the regularization parameter in formula (1), which is used to control the sparsity and accuracy of the expression.

The collaborative representation based classification algorithm applies L2-norm constraint on the object function; the objective of the CRC algorithm can be rewritten as follows:where is the regularization parameter to control the expression accuracy of the object function.

Both SRC and CRC methods directly use the training samples as the dictionary. And each base in the dictionary has the same contribution to the sample expression. The testing sample can be encoded asHere, is the dictionary matrix composed of the th class training samples, and is the sparse coding of .

Directly using the training samples as the dictionary leads to high residual error. Liu et al. [29] proposed a single-view self-explanatory sparse representation dictionary learning algorithm (SSSR). Supposing that represents the class number of the training samples and means the collection of sample characteristics of class , the objective function of the SSSR method can be formulated aswhere is the sparse codes of the th class and represents the th column of . The SSSR algorithm reconstructed the dictionary matrix, is the dictionary weight matrix, , and is the number of the th classes. expands the original dictionary space into a more complete dictionary space; when the identity matrix appears, the class specific dictionary learning algorithm evolves into the SRC method. The existence of matrix makes dictionary learning more flexible in the process of expression, and the reconstruction error may be reduced as well.

Meanwhile, Liu et al. [29] extended the SSSR algorithm into kernel spaces, which can map the original sample features into a high dimensional nonlinear space for better mining of nonlinear relationships between samples. The objective function of the multiple-view kernel-based class specific dictionary learning algorithm (KCSDL) is shown as follows:where means the kernel function; it maps the original feature space into a high dimensional kernel space.

#### 3. Our Proposed Approach

Although the above methods have achieved good results in the field of face recognition, there are still some deficiencies. The SSSR algorithm uses a reconstructed dictionary matrix to make sparse representation on samples; however, it does not take into account the fact that only the sparsity constraint on the target is not necessary to gain results for better classification.

Motivated by this, we have proposed the sparse representation algorithm based on Laplace graph embedding, while taking into account the sparse representation on the samples; this algorithm mines the details implicit in the training samples; therefore, the same sample is more concentrated in the sparse expression space, so as to reduce the fitting error and improve the classification effect.

The objective function of our proposed sparse representation algorithm based on Laplace graph embedding now becomeswhere means class training samples, means dictionary weight matrix, means the dictionary representation of class samples, and represents the th column of .

#### 4. Optimization of the Objective Function

In this section, we focus on solving the optimization problem for the proposed Laplace graph embedding class specific dictionary learning algorithm. The dictionary weight matrix and sparse representation matrix can be optimized by iterative approaches.

When each element in the matrix is updated, the remaining elements in matrix and matrix are fixed; at this time, the objective function is changed into an L2-norm constrained least-squares minimization subproblem. Similarly, when each element in matrix is updated, matrix and the remaining elements in matrix are fixed. The objective function can be seen as an L1-norm constrained least-squares minimization subproblem.

##### 4.1. L1-Norm Regularized Minimization Subproblem

When updating the elements in matrix, the nonupdated elements in and matrix will be fixed. Here, the objective function can be formulated aswhere is the weight value which describes the neighboring degree of and and . and are training samples that belong to the th class, and is a constant which controls the range of . Formula (7) can be simplified aswhere , , and matrix is the weight matrix expressing the sample neighboring distance. means the kernel function of the sample, and is calculated prior to dictionary updating. .

In this algorithm, each element in is updated sequentially; when is updated, the other elements in the matrix are regarded as constants. After ignoring the constant term of formula (8), formula (8) can be simplified as

According to the solving method in [29], it is easy to obtain the solution of the minimum value of under the current iteration condition:where and .

##### 4.2. Norm Constrained Minimization Subproblem

When updating the dictionary matrix , and the nonupdated elements in matrix will be fixed. The objective function can be transformed into the following form:

The Lagrange multiplier method is used to optimize the above problems; then, the objective function can be reduced to

Here, is a variable. Meanwhile, the algorithm uses Karush-Kuhn-Tucker (KKT) conditions to optimize the objective function, and Karush-Kuhn-Tucker (KKT) conditions meet the following three criteria:

Hence, the solution to becomeswhere and .

#### 5. Experimental Results

In this section, we present experimental results on five benchmark databases to illustrate the effectiveness of our method. We compare the Laplace graph embedding class specific dictionary learning algorithm (LGECSDL) with some state-of-the-art methods. In the following section, we introduce the experimental environment setting, database descriptions, and experimental results. In the end, we accordingly analyze the experimental results.

##### 5.1. Experimental Settings

In this section, we evaluate our method on five benchmark databases. The proposed LGECSDL algorithm is compared with another seven classical face recognition algorithms: nearest neighbor (NN) classification, collaborative representation based classification (CRC) [30], sparse representation based classification (SRC) [31], kernel-based probabilistic collaborative representation based classifier (ProKCRC) [32], VGG19 [33], kernel-based class specific dictionary learning (KCSDL) algorithm [29], and SVM [34].

There are two parameters in the objective function of the LGECSDL algorithm that need to be specified. is an important parameter in the LGECSDL algorithm which is used to adjust the trade-off between the reconstruction error and the sparsity. We increase from to in each experiment and find the best in our experiments.

is another important factor in the LGECSDL algorithm. is used to control the trade-off between the reconstruction error and the collaborative information. We increase from to and find the best in all of our experiments.

We also evaluate the effect of different kernels for the LGECSDL algorithm. Three different kernel functions are adopted: as linear kernel , Hellinger kernel , and polynomial kernel . Here, in our experiments, and are set to be and , respectively.

##### 5.2. Database Descriptions

There are five image databases involved in our experiments. The first one is the Extended YaleB database, which contains 38 categories and 2414 frontal-face images. All the images are captured under varying illumination conditions. In our experiments, the image has been cropped and normalized to pixels. Figure 1 shows several example images in the Extended YaleB database.