Abstract

The problem of recognizing human faces from frontal views with varying illumination, occlusion, and disguise is a great challenge to pattern recognition. A general knowledge is that face patterns from an objective set sit on a linear subspace. On the proof of the knowledge, some methods use the linear combination to represent a sample in face recognition. In this paper, in order to get the more discriminant information of reconstruction error, we constrain both the linear combination coefficients and the reconstruction error by -minimization which is not apt to be disturbed by outliners. Then, through an equivalent transformation of the model, it is convenient to compute the parameters in a new underdetermined linear system. Next, we use an optimization method to get the approximate solution. As a result, the minimum reconstruction error has contained much valuable discriminating information. The gradient of this variable is measured to decide the final recognition. The experiments show that the recognition protocol based on the reconstruction error achieves high performance on available databases (Extended Yale B and AR Face database).

1. Introduction

Face recognition is an attractive field for researchers in computer vision and pattern recognition [14]. Different from other biometric techniques depending on cooperative subjects such as iris and fingerprint recognition, face images are usually obtained in unrestricted environments, which brings a serious challenge to face recognition methods. For example, as for the influence of illumination variations and occlusion, the appearances of a face complicate many problems in face recognition [5, 6]. Generally speaking, the decorations (such as sunglasses and scarves), and the shadows can be looked upon as different forms of occlusion which may lead to facial information losses [7]. Within the past decades, researchers made valuable contributions to the development of face recognition approaches under different illumination [812] and occlusion [13, 14], in which subspace learning as a class of important approaches is broadly adopted. Among those published articles related to subspace learning methods, two types of subspace learning methods, the global subspace learning (GSL) and the local subspace learning (LSL) (In the article, as we have known, GSL and LSL concepts are first proposed.), can be outlined. GSL method usually attains a subspace from all training samples, while LSL only forms one class of samples or a part of samples in different classes. For example, linear discriminant analysis (LDA) [15], marginal Fisher analysis (MFA) [16], principal component analysis (PCA) [17], maximum margin criterion (MMC) [18, 19], and independent component analysis (ICA) [20] fall in the category of GSL methods, and nearest subspace (NS) classification and common vector approach (CVA) [21] which only use class-specific samples as training samples belong to LSL methods. LDA, MFA, and MMC are known to show better effects in ideal conditions. Otherwise, the performance of PCA and ICA is robust for the situation of pixel corruption to a certain degree. Nevertheless, when faces are occluded partially, the performance of these approaches will be degraded seriously, since the extracted features are caused to be unreliable.

Recently, the linear representations methods are applied in face recognition, which achieved high performance [2225]. Sparse representation-based classification (SRC) [23], as one of the most representative approaches, will attain the best linear representation coefficients in a global sense, while another important method, linear regression-based classification (LRC) [22], tries to find linear representation coefficients in a local sense. The difference between SRC and LRC is that SRC uses all samples subspace, but LRC uses the sample subspace of each class to linearly represent a test sample. The subspaces in SRC and LRC are also different from above-mentioned subspace attained by subspace learning methods for only using original samples. If we look at linear representations method from a subspace perspective, linear representation method is also to find the best representation features in a global scope or a local one. In this sense, SRC can be treated as a GSL method and LRC as a LSL one. Usually, subspace learning methods will attain the subspace which benefits object recognition, such as the orthogonal subspace [15]. SRC only searches the sparse coefficients by using all training samples but not changing the original samples space, and it has demonstrated good potential in handling corruption and occlusion randomly in pixel and block. However, the method has been not robust to contiguous occlusion such as sunglasses and scarf [23]. Based on the working mechanism of SRC, Zhang et al. [25] recently proposed collaborative representation-based classification (CRC) approach. As well as SRC, LRC also does not change the original samples space. It is based on a model that samples from a specific object class lie on a linear subspace [15, 26] and uses class-specific training images to represent test image with only -norm constraint on fidelity term. The experiments on several databases have shown its efficacy and robustness to occlusion. In recent years, various learning methods are proposed, in which deep learning [27] shows its good performance in pattern recognition. However, other learning methods [28, 29] still devote their insight to this field.

In this paper, the constrained model based on the linear representations and class-specific concept is presented. From a subspace perspective, the method can be treated as a LSL one. And, in the model, both the linear combination coefficients and the minimum reconstruction error are restricted by -minimization. Usually, a reconstruction error which is generated from the true identity will reflect only intraclass difference. When it is attained by nontrue subjects, the reconstruction error will contain not only the intraclass difference but also much detailed information from the interclass difference. However, in a complicated environment, the reconstruction error may be affected by the pixel corruption caused by random noise, face expression, illumination variations, and occlusion. In our paper, the constrained reconstruction error can keep the similarity in the intraclass and the difference in the interclass as much as possible under complicated environment. And, we believe that the reconstruction error in the model includes much discriminating information which is of benefit to face recognition.

Our main advantages are listed as follows:(1)We propose a constrained linear model which is of benefit to face recognition under different degree of lighting and occlusion.(2)The objective function is transformed into a form which can be tackled by a convex optimization method.(3)In the approach, the assumption of explicit prior knowledge about the sources of light is not assigned, as well as the feature of regions corrupted and occluded.(4)Not only is our method used in face pattern recognition, but it can also be applied to other image-based object identification.

The rest of the paper is organized as follows. In Section 2, we show the analysis and motivation of our approach. Then, the proposed algorithm is shown in Section 3. In Section 4, further experiments show the robustness to occlusion and illumination in face recognition by comparison with some classic recognition techniques. Finally, Section 5 concludes the study.

2. Analysis and Motivation

In the study of face recognition, illumination variations, occlusion, and disguise have been challenging difficulties compromising the recognition performance. It has been proven that, for raw images, variations caused by illumination and partial occlusion are more significant than the inherent differences between individuals [30]. Shadows and partial occlusion can be considered as spatial errors in face images. In recent decades, there are numerous applications where the data under study can be represented by the linear representation methods from the class-specified samples or all classes ones, which naturally fit a linear representation model. So, when we have a number of face images of an individual, it is reasonable to believe that the linear representation methods have the advantage in removing these errors.

Figure 1 shows example images from the database of Extended Yale B [31, 32]. The resolution of every image is 30 × 28. We randomly choose 8 samples under varying illumination but no occlusion, which we stack as the columns of matrix . From Figure 2, It can be seen clearly that although LRC used class-specific samples to linearly represent probe face image, under the constraint of least square method, the linear representation coefficients may be overfit in the conditions of different illumination and occlusion [23]. Thus, the reconstruction error will be led to be far from the true value. Figures 2(a) and 2(b) show separately the difference of the reconstruction error based on same class samples (Figure 2(a), top level images) and other class (Figure 2(b), bottom level images) in LRC.

All coefficients of training samples are constrained sparsely to represent a probe sample in SRC. However, many linear representation coefficients from other classes samples will also unavoidably emerge. Figures 3(a) and 3(b) is the test face image and training faces in the AR Face database which will be introduced later. In Figure 4(b), left and mid images show that the reconstruction images are separately attained by the product of the class-specific samples with their linear representation coefficients and the product of the rest samples with their coefficients. In Figure 4(b), right image shows the difference between two reconstruction images. When the reconstruction error in a global sense is formed by all training samples, it is almost not of benefit to face recognition in the conditions of occlusion and illumination. So, the recognition protocol in SRC only uses the best linear representation coefficients.

In Figure 2, although the reconstruction error in LRC shows much difference when using the training samples to linearly represent a probe sample from same class and another class, the linear representation coefficients when representing different probe samples are not salient. A new insight into face recognition is that a reconstruction error is vital to face recognition under different occlusion and illumination. It inspired us to attain the accurate reconstruction error by designing a new method. The model is shown in formula (1). In Figure 4(c), it can be seen that the reconstruction error is obvious, in which the left image is from same class and the right one from another class. We can see that the texture of the right one is more obvious than the left, which expresses the gradient value with obvious texture being bigger than one with nonobvious texture. In Figure 5, candidates were sampled randomly from the 15th individual of Subset 1 for 50 times in Yale B and the average gradient values of the reconstruction error is highlighted in red. In the figure, the lowest gradient value among the 38 classes is attained by the proposed algorithm. In the experiments, the results show that if the test sample and the training samples are with the same label, the reconstruction error gradient value attained is much lower than the value when the test sample and the training samples are with the different label. And our method achieves 100 percent recognition rate in the first subset of the Extended Yale B database. In Figure 6, the reconstruction error and the linear representation coefficients are attained through solving the proposed model. From Figures 6(a), 6(b), 7(a), and 7(b), we can see that it is different between these two linear representation coefficients figures, in which one represents the probe image with the samples from the same class and the other represents the probe image with another class. In particular, from Figures 8 and 9, it is can be clearly seen that the variation scope of representation coefficients value in LRC is much smaller than the proposed approach. In Figure 9, we also observe that these two variation scopes of representation coefficients value in the proposed method are very big, in which one is from the same class representation and the other from other class.

Next, why can we attain the relatively accurate reconstruction error? The analysis is as follows.

(A) Linear Representation-Based Method. We first suppose the linear representation coefficients to be , the training samples’ matrix (in which, denotes the number of training samples and the dimension of each training sample), and the linear representation error and is the reconstruction sample. The general function can be represented as follows:

(B) SRC Method. The reconstruction sample of is composed of two parts in SRC, one is the component of the th class; another is from other classes. and are the sparse representation coefficients separately from the th class and other classes. denotes the global reconstruction error vector of SRC. So, can be linearly represented as follows:

In SRC, the reconstruction error is in which denotes all elements in a vector. So, the reconstruction error component is not only from the same class but from other classes.

(C) LRC Method. In LRC, the linear representation formula is in which is the gallery samples from the th class, is the linear representation coefficients on , and is the reconstruction sample.

For the least squares method that may lead to overfitting, in which is the same with the above, represents as an overfitting variable from the th class, and denotes the -norm.

(D) The Proposed Method. From (B) and (C), the above two methods will bring more or less information loss. In image-based recognition, -norm can be robustly represented as a probe sample from class-specific samples and is not easy to be affected by the outliers. Thus, the training samples that is class-specific (from its same class) will represent the test sample well. In Figure 6, the linear representation coefficients show that the values are with the sparse feature. The phenomenon is caused by the constraints of -norm which forces the probe sample to be represented by the most similar ones. And, through observing error variable restricted by -norm, we know that the reconstruction error also shows sparse feature when the probe sample is classified into its own class.

From above formulas (3) and (5), the error components and in SRC and LRC separately are not satisfied for further recognition. We argue that more accurate error component has much more discriminative power than the above approaches in the image-based recognition. Next, how to represent the error component as accurately as possible becomes the main problem placed in front of us.

In the proposed method, the reconstruction error is represented by its own training samples as accurately as possible, and we also enforce the -norm on it. To represent the error component as accurately as possible, the objective function is as follows: in which is the logical intersection, is the linear representation coefficients from the th class, is the reconstruction sample, and denotes the -norm. If the vector is attained when the probe face image is not from its same class, the elements in the vector will be not sparse and when the vector is transformed into a matrix, its gradient value will become bigger. And, when some outliers (the variation of a sample is big, such as different expression or illumination) mix into the same class, under the constraints of -norm, the sparse representation coefficients on the outliers will also approach zeros, and the probe sample will still be represented as accurately as possible in our method. From Figure 6, we can see that the coefficients become sparse, it implies that the probe image will be linearly represented with the higher similar degree samples. The above analysis clearly illustrates that our method has the stable performance.

3. Proposed Method

3.1. Algorithmic Description

Assume training samples for all individuals in a face dataset, and samples are available for each individual (actually, the number of training images for each identity might be different) with a resolution of pixels. indicates the number of training images of the th individuals. In this paper, each image is converted into a long vector by stacking its columns one by one; is a matrix with dimension (Algorithm 1). Then, the images in the subjects are arranged in the following matrix: in which is a matrix consisting of the training sample vectors. A test sample is represented by . Considering the following, optimization problem can be described as follows:in which is the logical intersection. In order to get the optimal solution, we transform the objective function into the following: in which represents identity matrix. Then, the problem becomesin which , and , denotes the training sample matrix of th class, and is an identity matrix which used to express potential occlusion. A part of , entries from to , represents linearly occlusion coefficients. And , , . The training set is composed of complete nonoccluded face images. Figures 5(a) and 6(a) show that a partially occluded face can be sparsely represented by the training samples from class-specified samples, and face data actually is distributed in low-dimensional subspace and satisfies the linear equation constraints. The difference of the linear representation coefficients in Figure 7 further verifies the above claims. Whether a probe image may be partially occluded or not, the error caused by occluded pixels can only be squeezed into the identity matrix . The pixels, corresponding to the entries of with big values, have extremely high probability to be occluded.

Input: The training samples for classes,
denotes the training sample matrix from th class, denotes the
number of training images of the th individuals. A test sample .
  .
  ,
  Let .
  Compute:
  for each subject   do
    (a) Use formula (13)–(29) to solve the objective function
          ,
          ,
    (b) the solution is
          ,
    (c) the unknown variables are
          
          
  end for
  Calculate the -norm of ,
  Calculate the Gradient of .
Output: the recognition protocols:
Reconstruction error based method,
          

Nowadays, improvement in the theory of sparse representation and compressed sensing [3335] reveals that if the solution to achieve is fairly sparse, the solutions of the -minimization problem (11) and the -minimization problem are equivalent. On the contrary, in some degree, we also believe that the -minimization solution is with sparse feature.

3.2. The Solution of the Proposed Algorithm

We will solve the following objective function to achieve the optimal results;When , , and are real number, (11) is looked upon as a linear program question [36]: in which and its rank is . This solution of above objective function is attained by a standard primal-dual method.

Set

We define the associated with the above objective function: Suppose and are like that form. And in which denotes the standard basis vector of component . Let , , , , and be any points which are zero duality gap, and these points are primal and dual optimal points. As minimizes the objective function on , minimizes it on . Then, it shows that the gradient must disappear at , ; that is,

The KKT conditions are listed as follows: where . Thus, we get the central and dual residuals: Then, the original complementary slackness condition is . Now, it is relaxed towhere the parameter is judiciously increased when progressing through the Newton iterations; diagonal matrix is set to . KKT conditions can be denoted as , in which it can be expressed as

Use the following Newton step to solve at the current point: The following linear equation expresses Newton step: Then, . In terms of , and , we have

Let the primal-dual search direction be the above formula solution. In the above formula,

in which denotes diagonal matrices, , and .

SetThe following items will be removed: and solve

Equation (28) is equations. It is also a positive definite system. We can solve it by conjugate gradients.

are attained; then compute the change:

3.3. Analysis of Classification

The reconstruction error is the component which cannot be represented linearly. It may come not only from the normal noise but also from occlusion or illumination. If the error can be attained accurately, it will play an important role in face recognition because the test sample with a lower gradient value is inclined to be the same label of the training samples. We will first look at and as follows.

In the proposed objective function, and represent the linear representation coefficients and the reconstruction error separately. In the paper, the constraints can make them become sparse through the optimization solution. That is, is the linear representation coefficients with the sparse character and is the sparse reconstruction error. When the label of the training set and the test sample is the same, the -norm value of will be smaller than the value when the label is different, which is shown in Figures 6 and 7.

In the paper, the gradient of is regarded as the recognition protocol. The probe image represented linearly by its true class is similar to its own gallery samples; then the gradient of will be very small. Thus, the error component including the noise and occlusion is not very obvious. The major variance may come from the occlusion, noise, and expression. Thus, we note that the correlation error between the gallery samples and the probe sample is vital to the final classification.

3.4. Classification Rule

These above considerations lead us to summarize the overall formula as follows.

First, in vector , we take out and reshape the vector to be an image matrix as the probe image. And, the reconstruction error image will be denoted by .

Here, the recognition protocols based on the gradient of reconstruction error are defined as the following forms.

Reconstruction error based method isin which denotes the gradient of .

4. Experimental Results

In this section, the experiments are presented on publicly available databases with different illumination, occlusion, and disguise. And the experiments are designed to show the efficacy of the our approach, which are implemented by Matlab R2011b on a desktop running Windows XP with Intel Pentium Dual-Core processor 2.60 GHz CPU. The Extended Yale B database [31, 32] and AR database [37] are usually used as the benchmarks to validate various algorithms. So, our paper also uses both of them. First, we will compare the proposed method with several classical methods on addressing different illuminations. Then, the experiments will show the robustness of our method for random pixel corruption and random block occlusion, respectively. Finally, we observe the performance in our method when the samples in disguise are from the AR Face database, respectively. In addition, since feature selection may affect classification results, our method directly works on original images without feature extraction. We choose the -norm method in the paper to solve the -regularized minimization in SRC. In the paper, we will show the efficiency of reconstruction error based method.

4.1. Recognition under Varying Illumination

The Extended Yale B database consists of 2,414 frontal face images of 38 individuals under various laboratory controlled lighting conditions. All test image data used in the experiments are manually aligned, cropped, and then resized to 30 × 28 images. The database is composed of five subsets (see Figure 1): Subset 1 consisting of 266 images under normal illumination conditions and each subject including seven images; Subset 2 and Subset 3 under slight-to-mode rate illumination variations, and each subset including 12 images, while Subset 4 including 14 images in a subject and Subset 5 including 19 images under severe light variations. In this experiment, Subset 1 is chosen for training, and other subsets are chosen for testing.

Table 1 shows the recognition performance for varying subsets. Eigenfaces and Fisherfaces are combined with Nearest Neighbors (NN) classifier for classification. We also compare the proposed method with CRC of regularized least square (CRC-RLS) [25] and Gradientfaces [9], in light of its capability of handling illumination changes. Note that Fisherfaces + NN, LRC, SRC, Gradientfaces, and the proposed methods show excellent performance for moderate light variations, yielding recognition accuracy for Subsets 2. CRC-RLS also achieves . In the case of severe light variations, recognition accuracy of SRC and LRC, however, falls to and for Subsets 4, respectively. Our method achieves on Subset 4, outperforming the best competitor CRC-RLS by a margin of . For Gradientfaces, it is derived from the image gradient domain such that it can discover underlying inherent structure of face images since the gradient domain explicitly considers the relationships between neighboring pixel points [9]. We can see that Gradientfaces achieves stable recognition accuracy for all subsets, which demonstrates its ability of extracting illumination insensitive features. In particular, Gradientfaces obtains the best recognition rate of on Subset 5. By contrast, recognition accuracy of the other approaches drops heavily in this subset, while the proposed method still outperforms LRC, SRC, CRC-RLS, and the benchmark approaches of Eigenfaces + NN and Fisherfaces + NN.

Notice that all rival methods attained poor performance on the most challenging subset, that is, Subset 5. The proposed method achieves recognition rate on the second most challenging subset, and drops to for the most challenging subset. For Subset 5, cast shadows and the presence of severe illuminations (e.g., overhead and lateral and even rear illumination) are some of the difficulties of note. The characteristic facial features, such as the eyes and the nose, are typically of great importance for recognition. To substantiate the explanation of the performance of our method, Subset 1 and Subset 2 are chosen for training and testing, respectively. Three face features (nose, eyes, and mouth) will be occluded by using a black rectangle to simulate the occlusion. See Figure 10 for an example of the occluded face parts. The results in Table 2 show that our method reaches accuracy for all cases. The results show that the comparable algorithms except Eigenfaces cope well with block occlusion.

4.2. Recognition with Random Pixel Corruption

In this experiment, we test the robustness of our method in the presence of pixel corruption. We use Subset 1 for training and Subsets 2 and 3 for testing, respectively. For each testing image, we replace a certain percentage of its pixels with uniformly distributed random values within . The corrupted pixels are randomly chosen for each test image and their locations are unknown to the proposed algorithm. For Subset 2, we adjust the percentage of corrupted pixels from 10 percent to 90 percent to check the efficiency of the algorithm. Figure 11(a) is an example of a test image under 30 percent corruption, and Figures 11(b)11(d) separately denote the associated original face images, the reconstruction error images (the proposed algorithm with its own class and the other), and gradient magnitude images of six individuals. The first image from the left in Figure 11(c) is attained by the computation of probe face image and the training face images from the same class. And, the rest attained are from the computation when the probe face image and the training face images are from different class.

As can be seen that, due to the pixel corruption, the test image becomes blurred. Compared with the other error images, the error image extracted from its true individual still looks more distinguishable from the gradient perspective. It can be seen, from Figure 12(a), when the percentage of corrupted pixels is between 10 and 20, LRC almost correctly classifies all of the testing images. However, when the percentage of corrupted pixels is more than 40, the recognition rate of Eigenfaces, Fisherfaces, and LRC is below . The performance of SRC drops heavily beyond 50 percent corruption. We can also observe that, from 0 to 50 percent pixel corruption, our method correctly classifies all subjects. At 50 percent corruption, the proposed approach is , while the excellent Gradientfaces is only . Notice that none of the others achieves higher than recognition rate. Even at 80 and 90 percent corruption, the proposed recognition rate is still higher than others.

For Subset 3, the faces are under even worse illumination conditions. The recognition results for different methods can be seen in Figure 12(b). We observe that our method outperforms others in this situation. For example, at 20 percent corruption, the recognition accuracy of the proposed method and LRC is recognition rate. However, at 40 percent corruption, the proposed method achieves recognition rate which outperforms SRC and LRC by a margin of and , respectively.

4.3. Recognition in the Presence of Block Occlusion

In this part, we test the robustness of our method to block occlusion. We use Subset 1 and Subsets 2 for training and testing separately. Different levels of contiguous occlusion from 10 to 70 percent were simulated by a square block of each test image substituting for an nonface image, that is, a baboon image. Figure 13 indicates a test image under 10 to 70 percent occlusion. The occlusion position is randomly selected in a test image and these algorithms do not know the position. Contiguous occlusion is much worse than the random pixel corruption in the face recognition algorithms.

Table 3 shows that the recognition rate of SRC, LRC, and the reconstruction error based method all yield . The proposed algorithms do not suppose prior knowledge about the occlusion feature. Based on an important principle of coding theory, the redundancy in the measurement is essential to detect gross errors [23]. Therefore, if a part of the image is completely corrupted by varying occlusions, identification can still be performed depending on the residual image. In this sense, our method and SRC both benefit from this principle. Table 4 illuminates the correct recognition of all compared algorithms. The proposed method significantly outperforms the other methods for different degree of occlusion. When half of a face is occluded, our method recognized over 97.80% of test subjects for Subset 3 correctly. Even at 60% occlusion, over recognition rate can be achieved.

4.4. Recognition of Face in Disguise

In the experiment, 100 persons (including 50 males and 50 females) are selected from AR database [37]. We use 799 images (about 8 samples per face) characterized by frontal views with various expressions and without occluded for training, and another two groups of 200 faces are chosen for testing at the same time. The first testing group includes images of the faces with sunglasses, occluding approximately 20 percent in the image. The second testing group includes images of the faces wearing a scarf, occluding approximately 40 percent of the image. All face images are converted into gray scale, cropped and aligned by the centers of eyes and mouth, and then normalized with resolution 64 × 56, as shown in Figure 14. In addition, the robust CRC (R-CRC) [25], proposed for face recognition with occlusion, is selected for comparison.

Table 5 depicts the comparison of our method with other compared algorithms. For LRC, the recognition results, affected by the feature extraction and the number of training samples, is for sunglasses and for scarves, respectively. For the reconstruction error based method its recognition rate is 51.72% which is less than 87.00% of SRC when the occlusion is caused by scarves. However, under sunglasses occlusion, our method attains the excellent correct recognition of , more than of the nearest competitor LRC and better than of SRC.

5. Conclusion

In the paper, we present a new method to recognize face images under illumination variations, pixel corruption, and occlusion. The reconstruction error in constrained objective function is used for classification. Experimental results show that the reconstruction error based approach is robust to illumination variations and corruptions such as occlusion and disguise when we do not assume prior knowledge about the light sources assumption and the nature of corrupted and occluded regions. Not only is our method used in face pattern recognition, but it can be applied to other image-based object identification. Although the performance of recognition protocol based on the linear representation coefficients needs to be improved, it gives a new insight to image-based recognition. It make us know that, even in a small sample space, the class-specific samples will show unique features under some constraints. Meanwhile, we found that the other variable, the linear combination coefficients, also yields the discriminating information which is of benefit to face identification. In the future, we will improve the performance of the linear representation coefficients based method.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is sponsored by the Major Program of National Natural Science Foundation of China (no. 91438104, no. 91420102, and no. 61472053); Project Supported by Scientific and Technological Research Program of Chongqing Municipal Education Commission (Grant no. KJ15012001); Chongqing Postdoctoral Special Funding Project (Xm2015063); and Fundamental Research Funds for the Central Universities in China (no. 106112015CDJRC161203).