A Novel Face Recognition Approach Based on Two-Step Test Sample Representation
The two-step test sample representation method is proposed for face recognition. It first identifies “representative” samples from each category training samples for the test sample then produces a weighted sum of all the “representative” samples that well approximates the test sample. This method assigns the test sample to the class whose training samples have the smallest deviation from the test sample. As the method proposed in this paper is able to reduce the side-effect of the other training samples that are very “far” from the test sample on the recognition decision of the test sample, the high recognition rates can be obtained.
Face recognition has attracted much attention in the computer-vision-related fields which has a wide range of applications including identity authentication, access control, and surveillance [1–3]. The K nearest neighbor (KNN) is a simple and popular machine learning approach for classification and has been widely used for face recognition [4–6]. The basic idea of KNN is to use the K nearest neighbors of the test sample from the training set to classify the test sample. In particular, it assigns the test sample to the class which has the more nearest neighbors for the test sample than all the other classes. The principle of nearest neighbor classification was first proposed by Skellam , it takes the ratio of expected and observed mean value of the nearest neighbor distances to determine whether a data set is clustered or not.
The following works can be viewed as recent improvements of KNN. In 1999, Li and Lu proposed the nearest feature line (NFL) classifier . Actually, NFL is a method that exploits virtue training samples to improve KNN. In other words, NFL treats the line between two sample points from the same class as the so-called feature line and views all points on feature lines as virtue training samples. Suppose that some certain class has n training samples, there are feature lines with it. Chien and Wu  proposed the nearest feature plane (NFP) and the nearest feature space (NFS) classifiers. With training samples from some certain class, there are feature planes and one feature space, respectively. In 2004, Zheng et al.  proposed both the nearest neighbor line (NNL) and the nearest neighbor plane (NNP) classifiers. Moreover, Orozco-Alzate and Castellanos-Domínguez  proposed the k nearest feature line (KNFL) and the k nearest feature plane (KNFP) classifiers based on the k nearest feature lines and the k nearest feature planes, respectively. Du and Chen  proposed the rectified nearest feature line segment (RNFLS) classifier. Fu et al.  presented the k-nearest-neighbor simplex (KNNS) classifier where k is the number of vertexes in a simplex. Moreover, Majumdar  proposed the nearest subspace (NS) classifier.
There exists some differences among the nearest-neighbor-based classifiers. The first difference is that they have different-dimensional coefficient vectors. Specifically, the dimensionalities of the coefficient vectors in NFL, NFP, and NFS are 2, 3, and , respectively. stands for the number of training samples of the given class. The second difference is that the coefficients of some approaches satisfy the constraint that their sum are equal to one whereas the other methods do not. In particular, NFL is subject to this constraint, but NFP and NFS are not. The third difference is that all the coefficients should be nonnegative with RNFLS and KNNS, whereas NFL, NFP, and NFS do not have the requirement.
Recently, the sparse representation-based classification (SRC)  has been proposed and can be considered as an improvement to KNN. SRC treats a weighted sum of all the training samples from a class as the center of this class and classifies the test sample into the class whose center is the nearest to the test sample. In SRC, the weights can be learned using some algorithms. Moreover, SRC outperforms most of the state-of-the-art facial classifiers.
The common characteristic of the classifiers presented earlier is that all of them use the deviation between the test sample and the linear combination of the training samples from some certain class to classify the test sample. They assign the test sample to the class that has the smallest deviation. Different from the other improvements of KNN, SRC exploits a sparse linear combination of all the training samples to represent the test sample. “Sparse” means that some coefficients of the linear combination are equal or close to zeros.
In this paper, we propose the two-step test sample representation method for face recognition. This method first identifies “representative” samples from every category training samples, the representative samples are more similar to the test sample. The method also exploits a linear combination of all the “representative” samples from all the classes to represent and classify the test sample. Here, the proposed method is employed for face recognition, and the experiments are conducted with different face databases.
The remainder of this paper is organized as follows. Section 2 overviews previous improvements of KNN. Section 3 presents the proposed method. Section 4 shows the experimental results. Finally, Section 5 summarizes the paper.
2. Improvements of KNN
2.1. Nearest Feature Line
With the nearest feature line (NFL), any two feature points of the same class are generalized by the feature line (FL) passing through the two points. So more variation of available samples can be captured, and the classification is based on the nearest distance from the query point to each FL. Let denote the matrix composed of all training samples from the th class, ( is the number of class). is a given test sample. For each pair of training samples and from the th class, there exist feature lines defined as The NFL classifier defines the distance between and feature lines as The distance between and the th class is calculated using If , then the NFL classifier assigns to the th class. is one element of the set of the class labels, that is, .
It is clear that the NFL classifier indeed evaluates the deviation between a linear combination of two training samples from the same class and the test sample, and the smallest deviations generated from all the classes can be obtained. If the smallest deviation of a class is smaller than those of all other classes, then the NFL classifier assigns the test sample into this class.
2.2. Nearest Feature Space
The NFS classifier first defines the feature space spanned by all training samples from the th class using This classifier takes the minimum value of the distances between test sample and the elements of as the distance between and the th class. That is, the distance between and the th class is calculated using If , then the NFS classifier assigns to the th class.
2.3. Sparse Representation-Based Classification
Let stands for the matrix consisting of all training samples, and let be the number of training samples. Let be the test sample. SRC aims to use a sparse linear combination of all training samples to represent . In other words, if denotes the coefficient vector of the sparse linear combination, SRC has the following objective function: where is a user-defined small positive constant. Let be the solution of . The distance between the test sample and the th class is calculated using where is a vector whose nonzero entries are the entries in that are associated with the th class, . If , SRC assigns test sample to the th class.
3. The Proposed Method
3.1. Description of the Proposed Method
The proposed method is composed of two steps. The first step determines “representative” training samples from each category for the test sample. The classes will have “representative” samples in total. The second step represents the test sample via a linear combination of the representative samples. Let be all the representative samples of the classes, the test sample can be represented with the linear combination of the representative samples. Finally, the second step calculates the deviation between the test sample and the linear combination of each class and assigns the test sample to the class that has the minimum deviation.
In the first step, the test sample is represented with a linear combination of all the training samples of the th class (). The coefficients of the obtained liner combination will be used to compute the deviation between the test sample and the th class (). It then computes the deviation between the test sample and the linear combination of arbitrary () training samples of the th class (), that is, the deviation between the test sample and the th class.
training samples from the th class is formulated as a matrix , the test sample can be approximated as
is the combination coefficients of the th class samples which represent the test sample, can be solved using . and are a small positive constant and the identity matrix, respectively. Let denote arbitrarily samples of , the deviation between the test sample and can be calculated using are coefficients of related to , respectively. is the so-called deviation between the test sample and some th class representation. If training samples are selected from all training samples of the th class, there will be possible cases. Also, there will be different . For a class, the training samples that produce the minimum are selected as the “representative” samples of the th class for the test sample. In the experimental section, we also perform a classification based on the first step of the proposed method as follows: the minimum deviation of every class from the test sample can be computed, and the test sample is assigned to the class with which the minimum deviation is the smallest among all the classes.
After the representative samples are determined, the test sample is approximated with them as where and . The solution of (10) is obtained with is the identity matrix, and stands for a small positive constant. The final deviation between the test sample and the th class is calculated using where , is composed of the representative samples of the th class. If , then the test sample is assigned to the th class.
3.2. Insight into the Proposed Method
3.2.1. Why the Proposed Method Is Reasonable
This section will show why the proposed method is reasonable. The first step of the proposed method is indeed a step to identify the “representative” samples that are the most similar to the test sample from each class. Then, only the training samples that are similar to the test sample are used to represent and classify the test sample. It is known that the face images of the same subject have much variation owing to the varying pose, facial expression, and illumination [16, 17], and the training samples dissimilar to the test sample might be or not be from the genuine subject of the test sample. As a result, it is not feasible to use the “dissimilar” training samples to distinguish which class the test sample is truly from. On the contrary, it is very helpful to use only the training samples that are similar to the test sample for classification.
The second step of the proposed method also enables all the classes to represent the test sample in a competitive way. Actually, if one class contributes much, that is, has a small deviation in representing the test sample, the other classes will contributes less. The experiments show that the genuine class of the test sample usually has the smallest deviation with the test sample; thus, the proposed method allows the test sample to be classified with a high accuracy.
The first step is somewhat able to remove the “outlier” of a class, that is, the training samples that are very close to other classes. Thus, it is useful for obtaining a high accuracy. Figures 1, 2, and 3 show a test sample from the AR face database and the classification result of our method on this test sample (the occluded image in the first column is the test sample). In these three figures, the first image always stands for the same test sample. In this example, our method takes the first six face images and the remaining face images of each subject as training and test samples, respectively, and is set to 3. Figure 1 shows a sample and its 34 neighbors from the set of training samples (these neighbors were determined using the Euclidean distance between the training sample and test sample). Among these 34 neighbors only the last one is from the same subject as the test sample, and the second to sixth images are the first five neighbors of the test sample.
Figure 2 shows that the second step of our method correctly assigns the test sample to its genuine class, the second to fourth images are the three training samples of the first closest class to the test sample, the fifth to seventh images and eighth to tenth images are the three training samples of the second and third classes that are the closest to the test sample, respectively. On the other hand, Figure 3 shows that if the results from the first step of our method are directly employed to perform classification, the test sample is misclassified. The three training samples form three closest class to the test sample are also presented, the genuine class of the test sample is only the seventeenth class among all the classes that are “close” to the test sample. In other words, if the test sample is assigned to the class with the smallest (defined in (9)), it will be misclassified.
Figures 4 and 5, respectively, show the difference between the same test sample as shown in Figures 1, 2, and 3 and its representation results obtained using the first and second steps of our method. In these two figures, the horizontal and vertical axes show the number of class (class label) and difference, respectively. The genuine class label of the test sample is 3. Figure 5 shows that in terms of the representation result obtained using the second step, the test sample has the smallest deviation from its representation result generated from the third class. As a result, the second step will correctly assign the test sample to the third class. However, Figure 4 shows that the test sample is assigned to the fifty-fourth class only with the deviations from the first step of our method, and the class that has the minimum deviation is not the genuine class (the genuine class is the third class).
Differing from conventional -norm-based sparse representation method, our method is an -norm-based. The -norm based representation method proposed by Xu et al. not only is computationally much more efficient than conventional sparse representation methods but also can lead to a higher face recognition accuracy .
3.2.2. Explore the Relationship between the Accuracy and the Representation Error
We will show whether the accuracy is directly related to the representation error or not. As shown earlier, if the representation result generated from a class has the minimum deviation from the test sample, the test sample is assigned to this class. Let , , is the so-called representation error. If the number of the available training samples is equal to or greater than the dimensionality of the sample vector, the representation error of the test sample can be zero. Under the other conditions, the more the available training samples the lower the representation error. In the real-world face recognition applications, the condition that the number of the available training samples is equal to or greater than the dimensionality of the sample vector is usually not satisfied. Nevertheless, the experiment shows that a lower representation error usually does not lead to a higher recognition accuracy. Moreover, our method usually has a high representation error, whereas it obtains a high recognition accuracy. Figure 6 shows the representation errors of the first 300 test samples in the case where the first four face images of each subject in the AR database are used as training samples and the remaining face images are used as test samples. It is clear that obtains a lower representation error than ; however, obtains a higher classification accuracy. Thus, the representation error is not a good index of the accuracy. However, the accuracy is directly related to the deviation between the test sample and the representation result of the genuine class of the test sample. Moreover, it seems that the smaller the deviation, the better the classification performance.
4. Experimental Results
4.1. Experiment on the Lab2 Data
The first experiment is conducted on the Lab2 database . This database contains 50 subjects and 1000 visible face images. Each subject is provided with twenty face images. These images were acquired under the following four different illumination conditions: (a) the environmental illumination (referred to as “normal illumination”) condition, (b) the environmental illumination puls the illumination of the left incandescent lamp (referred to as “left illumination”), (c) the environmental illumination puls the illumination of the right incandescent lamp (referred to as “right illumination”), and (d) the environment illumination puls the illumination of the left and the right incandescent lamps (referred to as “both illumination”). The face images also have a variation in facial expression and pose. Figure 7 shows several visible face images of one subject in the Lab2 database. We used the ten visible face images captured with “both illumination” and “left illumination” of every subject as training samples and took the remaining images as test samples. Table 1 shows the experimental results of our method on the Lab2 database. Table 2 shows the experimental results of NFL, NFP, and NFS on the Lab2 database. It is shown that our method obtained much fewer classification errors than NFL, NFP, and NFS. Table 1 also illustrates that if we depend on only the first step of our method to perform classification, a higher error rate will be obtained.
4.2. Experiments on the AR Face Database
The second experiment is conducted on the AR database . The face images of this database were obtained under the condition of varying pose, facial expression, or lighting. Occluded face images are also included in the AR face database. There are 120 subjects and 3120 gray face images captured in two sessions. We resized each image to a 40 by 50 image. A series of experiments are carried out, where the first certain number of images (set to 4, 6, 8, 12, and 14, resp.) for each subject constitute the training samples, all the other images make the test samples. The number of representative samples are selected in the view of Tables 3 and 4, and the experimental results are also shown there. The table shows the comparison results with another three KNN algorithms (NFL, NFS, and NFP) on the AR database. “ER” is the abbreviation of “error rate”.
From Tables 3, 4, and 5, it is known that our method leads to the lowest error rate than other methods. These tables also show that the first step of our method obtains a much higher error rate than the second step. Thus, the second step plays an important role in representing and classifying the test sample. This is mainly owing to that the second step evaluates the contribution of every class to representing the test sample, in a competitive way.
The two steps of the proposed method play different and positive roles in performing classification. The first step indeed identifies the training samples which are the most similar to the test sample. This is beneficial to reduce the side-effect on classification of the other dissimilar training samples. The second step allows the “representative” samples of every class to represent the test sample in a competitive way by using a linear combination of them. This is useful for the method to better evaluate the deviation between the test sample and each class than NFL, NFP, and NS. The proposed method is easy to implement and achieve better performance than the conventional nearest feature classifiers. But the selection of all possible subsets of training samples with each class needs more computation that will bring some negative effects.
This work was supported by National Natural Science Foundation of China (Grant no. 61065003) and Social Sciences Foundation of the State Education Ministry (Grant no. 10YJC630379).
W. Hizem, Y. Ni, and B. Dorizzi, “Near infrared sensing and associated landmark detection for face recognition,” Journal of Electronic Imaging, vol. 17, no. 1, Article ID 011005, 2008.View at: Google Scholar
J. G. Skellam, “Studies in statistical ecology. I. Spatial pattern,” Biometrica, vol. 39, pp. 346–362, 1952.View at: Google Scholar
S. Z. Li and J. Lu, “Face recognition using the nearest feature line method,” IEEE Transactions on Neural Networks, vol. 10, no. 2, pp. 439–443, 1999.View at: Google Scholar
A. Majumdar, Compressive classification for face recognition [M.S. thesis], Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, Canada, 2009.
Y. Xu, A. Zhong, J. Yang, and D. Zhang, “Bimodal biometrics based on a representation and recognition approach,” Optical Engineering, vol. 50, no. 3, 2011.View at: Google Scholar