Abstract

In the actual face recognition applications, the sample sets are updated constantly. However, most of the face recognition models with learning strategy do not consider this fact and using a fixed training set to learn the face recognition models for once. Besides that, the testing samples are discarded after the testing process is completed. Namely, the training and testing processes are separated and the later does not give a feedback to the former for better recognition results. To attenuate these problems, this paper proposed an online sparse learning method for face recognition. It can update the salience evaluation vector in real time to construct a dynamical facial feature description model. Also, a strategy for updating the gallery set is proposed in this proposed method. Both the dynamical facial feature description model and the gallery set are employed to recognize faces. Experimental results show that the proposed method improves the face recognition accuracy, comparing with the classical learning models and other state-of-the-art face recognition methods.

1. Introduction

Face recognition is a significant gift that humans perform effortlessly and automatically in our daily lives. According to the wide applications of automatic face recognition, such as public security and human-robot interaction, it is an established field attracting great attention [13].

In 1973, Takeo Kanade presented the first face recognition system [4]. After that, there was a dormant period in automatic face recognition until the work on a low-dimensional face representation by Sirovich and Kirby, derived using the K-L transform or principal component analysis (PCA) [5]. It is the pioneering work of Turk and Pentland that proposes a new facial feature description model, Eigenface [6, 7], to recognize faces. From then on, a number of face recognition and modeling systems have been developed and deployed. Some major state-of-the-art methods of face recognition are as follows: Ahonen et al. divide a face image into some regions and then extract local binary patterns from these regions to recognize faces [8]; to enhance the robustness to noise of LBP features, Zhang et al. [9] introduce a combination approach, which employs multiorientation and multiscale Gabor filtering to extend LBP to Local Gabor Binary Pattern (LGBP); they also propose a combination of the spatial histogram and the Gabor phase information encoding scheme, the Histogram of Gabor Phase Pattern (HGPP) [10]; to reduce the dimension of LGBP and suppress the intrapersonal variation, whitened PCA technique is applied on LGBP by Nguyen et al. [11] and leads to very high recognition rate; through applying the LBP-based structure on oriented magnitudes, Patterns of Oriented Edge Magnitudes (POEM) is proposed by Vu and Caplier [12], followed by the whitened PCA to increase the discriminative power and robustness for face recognition; to generate a more robust face descriptor, Mehta et al. [13] propose a new approach for face recognition using directional and texture information from face images, which improves the recognition results significantly. Besides these single feature-based face recognition methods, recently there are some fusion feature methods for more higher face recognition accuracy: Zou et al. [14] adopt Borda count method to combine different classifiers which are trained with one type of features selected from LBP, Gabor, and Eigenface; Tan and Triggs [15] fuse LBP and Gabor features to recognize faces with the kernel discriminant common vector (DCV) method.

Classical face recognition methods mentioned above employ a training set to train the face recognition model, and a gallery set to be the prototype which the testing samples matched against during the recognition procedure. Good results have been achieved of the classical methods; however, two drawbacks are also found from these methods: (1) The fixed training sets and gallery sets could not adapt to the variation of face images during testing, especially in real applications. For example, faces will gradually change with age. (2) The testing sample is discarded after the recognition procedure, which causes great loss of information contained in the testing set. Namely, the training and testing processes are separated and the latter does not give a feedback to the former for better recognition results. In order to attenuate these problems, an online sparse learning method is proposed in this paper, which can accommodate itself to variation of the face images in the testing set or real applications. In addition, the proposed method can update the facial feature description model in real time and ensure the good sparsity of the facial features.

2. Previous Works

In our previous work [16], we proposed a sparse learning method for salient facial feature description, and a new facial feature description model was obtained, which is given bywhere is a sparse facial feature vector and is a LBP feature vector. is a salience evaluation vector for . is a local sparse feature vector of the region in the facial image.

The method for obtaining is described in detail in our previous work [16], and there are two limitations in this method: (1) the training set is fixed; for example, it does not consider the dynamic changes of the sample set. (2) is obtained by only once learning, and there is no further adjustment for it. Thus, its adaptability and self-adjustment ability need to improve. An online sparse learning method for face recognition is presented in this paper to solve these problems.

3. The Proposed Method

3.1. Online Sparse Learning

With the method in our previous work [16], the initial salience evaluation vector and facial feature description model can be obtained, and suppose they are and . Let and denote the testing set and gallery set, respectively. Now, we propose an online sparse learning method to update and .

Randomly select a sample from testing and denote it as . Suppose the label of is , which can be computed with nearest neighbor model and gallery set . Take and to denote one nearest neighbor and next-nearest neighbor of , which are obtained from the gallery set with .

Region-based features (such as LBP and POEM) can be extracted from images , , and , which, respectively, are as follows:

A feature transformation is introduced to generate two classes of distance samples. If two feature vectors are from the nearest neighbor couple, the following transformation will produce a positive sample:

Also a negative sample can be generated by applying the same transformation to the next-nearest neighbor couple; namely, the transformation will generate a negative sample . We should know that the number of the nearest neighbors and next-nearest neighbors are both at least one. All the positive and negative samples form the positive and negative sample set and , respectively. The sample matrix consists of and . Label vector is obtained by assigning label “1” and “0” to the positive and negative sample, respectively. Let denote the updated value of , then we havewhere is the increment of .

The initial value of is obtained by using the sample matrix (which is learned from the training samples and different to ) and salience evaluation vector to fit the label vector. As is the updated value of , it also have the property that it can fit the label vector cooperated with the sample matrix. If we take one testing set tn and its nearest and next-nearest neighbors as the training samples, we can use and to fit . Then, we arrive at

Plug equation (4) into equation (5), we have

A linear system is obtained by substituting with , which is given by

A visual expression in Figure 1 is afforded to better understand the facial feature transformation equation (3) and the theory for constructing the linear system.

As the number of samples in is much smaller than the dimensionality of the sample, it cannot get an ununique solution for . Here, the structured sparse representation is introduced [1721] to overcome this problem. Simultaneously, this strategy leads to the sparsity at both the group and individual levels. By dividing the sample matrix E and the salience evaluation into region, we havewhere is the column of . Then, an optimization problem is constructed to solve as follows:where is the group columns of and is the corresponding salience evaluation vector of . is the Euclidean norm. is a complexity parameter, which has a positive value and controls the amount of shrinkage; namely, the larger the value of is, the greater the amount of shrinkage could be.

When is obtained, can be computed with equation (4). Then, we use to construct the facial feature description model (in equation (1)). and can be updated with and , respectively. These steps are performed iteratively.

A summary of the online sparse learning algorithm is given in Algorithm 1.

To better understand the online sparse learning algorithm, a framework of the essential procedures of this algorithm is presented in Figure 2. Also, a detailed description of the online sparse learning algorithm is summarized in Algorithm 2. With the update of the testing sample, the online sparse learning procedures are performed continuously. More importantly, the time cost of the online sparse learning algorithm is very low, which ensures it can be performed in real time.

Initialization:
(1)Obtain gallery set , testing set , initial salience evaluation vector , and facial feature description model .
(2)Set the initial value for the number of iterations: n = 1.
(3)Set the initial value for the sign vector (each element denotes whether the label of the testing sample is correct, “1” denotes “right” and “−1” denotes “wrong”) SV = 1.
(4)Randomly select a testing sample from .
Iteration:
(5)while {the number of the consecutive testing sample labeled incorrectly: k < 5} do:
(6).
(7)Compute the facial feature vector for with equation (1).
(8)Compute the label for with , , and the nearest neighbor model.
(9)Judge the correctness of according the criterion in Section 3.2: if is correct, then update according the gallery set update strategy in Section 3.3; if is incorrect, then repeat steps (4)∼(8).
(10)Construct the nearest neighbor set and next-nearest neighbor set : find the nearest neighbor samples of from , and all the nearest neighbor samples form set ; similarly, the next-nearest neighbor set is obtained.
(11)Construct positive and negative sample set and according to equation (7).
(12)Solve the optimization problem equations (9)∼(10) to obtain .
(13)Compute the salience evaluation vector β with equation (4).
(14)Compute with equation (1).
(15)Update
(16)Update
(17)n = n + 1.
(18) is equal to the number of element “−1” in vector [SV (n − 4), SV (n − 3), SV (n − 2), SV (n − 1), SV (n)].
(19)end while
3.2. Criterion of Label Correctness

This subsection presents one comprehensive criterion for judging the correctness of the testing sample label . As we employ the nearest neighbor model to assign a label for the testing sample, a threshold for the nearest neighbor distance is used to determine whether the label is correct. For improving the accuracy of the judgement further, the next-nearest neighbor distance is also employed to these judgement procedures. Suppose the distance between two elements of any one nearest neighbor couple is computed, which is . Also, is used to denote the distance between two elements of any one next-nearest neighbor couple . With the nearest neighbor distance and next-nearest neighbor distance, one comprehensive criterion for judging the correctness of the testing sample label is proposed as follows.

Rule 1. (criterion of label correctness). If and satisfy, the label of , is correct, where and are, respectively, two thresholds for and , which are set manually according to the experience.

3.3. Gallery Set Update Strategy

In this research, a dynamic gallery set is taken as the input of the online sparse learning algorithm. If the label of the testing sample is right, the following strategy is used to update the gallery set (Algorithm 2).

Initialization:
(1)Obtain gallery set , testing set , and its label .
(2)Set the initial value for (the maximal number of the pictures of each person in the gallery set).
Update gallery set:
(3)In gallery set , find the class that the testing sample belongs to using .
(4)Obtain the number of the pictures in this class and suppose it is .
(5)If , remove the picture in the second place from the class and put in the last place. If , put in the last place directly.

4. Experiments

In this section, the performance of the proposed method is demonstrated on the FERET database [22]. We first compare the proposed method with the classical learning models: PCA [23] and WPCA [24] using LBP and POEM features. Then, a lot of state-of-the-art face recognition results obtained on this database are shown to evaluate the performance of the proposed method. Overall recognition rate (ORR) is defined as the ratio that the number of images correctly recognized to the total number of images in the whole face dataset, which is a comprehensive criterion for evaluating the performance of the face recognition method.

4.1. Data Description

FERET database is a famous database for face recognition. It has 14,051 grayscale images representing 1,199 individuals. These images contain variations in lighting, facial expressions, and time. The ways of selecting the gallery and probe sets are the same to the FERET evaluation protocol [22]. Namely, fa (1,196 images) is used as the gallery set, while fb (1,195 images), fc (194 images), dupI (722 images), and dupII (234 images) are used as the probe sets. Note that the original training set comes from two training sets, the standard FERET training set (736 images) and subfc training set (194 images) [8]. All the images are cropped to 130 × 150 pixels for LBP feature extraction and 96 × 96 for POEM feature extraction.

Extended Yale B face database consists of 38 subjects, and each subject has approximately 64 frontal view images under various lighting conditions. All image data used in the experiments are manually aligned, cropped, and then resized to 168 × 192 pixels. The database is divided into five subsets on the angle between the light direction: gallery set (38 images), S1 (225 images), S2 (456 images), S3 (455 images), S4 (526 images), and S5 (714 images). We random select 500 images for training set.

4.2. Results

The comparison results between the proposed method and other learning methods for face recognition on FERET database are shown in Tables 1 and 2. From these two tables, we can see that the proposed method achieves the highest recognition accuracy in most of the subsets (except subset fb). By considering the comprehensive criterion of ORR, the proposed method (94.6% with LBP features and 95.4% with POEM features) is significantly better than the other two methods, which indicates that the online sparse learning strategy has great advantages than PCA model and WPCA model.

Comparing the corresponding results of the same method in Tables 1 and 2, we find that the results obtained using POEM features (95.4%) are better than the results obtained using LBP features (94.6%). It indicates that POEM features are much discriminative than LBP features. The proposed methods obtained significantly better results than other two learning methods PCA and WPCA both with LBP and POEM, which shows that the proposed method has better generalization ability among different region-based features.

We also compare the performance of the proposed method with other state-of-the-art methods and summarize the results in Table 3. This table shows that the performance of the proposed method is much better than that of other state-of-the-art methods on the comprehensive criterion ORR. More importantly, the proposed method gains 100% accuracy on the subset dupII, and much higher recognition rate than other methods on the subset dupI. Because dupI and dupII were, respectively, taken within one year of gallery image and at least one year apart, which means the proposed method is much more robust to variations in time and age.

5. Conclusions

Face recognition is the main task for person identification. This paper proposes an online sparse learning method for face recognition. The initial salience evaluation vector in a facial feature description model is obtained from our previous work. Then, an online sparse learning method is proposed to learn the increment of the salience evaluation vector with one testing sample and the current gallery set. The salience evaluation vector is updated by adding the initial salience evaluation vector to its increment. Also, a gallery set update strategy is presented in this paper, which achieves the dynamical update of the gallery set. The proposed method can update the facial feature description model in real time and ensure the good sparsity of the facial features. In addition, it considers the dynamical changing of the sample set to generalize the ability of the proposed method for good face recognition results. Experimental results show that the proposed method improves the face recognition accuracy, comparing with the methods of PCA and WPCA.

Data Availability

Previously reported FERET database, PCA learning model, and WPCA learning model were used to support this study and are available at DOI: 10.1109/CVPR.1997.609311, DOI: 10.1016/S1077-3142(03)00077-8, and DOI: 10.1016/j.patcog.2009.12.004. These prior studies and datasets are cited at relevant places within the text as references [2123].

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

Jianbo Su contributed to conceptualization and data curation; Qiaoling Han was involved in methodology and wrote the original draft; and Yue Zhao reviewed and edited the manuscript.

Acknowledgments

This work was supported in part by the Fundamental Research Funds for the Central Universities (2019ZY12) and in part by the NSF of China under the grant nos. 61533012 and 91748120.