Abstract

Face recognition has become a very active field of biometrics. Different pictures of the same face might include various changes of expressions, poses, and illumination. However, a face recognition system usually suffers from the problem that nonsufficient training samples cannot convey these possible changes effectively. The main reason is that a system has only limited storage space and limited time to capture training samples. Many previous literatures ignored the problem of nonsufficient training samples. In this paper, we overcome the insufficiency of training sample size problem by fusing two kinds of virtual samples and the original samples to perform small sample face recognition. The two used kinds of virtual samples are mirror faces and symmetrical faces. Firstly, we transform the original face image to obtain mirror faces and symmetrical faces. Secondly, we fuse these two kinds of virtual samples to achieve the matching scores between the test sample and each class. Finally, we integrate the matching scores to get the final classification results. We compare the proposed method with the single virtual sample augment methods and the original representation-based classification. The experiments on various face databases show that the proposed scheme achieves the best accuracy among the representation-based classification methods.

1. Introduction

Face recognition [15] is paid more and more attention as a branch of biometrics [69]. Face images are usually unstable and a plenty of factors such as changes in facial expression, poses, light conditions (day/night, indoor/outdoor), coverings (mask, sunglasses, hair, beard, etc.), and the age adversely affect the performance of face recognition. In recent years, many researchers have made their efforts to address these challenges. For example, Beymer [10] proposed a view-based approach for recognizing faces under varying pose. The recognizer consists of two main stages, a geometrical alignment stage where the input is registered with the model views and a correlation stage for matching. Georghiades et al. [11] presented a generative appearance-based method for recognizing human faces under variation in lighting and viewpoint. Yang et al. [12] presented a novel robust sparse coding (RSC) model and an effective iteratively reweighted sparse coding (IRSC) algorithm for RSC to decrease the influence of various types of outliers. Wagner et al. [13] proposed a system for recognizing human faces from images taken under practical conditions that is conceptually simple, well motivated, and competitive with state-of-the-art recognition systems for access control scenarios. If we have sufficient and available training samples, which contain all the possible variations of the illumination, facial expression, and pose, we will obtain a high accuracy. However, in a practical application, the training samples always cannot convey sufficient variations of the illumination, facial expression, and pose [1416]. Therefore, limited number of available training samples becomes a severe problem.

For face recognition, fewer samples per person mean less laborious effort for collecting them and lower cost for storing and processing them. Unfortunately, the small size of training samples will bring about some challenges such as lower robustness. For nonsufficient training samples situation, previous literatures have proposed some approaches to generate new face images so that the size of training samples is enlarged [1719]. These generated face images are called virtual samples. The virtual samples will contain the same pixels as the original samples. Suppose that the original image is ; then the virtual samples can be represented as . For example, Tan et al. [20] attempted to provide a comprehensive survey of current researches on one sample per person face recognition problem. They evaluated some advantages and disadvantages of previous relevant methods. Xu et al. [21] assumed that the facial structure is symmetrical and proposed an approach to generate “symmetrical” face images and exploited both the original and “symmetrical” face images to recognize the subject. In [22] and [23], Xu et al. aimed at solving nonsymmetrical samples and misalignment problem. They proposed a method which exploits the mirror image of the face image to simulate possible variation. Su et al. [24] proposed to adapt the within-class and between-class scatter matrices computed from a generic training set to the persons to be identified by coupled linear representation method.

If the training sample does not have a considerable size, the face recognition method using only the original face images will hardly obtain satisfactory accuracy. In this paper, the proposed method integrates the original training sample and its virtual samples to perform face recognition. The virtual samples include two parts: mirror face images and symmetrical face images. Each virtual face image reflects some possible change in pose and illumination of the original face image. The main advantages of mirror virtual training samples are that they can effectively overcome the problem of nonsymmetrical samples and misalignment problem [23]. The advantages of symmetrical faces are that they are different from the original face image but can really reflect some possible appearance of the face. Figure 1 shows the original samples, the corresponding virtual samples, and the test samples. The proposed scheme is not a simple combination of all the virtual samples and training samples, but a fusion of weights among the original training samples, the mirror virtual faces, and the symmetrical virtual faces. It is very important to select the superior fusion of weights; however, the details of selection process will be presented in Section 5.

To verify the effectiveness of our scheme, we apply it to several representation based classification (RBC) methods [2528]. Some face recognition methods have been shown to be extremely brittle when they are faced with challenges such as alignment variation or minor occlusions. Many researchers have made their efforts to address the brittleness. The recently proposed sparse representation classification (SRC) method [2931] can obtain a high accuracy in face recognition. In SRC, the testing image is represented as a sparse linear combination of the training samples and then the deviation between the test sample and the expression result of every class is used to perform classification. SRC has demonstrated striking recognition performance when facing noise such as occlusion or corruption. Relevant literatures [10] reveal that SRC performs well even if face samples were disturbed by 80 percent of random noises. In this paper, we conduct contrast experiments on some representative SRC methods, which include the collaborative representation classification (CRC) method [26], an improvement to the nearest neighbor classifier (INNC) method [27], a simple and fast representation based face recognition (SFRFR) method [28], and the two-phase test sample sparse representation (TPTSR) method [9].

The contributions of the paper are as follows.(1)The used virtual samples are composed of mirror virtual faces and symmetrical virtual faces. The simultaneous use of these two kinds of virtual samples gives good solution to the problem of nonsufficient sample problem.(2)The proposed method is not based on a single combination of all the virtual samples and training samples but is based on a weighted fusion of the original training samples, the mirror virtual faces, and the symmetrical virtual faces.

The remainder of the paper is organized as follows. Section 2 presents related works. Section 3 describes our proposed scheme. Section 4 shows the experimental results. Section 5 offers the conclusion.

To verify the validity of our scheme, we will apply it to several representation based classification (RBC) methods [32, 33]. In this section, we present a brief introduction to the RBC method.

Among RBC methods, the sparse representation classification (SRC) was almost the earliest proposed one. Later sparse representation methods based on the -norm regularization [34, 35], -norm regularization [36, 37], and -norm regularization [38, 39] have also been proposed for face recognition. Among them, the -norm minimization methods are widely used because of their lower computational complexity and good classification results. For example, Xu et al. [25] proposed a simple yet efficient face classification scheme, namely, collaborative representation based classification (CRC). The -regularized CRC method exploits sparse linear equations to represent samples and divides the class specific representation residual by the -norm “sparsity” to get the final classification result. In this paper, we conduct contrast experiments on CRC, INNC, SFRFR, and TPTSR which are all -regularized RBC methods. We take CRC and TPTSR as an example to introduce the basic algorithm of RBC. The INNC method has the same equation and solution scheme as CRC but uses a simpler classifier. The SFRFR method has the same equation and solution scheme as TPTSR but only selects nearest neighbors, each being from one class, for the test sample. We assume that there are classes, and each class contains training samples. Let denote all training samples , where , and stands for the th training sample of th subject. Let denote the test sample. The CRC and TPTSR are summarized as Algorithms 1 and 2, respectively.

(1) Normalize the columns of and to have unit -norm
(2) Solve
   If is singular, , where is a small
positive constant and is the identity matrix.
   If is not singular, .
(3) Compute the regularized residuals
    
(4) Output the identity of as
  Identity =

(1) Normalize the columns of and to have unit -norm
(2) Solve
   If is singular, , where is a small
positive constant and is the identity matrix.
   If is not singular, .
(3) Compute the deviation
    
(4) Output the training samples that have the greatest contributions
    
(5) Solve
   If is singular, , where is a small
positive constant and is the identity matrix;
   If is not singular, .
(6) Compute the deviation
    
(7) Output the identity of as
  Identity

3. The Proposed Scheme

In this section, we will introduce the proposed fused virtual samples augment method in detail.

The proposed method uses a simple way to obtain more training samples and to improve the face recognition accuracy. The main steps are showed as follows. The first and second steps generate mirror training sample and “symmetrical face” training samples. These samples reflect possible variation of the face. The third step exploits original training samples for classification. The fourth, fifth, and sixth steps, respectively, exploit the mirror virtual training samples, the left symmetrical virtual training samples, and the right symmetrical virtual training samples for classification. The seventh step uses the score fusion for ultimate face recognition. Suppose that there are classes and each class has training samples; we present these steps as follows.

Step 1. Use original training sample to generate mirror training sample. Let be the th training sample in the form of image matrix. Let stand for the mirror training sample. The mirror virtual sample of an arbitrary original training sample is defined as , , . Row and col stand for the numbers of the rows and columns of , respectively. denotes the pixel located in the th row and th column of . Figure 2 presents several original training samples in the ORL database and their corresponding mirror virtual training samples.

Step 2. Use original training sample to generate the left and right symmetrical virtual training samples. Let and stand for the left and right symmetrical training samples generated from the th training sample, respectively. The left symmetrical virtual image of an arbitrary original training sample is defined as and , , , . The right symmetrical virtual image of an arbitrary original training sample is defined as and , , , . Figure 3 presents several original training samples in the ORL database and their corresponding symmetrical virtual training samples.

Step 3. Use the original training samples to perform representation based method. Let denote the score of test sample with respect to the th class.

Step 4. Use the corresponding mirror virtual training samples to perform representation based method. Let denote the score of test sample with respect to the th class.

Step 5. Use the corresponding left symmetry virtual training samples to perform representation based method. Let denote the score of test sample with respect to the th class.

Step 6. Use the corresponding right symmetry virtual training samples to perform representation based method. Let denote the score of test sample with respect to the th class.

Step 7. Combine all scores obtained using Steps 3, 4, 5, and 6 to conduct weighted score level fusion. For test samples, we use to calculate the ultimate score with respect to the test sample to in the th class. , , , and represent the weights. Let and . Figure 4 shows the main steps of fusing virtual samples technique to do face recognition.

4. Experimental Results

In this section, we evaluate the performance of our proposed scheme for face recognition. Then we make comprehensive analysis on the experimental results. In addition, we explain the selection of weights in detail and show the experimental comparison of different fusions of weights.

4.1. Data Sets

We conduct a number of experiments on the ORL [40], Yale [41], FERET [42], and FLW [43] databases. Figure 5 shows some original training samples from the ORL face database. Figure 6 shows some original training samples from the Yale face database. Figure 7 shows some original samples from the FERET face database. Figure 8 shows some original samples from the FLW face database. The database is introduced as follows.

ORL. This data set contains 400 face images from 40 subjects, each providing ten images. The ORL database includes variations in facial expression (smiling/not smiling, open/closed eyes) and facial details. Each of the face images contains 32 × 32 pixels. The first 1, 2, and 3 images per person were selected for training and the remaining for testing.

Yale. This data set contains 14 subjects, and each subject has 11 images. The Yale database includes variations in facial expression and facial details. Each of the face images contains 100 × 100 pixels. The first 1, 2, and 3 images per person were selected for training and the remaining for testing.

FERET. This data set contains 200 subjects, and each subject has 7 images. Each face sample contains 40 × 40 pixels. FERET mainly includes change of illumination and change of expression. The first 1, 2, and 3 images per person were selected for training and the remaining for testing.

LFW. This data set contains more than 13,000 images of faces collected from the web. Each face has been labeled with the name of the people pictured. 1680 of the people pictured have two or more distinct photos in the database. In the experiments, 860 images from 86 people were chosen. Each person has 10 images. The first 1, 2, and 3 images per person were selected for training and the remaining for testing.

4.2. Experimental Results and Analysis

Actually, in the beginning of studying this topic, we tried to use both the original training samples and their virtual training samples to do recognition, but the results are not satisfying. The basic steps of the method which combine the original training samples and the generated virtual images into a new training sample set for face recognition (COVNFR) are showed in Figure 9.

We use the collaborative representation based classification (CRC) method to do the experiment in ORL, Yale, FERET, and FLW databases. The experimental results are showed in Table 1.

From Table 1, we find that the COVNFR method has higher classification error rates than the original face recognition methods. The reasons why the COVNFR method cannot achieve a satisfying result are the following: if we use the virtual training samples only to do face recognition, all of the classification rates in Algorithm 1 are higher than 90% which means the virtual training sample cannot inherit all of the features. However, in the condition of limited training sample, only original training samples cannot conclude comprehensive features, while the virtual training samples inherit some features from the original training samples. Therefore, the use of weight combination to combine the original training samples and their virtual training samples not only adds feature information, but also regulates the effect of each part of training samples.

Our scheme is based on a weighted fusion of the original training samples, the mirror virtual faces, and the symmetrical virtual faces which can be applied to lots of face recognition methods. In this paper, we conduct contrast experiments on some representative RBC methods, which include the collaborative representation classification (CRC) method [24], an improvement to the nearest neighbor classifier (INNC) method [25], a simple and fast representation based face recognition (SFRFR) method [26], and a two-phase test sample sparse representation (TPTSR) method [27]. From Tables 2, 3, 4, and 5, we compare some RBCs with different ways to process training samples. In Algorithm 1, “CRC-original” represents the original CRC method; “CRC-mirror face” represents integrating the original face image and its mirror sample to perform CRC method; “CRC-symmetrical face” represents integrating the original face image and its symmetrical sample to perform CRC method; “CRC-the proposed scheme” represents integrating the original face image and its fused virtual sample to perform CRC method. As can be seen from these four tables, the proposed scheme performs well when compared to all the other methods. For the LFW database, there are more complex variations in the face sample than other databases. However, the proposed method still achieves a satisfactory performance.

4.3. The Weighted Fusion

In this paper, the proposed method uses original training sample to generate mirror training sample and left symmetrical virtual training sample and right symmetrical virtual training sample. Then use the original training samples, the mirror training samples, and the left and right symmetrical virtual training samples to perform representation based method, respectively, and the corresponding scores of test sample with respect to the th class can be denoted by , , , and . Combine all scores to conduct weighted score level fusion. For test samples, we use to calculate the ultimate score with respect to the test sample to in the th class. , , , and represent the weights. Let and . Table 6 shows that different combinations of weights have different classification accuracies. However, different data sets may have very different optimal combinations of weights. Therefore, it is important to select a proper group of weights.

To get the optimal classification accuracy, we should select the optimal combination of weights in each database or even in each number of training samples. In this paper, we use cross validation to choose the best combination of weights; some recognition rates have a promotion. The cross validation is more reasonable for the proposed method because different conditions will have a different combination of weights. The parameters and were selected from sets and , respectively. If , we let . Finally, we select the best combination from each combination to do face recognition. The cross validation method was also used to select the optimal parameters for RBC-mirror face, RBC-symmetrical face, and so on. We compare our method RBC-the proposed scheme with RBC, RBC-mirror face, and RBC-symmetrical face under the same conditions. In this paper, the RBC method includes CRC, INNC, SFRFR, and TPTSR. In particular, all the methods which are used to do experimental contrast should have the same training samples and test samples, as well as the same process of preprocessing.

5. Conclusion

In this paper, the proposed method integrates the original face image and its two kinds of virtual samples to perform face recognition. The scheme first exploits the original training samples to generate the mirror virtual training samples and symmetrical training samples. Then it uses the original, mirror, and symmetrical virtual training samples to represent test sample by a linear combination of the training samples. The recognition rate can be improved by adjusting the weights of original and virtual training samples based on real situations. Therefore, the proposed scheme can enhance the robustness and improve the recognition rates. The scheme performs well to overcome the problem of insufficiency of training samples. The experimental results show that the proposed method can outperform the original RBCs, the mirror virtual sample augment RBC methods, and the symmetrical virtual sample augment RBC methods.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.