Abstract

In recent years, sparse representation based classification (SRC) has emerged as a popular technique in face recognition. Traditional SRC focuses on the role of the -norm but ignores the impact of collaborative representation (CR), which employs all the training examples over all the classes to represent a test sample. Due to issues like expression, illumination, pose, and small sample size, face recognition still remains as a challenging problem. In this paper, we proposed a patch based collaborative representation method for face recognition via Gabor feature and measurement matrix. Using patch based collaborative representation, this method can solve the problem of the lack of accuracy for the linear representation of the small sample size. Compared with holistic features, the multiscale and multidirection Gabor feature shows more robustness. The usage of measurement matrix can reduce large data volume caused by Gabor feature. The experimental results on several popular face databases including Extended Yale B, CMU_PIE, and LFW indicated that the proposed method is more competitive in robustness and accuracy than conventional SR and CR based methods.

1. Introduction

Face recognition (FR) is one of the most classical and challenging problems in pattern classification, computer vision, and machine learning [1]. Although face recognition technology has made a series of achievements, it still confronts many challenges caused by the variations of illumination, pose, facial expression, and noise in real-world [2, 3]. In real applications, the small sample size problem of FR is also a more difficult issue due to the limitations in availability of training samples.

In terms of classification schemes, several widespread pattern classification methods are used in FR. Generally, there are two types of pattern classification methods [4, 5]: parametric methods and nonparametric methods. Parametric methods such as support vector machine (SVM) [6, 7] center on how to learn the parameters of a hypothesis classification model from the training samples and then use them to identify the class labels of test samples. In contrast, the nonparametric methods, such as nearest neighbor (NN) [8] and nearest subspace (NS) [9], use the training samples directly to identify the class labels of test samples. Recent works have revealed an advantage by the nonparametric methods over the parametric methods [4, 10, 11]. The distance based classifiers are widely used in nonparametric methods for FR [11], such as the nearest subspace classifier (NSC) [12]. A key issue in distance based nonparametric classifiers is how to represent the test sample [4]. Recently, Wright et al. pioneered by using the SRC for robust FR [13]. First the test sample was sparsely coded by the training samples; then the class labels of test samples were identified by choosing which class yields the smallest coding error. Although sparse representation related methods [13, 14] achieved a great success in FR, those methods focus on the role of the -norm while ignoring the role of CR [15], which uses all classes of training samples to represent the target test sample. In this study [16], Zhu et al. argued that both SRC and collaborative representation based classifier (CRC) suffer serious performance degradation when the training sample size is very small, because the test sample cannot be well represented. In order to solve the small sample size problem, they proposed to conduct CRC on the patch and named it the patch based CRC (PCRC).

The PCRC and some related works [17, 18] have demonstrated their effects on small sample size problem of FR; however, some key issues remain to be further optimized. On one hand, all the PCRC related works used the original face feature, but the original feature cannot effectively handle the variations of illumination, pose, facial expression, and noise [19]. On the other hand, the data redundancy existing in these methods leads to poor performance in classification accuracy and computational cost. The face feature problem has been noticed by some recent works in which an efficient and effective image representation has been proposed by using local and holistic features. The Eigenface [20, 21], Randomface [22], and Fisherface [21] are all classical holistic features [23], but some other works argued that those holistic features can be easily affected by variables such as illumination, pose, facial expression, and noise. Therefore, they introduced some local features such as LBP [24] and Gabor filter [25, 26]. Gabor filter has been successfully and widely used in FR [26, 27]. Gabor feature could effectively extract the face local features at multiple scales and multiple directions; however, this may lead to a sharp rise in data volumes [28]. The key to solving the large data volume problem is dimensionality reduction. Numerous dimensionality reduction methods have been put forward to find projections that better separate the classes in low-dimensional spaces, among which linear subspace analysis method has received more and more attention owing to its good properties, including principal component analysis (PCA) [28], linear discriminant analysis (LDA) [29], and independent component correlation algorithm (ICA) [30]. A lot of works showed that PCA has the optimal performance in FR [3133].

In this paper, we first attempted to alleviate the influence of unreliability environment on small sample size in FR; therefore, we proposed to use Gabor feature and applied it to PCRC. Then to improve the computational efficiency of GPCRC, we proposed to use PCA for dimension reduction and then use the measurement matrix, including Random Gaussian matrices [34], Toplitz and cyclic matrices [35], and Deterministic sparse Toeplitz matrices () [36], to reduce the dimension of the transformed signal and used the low-dimensional data to accurately represent the face. The experimental results showed that the GPCRC and its improved methods are effective.

Section 2 briefly reviewed SRC and CRC. Section 3 described the proposed GPCRC and its improved methods. Section 4 illustrated the experiments and the results. And Section 5 concluded the paper.

2.1. Sparse Representation Based Classification

Recently, SRC was first reported by Wright et al. [13] for robust FR. In SRC, let denote the -th face dataset, and each column of is a sample of the face of the -th individual. Assuming there are classes of face samples, let . When identifying a target face test sample, is used for coding, , and the coefficient vector is the encoding of the -th individual sample. If is from the -th class, then is the best reservation, which means that a large number of coefficients are close to zero in (); only remains intact. Thus, the classification (ID) of the target face test sample can be decoded by the sparse nonzero coefficient in .

The SRC methods [13] are summarized as follows.

Step 1. Given -class face training sample and test sample .

Step 2 (dimension adjustment). , are projected onto the corresponding low-dimensional feature space , using the traditional dimensionality (PCA) reduction technique.

Step 3 (-norm). Obtain the normalized column and , respectively.

Step 4. Solve the -minimization problem.
satisfies or .

Step 5. Compute the residuals to identify the following:
, , .

2.2. Collaborative Representation Based Classification

Zhang et al. [15] argued that it is the CR but not the -norm sparsity that makes SRC powerful for face classification. Collaborative representation uses all classes (individuals) of training samples to label the target test samples. In order to reduce the complexity of face detection by -coordinate, a regularized least-squares method is proposed by Zhang et al., which iswhere is a regularization parameter. Equation (1) has dual roles, first of which is stabilizing the least-squares method, and, secondly, it proposes a “sparsity" which is much weaker than -norm to solve . The CR for the regularized least-squares method of (1) can be solved as follows:Let . Obviously, since and have little relevance, they can be precalculated as a projection matrix. When a target test sample comes in to be identified, is projected onto by , thus making CR very fast. The classification by is very similar to the classification by in the SRC method. In addition to representing the classification residuals , is a coefficient vector associated with class ; the -norm “sparsity” also contains abundant information for classification. The CRC method [15] is summarized as follows.

Step 1. Give classes of face training sample and test sample .

Step 2 (reduce the dimension). Use PCA to reduce and to low-dimensional feature space and obtain and .

Step 3 (normalization). Normalize the columns of and using the unit -norm.

Step 4. Encode on :
,  .

Step 5. Compute the regularization for identification:
;  ,  .

2.3. Patch Based Collaborative Representation

In the equation of sparse representation and collaborative representation, it can be seen that if the linear system determined by the training dictionary is underdetermined, the linear representation of the target test sample over can be very accurate, but in reality available samples in each target are limited; the sparse representation and the cooperative representation method may fail because the linear representation of the target test sample may not be accurate enough. In order to alleviate this problem, Zhu et al. proposed a PCRC method for FR [16], as shown in Figure 1, the target face test sample is divided into a set of overlapping face image patches according to the patch size. Each of the divided face image patches is collaboratively represented on the local dictionary at the corresponding position of the patch extracted from . In this case, since the linear system determined by the local dictionary is often underdetermined, the patch based representation is more accurate than the overall face image representation.

3. Patch Based Collaborative Representation Using Gabor Feature and Measurement Matrix for Face Recognition

3.1. Gabor Feature

Gabor feature has been widely used in FR because of its robustness in illumination, expression, and pose compared to holistic feature. Yang and Zhang have applied multiscale and multidirectional Gabor dictionary to the SRC for FR [19], which further improves the robustness of the algorithm. Inspired by previous works, in this paper, we integrate Gabor feature into the PCRC framework to improve its robustness.

A Gabor filter with multidirection and multiscale is defined as follows [25, 26]:where the coordinates of the pixel are , is the maximum frequency, and the interval factor of the kernel function distance is denoted as . The bandwidth of the filter is determined by . The convolution of the target image Img and the wavelet kernel is expressed aswhere () is the amplitude of Gabor, is the phase of Gabor, and the local energy change in the image is expressed by amplitude information. Because the Gabor phase changes periodically with the space position and the amplitude is relatively smooth and stable [19, 25, 26], only the magnitude of Gabor was used in this paper, such as Figure 2.

3.2. Measurement Matrix

Unfortunately, although Gabor feature can be used to enhance the robustness of face image representation, it brings higher dimensions to the training sets than holistic feature does. In other words, the computation cost and computation time are increased. In order to solve the problem caused by higher dimension of the training sets, a further dimension reduction is necessary. We proposed to use PCA [3133] for our method. The steps of PCA are as follows: assuming sample images , . Firstly, normalize each sample (subtract the mean, and then divide the variance), convert vector , which accord with normal distribution . Secondly, compute the eigenvectors of covariance matrix , : , is the eigenvector corresponding to the eigenvalue . Thirdly, the eigenvector is sorted according to the size of the eigenvalues; the first eigenvectors are extracted to form a linear transformation matrix . We can use to reduce dimension. But, if , the dimension of would be very large. In order to solve this problem, singular value decomposition is usually used. The eigenvalues and eigenvectors of are obtained by calculating the eigenvalues and eigenvectors of (the first ): , .

In summary, we found that PCA and its related algorithms have two obvious shortcomings [37, 38]. Firstly, the leading eigenvectors encode mostly illumination and expression, rather than discriminating information. Secondly, in the actual calculation, the amount of calculation is very large and will fail in small samples.

Inspired by the method of using random face for feature extraction illustrated in this literature [38], we used the Random Gaussian matrices () as a measurement matrix to measure face images. The measurement matrix is used to measure the redundant dictionary to obtain , where . In Figure 3(a), for any test image , measurements were obtained by . In essence, utilizing the measurement matrix to reduce the dimension of the image is different from the sparse representation theory. The dimension of the measurements was measured by the measurement matrix and was not limited by the number of training samples.

However, some literatures [35, 39] suggested that the Random Gaussian matrices are uncertain and limit its practical application. And Toplitz and cyclic matrices were proposed for signal reconstruction. The Toplitz and cyclic matrices rotate the row vectors to generate all matrices. Usually, the value of the vector in Toplitz and cyclic matrices is , and each element is independent of the Bernoulli distribution. Therefore, it is easy to implement hardware in practical application. Based on the above analysis, we further used Toplitz and cyclic matrices and their improved method (namely, the Deterministic sparse Toeplitz matrices [36]) for our method. Some of the relevant measurement matrices used in this paper will be described in detail. The operating mechanism of each measurement matrix is shown in Figure 3.

3.2.1. Random Gaussian Matrices

The format of Random Gaussian matrices is expressed as follows [34, 38]:each element is independently subject to a Gaussian distribution whose mean is 0 and variance is .

3.2.2. Toplitz and Cyclic Matrices

The concrete form of Toplitz and cyclic matrices [35, 39] is presented below:Equation (6) is the Toplitz matrices; the main diagonal are constants. If , the additional condition of (6) is satisfied; then it becomes a cyclic matrices, and its element follows a certain probability distribution .

3.2.3. Deterministic Sparse Toeplitz Matrices

The construction of Deterministic sparse Toeplitz matrices [36] is based on the Toplitz and cyclic matrices, which is illustrated in this paper by the example of the random spacing Toplitz matrices with an interval of . The independent elements in the first row and the first column of (6) constitute the vector :Conducting random sparse spacing to (7), one can see that contains all the independent elements in (8). Then assignment operations are performed on , where the elements (, is the indexes randomly chosen from the index sequence ) obey the independent and identically distributed (i.i.d.) Random Gaussian distribution, while the other elements are 0. Finally, we obtained the Deterministic sparse Toeplitz matrices according to the characteristics of construction of the Toplitz and cyclic matrices.

3.3. The Proposed Face Recognition Approach

Although the PCRC can indeed solve the problem of small sample size, this method is still based on the original feature of the patch, and the robustness and accuracy are yet to be improved. Based on the above analysis, Gabor feature and measurement matrix are infused into the PCRC for FR, which not only solves the problem of small sample size but also enhances the robustness and efficiency of the method.

Our proposed method for FR is summarized as follows.

Step 1 (input). Face training sample and test sample .

Step 2 (patch). Divide the face training samples into patches and divide the test sample into patches, where and are the position corresponding to the sample patches:

Step 3 (extract and measure features). (1) Extract Gabor feature from patches of training samples and test sample, respectively, to obtain(2) Carry out the measurement and reduce the dimension of each patch () using the measurement matrix , to obtain measured signal:

Step 4. Collaborative representation of each patch measurement signal of the test sample over the training sample measurement signal local dictionary :Equation (12) can be easily derived as follows:where is unit matrix.

Step 5. Recognition results of patch are

Step 6. Use voting to obtain the final recognition result.

4. Experimental Analysis

The proposed approaches, GPCRC and its improved methods, were evaluated in three publicly available databases: Extended Yale B [4042], CMU_PIE [43, 44], and LFW [45, 46]. To show the effectiveness of the GPCRC, we compared our methods with four classical methods and their improved methods, namely, SRC [13], CRC [15], SSRC [14], and PCRC [16]. Then, we tested our GPCRC using several measurement matrix, namely, Random Gaussian matrices, Toeplitz, and cyclic matrices and Deterministic sparse Toeplitz matrices () to reduce the dimension of the transformed signal and compare their performance with the conventional PCA methods. In all the following experiments, we ran the MATLAB 2015b on a typical Intel(R) Core(TM) i5-3470 CPU 3.20 GHz, Windows7 x64 PC. In our implementation of Gabor filters, the parameters were set as , , , , for all the experiments below. All the programs were run 20 times on each database and the mean and variance of each method were reported. In order to report the best result of each method, the parameter used in SRC, SSRC, CRC, and PCRC was set to 0.001 [13], 0.001 [14], 0.005 [15], and 0.001 [16, 17], respectively. The patch sizes were used in PCRC and GPCRC and its improved methods are [16, 17] and , respectively.

4.1. Extended Yale B

To test the robustness of the proposed method on illumination, we used the classic Extended Yale B database [4042], because faces from Extended Yale B database were acquired in different illumination conditions. The Extended Yale B database contains 38 human subjects under 9 poses and 64 illumination conditions. For obvious comparison, all frontal-face images marked with P00 were used in our experiment, and the face images size was downsampled to . 10, 20 faces from each individual were selected randomly as training sample; 30 others from each individual were selected as test sample. Figure 4 shows some P00 marked samples from the Extended Yale B database. The experimental results of each method are shown in Table 1. In Figure 5 we compared the recognition rates of various methods with the space dimensions 32, 64, 128, 256, 512, and 1024 for each patch in GPCRC, and those numbers correspond to downsampling ratios of 1/320, 1/160, 1/80, 1/40, 1/20, and 1/10 respectively. In Table 1, it can be clearly seen that GPCRC achieves the highest recognition rate performance in all experiments. In Figure 6, we can see the performance of various dimensionality reduction methods for GPCRC. 10 samples of each individual in Figure 5(a) were used as training samples. When the feature dimension is low (≤256), the best performance of PCA is 92.80% (256 dimension), which is significantly higher than that of the measurement matrix at 256 dimension: Random Gaussian matrices are 88.77%, Toeplitz and cyclic matrices are 85.17%, Deterministic sparse Toeplitz matrices () are 86.67%, Deterministic sparse Toeplitz matrices () are 85.69%, and Deterministic sparse Toeplitz matrices () are 86.05%, but none of these has been as good as the performance of the original dimension (10240 dimension). So the reason why the performance of PCA is significantly better than the measurement matrix at low dimension can be analyzed through their operation mechanism: the PCA transforms the original data into a set of linearly independent representations of each dimension in a linear transformation, which can extract the principal component of the data. These principal components can represent image signals more accurately [3133]. The theory of compressed sensing [3436] states that when the signal is sparse or compressible in a transform domain, the measurement matrix which is noncoherent with the transform matrix can be used to project transform coefficients to the low-dimensional vector, and this projection can maintain the information for signal reconstruction. The compressed sensing technology can achieve the reconstruction with high accuracy or high probability using small number of projection data. In the case of extremely low dimension, the reconstruction information is insufficient, and the original signal cannot be reconstructed accurately. So the recognition rate is not high. When the dimension is increased and the reconstructed information can more accurately reconstruct the original signal, the recognition rate will improve. When the feature dimension is higher (≥512), PCA cannot work because of the sample size. At 512 dimension, the measurement matrix (Deterministic sparse Toeplitz matrices () are 95.21%; Deterministic sparse Toeplitz matrices () are 94.91%) has reached the performance of the original dimension (10240 dimension), and the dimension is only 1/20 of the original dimension. At the 1/10 of the original dimension, the performance of the measurement matrix has basically achieved the performance of the original dimension: Random Gaussian matrices are 95.60%, Deterministic sparse Toeplitz matrices () are 94.64%, Deterministic sparse Toeplitz matrices () are 94.56%, and Deterministic sparse Toeplitz matrices () are 94.98%. In Figure 6(b), 20 samples of each individual were used as training samples. When the feature dimension ≤ 512, the best performance is PCA which is 97.50% (512 dimension). At 1024 dimension, PCA cannot work and the performance of the measurement matrix has basically achieved the performance of the original dimension: Random Gaussian matrices are 98.80%, Deterministic sparse Toeplitz matrices () are 98.68%, Deterministic sparse Toeplitz matrices () are 99.01%, and Deterministic sparse Toeplitz matrices () are 98.77%. In Table 2, we fixed the dimension of each reduction methods at 512 dimension in the case of 20 training samples. In general, the complexity of PCA is [47], where is the number of rows of the covariance matrix. The example we provided consists of 38 individuals, and each individual contains 20 samples. Since the number of Gabor features (10240) for each patch far outweighs the total number of training samples (760), the number of rows of the covariance matrix is determined by the total number of training samples, and the complexity of PCA is approximately . However, in the proposed measurement matrix algorithm, the measurement matrix is formed by a certain deterministic distribution (e.g., Gaussian distribution) and structure, according to the required dimension and length of the signal, so that the measurement matrix demonstrates low complexity [36, 48, 49]. The actual speed of each dimension reduction methods for one patch of all samples (including all training samples and all testing samples) is listed. Based on the above analysis, we can see that the PCA method has the most outstanding performance, but it is limited by the sample size and it is time-consuming. The measurement matrix does not perform well in low dimension, but is not limited by the sample size. When the dimension reaches a certain value, its performance was the same as that of the original dimension. Thus, the dimension of the data can be reduced without loss of recognition rate.

4.2. CMU_PIE

In order to further test the robustness in illumination, pose, and expression, we utilized a currently popular database CMU_PIE [43, 44], a database consisting of 41,368 images of 68 people; for each person, there are 13 different poses, 43 different illumination conditions, and 4 different expressions. To testify the advantage of our method in small sample size condition, we randomly selected 2, 3, 4, and 5 face samples from each individual as training sample and other 40 face samples from each individual as test samples. The face images size is downsampled to . Figure 6 shows some marked samples of the CMU_PIE database. The experimental results of each method are shown in Table 3; GPCRC has the best results. In Figure 7 we compared the recognition rates of various dimension reduction methods for GPCRC. Similarly, we can see that PCA cannot achieve the best recognition rate because of the sample size limit, and at 1024 dimension the performance of the measurement matrix has basically achieved the performance of the original dimension. The recognition rate of each measurement matrix is shown below:(i)Random Gaussian matrices are (2 training samples), (3 training samples), (4 training samples), and (5 training samples).(ii)Toeplitz and cyclic matrices are (2 training samples), 58.30% (3 training samples), 65.44% (4 training samples), and 73.12% (5 training samples).(iii)Deterministic sparse Toeplitz matrices () are 47.63% (2 training samples), (3 training samples), 67.03% (4 training samples), and 71.78% (5 training samples).(iv)Deterministic sparse Toeplitz matrices () are (2 training samples), (3 training samples), (4 training samples), and (5 training samples).(v)Deterministic sparse Toeplitz matrices () are 46.94% (2 training samples), (3 training samples), 66.25% (4 training samples), and 71.69% (5 training samples).

Through the above analysis, we further validated the feasibility of the measurement matrix in the premise of ensuring the recognition rate.

4.3. LFW

In practice, the number of training samples is very limited, and only one or two can be obtained from individual identity documents. To simulate the actual situation, we select the LFW database. The LFW database includes frontal images of 5,749 different subjects in unconstrained environment [45]. LFW-a is a version of LFW after alignment using commercial face alignment software [46]. We chose 158 subjects from LFW-a; each subject contains no less than ten samples. For each subject, we randomly choose 1 to 2 samples from each individual for training and another 5 samples from each individual for testing. The face images size is downsampled to . Figure 8 shows some marked samples of the LFW database. The experimental results of each method are shown in Table 4; it can be clearly seen that the GPCRC achieves the highest recognition rate performance in all experiments with the training sample size from 1 to 2. In Figure 9, we can also see the advantages of the measurement matrix in small sample size. At 1024 dimension, the performance of the measurement matrix in the single training sample is as follows: Random Gaussian matrices are , Toeplitz and cyclic matrices are 24.68%, Deterministic sparse Toeplitz matrices () are 24.43%, Deterministic sparse Toeplitz matrices () are , and Deterministic sparse Toeplitz matrices () are 25.28%, and the performances of the measurement matrix in the two training samples are as follows: Random Gaussian matrices are , Toeplitz and cyclic matrices are 37.72%, Deterministic sparse Toeplitz matrices () are 37.56%, Deterministic sparse Toeplitz matrices () are , and Deterministic sparse Toeplitz matrices () are 38.25%.

5. Conclusion

In order to alleviate the influence of unreliability environment on small sample size in FR, in this paper we applied improved method for Gabor local features to PCRC; we proposed to use the measurement matrices to reduce the dimension of the transformed signal. Several important observations can be summarized as follows. (1) The proposed GPCPC method can effectively deal with the influence of unreliability environment in small sample FR. (2) The measurement matrix proposed to deal with the high dimension in GPCRC method can effectively improve the computational efficiency and computational speed, and they were able to overcome the limitations of the PCA method.

Conflicts of Interest

None of the authors have conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (no. 61672386), Humanities and Social Sciences Planning Project of Ministry of Education (no. 16YJAZH071), Anhui Provincial Natural Science Foundation of China (no. 1708085MF142), University Natural Science Research Project of Anhui Province (no. KJ2017A259), and Provincial Quality Project of Anhui Province (no. 2016zy131).