Mathematical Methods Applied to Digital Image ProcessingView this Special Issue
A Fast Iterative Pursuit Algorithm in Robust Face Recognition Based on Sparse Representation
A relatively fast pursuit algorithm in face recognition is proposed, compared to existing pursuit algorithms. More stopping rules have been put forward to solve the problem of slow response of OMP, which can fully develop the superiority of pursuit algorithm—avoiding to process useless information in the training dictionary. For the test samples that are affected by partial occlusion, corruption, and facial disguise, recognition rates of most algorithms fall rapidly. The robust version of this algorithm can identify these samples automatically and process them accordingly. The recognition rates on ORL database, Yale database, and FERET database are 95.5%, 93.87%, and 92.29%, respectively. The recognition performance under various levels of occlusion and corruption is also experimentally proved to be significantly enhanced.
Face recognition is one of the most active and challenging subject in computer vision and artificial intelligence, which has a wide range of applications such as personnel sign system, image search engine, and convicts detecting system. It has been experimentally proved that various sparse representation methods perform well in face recognition . Given sufficient face images of object members and a test image , which belongs to one of the object classes, the problem of face recognition can be transformed into a classification issue [1, 2]. The basic idea of this kind of algorithms is to find a sparsest solution to represent for classification [3, 4].
But current algorithms based on sparse representation also have drawbacks. On one hand, convergence speeds of these methods are slow to some extent. On the other hand, when the test image is under large percent of random corruption or contiguous occlusion, the sparse solution becomes denser so that it is hard for the system to find the right class belongs to [5–7].
In this paper, a fast pursuit algorithm is proposed to solve the problem mentioned above. A common problem of pursuit algorithms is that the computational speed is quite slow. Aiming at face recognition, we improved the greedy algorithm, making it much faster than kindred ones. By focusing on crucial information to classification issue—sparsity of errors, this algorithm enhances the robustness in the problem of face misalignment and large percent of occlusion. The basic frame of this paper is as follows: to begin with, we briefly review existing techniques for face recognition, including its advantages and brittleness to occlusion. Then, an improved algorithm is proposed and its feasibility and effectiveness will be demonstrated. Finally, we present experiments on ORL, Yale, and FERET face databases, as well as on a face database collected by ourselves, to verify the modified algorithm.
2. A Review of Pursuit Algorithms to Solve Face Recognition Issue
Pursuit algorithms are for solving the problem where represents the test sample, is the training dictionary of the known classes, and is the sparse solution [8–10]. The core idea of these greedy algorithms is to update support and provisional solution iteratively in order to reduce the residual to a minimum . Suppose the matrix is composed of training face images of the subjects, where an image is represented by a column of , where represents the th class of and represents th element in . And the given test sample , well aligned, can be seen as one of the linear combination of these -dimensional column vectors: where is the corresponding coefficient of . The task of these algorithms is to find the sparsest solution so that class of the largest coefficient can be considered of the same class as .
Now, let us briefly review the procedure of OMP and MP: firstly, find the th column which can minimize , add this column to support, and compute to make the residual minimized. Then, continue to update support and provisional solution iteratively until the residual is less thAn a threshold we set before. As to MP, which is similar to OMP, the apparent difference is that the coefficients of original entries remain unchanged, rather than solving a least square for reevaluating all the update support stages.
The greedy strategy expands the support set, initially empty, by one additional column. So the enumeration takes steps if the optimization problem is known to have nonzero members, which seems quite slow in many situations. But to the problem of face recognition, since the test sample belongs to one class in , in another word, correlated with only a few columns in , is a controllable whose value is small. Many iterative-shrinkage algorithms, which “shrink” entry of every column in to update the optimal solution iteratively in order to make the solution become “sparser,” additionally process () classes of useless information every stage. Compared to them, pursuit algorithms can find the most possible class of test image by the foremost iterations. Just because of this extraordinary nature, we hold that the pursuit method is best for classification via sparse representation.
3. A Fast and Robust Pursuit Algorithm
One of the apparent drawbacks of pursuit algorithm is that the computing speed is slow in many situations, since it must continuously update the support and optimal solution until the result is accurate enough. In this section, we propose an improved pursuit algorithm for face recognition, which mainly involved two aspects: (1) general stopping rules to make the greedy algorithm faster in face recognition; (2) stopping rules for well-aligned and noiseless input samples.
Most state-of-the-art algorithms dealing with sparse representation try to find linear combination of matrix , which approximates to input vector . Goal of many kinds of iterative shrinkage algorithms is to “shrink” the sparse solution , in other words, to make the solution become “sparser” ; on the contrary, pursuit strategy is to make the solution from sparse to dense. It is known that the basic idea of pursuit algorithm is to add columns to the support and update provisional solution until the residual between the proposed solution and the input vector is small enough. We can easily imagine that it is hard to represent by the linear combination of columns in accurately when the test sample is occluded or corrupted, as the distance between and the corresponding column in increases, and may be relevant with more columns in .
3.1. General Stopping Rules
Since test image is related with just a few columns in matrix , which belong to the same class, only coefficients of one class are valid to the recognition result. So it is unnecessary and inefficient to recover precisely, which will unavoidable involve many irrelevant classes. Rather than recovering accurately, our goal is to find the right class rapidly in face recognition; therefore more stopping rules are needed to make the algorithm faster.
3.1.1. Maximum Iterating Times
Let us consider the best situation first. If the test image is the same as one column in matrix , indicating that and are parallel, only one iteration which needs flops to find that the maximum is required to identify the class. When the test image is under random corruption or varying level of contiguous occlusion, the errors between and every column of become larger. Then may be correlated with more classes in . But the final recognition result depends on the largest coefficients of sparse solution, which suggest the most possible class of . Hence even under the worst condition, iterating times are enough because the iteration can be seen as an ergodic one to a classification problem where only one class is valid.
3.1.2. Results Resemblance of Two Successive Iterations
As what we have discussed above, our final identification result depends on the largest entries of the sparse solution; therefore we can neglect the details of the solution . The minor change of the solution may affect the accuracy of representation, but this would not influence the classification result. So if is smaller than some predetermined threshold, the iteration process can be stopped.
3.2. Stopping Rules and Processing Aiming at Different Image Samples
Under most conditions, we would not hope that the iteration times reach its upper limit: on one hand, it may take quite a long time; on the other, since the solution becomes denser when iteration times increase, it is harder to identify the right class. And the basic stopping rule of OMP, where and is the error threshold, is often hard to obtain as test image may contain various noises or be disguised. Hence, more stopping rules should be devised to reduce iteration times. Now, we discuss this issue in two contexts—well-aligned, noiseless input samples and corrupted or occluded images.
3.2.1. Maximum Coefficient of the Sparse Solution
Suppose the input image contains little noise and is well-aligned, it is easy for the system to classify this sample. What we should do is to raise the identification speed. The coefficients of sparse solution reflects the degree of resemblance between test sample and one column in —. If the maximum coefficient is large enough, in other words, approximating to 1, we can consider that belongs to the class of , since and have already been sufficiently similar and we do not need to represent the minor errors in the linear combination of other columns as the small coefficients are senseless to classification and this process will surely increase the iteration times.
3.2.2. Sparse Level of the Sparse Solution
To corrupted or disguised test samples, however, large coefficients of the sparse solution are hard to reach owning to relatively larger range of noise. But the special characteristic of recognition or identification ensures that the recognition system can find the class of before the representation error reaches the upper bound we have set. One of the key points is the sparse level of the sparse solution. We define the sparse level as the ratio of two largest coefficients which are in different classes, where is the largest coefficient, which belongs to class and is the second largest coefficient whose class is different from class .
3.2.3. The Noise Level
We define as the maximal inner product between columns in and test sample , which reflects the maximum correlation between and face images in training dictionary . If is smaller than some predefined threshold, it means that similarity between and each column of is less than that threshold, which indicates that the input sample is not a valid one or with much additional noise. Suppose the test sample is a valid one, it is inefficient and laborsome for the system to process directly since our goal is to try to find the one resemble in the dictionary, but itself is not precise enough. Both the random noise and the part of occlusion can be regarded as noise in the test image accordantly. So implies the noise level of the input image.
3.2.4. Stopping Rules for Well-Aligned and Input Samples with Little Noise
The entries of sparse solution reflect how much their respective contribution to the solution in respect that includes all coefficients of the combination of columns in . Matching pursuit algorithm update support and solution based on minimizing the errors where and . For input samples with noise or misaligned, obviously, the coefficients of decrease. So the largest entry threshold cannot be reached. But if the corrupted percent of the input image is not too large, the corresponding level of and , which is in the same class as , is relatively large compared to the others. This means that the solution is still quite sparse after a few iterations and the class of largest entry can be considered as the identification result. Using the concept of sparse level which we have defined before, the iteration processing can be stopped when a preset sparse level threshold reaches after some iteration times.
Proof. It can be proved that for a system of linear equations , if a solution exists obeying OMP run with threshold parameter , is guaranteed to find it exactly. This theorem is only valid when test sample can be represented by the linear combination of exactly, Suppose that, is equal or greater than some value close to 1, and after some iterations, the sparse level of the solution is still greater than , and we only reserve the greater entries; hence It can be proved that if a sparse vector satisfies the sparsity constraint and gives a representation of to within error tolerance , every solution must obey Note that as we reconstruct as one image in , the smallest error between and the recovery image is , so is larger than . Therefore,
4. Processing for Images under Random Corruption and Contiguous Occlusion
To almost all sparse representation algorithms, as based on errors of pixel in the corresponding position, it is brittle for them to cope with samples under large level of occlusion or corruption [7, 16].
4.1. Necessity of Preprocessing of Test Samples and Training Dictionary
When the test sample is occluded seriously, the error between and each column of increases, which means each element of vector rises, where . Hence, both the largest coefficient and the sparse level after some fix iteration times can hardly reach their respective predetermined thresholds. On condition that we employ and directly, as shown in Figure 1, the solution we get can be dense, which will surely weaken the superiority of sparse representation. Therefore it is necessary to process input sample and training dictionary firstly.
It has been proposed that this kind of issue can be solved by block partitioning, which means to partition the image into blocks and process each block independently. The results for individual blocks are then aggregated. This method is only valid to images under contiguous occlusion. And the processing takes quite a long time since it transforms the classification to several subproblems. Let us think about how human brains handle the face images disguised by scarf or glasses. We make out the scarf or glasses in the image and then neglect these parts which are unrelated to face. And our judgment of the person in the image depends on the other parts which we regard as face. Imitating the method human brain dealing with this kind of image, we can propose to preprocess the input image and training matrix before applying them to the OMP algorithm. No matter the random corruption or the contiguous occlusion part can be regarded as noise in the image, we can uniformly reject this part and only pay attention to the other parts.
4.2. How to Identify a Corrupted or Occluded Test Sample Automatically
Firstly, the system should automatically identify whether the test image is the one with partial occlusion or corruption. The errors between the occlusion and corruption image which belong to a class in the training dictionary or an invalid one are both quite large. But the error of an occluded one has some distinguishing features—the error between pixels focuses on only a few pixels; the others’ error is quite small. So if the error vector contained some elements which are closed to 0 and the variance of the error vector is large enough, , we can regard the test sample as a partial occlusion or corruption one, as shown in Figure 2.
4.3. Images Preprocessing to Corrupted or Occluded Samples
Then let us discuss how to use extract the “clean” pixels and remove the invalid ones. To begin with, we can define an error threshold between a particular pixel in test sample and respective pixels of elements in . If the minimal error between the pixel in and respective pixel in one class of is larger than , We can regard this pixel as an invalid one and remove it from as well as the respective one from . Then we get the new training matrix and test vector , whose “noise pixels” have been filtered. So the problem has been transformed into finding a sparse solution subject to And the improved OMP algorithm can just be used in this new generated equation. One example is given in Figure 3. This will surely enhance the recognition rate of the system. Since this method does not involve any constraint conditions regarding noise distribution, the issue of random corruption and contiguous occlusion can be solved simultaneously. “Shrinking” of and also makes the computing speed faster.
5. Experimental Results
In this section, we apply our algorithm on ORL database for face recognition. We first test the recognition rate and elapsed time of the algorithm, compared to the state-of-art algorithm to find the sparse solution. We then examine the identifying performance to corruption and occlusion. Finally, we simulate the real situation and check up the robustness under various disguises.
5.1. Performance on ORL Database and Yale Database
ORL database consists of 400 frontal images for 40 individuals. Samples in this database include facial variation like various expressions and postures, which can be obstacles or challenges for the system to find the true class of the test images.
We test the face recognition rate and elapsed time of the algorithm by 10-fold cross-validation. In other words, for each test, the training dictionary is consisted of 40 classes of 360 images (9 samples per class) and the remaining 40 images are test samples. All these images have been simply downsampled, without any particular feature extraction. We compared the result with original OMP, together with some state-of-art algorithms aiming at sparse solution—primal augmented Lagrangian method (PALM), dual augmented lagrangian method (DALM), fast iterative soft-thresholding algorithm (FISTA), and truncated Newton interior-point Method (TNIPM). All results of each test and the average are in Table 1.
As shown in Table 1, the recognition rate of improved OMP is the highest. Meanwhile, elapsed time per sample outperforms others.
We also compared the algorithm with others in the Yale database. Yale database contains 2432 frontal images for 38 individuals, which were captured under various laboratory-controlled lighting conditions. 8-fold cross-validation has been taken in this database, with 56 images of each class as training samples, and the other 304 images as test samples. Just as the tests on ORL database, the images have only been downsampled to construct the training dictionary to make the problem become undetermined. The results of the 8 tests and average values are in Table 2.
In Table 2, although the recognition rate of FISTA is higher than our algorithm, its run time is almost tenfold compared with the improved OMP.
Similar experiments were performed on FERET database. Compared with the other face databases we mentioned above, this database includes more variations like postures and facial expressions. We ran 7-fold cross-validation, with 150 classes (6 samples per class) in this database as training database, the other 150 samples as test samples in each test. Data in Table 3 reflects the comparison of these algorithms.
In Table 3, we can see that the average recognition rate of improved OMP is the highest in tests of FERET database and its run time is much shorter.
We get these data in MATLAB on a typical 2.40 GHz PC with quad-core processor. To be fair, both the training dictionary and the test samples are the same to all algorithms. And all identified results depend on the class of the largest coefficient in the sparse solution. The improved OMP algorithm greatly reduces the run time, and its recognition rates are relatively well.
5.2. Recognition despite Occlusion and Corruption
5.2.1. Recognition under Random Corruption
We first test the robust version of our algorithm for samples under random corruption. We add salt and pepper noise of different intensities to samples in the database to generate the test samples. Figure 4 plots the recognition results of the robust version of the OMP and applying OMP directly to the test samples.
From Figure 4, when the face imaged is 90% corrupted by the noise, although we can hardly identify it as a face image, the algorithm still reconstructed the right image. The right line graph in Figure 4 indicates that this method performs quite well under the condition of large percent of random corruption.
5.2.2. Recognition under Continuous Occlusion
We add an irrelevant image with different sizes to the samples in the database and treat them as test image to test the robustness under continuous occlusion. Figure 5 indicates the performance.
We can see in the right of Figure 5 that the improved OMP significantly outperforms the original one, which shows its robustness of occlusion.
5.3. Recognition despite Disguise
Face photos taken in a real world scenario often contained glasses and scarf, which makes it harder for the system to identify the right person. Now let us examine the performance of the algorithm under these kinds of situation. Our test images are also from ORL database and we add glasses and scarf pictures to the samples.
Figure 6 shows that the algorithm also has a quite well performance on the real situation—disguised test samples. We constructed a disguised test sample database based on ORL database of 40 samples with sunglasses and scarf. And the recognition rate reached 95%.
In this paper, we proposed a fast and robust algorithm based on OMP algorithm. We first discussed the disadvantage of OMP algorithm to solve the face recognition problem. Then an improved method is proposed to make the elapsed time become much shorter to identify a test image. We also tried to enhance robustness to the occluded and corrupted test samples by extracting the “noiseless” pixel and reduce the elements in both the test image and training dictionary, respectively. Finally, we prove this method by experiments on ORL database, Yale database, and FERET database.
One further work is to enhance the robustness in situation under various kinds of misalignment and postures. We may further reduce the constrained condition and apply this method to object recognition.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by National Natural Science Foundation of China (no. 61379010) and Natural Science Basic Research Plan in Shaanxi Province of China (no. 2012JQ1012).
J. Wright, A. Yang, A. Ganesh et al., “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.View at: Google Scholar
Y. Su, S. G. Shan, X. L. Chen et al., “Hierachical ensemble of global and local classifier for face recognition,” IEEE Transactions on Image Processing, vol. 9, no. 2, pp. 273–292, 2009.View at: Google Scholar
A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Fast l1-minimization algorithms and an application in robust face recognition: a review,” in Proceedings of the International Conference on Image Processing, 2010.View at: Google Scholar
J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan, “Sparse representation for computer vision and pattern recognition,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1031–1044, 2010.View at: Google Scholar
A. Martinez, “Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 6, pp. 748–763, 2002.View at: Google Scholar
D. Donolo and M. Elad, “Optimal sparse representation in general(nonorthogonal) dictionaries via minimization,” Proceedings of the National Acadamy of Sciences of the United States of America, vol. 100, no. 5, pp. 2197–2202, 2003.View at: Google Scholar
A. Yang, M. Gastpar, R. Bajcsy, and S. Sastry, “Distributed sensor perception via sparse representation,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1077–1088, 2010.View at: Google Scholar
D. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proceedings of the National Academy of Sciences, vol. 106, no. 45, pp. 18914–18919, 2009.View at: Google Scholar
F. Sanja, D. Skocaj, and A. Leonardis, “Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 3, 2006.View at: Google Scholar