A Gabor-Block-Based Kernel Discriminative Common Vector Approach Using Cosine Kernels for Human Face Recognition
In this paper a nonlinear Gabor Wavelet Transform (GWT) discriminant feature extraction approach for enhanced face recognition is proposed. Firstly, the low-energized blocks from Gabor wavelet transformed images are extracted. Secondly, the nonlinear discriminating features are analyzed and extracted from the selected low-energized blocks by the generalized Kernel Discriminative Common Vector (KDCV) method. The KDCV method is extended to include cosine kernel function in the discriminating method. The KDCV with the cosine kernels is then applied on the extracted low-energized discriminating feature vectors to obtain the real component of a complex quantity for face recognition. In order to derive positive kernel discriminative vectors, we apply only those kernel discriminative eigenvectors that are associated with nonzero eigenvalues. The feasibility of the low-energized Gabor-block-based generalized KDCV method with cosine kernel function models has been successfully tested for classification using the distance measures; and the cosine similarity measure on both frontal and pose-angled face recognition. Experimental results on the FRAV2D and the FERET database demonstrate the effectiveness of this new approach.
Face authentication has gained considerable attention in the near past through the increasing need for access verification systems using several modalities like voice, face image, fingerprints, pin codes, and so forth. Such systems are used for the verification of a user's identity on the Internet, when using automated banking system, or when entering into a secured building, and so on. The Gabor wavelet transformation (GWT) models well the receptive field profiles of the cortical simple cells and also has the properties of multiscale and multidirectional filtering. These properties are in accordance with the characteristics of human vision [1–3]. Further, the discriminant analysis is an effective image feature extraction and recognition technique as they allow the extraction of discriminative features, reduce dimensionality, and consume less computing time [4, 5]. In our previous work , we combined the GWT and Bayesian principal component analysis (PCA) techniques and presented a GWT-Bayesian PCA face recognition method which outperforms some conventional linear discriminating methods. As an extension of linear discriminant technique, the kernel based nonlinear discriminant analysis technique has now been widely applied to the field of pattern recognition. Baudat and Anouar  developed a commonly used generalized discriminant analysis (GDA) method for nonlinear discrimination. Jing et al.  put forward a Kernel Discriminative Common Vectors (KDCVs) method. In this paper we develop blockbased GWT KDCV and propose a block-based low-energized nonlinear GWT discriminant feature extraction for enhanced face recognition. As the high energized blocks of GWT image generally have larger nonlinear discriminability values. Then the nonlinear discriminant features are extracted from the selected low-energized block of GWT image by presenting a new generalized KDCV method is then extended to include cosine kernel model which extracts the nonlinear discriminating features from the selected blocks to get the best recognition result. These features are finally used for classification using three different classifiers. The experimental results demonstrate the effectiveness of this new approach.
In this paper a novel method is proposed based on selecting low-energized blocks of Gabor wavelet responses as feature points, which contain discriminate facial feature information, instead of using predefined graph nodes as in elastic graph matching (EGM) , which reduces representative capability of Gabor wavelets. This corresponds to enhancement of edges for eyes, mouth, nose, which are supposed to be the most important points of a face; hence the algorithm allows these facial features to keep overall face information along with local characteristics.
The remainder of this paper is organized as follows. Section 2 describes the derivation of low-energized blocks from the GWT images. Section 3 details the generalized KDCV method with cosine kernel function for enhanced face recognition. Section 4 shows the performance of the proposed method on the face recognition by applying it on the datasets from the FERET , and FRAV2D  face databases, and by comparing it with some of the previous KDCV methods and we conclude our paper in Section 5.
2. 2D Gabor Wavelets
Gabor wavelets are used in image analysis because of their biological relevance and computational properties [12, 13]. The Gabor transform is suitable for analyzing gradually changing data such as the face, iris, and eyebrow images. The Gabor filter used here has the following general form: where and define the orientation and scale of Gabor kernels, respectively, is the variable in spatial domain, denotes the norm operator, and is the frequency vector which determines the scale and orientation of Gabor kernels, where and , , , where is the spacing factor. Here Gabor wavelets at five different scales, and eight orientations are chosen. The term is subtracted from (1) in order to make the kernel DC-free, thus becoming insensitive to illumination. The magnitude of the convolution outputs is indicated as . The kernels exhibit strong characteristics of spatial locality and orientation selectivity, making them a suitable choice for image feature extraction when one’s goal is to derive local and discriminating features for (face) classification.
2.1. Gabor-Based Feature Representation
The Gabor wavelet representation of an image is the convolution of the image with a family of Gabor kernels as defined in (1). Let be the gray level distribution of an image, the convolution output of image and a Gabor kernel is defined as where , and denotes the convolution operator.
Applying the convolution theorem, convolution outputs are derived from (2) via the Fast Fourier Transform (FFT): Here , denote the forward and inverse discrete Fourier transforms, respectively. The outputs , where , , consist of different local, scale, and orientation features in both real and imaginary parts in the specific locality as described later in Section 2.2. The magnitude of is defined as modulus of , that is, .
2.2. Low-Energized Block Based GWT Feature Extraction
It is to be noted that we considered the magnitude of , but did not use the phase, which is consistent with the application of Gabor representations [14, 15]. As the outputs consist of 40 different local scale and orientation features, the dimensionality of the Gabor transformed image space is very high. So the following technique is applied for the extraction of low-energized discriminability feature vector from the convolution outputs. The method for the extraction of low-energized block based features from the GWT image is explained in Algorithm 1.
Algorithm 1. Consider
Step 1. Find the convolution outputs of the original image with all the Gabor kernels. As the convolution outputs contain complex values, so replace each pixel value of the convolution output by its modulus and the resultant image is termed as , where .
Step 2. Obtain the final single Gabor transformed image , .
Step 3. Compute the overall mean of the final Gabor transformed image as, , where is the size of image.
Step 4. Divide the final Gabor transformed image into windows of size . Thus the total number of windows, .
Step 5. For each window , if minimum , then extract a block of size from , with centre pixel as the minimum . The value of must be odd integer and less than .
Step 6. For each window , if minimum , and there does not exist block of size from as mentioned in Step 5, with centre pixel as the minimum , then create a block of size , by considering the unavailable pixel values as .
Step 7. For each window , if minimum , then create a pseudo block of size with all elements as .
Step 8. Extract feature vector from each block in a systematic order, where contains all elements of the block .
Step 9. Concatenate all the feature vectors to obtain the final feature vector , which is the final extracted low-energized feature vector. This extracted feature vector encompasses the low valued discriminable elements of the Gabor transformed image, and the size of this feature vector is which is much lower in dimension in comparison to the original image (dimension: and the GWT image (dimension: .
Thus this augmented Gabor feature vector encompasses most of the discriminable feature elements of the Gabor wavelet representation set, . The window size is one of the important features of the above algorithm, as it must be chosen small enough to capture most of the important features and large enough to avoid redundancy. Since it is observed that there are some windows each of whose minimum value is not less than the overall mean, so Step 7 is applied in order not to get stuck on a local minimum.
In the experiments we took a window and a block of size and , respectively, to extract the low- energized feature vector. Thus the extracted facial features can be compared locally, instead of using a general structure, allowing us to make a decision from the parts of the face.
3. Generalized Kernel Discriminative Common Vector (KDCV) Method
Sometimes the discriminative common vectors are not distinct in the original sample space. In such cases one can map the original sample space to a higher-dimensional space , where the new discriminative common vectors in the mapped space are distinct from one another. This is because a mapping, , can map two vectors that are linearly dependent in the original sample space onto two vectors that are linearly independent in . As the mapped space can have arbitrarily large, possibly infinite, dimensionality, hence it is reasonable to use the DCV method.
Let represent the matrix whose columns are the transformed training samples in . Here is the number of training classes; the th class contains samples. The within-class scatter matrix , the between-class scatter matrix , and the total scatter matrix in are given by where is the mean of all samples, and is the mean of samples of the th class in . Here is a block-diagonal matrix and each is a matrix with all its elements equal to ; is a block-diagonal matrix and each is a vector with all its elements equal to ; is a block-diagonal matrix and each is a vector with the entries ; is a matrix with entries . The aim of the DCV algorithm is to acquire the optimal projection transform in the null space of : The approach for computing this optimal projection vector is as follows.
Step 1. Project the training set samples onto the range of through the Kernel PCA.
Step 2. Find vectors that span the null space of .
Step 3. Remove the null space of if it exists.
Step 4. Obtain the final projection matrix , which will then be , where is the diagonal matrix with nonzero eigenvalues, , the associated matrix of normalized eigenvectors, and is the basis for the null space of , here there are at most projection vectors.
Let the common vector be, then each of the feature vectors can be written as , where , and .
Here and represent the common and different parts of separately. It has been proved by Gülmezoğlu et al.  that for all samples of the th class, their common vector parts are same. The common vector can be written as .
Thus, a set of common vectors for is obtained as: Compute the optimal projection transform. Let denote the total scatter matrix of . is composed of the eigenvectors corresponding to the positive eigenvalues of . is designed to satisfy the criteria: is calculated from the different vectors. is composed of the eigenvectors corresponding to the positive eigenvalues of . The optimal projection transform is obtained as Thus for each sample in the kernel space using the generalized nonlinear KDCV method, we construct and then extract the kernel discriminative common and different vector and . Then, , and . Finally,
Thus we obtain a new sample set corresponding to . This sample set is used for image classification. All mathematical properties of the linear DCV are carried over to the kernel DCV method with the modifications that are applied to the mapped samples, , , where . After performing the feature extraction, all training set samples of each class typically give rise to a single distinct discriminative common vector.
3.1. KDCV Approach Using Cosine Kernel Function
Let be the data in the input space, and be a nonlinear mapping between the input space and the feature space; . Generally three classes of kernel functions are used for nonlinear mapping: (a) the polynomial kernels, (b) the Radial Basis Function (RBF) kernels, and (c) the sigmoid kernels .
The RBF kernels, are also known as isotropic stationary kernels, are defined by such that , where , and is the norm operator. Normally a Gaussian function is preferred as the RBF, in most of the RBF kernels in pattern classification applications. The Gaussian function for RBF kernels is given by . But the globally used RBF kernels yield dense Gram matrices, which can be highly ill-conditioned for large datasets.
So in this work the cosine kernel function is considered as the kernel function , defined by
This result can be expressed in terms of the angle between the inputs: . This shows that this kernel has a dependence on the angle between the inputs.
As a practical matter, we note that cosine kernels do not have any continuous tuning parameters (such as kernel width in RBF kernels), which can be laborious to set by cross validation.
Large margin classifiers are known to be sensitive to the way features are scaled . Therefore it is essential to normalize either the data or the kernel itself. The recognition accuracy can severely degrade if the data is not normalized . Normalization can be performed at the level of the input features or at the level of the kernel. It is often beneficial to scale all features to a common range, for example, by standardizing the data. An alternative way to normalize is to convert each feature vector into a unit vector. If the data is explicitly represented as vectors one can normalize the data by dividing each vector by its norm such that , after normalization. Here normalization is performed at the level of the kernel, that is, normalizing in feature-space, leading to (or equivalently that . This is accomplished by using the cosine kernel which normalizes a kernel to . Normalizing data to unit vectors reduces the dimensionality of the data by one since the data is projected to the unit sphere.
In order to derive positive kernel nonlinear discriminating features (9), we consider only those eigenvectors that are associated with positive eigenvalues.
4. Similarity Measures and Classification
Finally the lower-dimensional, low-energized extracted feature vector of the GWT image is used as the input data instead of the whole image in the proposed method to derive the kernel discriminative feature vector, , using (8). Let be the mean of the training samples for class , where where is the number of classes. The classifier then applies, the nearest neighbor (to the mean) rule for classification using the similarity (distance) measure : The low-energized KDCV vector is classified to that class of the closest mean using the similarity measure . The similarity measures used here are distance measure, , distance measure, , and the cosine similarity measure, , which are defined as where is the transpose operator and denotes the norm operator. Note that the cosine similarity measure includes a minus sign in (14) because the nearest neighbour (to the mean) rule (11) applies minimum (distance) measure rather than maximum.
4.1. Experiments of the Proposed Method on Frontal and Pose-Angled Images for Face Recognition
This section assesses the performance of the low-energized Gabor-block-based KDCV method for both frontal and pose-angled face recognition. The effectiveness of the low-energized block based KDCV method is successfully tested on FRAV2D and FERET databases. For frontal face recognition, the data set is taken from the FRAV2D database, which consists of 1100 frontal face images corresponding to 100 individuals. The images are acquired, with partially occluded face features and different facial expressions. For pose-angled face recognition, the data set taken from the FERET database contains 2200 images of 200 individuals with different facial expressions and poses. Further studies have been made on the FERET dataset using the standard protocols, that is, the Fa, Fb, DupI, and DupII set to assess the performance of the proposed method.
4.1.1. FRAV2D Face Database
The FRAV2D face database, employed in the experiment, consists of 1100 colour face images of 100 individuals, 11 images of each individual are taken, including frontal views of faces with different facial expressions, under different lighting conditions. All colour images are transformed into gray images and are scaled to a size of here is used. The details of the images are as follows: Figure 1(a) regular facial status; Figures 1(b) and 1(c) are images with a turn with respect to the camera axis; Figures 1(d) and 1(e) are images with a turn with respect to the camera axis. Figures 1(f) and 1(g) are with gestures, such as smiles, open mouth, winks, laughs; Figures 1(h) and 1(i) are images with occluded face features; Figures 1(j) and 1(k) are images with change of illumination. Figure 3 shows all samples of one individual.
4.1.2. Specificity and Sensitivity Measures for the FRAV2D Dataset
To measure the sensitivity and specificity  the dataset from the FRAV2D database is prepared in the following manner. For each individual a single class is constituted with 18 images. Thus a total of 100 classes are obtained, from the dataset of 1100 images of 100 individuals. Out of the 18 images in each class, 11 images are of a particular individual, and 7 images are of other individuals taken by permutation as shown in Figure 2. Using this dataset the true positive ; false positive ; true negative ; false negative are measured. From the 11 images of the particular individual, at first the first 4 images Figures 2(a)–2(d), then the first 3 images Figures 2(a)–2(c) of a particular individual are selected as training samples and the remaining images of the particular individual are used as positive testing samples. The negative testing is done using the images of the other individuals. Figure 2 shows all sample images of one class of the dataset used from FRAV2D database.
4.1.3. FERET Face Database
The FERET database, employed in the experiment here, contains 2,200 facial images corresponding to 200 individuals with each individual contributing 11 images. The images in this database were captured under various illuminations, which display a variety of facial expressions and poses. As the images include the background and the body chest region, so each image is cropped to exclude those, and are transformed into gray images and is scaled to here is used. Figure 3 shows all samples of one subject. The details of the images are as follows: Figure 3(a) regular facial status; Figure 3(b) pose angle; Figure 3(c) pose angle; Figure 3(d) pose angle; Figure 3(e) pose angle; Figure 3(f) pose angle; Figure 3(g) pose angle; Figure 3(h) pose angle; Figure 3(i) pose angle; Figure 3(j) alternative expression; Figure 3(k) different illumination.
First 4 images of each individual, that is, Figures 3(a)–3(d) are regarded as training samples. The remainders are regarded as testing samples. After that 3 images of each individual, that is, Figures 3(a)–3(c) are regarded as training samples.
4.1.4. Specificity and Sensitivity Measure for the FERET Dataset
To measure the sensitivity and specificity, the dataset from the FERET database is prepared in the following manner. For each individual a single class is constituted with 18 images. Thus a total of 200 classes are obtained, from the dataset of 2200 images of 200 individuals. Out of these 18 images in each class, 11 images are of a particular individual, and 7 images are of other individuals taken using permutation as shown in Figure 4. Similarly using this dataset from the FERET dataset the specificity and sensitivity are being measured. From the 11 images of the particular individual, at first the first 4 images Figures 4(a)–4(d), then the first 3 images Figures 4(a)–4(c) of a particular individual are selected as training samples and the remaining images of the particular individual are used as positive testing samples. The negative testing is done using the images of the other individuals. Figure 4 shows all sample images of one class of the dataset used from FERET database.
4.2. Further Evaluation on the FERET Face Dataset
We further use the FERET face database for testing our proposed method, as it is one of the most widely used databases to evaluate face recognition algorithms . FERET contains a gallery set, Fa, and four testing sets: Fb, Fc, DupI, and Dup II. In the Fa set, there are 1196 images and contains one image per individual. The Fb set has 1195 images of people with different gestures. In the Fc set, there are 194 images under different lighting conditions. The DupI set has 722 images of pictures taken between 0 and 34 months of difference with those taken for Fa. The DupII set contains 234 images acquired at least 18 months after the Fa set. DupII is a subset of DupI. Figure 4 shows samples from FERET face database. All images are aligned and cropped to according to .
The extracted low-energized Gabor feature vector is considered as input to a trained KDCV with cosine kernel and its output is compared with a gallery set using the , , and cosine similarity measure. The recognition rates of different methods on the FERET probe sets are shown in Table 5. The results are compared with the most recent state-of-the-art with the FERET database. Our results with the FERET database are equivalent (with a difference of ±1%) to the most recent works on the FERET dataset. Note that the methods described in [21, 22] use the Gabor wavelets to generate their feature vectors. As the Gabor wavelets have a much higher algorithmic complexity, so the overall computing cost is very high. On the other hand, our block-based low-energized Gabor feature is a very low-dimensional feature vector which reduces the algorithmic complexity. Also with the use of cosine kernel as the kernel function in the KDCV makes the proposed method quite fast and more suitable to real applications.
Furthermore, we compare the face recognition performance of our proposed low-energized GWT-KDCV method using cosine kernels, with some other well-known methods like generalized discriminant analysis (GDA) method , (EGM) , Discrete Cosine transformation (DCT) and linear discriminant analysis (LDA), DCT-LDA method , DCT-GDA , GWT-LDA, DCT-KDCV method , and the Gabor fusion KDCV method . Classification results obtained from the proposed method are comparable or even better in some cases than above-mentioned methods. Also the cosine similarity measure is more suitable for classifying the nonlinear real KDCV features shown in the tables, Tables 1, 2, 3, 4, 5, and 6.
As the proposed method performs best with the cosine similarity classifier, so the specificity rate of the proposed method is evaluated for the FERET and FRAV2D dataset using the cosine similarity measure shown in Figure 7.
Experiments conducted using the low-energized block based Gabor KDCV method, with three different similarity measures on both the FERET and FRAV2D databases are shown in Figures 5 and 6. Considering, only the low dimensional, low-energized features of GWT image, greatly improves the computing speed of nonlinear discriminant method. The results of recognition accuracy (in terms of sensitivity) versus dimensionality reduction (number of features) and the cumulative match curves using the three different similarity measures are shown in Figures 5 and 6. From the results on both FERET and FRAV2D database it is seen that the cosine similarity measure performs the best, followed in order by the and the measure.
Figures 5 and 6 indicate that the proposed method performs well with a lower dimension as well. These results show that there is certain robustness to age and illumination. Our results indicate that(i)the low-energized block-based Gabor features with KDCV approach greatly enhance the face recognition performance as well as reduce the dimensionality of the feature space when compared with the Gabor features as shown in Tables 1 and 3. For example, the similarity measure improves the face recognition accuracy by almost 10% using only the few low-energized Gabor features with improved discriminative power compared to the original Gabor features as shown in Tables 1 and 3.(ii)The proposed method further enhances face recognition with the use of cosine kernel along with the cosine similarity measure.
Experimental result indicates that the use of cosine kernels in the KDCV further increases the discriminative power of the feature vector extracted from the low-energized block of GWT image and hence, is an effective feature extraction approach, performing better way to extract more effective discriminating features than the GDA. The extracted low-energized feature vector by the proposed Algorithm 1 of Section 2.2 enhances the face recognition performance in presence of occlusions. Experimentally it has been observed that this method is less time consuming than the EGM and other well-known algorithms [4, 21, 22, 24, 25]. (a) Our results show that the proposed method greatly enhances recognition performance, (b) reduces the dimensionality of the feature space, and (c) the Cosine similarity distance classifier further increases the face recognition accuracy, as it calculates the angle between two vectors and is not affected by their magnitude.
This paper introduces a novel block-based GWT generalized KDCV method using the cosine kernel for frontal and pose-angled face recognition. As cosine kernel function is used here, so there is no need of data normalization and the parameter tuning can be avoided as in the case of (RBF) Gaussian kernels. Also the derived low dimensional low-energized features are characterized by spatial frequency, locality, and orientation selectivity to cope with the variations due to illumination and facial expression changes as a property of Gabor kernels. Such characteristics produce salient local features, such as the eyes, nose, and mouth, that are suitable for face recognition. The KDCV method extended with cosine kernel is then applied on these extracted feature vectors to finally obtain only the real nonlinear discriminating kernel feature vector with improved discriminative power, containing salient facial features that are compared locally instead of a general structure, and hence allows to make a decision from the different parts of a face and thus maximizes the benefit of applying the idea of “recognition by parts.” So the method performs well in presence of occlusions (e.g., sunglass, scarf etc.) that is when there are sunglasses or any other obstacles the algorithm compares face in terms of mouth, nose, and other features rather than the eyes.
The authors are thankful to a major project entitled “Design and Development of Facial Thermogram Technology for Biometric Security System,” funded by University Grants Commission (UGC), India and “DST-PURSE Programme” and CMATER and SRUVM project at Department of Computer Science and Engineering, Jadavpur University, India for providing necessary infrastructure to conduct experiments relating to this work.
A. Kar, D. Bhattacharjee, M. Nasipuri, D. Kumar Basu, and M. Kundu, “Classification of high-energized gabor responses using bayesian PCA for human face recognition,” International Journal of Recent Trends inEngineering, pp. 106–110, 2009.View at: Google Scholar
G. Baudat and F. Anouar, “Generalized discriminant analysis using a kernel approach,” Neural Computation, vol. 12, no. 10, pp. 2385–2404, 2000.View at: Google Scholar
L. Wiskott, J. M. Fellous, N. Krüger, and C. von der Malsburg, “Face recognition by elastic graph matching,” in Intelligent Biometric Techniques in Fingerprint and Face Recognition, chapter 11, pp. 355–396, CRC Press, New York, NY, USA, 1999.View at: Google Scholar
T. S. Lee, “Image representation using 2d gabor wavelets,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 10, pp. 959–971, 1996.View at: Google Scholar
D. Gabor, “Theory of Communication,” Journal of the Institution of Electrical Engineers, vol. 93, pp. 429–459, 1949.View at: Google Scholar
G. Donate, M. S. Bartlett, J. C. Hager, P. Ekman, and T. J. Sejnowski, “Classifying facial actions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 10, pp. 974–989, 1999.View at: Google Scholar
B. Scholkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, MIT Press, Cambridge, Mass, USA, 2002.
P. J. Phillips, H. Wechsler, J. Huang, and P. J. Rauss, “The FERET database and evaluation procedure for face-recognition algorithms,” Image and Vision Computing, vol. 16, no. 5, pp. 295–306, 1998.View at: Google Scholar
W. Zhang, S. Shan, W. Gao, X. Chen, and H. Zhang, “Local Gabor binary pattern histogram sequence (LGBPHS): a novel non-statistical model for face representation and recognition,” in Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV'05), pp. 786–791, October 2005.View at: Publisher Site | Google Scholar
Y. M. Lui and J. R. Beveridge, “Grassmann registration mani-folds for face recognition,” in Proceedings of the European Conference on Computer Vision, pp. 44–57, 2008.View at: Google Scholar
S. Li, Y. F. Yao, X. Y. Jing, Z. L. Shao, D. Zhang, and J. Y. Yang, “Nonlinear DCT discriminant feature extraction with generalized KDCV for face recognition,” in Proceedings of the 2nd International Symposium on Intelligent Information Technology Application (IITA'08), pp. 338–341, December 2008.View at: Publisher Site | Google Scholar