Abstract

A multiresolution feature extraction algorithm for face recognition is proposed based on two-dimensional discrete wavelet transform (2D-DWT), which efficiently exploits the local spatial variations in a face image. For feature extraction, instead of considering the entire face image, an entropy-based local band selection criterion is developed, which selects high-informative horizontal segments from the face image. In order to capture the local spatial variations within these bands precisely, the horizontal band is segmented into several small spatial modules. The effect of modularization in terms of the entropy content of the face images has been investigated. Dominant wavelet coefficients corresponding to each module residing inside those bands are selected as features. A histogram-based threshold criterion is proposed to select dominant coefficients, which drastically reduces the feature dimension and provides high within-class compactness and high between-class separability. The effect of using different mother wavelets for the purpose of feature extraction has been also investigated. PCA is performed to further reduce the dimensionality of the feature space. Extensive experimentation is carried out upon standard face databases, and a very high degree of recognition accuracy is achieved by the proposed method in comparison to those obtained by some of the existing methods.

1. Introduction

Automatic face recognition has widespread applications in security, authentication, surveillance, and criminal identification. Conventional ID card and password-based identification methods, although very popular, are no more reliable as before because of the use of several advanced techniques of forgery and password hacking. As an alternative, biometric, which is defined as an intrinsic physical or behavioral trait of human beings, is being used for identity access management [1]. The main advantage of biometric features is that these are not prone to theft and loss and do not rely on the memory of their users.

Among physiological biometrics, face is getting more popularity because of its nonintrusiveness and high degree of security. Moreover, unlike iris or finger-print recognition, face recognition do not require high precision equipments and user agreement, when doing image acquisition, which make face recognition even more popular for video surveillance. Nevertheless, face recognition is a complicated visual task even for humans. The primary difficulty in face recognition arises from the fact that different images of a particular person may vary largely, while images of different persons may not necessarily vary significantly. Moreover, some aspects of the image, such as variations in illumination, pose, position, scale, environment, accessories, and age differences, make the recognition task more complicated. However, despite many relatively successful attempts to implement face recognition systems, a single approach, which is capable of addressing the hurdles, is yet to be developed.

Face recognition methods are based on extracting unique features from face images. In this regard, face recognition approaches can be classified into two main categories: holistic and texture-based [2–4]. Holistic or global approaches to face recognition involve encoding the entire facial image in a high-dimensional space [2]. It is assumed that all faces are constrained to particular positions, orientations, and scales and, hence, are very sensitive to pose variations [5]. However, texture-based approaches rely on the detection of individual facial characteristics and their geometric relationships prior to performing face recognition [3, 4]. Edge information of faces has also been used for face recognition. A line edge map approach was proposed in [6], which gives a distance measurement between two line edge maps of faces and performs face matching based on those measures.

Apart from these approaches, face recognition can also be performed by using different local regions of face images [7–9]. It is well-known that, although face images are affected due to variations, such as nonuniform illumination, expressions, and partial occlusions, facial variations are confined mostly to local regions. A local binary pattern was applied in [8] as a texture descriptor. The local pattern is extracted by binarising the gradients of center point to its eight neighboring points pixel wisely, and this binary pattern is used as image features for classification. It is expected that capturing the localized variations of images would result in a better recognition accuracy [9]. In this regard, wavelet analysis is also employed that possesses good characteristics of spatial-frequency localization to detect facial geometric structure [10–12]. Because of the property of shift invariance, it is well known that wavelet-based approach is one of the most robust feature extraction schemes, even under variable illumination [13]. Hence, it is motivating to utilize local variations of face geometry using wavelet transform for feature extraction and thereby develop a face recognition scheme incorporating the advantageous properties of both holistic- and texture-based approaches.

The objective of this paper is to develop a wavelet-based face recognition scheme, which, instead of the entire face image, considers only some high-informative local zones of the image for dominant feature extraction. An entropy-based horizontal band selection criterion is developed to exploit the high-informative areas of a face image. In order to precisely capture the local spatial variation within a high-informative horizontal band, such high-informative bands are further divided into some smaller spatial modules. The effect of modularization in terms of the entropy content of the face images has been investigated. We propose to extract dominant wavelet coefficients corresponding to some smaller segments residing within the band utilizing a histogram-based on a threshold criterion. In comparison to the discrete Fourier transform, the DWT is used as it possesses a better space-frequency localization. It is shown that the discriminating capabilities of the proposed features are enhanced because of modularization of the face images. The variation of recognition performance with the module size has been investigated. Moreover, the improvement of the quality of the extracted features as a result of illumination adjustment has also been analyzed. In view of further reducing the computational complexity, principal component analysis is performed on the proposed feature space. Finally, the face recognition task is carried out using a distance-based classifier.

2. Brief Description of the Proposed Scheme

A typical face recognition system consists of some major steps, namely, input face image collection, preprocessing, feature extraction, classification, and template storage or database, as illustrated in Figure 1. The input image can be collected generally from a video camera or still camera or surveillance camera. In the process of capturing images, distortions including rotation, scaling, shift, and translation may be present in the face images, which make it difficult to locate at the correct position. Preprocessing removes any unwanted objects (such as, background) from the collected image. It may also segment the face image for feature extraction. For the purpose of classification, an image database is needed to be prepared consisting template face poses of different persons. The recognition task is based on comparing a test face image with template data. It is obvious that considering images themselves would require extensive computations for the purpose of comparison. Thus, instead of utilizing the raw face images, some characteristic features are extracted for preparing the template. It is to be noted that the recognition accuracy strongly depends upon the quality of the extracted features. Therefore, the main focus of this research is to develop an efficient feature extraction algorithm.

The proposed feature extraction algorithm is based on extracting spatial variations precisely from high informative local zones of the face image instead of utilizing the entire image. In view of this, an entropy-based selection criterion is developed to select high informative facial zones. A modularization technique is employed then to segment the high informative zones into several smaller segments. It should be noted that variation of illumination of different face images of the same person may affect their similarity. Therefore, prior to feature extraction, an illumination adjustment step is included in the proposed algorithm. After feature extraction, a classifier compares features extracted from face images of different persons and a database is used to store registered templates and also for verification purpose.

3. Proposed Method

For any type of biometric recognition, the most important task is to extract distinguishing features from the training biometric traits, which directly dictates the recognition accuracy. In comparison to person recognition based on different biometric features, face-image-based recognition is very challenging even for a human being, as face images of different persons may seem similar whereas face images of a single person may seem different, under different conditions. Thus, obtaining a significant feature space with respect to the spatial variation in a human face image is very crucial. In what follows, we are going to demonstrate the proposed feature extraction algorithm for face recognition, where spatial domain local variation is extracted using wavelet domain transform.

3.1. Entropy-Based High-Informative Zone Selection

The information content of different regions of a human face image vary widely [14]. It can be shown that, if an image of a face was divided into certain segments, not all the segments would contain the same amount of information. It is expected that a close neighborhood of eyes, nose, and lips contains more information than that possessed by the other regions of a human face image. It is obvious that a region with high information content would be the region of interest for the purpose of feature extraction. However, identification of these regions is not a trivial task. Estimating the amount of information from a given image can be used to identify those significant zones. In this paper, in order to determine the information content in a given area of a face image, an entropy-based measure of intensity variation is defined as [15]𝐻=βˆ’π‘šξ“π‘˜=1π‘π‘˜log2π‘π‘˜,(1) where the probabilities {π‘π‘˜}π‘š1 are obtained based on the intensity distribution of the pixels of a segment of an image. It is to be mentioned that the information in a face image exhibits variations more prominently in the vertical direction than that in the horizontal direction [16]. Thus, the face image is proposed to be divided into several horizontal bands and the entropy of each band is to be computed. It has been observed from our experiments that variation in entropy is closely related to variation in the face geometry. Figure 2(b) shows the entropy values obtained in different horizontal bands of a person for several sample face poses. One of the poses of the person is shown in Figure 2(a). As expected, it is observed from the figure that the neighborhood of eyes, nose, and lips contains more information than that possessed by the other regions. Moreover, it is found that the locus of entropies obtained from different horizontal bands can trace the spatial structure of a face image. Hence, for feature extraction in the proposed method, spatial horizontal bands of face images are chosen corresponding to their entropy content.

3.2. Proposed Feature

For biometric recognition, feature extraction can be carried out using mainly two approaches, namely, the spatial domain approach and the frequency domain approach [17]. The spatial domain approach utilizes the spatial data directly from the face image or employs some statistical measure of the spatial data. On the other hand, frequency domain approaches employ some kind of transform over the face images for feature extraction. In case of frequency domain feature extraction, pixel-by-pixel comparison between face images in the spatial domain is not necessary. Phenomena, such as rotation, scale, and illumination, are more severe in the spatial domain than in frequency domain. Hence, in what follows, we intend to develop a feature extraction algorithm based on multi-resolution transformation.

Since it is shown in the previous section that certain zones of a face image consist considerably higher information in comparison to other zones, unlike conventional approaches, our objective is to extract features from the spatial data residing only in the high-informative facial bands. Obviously, such a method of feature extraction reduces the feature dimension, which results in significant computational savings. For feature extraction, we have employed 2D-DWT, which, in comparison to the Fourier transform, possesses a better space-frequency localization. This property of the DWT is helpful for analyzing images, where the information is localized in space. The wavelet transform is analogous to the Fourier transform with the exception that it uses scaled and shifted versions of wavelets and the decomposition of a signal involves sum of these wavelets. The DWT kernels exhibit properties of horizontal, vertical, and diagonal directionality.

The continuous wavelet transform (CWT) of a signal 𝑠(𝑑) using a wavelet πœ“(𝑑) is mathematically defined as1𝐢(π‘Ž,𝑏)=βˆšπ‘Žξ€œξ‚€π‘ (𝑑)πœ“π‘‘βˆ’π‘π‘Žξ‚π‘‘π‘‘,(2) where π‘Ž is the scale and 𝑏 is the shift. The DWT coefficients are obtained by restricting the scale (π‘Ž) to powers of  2 and the position (𝑏) to integer multiples of the scales and are given by𝑐𝑗,π‘˜=2𝑗/2ξ€œβˆžβˆ’βˆžξ€·2𝑠(𝑑)πœ“π‘—ξ€Έπ‘‘βˆ’π‘˜π‘‘π‘‘,(3) where 𝑗 and π‘˜ are integers and πœ“π‘—,π‘˜ are orthogonal baby wavelets defined asπœ“π‘—,π‘˜=2𝑗/2πœ“ξ€·2π‘—ξ€Έπ‘‘βˆ’π‘˜.(4) The approximate wavelet coefficients are the high-scale low-frequency components of the signal, whereas the detail wavelet coefficients are the low-scale high-frequency components. The 2D-DWT of a two-dimensional data is obtained by computing the one-dimensional DWT, first along the rows and then along the columns of the data. Thus, for a 2D data, the detail wavelet coefficients can be classified as vertical, horizontal, and diagonal detail.

We intend to demonstrate that the distinguishability of face images of separate persons is enhanced in the wavelet domain. In Figures 3(a) and 3(b), two sample face images of two different persons are shown. The Euclidean distance between the raw face images and that between their corresponding 2D-DWT approximate coefficients are shown in Figure 3(c). It is observed from the figure that the latter one provides comparatively higher Euclidean distance as opposed to the earlier one, which shows better discriminating capability.

In order to demonstrate the effect of rotation on the extracted features in wavelet domain, two face images are shown in Figures 4(a) and 4(b). The two images are from the same person, except that, in the second image, the person’s head is slightly rotated. The Euclidean distance between the raw face images and that between their corresponding 2D-DWT approximate coefficients are shown in Figure 4(c). It is evident from the figure that the latter one provides orders of magnitude lower Euclidean distance as opposed to the earlier one, which shows sharp correlation signifying better match.

3.3. Illumination Adjustment

It is intuitive that images of a particular person captured under different lighting conditions may vary significantly, which can affect the face recognition accuracy. In order to overcome the effect of lighting variation in the proposed method, illumination adjustment is performed prior to feature extraction. Given two images of a single person having different intensity distributions due to variation in illumination conditions, our objective is to provide with similar feature vectors for these two images irrespective of the different illumination condition. Since in the proposed method, feature extraction is performed in the DWT domain, it is of our interest to analyze the effect of variation in illumination on the DWT-based feature extraction.

In Figure 5, two face images of the same person are shown, where the second image (shown in Figure 5(b) is made brighter than the first one by changing the average illumination level. 2D-DWT is performed upon each image, first without any illumination adjustment and then after performing illumination adjustment. Considering all the 2D-DWT approximate coefficients to form the feature vectors for these two images, a measure of similarity can be obtained by using correlation. In Figures 6 and 7, the cross-correlation values of the 2D-DWT approximate coefficients obtained by using the two images without and with illumination adjustment are shown, respectively. It is evident from these two figures that the latter case exhibits more similarity between the DWT approximate coefficients indicating that the features belong to the same person. The similarity measure in terms of Euclidean distances between the 2D-DWT approximate coefficients of the two images for the aforementioned two cases are also calculated. It is found that there exists a huge separation in terms of Euclidean distance when no illumination adjustment is performed, whereas the distance completely diminishes when illumination adjustment is performed, as expected, which clearly indicates that a better similarity between extracted feature vectors.

3.4. Modularization and Its Effect upon Information Content

As mentioned earlier, it is expected to extract features from portions of a face image, where the information content is relatively high. One possible way is to segment the entire face image into several horizontal bands, compute the entropy content of each band, and select the bands with higher entropy contents. It is to be noted that, within a particular horizontal band of a face image, the change in information over the band may not be properly captured if the DWT features are selected considering the entire band as a whole. Even if it is performed, it may offer features with very low between-class separation. In order to obtain high within-class compactness as well as high between-class separability, we modularize the horizontal bands into some smaller segments, which are capable of extracting variation in image geometry locally within a band.

In view of presenting more rationale towards modularizing the high informative facial bands, three different face images are shown in Figures 8(a)–8(c) along with their corresponding histograms calculated from the intensity distribution of those images in Figures 8(d)–8(f). Based on these histograms, a general trend of the intensity distribution of human face images can be acquired. It is observed that the distribution follows an almost similar pattern for the three different persons. One can compute the information content in terms of entropy of a high-informative horizontal band using (1). For the purpose of comparison, from the entropy of the horizontal band, the average entropy per segment 𝐻 is computed by taking into account the total number of segments to be used for modularization. In Figure 9(a), the average entropy per segment 𝐻 computed for the image shown in Figure 8(a) considering 23 segments each having a size of 28Γ—4 pixels is shown.

Next, we consider different small segments of the high-informative horizontal band of the face images and compute the entropy of each segment 𝐻𝑖 based on the histogram of corresponding segments. In Figure 9(a), the entropy values computed from different segments of the high-informative horizontal band of the face image shown in Figure 8(a) are also plotted along with the average entropy per segment calculated using the segmental entropies. Figures 9(b) and 9(c) show similar entropy measures for the high-informative horizontal bands of the other two persons shown in Figures 8(b) and 8(c). It can be clearly observed that the entropy measures vary significantly among different segments. Moreover, the average value of the segmental entropies (shown in Figure 9 as dashed line) is much higher than the average entropy per segment 𝐻 computed from the entire horizontal band of the face image. This clearly gives an indication that, for feature extraction, instead of considering the entire high-informative horizontal band as a whole, modularization would be a better choice.

However, the size of the module is also an important factor. In Figure 10, the variation of average entropy of a sample face image with segment size is shown. It is clear that decreasing the size of the modules offers greater entropy values, that is, variation in information, which is obviously desirable. However, if the modules were extremely small in size, it is quite natural that the small segments will not be capable of exhibiting significant differences in different images.

3.5. Proposed Histogram-Based Wavelet Domain Dominant Feature Extraction

Instead of considering the DWT coefficients of the entire image, the coefficients obtained from each module of the high-informative horizontal band of a face image are considered to form the feature vector of that image. However, if all of these coefficients were used, it would definitely result in a feature vector with a very large dimension. In view of reducing the feature dimension, we propose to utilize the dominant wavelet coefficients as desired features. In order to select the dominant wavelet coefficients, we propose to consider the frequency of occurrence of the wavelet coefficients as the determining characteristic. It is expected that coefficients with higher frequency of occurrence would definitely dominate over all the coefficients for image reconstruction and it would be sufficient to consider only those coefficients as desired features. One way to visualize the frequency of occurrence of wavelet coefficients is to compute the histogram of the coefficients of a segment of a high-informative horizontal band. In order to select the dominant features from a given histogram, the coefficients having frequency of occurrence greater than a certain threshold value are considered.

It is intuitive that, within a high-informative horizontal band of a face image, the image intensity distribution may drastically change at different localities. In order to select the dominant wavelet coefficients, if the thresholding operation was to be performed over the wavelet coefficients of the entire band, it would be difficult to obtain a global threshold value that is suitable for every local zone. Use of a global threshold in a particular horizontal band of a face image may offer features with very low between-class separation. In order to obtain high within-class compactness as well as high between-class separability, we have considered wavelet coefficients corresponding to some smaller spatial modules residing within a horizontal band, which are capable of extracting variation in image geometry locally. In this case, for each module, a different threshold value may have to be chosen depending on the coefficient values of that segment. We propose to utilize the coefficients (approximate and horizontal detail) with frequency of occurrence greater than πœƒ% of the maximum frequency of occurrence for the particular module of the face image and that are considered as dominant wavelet coefficients and selected as features for the particular segment of the image. This operation is repeated for all the modules of a face image within the selected high-informative horizontal band.

Next, in order to demonstrate the advantage of extracting dominant wavelet coefficients corresponding to some smaller modules residing in a horizontal band, we conduct an experiment considering two different cases: (i) when the entire horizontal band is used as a whole and (ii) when all the modules of that horizontal band are used separately for feature extraction. For these two cases, centroids of the dominant approximate wavelet coefficients obtained from several poses of two different persons (appeared in Figure 11) are computed and shown in Figures 12 and 13, respectively. It is observed from Figure 12 that the feature centroids of the two persons at different poses are not well separated and even for some poses they overlap with each other, which clearly indicates poor between-class separability. In Figure 13, it is observed that, irrespective of the poses, the feature centroids of the two persons maintain a significant separation indicating a high between-class separability, which strongly supports the proposed local feature selection algorithm.

We have also considered dominant feature values obtained for various poses of those two persons in order to demonstrate the within class compactness of the features. The feature values, along with their centroids, obtained for the two different cases, that is, extracting the features from the horizontal band without and with modularization, are shown in Figures 14 and 15, respectively. It is observed from Figure 14 that the feature values of several poses of the two different persons are significantly scattered around the respective centroids resulting in a poor within-class compactness. On the other hand, it is evident from Figure 15 that the centroids of the dominant features of the two different persons are well separated with a low degree of scattering among the features around their corresponding centroids. Thus, the proposed dominant features extracted locally within a band offer not only a high degree of between-class separability but also a satisfactory within-class compactness.

3.6. Reduction of the Feature Dimension

For the cases where the acquired face images are of very high resolution, even after selection of dominant features from the small segments of the high-informative horizontal band of a face image, the feature vector length may still be very high. Further dimensionality reduction may be employed for reduction in computational burden.

Principal component analysis (PCA) is a very well-known and efficient orthogonal linear transformation [18]. It reduces the dimension of the feature space and the correlation among the feature vectors by projecting the original feature space into a smaller subspace through a transformation. The PCA transforms the original 𝑝-dimensional feature vector into the 𝐿-dimensional linear subspace that is spanned by the leading eigenvectors of the covariance matrix of feature vector in each cluster (𝐿<𝑝). PCA is theoretically the optimum transform for given data in the least square sense. For a data matrix, 𝑋𝑇, with zero empirical mean, where each row represents a different repetition of the experiment and each column gives the results from a particular probe, the PCA transformation is given byπ‘Œπ‘‡=π‘‹π‘‡π‘Š=𝑉Σ𝑇,(5) where the matrix Ξ£ is an π‘šΓ—π‘› diagonal matrix with nonnegative real numbers on the diagonal and π‘ŠΞ£π‘‰π‘‡ is the singular value decomposition of 𝑋. If π‘ž poses of each person are considered and a total of 𝑀 dominant DWT coefficients (approximate and horizontal detail) are selected per image, the feature space per person would have a dimension of π‘žΓ—π‘€. For the proposed dominant features, implementation of PCA on the derived feature space could efficiently reduce the feature dimension without losing much information. Hence, PCA is employed to reduce the dimension of the proposed feature space.

3.7. Distance-Based Face Recognition

In the proposed method, for the purpose of recognition using the extracted dominant features, a distance-based similarity measure is utilized. The recognition task is carried out based on the distances of the feature vectors of the training face images from the feature vector of the test image. Given the π‘š-dimensional feature vector for the π‘˜th pose of the 𝑗th person be {π›Ύπ‘—π‘˜(1),π›Ύπ‘—π‘˜(2),…,π›Ύπ‘—π‘˜(π‘š)} and a test face image 𝑓 with a feature vector {𝑣𝑓(1),𝑣𝑓(2),…,𝑣𝑓(π‘š)}, a similarity measure between the test image 𝑓 of the unknown person and the sample images of the 𝑗th person, namely, average sum-squares distance, Ξ”, is defined asΔ𝑓𝑗=1π‘žπ‘žξ“π‘šπ‘˜=1𝑖=1||π›Ύπ‘—π‘˜(𝑖)βˆ’π‘£π‘“||(𝑖)2,(6) where a particular class represents a person with π‘ž number of poses. Therefore, according to (6), given the test face image 𝑓, the unknown person is classified as the person 𝑗 among the 𝑝 number of classes whenΔ𝑓𝑗≀Δ𝑓𝑔,βˆ€π‘—β‰ π‘”βˆ€π‘”βˆˆ{1,2,…,𝑝}.(7)

4. Experimental Results

Extensive simulations are carried out in order to demonstrate the performance of the proposed feature extraction algorithm for face recognition. In this regard, different well-known face databases have been considered, which consist a range of different face images varying in facial expressions, lighting effects and presence/absence of accessories. The performance of the proposed method in terms of recognition accuracy is obtained and compared with that of some recent methods [19, 20]. The variation of recognition accuracy with different feature dimension and different module size has been investigated. Moreover, the effect of using different combinations of dominant coefficients as features has been presented.

4.1. Face Databases

In this section, the performance of the proposed face recognition scheme has been presented for two standard face databases, namely, the ORL database (available at http://www.cl.cam.ac.uk/Research/DTG/attarchive/pub/data/) and the Yale database (available at http://cvc.yale.edu/projects/yalefaces/yalefaces.html). In Figures 16 and 17, sample face images of different poses of two different persons taken from the ORL and the Yale databases, respectively, are shown. The ORL database contains a total of 400 images of 40 persons, each person having 10 different poses. Little variation of illumination, slightly different facial expressions, and details are present in the face images. The Yale database, on the other hand, consists a total of 165 images of 15 persons, each person having 11 different poses. The poses exhibit large variations in illumination (such as central lighting, left lighting and right lighting, dark condition), facial expressions (such as wink, sad, happy, surprised, sleepy, and normal) and other accessories (such as with glasses and without glass).

4.2. Performance Comparison

In the proposed method, dominant features (approximate and horizontal detail 2D-DWT coefficients) obtained from all the modules of high-informative horizontal bands of a face image are used to form the feature vector of that image and feature dimension reduction is performed using PCA. The recognition task is carried out using a simple Euclidean distance-based classifier as described in Section 3.7. The experiments were performed following the leave-one-out cross validation rule.

For simulation purposes, 𝑁 number of horizontal bands is selected based on the entropy measure described in Section 3.1 and divided further into small modules. Module height is the same as that of the horizontal band and module width is chosen based on the face image width. In our simulations, 𝑁=2 for the ORL database and 𝑁=3 for the Yale database are chosen and the module sizes are chosen as 16Γ—16 pixels and 32Γ—32 pixels, respectively. The dominant wavelet coefficients corresponding to all the local segments residing in the horizontal bands are then obtained using πœƒ=10.

For the purpose of comparison, recognition accuracies obtained using the proposed method along with those obtained by the methods reported in [19] and [20] are listed in Table 1. Here, in case of the ORL database, the recognition accuracy for the method in [20] is denoted as not available (N/A). It is evident from the table that the recognition accuracy of the proposed method is comparatively higher than those obtained by the other methods for both the databases. It indicates the robustness of the proposed method against partial occlusions, expressions, and nonlinear lighting variations.

As mentioned earlier, dominant features are extracted from the small modules of the high-informative horizontal bands of the face images. Next, we intend to demonstrate the effect of variation of module width upon the recognition accuracy obtained by the proposed method. In Figure 18, the recognition accuracies obtained for different module widths for both the databases are shown. It is observed from the figure that similar recognition accuracies are achieved unless the module width is extremely large or small. Note that, in case of considering the entire horizontal band as a whole instead of any modularization, the recognition accuracy drastically falls to a value less than 85.00% for both the databases, as expected.

In view of reducing computational complexity, dimension reduction of the feature space plays an important role. In the proposed method, the task of feature dimension reduction is performed using PCA. In Figure 19, the effect of dimension reduction upon recognition accuracy is shown. It is found from this figure that, even for a very low feature dimension, the recognition accuracies remain very high for both the databases. It is to be mentioned that, as the width of the images in the Yale database is almost three times than that of the ORL database, a higher feature dimension is required to obtain a similar level of recognition accuracy.

For the case of choosing dominant spectral coefficients based on the thresholding criterion in the proposed method, the effect of changing the threshold values, that is, incorporating different amount of top approximate and horizontal detail coefficients, has been investigated. In Figure 20, variation of recognition accuracies with different threshold values is shown. It can be observed from the figure that as the amount of top coefficients decreases, the recognition accuracy also decreases, although the recognition accuracies are sufficiently high even for very low amount of coefficients utilized.

As stated earlier, the effect of thresholding is selecting certain wavelet coefficients as feature values and discarding the others. In Figure 21, the percentage of reduction in number of coefficients is shown with the variation in threshold level. As the recognition accuracy decreases with increasing threshold values (depicted in Figure 20) and the reduction in number of coefficients increases with increasing threshold values, we chose an optimum value for the threshold level.

5. Conclusions

The proposed wavelet-based dominant feature extraction algorithm provides an excellent space-frequency localization, which is clearly reflected in the high within-class compactness and high between-class separability of the extracted features. Instead of using the whole face image for feature extraction at a time, first, certain high-informative horizontal bands within the image are selected using the proposed entropy-based measure. Modularization of the horizontal bands is performed, and the effect of modularization has been investigated. The dominant wavelet coefficient features are then extracted from within those local zones of those horizontal bands. The effect of variation of module size upon recognition performance has been investigated and found that the recognition accuracy does not depend on the module size unless it is extremely large or small. The effect of using different mother wavelets for the purpose of feature extraction has been also investigated. It has been found that the proposed feature extraction scheme offers an advantage of precise capturing of local variations in the face images, which plays an important role in discriminating different faces. Moreover, it utilizes a very low dimensional feature space, which ensures lower computational burden. For the task of classification, an Euclidean distance-based classifier has been employed, and it is found that, because of the quality of the extracted features, such a simple classifier can provide a very satisfactory recognition performance and there is no need to employ any complicated classifier. From our extensive simulations on different standard face databases, it has been found that the proposed method provides high recognition accuracy even for images affected due to partial occlusions, expressions, and nonlinear lighting variations.

Acknowledgment

The authors would like to express their sincere gratitude towards the authorities of the Department of Electrical and Electronic Engineering and Bangladesh University of Engineering and Technology (BUET) for providing constant support throughout this research work.