Abstract

We present a partially occluded facial image retrieval method based on a similarity measurement for forensic applications. The main novelty of this method compared with other occluded face recognition algorithms is measuring the similarity based on Scale Invariant Feature Transform (SIFT) matching between normal gallery images and occluded probe images. The proposed method consists of four steps: (i) a Self-Quotient Image (SQI) is applied to input images, (ii) Gabor-Local Binary Pattern (Gabor-LBP) histogram features are extracted from the SQI images, (iii) the similarity between two compared images is measured by using the SIFT matching algorithm, and (iv) histogram intersection is performed on the SIFT-based similarity measurement. In experiments, we have successfully evaluated the performance of the proposed method with the commonly used benchmark database, including occluded facial images. The results show that the correct retrieval ratio was 94.07% in sunglasses occlusion and 93.33% in scarf occlusion. As such, the proposed method achieved better performance than other Gabor-LBP histogram-based face recognition algorithms in eyes-hidden occlusion of facial images.

1. Introduction

CCTV cameras have been widely deployed for various purposes such as surveillance, crime investigation, and disaster monitoring in the past decade. Facial images captured by CCTV cameras or mobile devices are utilized as clues in the investigation of criminal suspects in forensic applications. The state-of-the-art commercial face recognition systems have achieved perfect accuracy in matching facial images acquired in controlled scenarios, but face identification and retrieval are still processed by human examiners. They are very inefficient in terms of performance and time, because the investigator has to match the captured facial images one by one with the criminal face database in order to find the criminal suspect. Therefore an automated facial image retrieval system has become a recent research issue for forensic application [13]. Facial images in the real world have many challenging factors such as facial expression, illumination, pose, aging, and occlusion. These factors can decrease the true acceptance rate from 99% to below 60%, so the topic of face recognition algorithms is studied in order to overcome the aforementioned drawbacks [4]. Furthermore, one of the most significant issues in the surveillance environment where people are not cooperating with the system is occlusion, which involves both intentional occlusion such as a hat, sunglasses, or scarf and natural occlusion such as a beard, hand, or makeup. Figure 1 shows examples of partially occluded facial images from the benchmark database. The occlusion causes falling-off of the performance of face recognition, because it distorts the appearance of the face and reduces the information to represent the facial image, as shown in the figure.

Face recognition algorithms can be categorized as holistic and local feature-based approaches. The well-known conventional holistic approaches including Principal Component Analysis (PCA) [5], Linear Discriminant Analysis (LDA) [6], and Independent Component Analysis (ICA) [7] have a major weakness, which is sensitivity to the loss of the distorted area, because these kinds of algorithm utilize the whole face information [8, 9]. An alternative method is local feature-based approaches that are more robust to occlusion than the holistic approaches. Most face recognition algorithms dealing with occlusion are based on the local features, such as LBP [10], Modified Census Transform (MCT) [11], and Local Gradient Pattern (LGP) [12]. Local feature-based face recognition for occluded facial images has been studied in [1316], in which the matching scores of local images are fused or voted for improving recognition. Kim et al. [17] have proposed the effective part-based locally salient features for face recognition, which is robust to local distortion and partial occlusion. Occluded face recognition for single training image per person has been presented by Tan et al. [18]. Occlusion-invariant face recognition using selective local nonnegative matrix factorization (S-LNMF) based images has been proposed by Oh et al. [19]. PCA is used in order to detect occlusion, and S-LNMF is used for face recognition. PCA-based face recognition algorithms for handling occlusion are compared in [20]. The face recognition approach based on a Support Vector Machine (SVM) [21] has been studied in [22], in which SVM is used to extract local features invariant to partial occlusion. Sharma et al. [23] have proposed an efficient partially occluded face recognition system, which makes use of Eigen faces and Gabor wavelet filters. Min et al. [24] have presented an occlusion detection algorithm using Gabor wavelet, PCA, and SVM, while recognition of the nonoccluded facial part is performed using LBP. Most occluded face recognition algorithms have employed statistical approaches to deal with the occluded region of the facial image. This is a complicated mechanism to detect the occluded region and acquire information on the distorted region of the face. In addition, the learning mechanism is difficult to apply in applications, because the face database for forensics is large scale. Also, learning and updating is a time-consuming process.

In this paper, a facial image retrieval method for partially occluded facial images, which focuses on the similarity measurement based on SIFT matching [25], is proposed. The goal of the proposed method is to provide a simple and nonstatistical approach that is able to detect the occluded region of probe images, to find the same subregions between original gallery images and partially occluded probe images, and to compensate the face misalignment. The similarity measurement is performed for detection of the occluded part in a probe facial image and selection of the same subregions in two compared images. The preprocessing method is adopted to compensate illumination changes and extract more reliable SIFT key-points. The proposed method also employs a Gabor-LBP histogram for extraction of representative face features [26]. The Gabor-LBP histogram based on a nonstatistical approach is independent of the characteristics of the training database. Therefore it is very suitable for facial image retrieval for real applications. Histogram intersection is performed only in the selected subregions, which contain SIFT-matched key-points for facial image retrieval. To validate the performance of the proposed method, the experiments are conducted using a benchmark database. The experimental results show that the proposed method improves the face retrieval rate, particularly with occlusion by sunglasses, and it is feasible for forensic application in terms of accuracy and simplicity. The rest of this paper is organized as follows. Section 2 presents the proposed facial image retrieval method. Experimental results are described to evaluate the proposed method on the benchmark database in Section 3. Finally, Section 4 concludes this paper, with remarks on future work.

2. Methodology

The proposed method takes into account the fact that SIFT matching [25] can be used to detect the occluded part caused by any kind of obstacle (sunglasses, scarf, hair, hand, etc.). It measures the nonmetric partial similarity between two images (a gallery image and a probe image) using the SIFT matching algorithm. This partial similarity can help extract the intrapersonal features for face retrieval. The overall architecture is described in Figure 2.

2.1. SQI Preprocessing

The illumination influences gray intensity, and a change of lighting makes it difficult to recognize the face precisely. Therefore the preprocessing step can be improved to rectify the illumination for face recognition. Here, SQI as a type of high pass filter is adopted in order to be robust against illumination variance [27]. In SQI, is defined aswhere is the low frequency image of an input image , is the Gaussian kernel, and denotes the convolution operation. The Gaussian kernel is calculated by multiplication of weight and Gaussian filter aswhere is the intensity of a pixel , is the mean value of intensity of the filtering region, and is the kernel size. denotes a Gaussian function with standard deviation. Figure 3 shows illumination-rectified images applied by the SQI operation to original images. As shown in Figure 3(b), SQI rectifies the illumination variation and retains the face features.

2.2. Face Feature Extraction

Gabor filters are adopted to extract face features since they have shown the decomposition power to extract representative information of facial images for face recognition in [2831]. Gabor filters are defined as [32]where is the pixel and denotes the norm operator. and are the orientation and scale of the Gabor filter and iswhere and define the orientation and scale of the Gabor wavelets, is the maximum frequency, and is the spacing factor between kernels in the frequency domain. Gabor filters are used with five scales and eight orientations with in this study. is the Gaussian distribution parameter and determines the kernel size of the filter. This means that the kernel size of the Gabor filter used is pixels to extract the distinctive contours of facial images [33].

The Gabor-LBP histogram sequence is constructed to be robust against local variance. Each Gabor-LBP image is divided into multiple nonoverlapping subregions, for example, and , and Gabor-LBP histograms in each region are extracted. Then the Gabor-LBP histograms are concatenated into a single feature histogram sequence in order to represent the facial image [34]. The histogram of a subregion of gray image is formulated aswhere is the gray value (0~255), is the number of pixels with gray value , and is defined asThe Gabor-LBP histogram sequence of a single facial image can be calculated by concatenating each subregion histogram aswhere is the Gabor-LBP histogram sequence of a single facial image and is the histogram of the th subregion. is the number of subregions. Finally, the Gabor-LBP histogram sequence as a face representation is acquired by concatenating each Gabor-LBP histogram sequence which obtained the Gabor filtered facial images aswhere is the th Gabor-LBP histogram sequence and . Histogram intersection is used to measure the similarity between the gallery and probe images:where and denote Gabor-LBP histograms of a gallery image and a probe image, respectively, represents the th subregion, and is the number of subregions.

2.3. SIFT-Based Similarity Measurement

To handle the occlusion for face retrieval, it is necessary to detect the occluded facial part and measure the similarity in the nonoccluded region between two compared images. SIFT key-points represent maxima or minima of the difference-of-Gaussian function in the scale-space. Let be an image. denotes a variable-scale Gaussian function with standard deviation . The scale-space of an image is defined aswhere denotes the convolution andThe difference-of-Gaussian function is formulated aswhere is a separation factor of two scales. Local maxima and minima of are obtained by comparison of the sample point, its eight neighboring points in the current scale image, and the nine neighboring points in the above scale and below scale images. If the pixel is a local maximum or minimum, then it is selected as a key-point.

For SIFT key-point matching, each key-point descriptor of a probe image is matched independently against all key-point descriptors of a gallery image. If the distance ratio of two key-point descriptors is below the specific threshold, then two key-points are matched. Otherwise, the match is rejected, and the key-point is removed. The SIFT matching ability for face recognition is shown in Figure 4, which represents the matching results between a normal face and various types of facial images—facial expression and occlusion. As shown in Figures 4(b) and 4(c), the SIFT matching shows good performance over the occlusion (sunglasses and scarf). It has the good characteristics of detecting occluded regions of facial images. SIFT matching can be employed to measure the similarity between a normal facial image and an occluded facial image in the proposed method. In addition, SQI preprocessing can be used to obtain more trustworthy SIFT key-points than original images. This is the most important property to acquire reliable key-points for matching in the SIFT algorithm. In particular, the occluded facial image does not have sufficient region, from which it is possible to extract key-points. SQI preprocessing is performed for SIFT key-point extraction in addition to illumination compensation. Figure 5 shows the performance comparison of SIFT key-point extraction between original images and SQI images. The SIFT matching of SQI images is more precise than that of the original images in imposter matching.

2.4. Facial Image Retrieval Using SIFT-Based Similarity

SIFT-based similarity measurement is necessary not only to detect the occluded region but also to compensate the error that occurs in the process of face alignment. If the facial image is misaligned, the face information in the same subregion is not necessarily the same face information even if in the same position. As shown in Figures 4 and 5, the measured similarity using SIFT represents nonoccluded regions in the face and the corresponding positions between a gallery image and a probe image. After SIFT-based similarity measurement, facial image retrieval performs a selected histogram intersection in a gallery image and a probe image. The selected histogram is a subregion histogram, which includes matched SIFT key-points between two compared images. The facial image retrieval method compares histograms not in the same numbered subregion but in a subregion including a matched key-point. Figure 6 shows a selected histogram intersection using SIFT-based similarity measurement in detail. In Figure 6(a), SIFT matching (genuine face matching) of two facial images has extracted 4 matched key-points. The 4 matched key-points are included in the specific subregion, and the specific subregions become selected subregions. The histogram of each selected subregion is intersected with the histogram of the corresponding subregion in order to measure the distance. Then other histogram intersections of the remaining selected subregions perform iteratively. Figure 6(b) shows imposter face matching. In the case of imposter matching, the number of matched key-points is smaller than in genuine matching. Moreover, the selected subregions are not the same numbered subregions in the facial images. Therefore the final similarity of the imposter is less than that with genuine matching.

The score matching function of the proposed facial image retrieval iswhere and are the histograms of and subregions, respectively. and are the selected subregions including SIFT-matched key-points of a gallery image and a probe image, respectively. is the th matched key-point, is the number of matched key-points, andwhere and are the th and th subregions including the th matched key-point in a gallery image and a probe image, respectively, are the th matched key-point coordinates in a gallery image, and are the th matched key-point coordinates in a probe image. The score matching approach in this study uses histogram intersection of selected subregions. The face features extraction and matching procedures which involve Gabor-LBP histogram sequence extraction and SIFT-based similarity measurement have no statistical and learning stages. Thus it can be simpler and faster than the conventional Gabor-LBP histogram algorithm.

3. Experimental Results

3.1. Database

The performance of the proposed method has been evaluated on the AR face database [37]. The AR face database is a commonly used benchmark database for face recognition and especially occluded face recognition. It consists of a total of 3,510 images of 135 different subjects (76 males and 59 females), and it can be divided into facial expressions (neutral, smile, angry, and scream), various types of illumination (right, left, and full), and an occlusion (sunglasses and scarf) section. There are no restrictions on glasses, hairstyle, moustache, and beard. The test dataset used in this experiment can be summarized as shown in Table 1. The normalized facial image which is obtained in our experiments is a gray-scale image of pixels, and the distance between the eye centers is 32 pixels. The SQI image is made from a gray-scale normalized facial image by preprocessing. The gallery images are 135 neutral images with no illumination, and the probe images are 270 occluded images with no illumination, consisting of 135 sunglasses and 135 scarf images.

3.2. Experiment  1: Preprocessing and Number of Subregions

The first experiment was conducted to validate the performance, according to the preprocessing and the number of subregions, in the conventional Gabor-LBP histogram approach [26]. The -axis of the graph represents the rank, which means that the subject is searched at the th rank. Rank-1 implies that the subject is found at the first rank, and the matching score is the highest in the test dataset. The -axis of the graph is the retrieval rate, which sums up the matching score of each subject at the th rank. The images in the experiment are used without preprocessing images and SQI images. Face features were extracted using Gabor-LBP histograms, and the facial image retrieval was performed in the sunglasses and scarf datasets, respectively. The other parameter in this experiment was the number of subregions. Gabor-LBP histograms were extracted in each small region so as to be robust against local variance. subregions represent that a facial image is divided into subregions of row by subregions of column. and subregions were tested in the experiment. Figure 7 shows that the performance of SQI images is better than the original images and that subregions achieve better performance than subregions. This shows that local variance affects the performance of face retrieval and that preprocessing and the number of subregions are important factors in Gabor-LBP-based face recognition algorithms.

3.3. Experiment  2: Comparison with Other Face Recognition Algorithms

In the case of sunglasses occlusion, the performance of the proposed method was compared with other algorithms as described in Table 2. The Gabor-LBP histogram achieved 75.56% accuracy at Rank-1 in the and subregions, and the SIFT total weight and SIFT probe weight achieved lower accuracy than the original Gabor-LBP histogram. Analysis shows that the Gabor-LBP histogram and SIFT total weight use the occlusion region of the face, and the occlusion region causes noise in the recognition. The SIFT probe weight is reflected on the occlusion using SIFT key-points, but the similarity between the gallery and probe images is not considered.

The proposed method achieved a retrieval rate at Rank-1 of 77.78% and 92.60% in the and subregions, respectively. The performance was about 20% higher than other algorithms [26, 35, 36]. The performance of the proposed method in the case of SQI images used in SIFT matching achieved greater accuracy than original images. The proposed method with SQI images achieved 94.07% and 92.59% at Rank-1 in the and subregions. Other algorithms showed similar results—that SQI images for SIFT matching improved performance. This validates that SQI images have the ability to extract more reliable SIFT key-points than original images. In addition, subregions give more precise face-representative information than subregions in the case of original images for SIFT matching. However, the experiment for SQI images used in SIFT matching showed the opposite result. It is assumed that if it is able to extract more precise and reliable key-points, the histogram of more neighboring points of each key-point is more robust against local variance. The performance comparison of sunglasses occlusion with subregions and subregions in the case of SQI images used in SIFT matching is shown in Figure 8.

Table 3 describes the performance comparison in case of scarf occlusion. The performance of the Gabor-LBP histogram achieved 97.94% at Rank-1 when we used SQI images in SIFT matching and subregions. This is the best performance in this experiment. The extensions of the Gabor-LBP histogram achieved from 89% to 95% approximately. The proposed method achieved 93.33% and 91.11% accuracy in the and subregions. The performance of the proposed method was similar to or less than the performance of other algorithms. This indicates that occlusion in the lower area of the face has less influence on the retrieval accuracy than other occlusion problems. It can be deduced that the discriminative face features are distributed in the region around the eyes, and the high performance can be achieved to utilize more information about the eye area. Gabor-LBP histogram-based face recognition algorithms used all the subregions of the upper area of the facial image, whereas the proposed method used only the selected subregions including SIFT matched key-points. The combination of SQI images for SIFT matching and subregions shows the best performance in the experiment for scarf occlusion. The performance comparison for scarf occlusion is shown in Figure 9, which shows the performance in the case of SQI images used in SIFT matching.

Consequently, the proposed method has achieved better performance, where the parameters selected are subregions and SIFT matching using SQI facial images; the performance of sunglasses and scarf occlusion was 94.07% and 93.33%, respectively. The results of the proposed method are shown to perform better compared to [26, 35, 36] in the sunglasses occlusion. However, the Gabor-LBP histogram-based face recognition algorithms are better than the proposed method in the case of occlusion by scarf. The analysis shows that eyes and their environs are a significant area for face recognition. The lower part of the face is less important than the upper, and it is difficult to deal with occlusions which conceal the upper region of the face (including eyes) for face recognition. In that respect, the proposed method can contribute to eye-hidden occlusion such as sunglasses and hats for facial image retrieval.

4. Conclusions

We have described a facial image retrieval method using SIFT-based similarity measurement for partially occluded facial images. The proposed method achieved a positive contribution to facial image retrieval for partially occluded facial images using similarity measurement based on SIFT matching. The similarity measurement through SIFT matching was used in order to (i) detect the occluded region of the face, (ii) find the same positions of the face, and (iii) compensate for face misalignment. The SIFT-based similarity measurement can detect the occluded region of probe images and find similar positions between gallery and probe images without complex image processing. For facial image retrieval, the proposed method adopted Gabor-LBP histogram-based face features, and SIFT matching was employed to measure the similarity between a gallery image and a probe image. The proposed method has achieved retrieval rates of 94.07% and 93.33% with sunglasses and scarf occlusion, respectively. The retrieval accuracy of the conventional Gabor-LBP histogram was nearly 75% and 98% in the case of sunglasses and scarf occlusion. This shows that occlusions that hide the eyes and their environs have a greater influence on retrieval accuracy than other occlusion problems.

The proposed method enhanced the performance by about 20% compared with other Gabor-LBP histogram-based face recognition algorithms in the case of sunglasses occlusion. The average performance of facial image retrieval for scarf occlusion was almost 93%. Analysis shows that the most discriminative face features are in the environs of the eyes, and high performance can be achieved by utilizing more information on this region. The performance of the proposed method was similar to or less than other algorithms in the case of scarf occlusion, because the proposed method used insufficient face features for face recognition. The proposed method was robust against eyes-hidden occlusion such as sunglasses and hat and could be applied to a facial image retrieval system for alignment-free facial images. Furthermore, the proposed facial image retrieval method was based on a nonstatistical approach, which did not need a supervised learning process. This is another advantage for the facial image retrieval system because it is able to solve the problems of the large scale and updating of the face database. However, the proposed method has the weakness that it is not possible to match facial images if the matched key-points are not extracted or are insufficient. Therefore our future work will be focused on research on a more reliable and sufficient fiducial key-points detector and matching approach, instead of SIFT.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.