Abstract

In spite of the differences in visual stimulus of human beings such as ageing, changing conditions of a person, and occlusion, recognition can even be done at a glance by the human eye many years after the previous encounter. It has been established that facial differences like the hairstyle changes, growing of one’s beard, wearing of glasses, and other forms of occlusions can hardly hinder the power of the human brain from making a face recognition. However, the same cannot easily be said about automated intelligent systems which have been developed to mimic the skill of the human brain to aid in recognition. There have been growing interests in developing a resilient and efficient recognition system mainly because of its numerous application areas (access control, entertainment/leisure, security system based on biometric data, and user-friendly human-machine interfaces). Although there have been numerous researches on face recognition under varying pose, illumination, expression, and image degradations, problems caused by occlusions are mostly ignored. This study thus focuses on facial occlusions and proposes an enhancement mechanism through face image augmentation to improve the recognition of occluded face images. This study assessed the performance of Principal Component Analysis with Singular Value Decomposition using Fast Fourier Transform (FFT-PCA/SVD) for preprocessing face recognition algorithm on face images with missingness and augmented face image database. It was found that the average recognition rates for the FFT-PCA/SVD algorithm were the same () when face images with missingness and augmented face images were used as test images, respectively. The statistical evaluation revealed that there exists a significant difference in the average recognition distances for the face images with missingness and augmented face images when FFT-PCA/SVD is used for recognition. Augmented face images tend to have a relatively lower average recognition distance when used as test images. This finding is contrary to the equal performance assessment by the adopted numerical technique. The MICE algorithm is therefore recommended as a suitable imputation mechanism for enhancing/improving the performance of the face recognition system.

1. Introduction

Face recognition technology in image processing is the process of identifying one or more people in images or videos by analysing and comparing patterns. In face recognition, an algorithm is used to extract facial features to be compared to a database with the aim of finding the best match. The computational model for face recognition developed by Turk and Pentland [1] created the present awareness in this field of research. According to Sharif et al. [2], viable technology has enhanced face recognition to gain a significant position in image processing but constraints such as ageing, facial expression, and occlusion have not been adequately addressed. Occlusion refers to the situation whereby extraneous objects hinder face recognition, and it represents missing information or holes in the facial images [3]. Cao et al. [4] posited that facial hair such as beard or moustaches and the usage of accessories like sunglasses or scarves can cause occlusion. Also, according to Thazheena and Aswathy Devi [5], a face can be occluded if part of its area is hidden by wearing objects like sunglasses, a mask, hats, or scarf in the eyes and mouth positions. They classified occlusion as natural and synthetic. Natural occlusion refers to obstruction of views between objects in an image without any intention. On the other hand, an artificial blockade or intentional covering of an image view is synthetic occlusion.

Some of the different reasons people use such items are outlined in the following examples. Firstly, for safety reasons, medical staff wear surgical masks for protection when treating patients and people on construction sites wear helmet for protection against head injuries. Secondly, people wear veils because of their cultural practices or religious convictions. Thirdly, bank robbers, automated teller machine (ATM) fraudsters, shop thieves, and football hooligans use items to cover up their faces when they want to commit illegal actions. Finally, sports men and women use helmets for rugby, swimming caps for swimming, etc. during sports competitions. The use of most of these items changes the appearance of the original face of the people and poses challenges to the recognition system due to distortion of face representations.

According to Gonzalez-Sosa et al. [6], the effect of occlusion is data missingness in a face matrix. This degrades the facial image and reduces the performance of the algorithm for recognition, hence the need to recover such portions to obtain the entire face.

Min et al. [7] proposed an efficient approach which consists of first detecting the presence of scarf/sunglasses and then processing the nonoccluded facial regions only. The occlusion detection problem was approached using Gabor wavelets, PCA, and support vector machines (SVM), while the recognition of the nonoccluded facial part is performed using block-based local binary patterns. Their experiments on AR face database showed that the proposed method yields significant performance improvements compared to some existing works for recognizing partially occluded and also nonoccluded faces. Their proposed method gave an average recognition rate of 94.58%, 92.08%, and 75.83% when used to recognize nonoccluded face, scarf, and sunglass occlusions, respectively.

Miyakoshi and Kato [8] proposed an imputation method using a Bayesian Network (BN) with weighted learning to impute missing values in face images. Their system successfully classified test samples with missing values. The missing values were imputed by the proposed method, with better success than some conventional imputation methods (Normal BN, Weighted KNN, Mean, and Support Vector Regression). The weighted Bayesian Network gave an accuracy rate of 83.6%, 65.6%, 76.5%, and 58.5% when the test image had no occlusion, brows occlusion, eye occlusion, and mouth occlusion, respectively.

Wright et al. [9] developed robust face recognition via sparse representation. The method required a proper harnessing of sparsity for proper choice of features. Sufficiently, a large number of features were needed to obtain correct computations. The drawback in this method was to ascertain the number of features that would be sufficient for correct computation. A Sparse Representation Classification (SRC) was proposed from the theory of sparse representation to predict how much occlusion can be handled by a recognition algorithm in addition to the choice of training images so as to maximize robustness to occlusion.

Further investigations proved that the Collaborative Representation (CR) that makes the sparse representation classification powerful was ignored in the earlier works. Hence, a Collaborative Representation Classification (CRC) was combined with Regularized Least Square (RLS) to obtain a new classification method (CRC_RLS) [10]. Although the CRC_RLS was an improvement to the SRC, the Laplacian or Gaussian distributions restrict the coding coefficient and the coding errors statistically.

Asiedu et al. [11] leveraged on the property of bilateral symmetry for natural objects to reconstruct frontal face images from left and right segmented images. They further evaluated the performance of FFT-PCA/SVD algorithm on the reconstructed face image database.

The use of symmetry can have a limitation in a situation where the same area in both the right and the left parts of the face are occluded. These constraints in the aforementioned method necessitate the need to explore other alternatives in face recognition system to recognize face images with occlusions. The goal is to enhance the recognition module to yield high precision. To this end, this study seeks enhancing the face recognition system by imputing occluded parts of the face (face augmentation).

2. Materials and Methods

2.1. Source of Data

The study adopted the Massachusetts Institute of Technology (MIT) (2003–2005) database to benchmark the face recognition algorithm. It is a secondary database which is made of ten subjects, with the subjects captured under different angular poses (, , , , , , , and ). For this study, we concentrated on the face images captured under the straight pose (). The train-image database contains twenty frontal face images. Ten of these images were straight pose from the MIT (2003–2005) database. The images captured into the train-image database are denoted as train images and are used to train the algorithm. Figure 1 shows images captured into the train image database.

Ten frontal images were acquired by creating random missingness in each image. These were captured into the test image database 1 labeled “occluded face images.” Figure 2 contains the images captured into the test image database 1.

2.1.1. Multivariate Imputation with Chain Equation (MICE)

One frequently used imputation method with a wide application for dealing with missing data in statistics is the MICE. This method is also known as the sequential regression or Fully Conditional Specification (FCS) multiple imputation [12]. It is a very flexible method because it can handle different variable types such as discrete and continuous. Continuous variables are modeled through the method of linear regression whereas categorical variables are modeled through logistic regression.

It has three different phases which are similar to any other multiple imputation method: imputation, analysis, and pooling. It creates multiple imputations to overcome the limitation of single imputation. MICE uses Fully Conditional Specification (FCS) to preserve unique features such as bounds, skip patterns, interactions, and bracketed responses in the data [12].

The MICE operation is based on the assumption of missing at random (MAR) with the implication that missing value probability does not depend on the unobserved values but only on the observed values [12]. MICE can handle large datasets through the use of chain equations as compared to other imputation methods that use joint models [13]. This makes it a powerful and flexible multiple imputation method that uses a number of regression algorithms.

In this study, ten frontal face images were acquired through augmentation of the images with missingness using the Multivariate Imputation with Chain Equations (MICE) algorithm. These images were captured into the test image database 2 labeled “augmented images.” Figure 3 contains the images capture into the test image database 2.

To keep the data uniform, captured images were digitized into gray-scale precision and resized into dimensions and the data types changed into double precision for preprocessing. This made the images (matrices) conformable and enhanced easy computations [11].

2.2. Research Design

In the recognition system, the train images are preprocessed using the adopted mean centering and Fast Fourier Transform (FFT) mechanisms. The important features of the preprocessed face images are extracted using the PCA/SVD algorithm. The extracted features are stored in memory as a created knowledge for recognition.

This study considered two test image databases: frontal face images with random missingness (shown in Figure 2) and frontal face images acquired through augmentation of the images with missingness using MICE algorithm (shown in Figure 3). The test images are also preprocessed using the mean centering and Fast Fourier Transform (FFT) mechanisms and their unique features are extracted using PCA/SVD for recognition. These features are passed to the classifier where they are matched with the knowledge created from the train images for recognition. It is worthy of note that only one test image database is used in the recognition module along with the train image database at a time. Figure 4 shows a design of the recognition module.

2.3. Preprocessing

Preprocessing is an initial stage of face recognition where the quality of the images is enhanced. According to Asiedu et al. [11], the preprocessing techniques are used to denoise the images making them better conditioned for recognition. Preprocessing of an image basically removes the acquired noise and suppresses unwanted distortion of image feature. This improves the quality of the image for feature extraction. Mean centering and Fast Fourier Transformation mechanisms were adopted for preprocessing. Details of the Fast Fourier Transformation and mean centering mechanisms used for the preprocessing of the face images are presented in Section 2.3.1 and Section 2.3.2, respectively.

2.3.1. Fast Fourier Transform

Fast Fourier Transform (FFT) was adopted as a noise reduction mechanism. According to Glynn [14], the FFT algorithm reduces the computational burden to arithmetic operations. Zhang et al. [15] and Asiedu et al. [16] demonstrated that the application of FFT in the image preprocessing stage improves the recognition system.

The DFT of a column vector, , is represented mathematically aswhere and . is the column of the image matrix, [11].

The Gaussian filter was adopted for filtering the face images after the Discrete Fourier transformation because of the Gaussian nature of illumination variations [11].

After filtering, the Inverse Discrete Fourier Transformations (IDFT) were performed to reconstruct images into their original forms. The Inverse Discrete Fourier Transform (IDFT) is given by

The output after the inverse transformation is usually complex. The real components are extracted to be used at the feature extraction stage whereas the imaginary component is discarded as noise. Figure 5 shows the stages in FFT preprocessing of an image.

2.3.2. Mean Centering

Given the image sample , whose elements are the vectorised form of the individual images in the study, the mean centering is performed by subtracting the mean image from the individual images under study. The mean is given byand the mean centering of the th image is given byand is the mean-centered matrix of the face space.

2.4. Feature Extraction

The storage of face image requires the reduction of the dimensions of original images using feature extraction methods. This is due to the large nature of face image space. We adopt Principal Component Analysis (PCA) as a dimensionality reduction algorithm to extract the most significant components or those components which are more informative and less redundant, from the original data. PCA can be used to find lower dimensional subspace which identifies the axes with maximum variance [17].

As stated earlier, the FFT-PCA/SVD algorithm was used to train the image database to extract unique face features for recognition.

We now present the mathematical underpins of the face feature extraction mechanism, PCA/SVD, as described by Asiedu et al. [16].

A set of orthonormal vectors, , which best describes the distribution of the data, is required for feature extraction. The vector is chosen such thatis a maximum subject to the orthonormality constraints

The vectors and scalars are the eigenvectors and eigenvalues, respectively, of the variance-covariance matrix given as

Through Singular Value Decomposition (SVD) of the covariance matrix , the eigenvalues and their corresponding eigenvectors are extracted. The SVD decomposition gives two orthogonal matrices and and a diagonal matrix .

The eigenfaces are then computed from the following equation:where is the column vector of .

The extracted principal components from the training set are given asand . These are the created knowledge which is stored in memory for recognition.

2.5. Recognition Process

We may recall that there are two test image databases: frontal face images with random missingness (shown in Figure 2) and frontal face images acquired through augmentation of the images with missingness using MICE (shown in Figure 3). When a test image is passed through the recognition module, its unique features are extracted asand are the principal components (extracted features) of the test image.

The recognition distances (Euclidean distances) are computed as

The minimum Euclidean distance and , is chosen as the recognition distance.

3. Results and Discussion

Figure 6 presents the recognition distances and matches face image with missingness and augmented face images. It can be seen in Figure 6 that there was one mismatch when the face images with missingness were used as test images for recognition. Also, there was one mismatch when the augmented face images were used as test images for recognition.

3.1. Numerical Evaluations

The average recognition rate was adopted as the numerical assessment criteria for assessing the performance of the algorithm on both databases. The average recognition rate, , of an algorithm is defined aswhere is the number of experimental runs, is the number of correct recognition in the run, and is the total number of faces being tested in each run [16]. The average error rate, , is defined as .

The total number of correct recognition for the study algorithm is 9.

The total number of experimental runs , and the total number of images in a single experimental run .

Now, using the face images with missingness as test images, the average recognition rate of study algorithm (FFT-PCA/SVD) is

The average error rate is

Using the augmented face images as test images, the average recognition rate of study algorithm (FFT-PCA/SVD) is

The average error rate is

3.2. Statistical Evaluation

The recognition distances were obtained from the same subjects in two different treatments (images with missingness and augmented images with MICE); hence, a paired sample test is suitable for statistical evaluation of the experiment.

Let denote the recognition distance recorded using face images with missingness as test images and denote the recognition distance using the augmented images as test images for the individual; then the paired differencesshould reflect the differential effects of the treatments.

Now assuming the difference , are independent observations from distribution, then the statisticwhereandhas a t-distribution with degrees of freedom. Consequently, an -level test of the hypothesis , against , is conducted by comparing with and the upper percentile of the t-distribution with degree of freedom. Note that implies the Mean difference of recognition distances from face images with missingness and augmented face images is zero.

To make a decision as to whether or not to reject the , value corresponding to the computed statistic is compared to the level of significance. A confidence interval for the mean difference in recognition distance is constructed as

Some assumptions underpinning the paired sample t-test are that the observation must be paired and independent and the observed differences should be normally distributed.

Each observation of one treatment (recognition distances for face image with missingness) has a match (paired) with a corresponding observation in the other treatment (recognition distances for augmented face images). This satisfies the assumption of paired observations.

The assumption of independence of the observations is attained since different subjects were captured in the study database.

Table 1 shows the sample statistics of recognition distances for the subjects used in the study when face images with missingness and augmented faces images are used as test images.

It is evident from Table 1 that the average image recognition distances after MICE augmentation (134.0239) are lower than those with missingness (202.5990). We now assess the normality of the observed difference through the Shapiro–Wilks test. The test was performed to check whether the distribution of the observed difference is the same as the expected (normal) distribution. The test statistic value for the test is 0.908 with a corresponding value of 0.267. This means the distribution of the observed difference is the same as the expected distribution (normal). Upon violation of the normality assumption, we would have resorted to the nonparametric counterpart (Related Samples Wilcoxon Signed Rank Test) of the paired sample test.

The paired samples correlation between the recognition distance for the face image with missingness and augmented face images is 0.611 with a value of 0.06. This indicates that there exists a moderate positive linear relationship between the recognition distance for the face images with missingness and recognition distance of the augmented face images. The value of 0.06 indicates that this relationship is not significant at level of significance. The results of the paired sample t test are shown in Table 2.

It can be seen from Table 2 that the average observed difference between the recognition distances for face images with missingness and recognition distances for augmented face images (MD and AD, respectively) is 68.5751. The test statistic value from the paired sample test is 3.979 with a corresponding value of 0.003. It is evident from the value that there is a significant difference between the average recognition distance for face images with missingness and recognition distances for augmented face images. It can be inferred that the average recognition distance for face images with missingness is significantly higher (at level of significance) than the average recognition distance for augmented face images.

4. Conclusion and Recommendation

The FFT-PCA/SVD algorithm produced average recognition rates of each when face images with missingness and augmented face images were used as test images, respectively. It is evident from the results of the numerical assessment that the recognition algorithm has equal performance when face images with missingness and augmented face images are used as test images.

The numerical results (average recognition rate of 90%) of this study were appreciable and consistent with those of Min et al. [7] and Miyakoshi and Kato [8] despite the differences in occlusion criteria (random occlusions; brows, eye, and mouth occlusions; scarf and sunglass occlusions) and the database used to benchmark the recognition/classification systems.

The statistical assessment revealed that there is a significant difference between the average recognition distance for face images with missingness and recognition distances for augmented face images. The average recognition distance for face images with missingness is significantly higher than the average recognition distances for augmented face images. This means the enhancement of the face recognition module through augmentation with MICE algorithm is a viable improvement as it provides relatively lower recognition distances. A relatively lower recognition distance is desired as it signifies close match.

The numerical assessment mechanism failed to reveal the hidden effect of the enhancement with MICE algorithm whereas the statistical evaluation procedure was able to uncover this effect. This can be attributed to the fact that the statistical evaluation mechanism is a more data-driven approach to assess the performance of the recognition algorithm. This may also be as a result of the low level of missingness created. The superiority MICE augmentation mechanism may be demonstrated better in subsequent studies by increasing the level of missingness created in the face images.

From the findings of the study, the MICE algorithm is therefore recommended as a suitable imputation mechanism for enhancing/improving the performance of recognition modules/systems when used to recognize occluded face images.

Data Availability

The PGM image data supporting this research are from previously reported studies and datasets, which have been cited. The processed data are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.