Assessing the Effect of Data Augmentation on Occluded Frontal Faces Using DWT-PCA/SVD Recognition Algorithm
The drift towards face-based recognition systems can be attributed to recent advances in supportive technology and emerging areas of application including voting systems, access control, human-computer interactions, entertainments, and crime control. Despite the obvious advantages of such systems being less intrusive and requiring minimal cooperation of subjects, the performances of their underlying recognition algorithms are challenged by the quality of face images, usually acquired from uncontrolled environments with poor illuminations, varying head poses, ageing, facial expressions, and occlusions. Although several researchers have leveraged on the property of bilateral symmetry to reconstruct half-occluded face images, their approach becomes deficient in the presence of random occlusions. In this paper, we harnessed the benefits of the multiple imputation by the chained equation technique and image denoising using Discrete Wavelet Transforms (DWTs) to reconstruct degraded face images with random missing pixels. Numerical evaluation of the study algorithm gave a perfect (100%) average recognition rate each for recognition of occluded and augmented face images. The study also revealed that the average recognition rate for the augmented face images (75.5811) was significantly lower than the average recognition rate (430.7153) of the occluded face images. MICE augmentation is recommended as a suitable data enhancement mechanism for imputing missing data/pixel of occluded face images.
Face recognition systems suffer suboptimality due to the lack of effective image preprocessing approaches. Thus, the use of image enhancement techniques and their effects on the performance of face recognition algorithms have been studied by several researchers .
Rana et al.  assessed the effect of image enhancement techniques on the recognition rate of facial recognition algorithms under varying illumination, face orientations, and expressions. Their results showed up to 75% improvement in recognition rate when image enhancement was applied. They also found that the highest recognition rate was achieved under low-light conditions and image noise reduced using a median smoothing filter.
Abdul-Jabbar  showed that preprocessing steps such as image adjustment, histogram equalization, and change in file format when applied to enhance the contrast and the quality of face images in different face recognition algorithms improve the accuracy of recognition up to 30% as compared to using the original database of face images. In the case where half of the face is degraded due to occlusions, several researchers [4, 5] have leveraged the bilateral symmetry of face images to reconstruct the full-face images and have used different denoising techniques to enhance image quality. Asiedu et al.  reconstructed frontal face images from left and right-half images using principal component analysis and singular value decomposition (FFT-PCA/SVD) and employed fast Fourier transforms in the preprocessing stage. They reported no statistical differences in the average recognition distances for the left and right reconstructed face images. However, numerical evaluation of the average recognition rate was higher for the left reconstructed face images (95%) as compared to the right face images (90%). It is worthy to note that the assumption of occluded half-face images is untenable in practice and becomes deficient in the presence of random occlusions. Also, Fourier transform denoises an image only in the frequency domain of the original image. Wavelet-based denoising techniques, on the other hand, have an advantage by providing both spatial and frequency representations which make more contributions to noise reduction . In addition, the scope of their work was limited to only image degradation due to half-face occlusions. The problem of image degradation due to random missing pixels or patches was not addressed.
Li et al.  evaluated the denoising performances of Sliding Window Average (SWA) and DWT in eliminating random fluctuations in sensor data for sensor fault detection and isolation in a nuclear power plant based on mean square error (MSE), signal-to-noise ratio (SNR), and correlation (COR) between the original data and denoised measurement. Their result showed superiority with respect to all test indexes when the DWT technique was used.
When image degradation is due to missing pixels or patches, imputation techniques provide a means of approximating such pixel values or patches by assuming that pixels in the known and unknown portions of degraded face images share the same statistical properties or geometric structures . Two of such approaches are the diffusion-based inpainting methods  and the exemplar-based inpainting techniques  which have been successfully employed in restoring missing pixels or patches. According to Criminisi et al. , the main drawback of diffusion-based methods is that the diffusion process introduces some blurriness which becomes noticeable when filling larger regions. Aside that, such methods are optimal for filling holes or small patches.
Zhang et al.  applied the exemplar-based inpainting technique based on a surface fitting as the prior knowledge and an angle-aware patch matching and introduced a Jaccard similarity coefficient to advance the matching precision between patches to restore missing blocks and large holes as well as object removal task. They asserted that their results outperformed many of the state-of-the-art methods in this domain. However, exemplar-based inpainting techniques are optimal for filling large texture area.
Multiple statistical imputation methods have emerged as a vital approach to finding random missing values. Such methods can account for uncertainty in imputations. The chain equations approach, in particular, is flexible and can handle both binary and continuous variables as well as complexities such as bounds or survey skip patterns .
In this study, we harness the benefits of the multiple imputation by the chained equation technique and image denoising using discrete wavelet transforms to reconstruct degraded face images with random missing pixels for recognition.
This work is motivated by the overgrowing number of applications of efficient and resilient intelligent systems. For more details on other application areas of intelligent systems, please refer to the work of Iwendi et al.  where they performed an empirical analysis to determine the effectiveness and performance of deep-learning algorithms in detecting insults in social commentary and the work of Gadekallu et al.  where a crow search-based convolution neural networks model was implemented in gesture recognition pertaining to the human-computer interaction (HCI) domain.
The rest of the paper is organized as follows: Section 2 discusses the data acquisition, the adopted statistical or mathematical methods, the research design, and implementation. Section 3 presents and discusses the results of the algorithmic runs and numerical and statistical evaluations, and Section 4 examines the findings of the study in comparison with existing works in the literature and finally concludes by summarizing the overall achievements of the study. This section also presents some recommendation and directions for future developments.
2. Materials and Methods
2.1. Source of Data
The Massachusetts Institute of Technology (MIT) (2003-2005) and Japanese Female Facial Expressions (JAFFE) databases were adopted to benchmark the face recognition algorithm.
The MIT database contains frontal facial expressions of ten individuals captured under different angular poses (, , , , , , , and ). For the purpose of this study, we used only the face images with a straight () pose.
The JAFFE database contains frontal face images of ten individuals captured along six universally accepted principal emotions (neutral, angry, disgust, sad, surprise, and happy). In this study, only the neutral expression was used.
Figure 1 shows the face images of subjects in the train image database. Overall, the acquired train image database contains ten frontal face images each with straight pose from MIT database (shown in sub-Figure 1(a)) and ten frontal face images each with neutral expression from the JAFFE database (shown in Figure 1(b)). The images captured into the train image database are denoted as train images and are used to train the algorithm.
Two test image databases were used in the study. Test image database 1 was acquired by creating random missingness (10%) in each of the twenty frontal face images. The images in the test image database 1 are shown in Figure 2.
2.1.1. Multiple Imputation with Chained Equation (MICE)
Multiple Imputation (MI) uses the distribution of the observed data to estimate a set of plausible values for missing data . Random components are incorporated into these estimated values to reflect their uncertainty.
MICE, also known as the sequential regression or fully conditional specification multiple imputation, is a very flexible method because it can handle different variable types such as discrete and continuous.
According to Van Buuren , the MICE operation is based on the assumption of missing at random (MAR) with the implication that missing value probability is independent of the unobserved values but only depends on the observed values.
MICE has three different phases which are similar to any other multiple imputation method, imputation, analysis, and pooling. It creates multiple imputations to overcome the limitation of a single imputation.
In this study, we adopted the MICE algorithm to augment the occluded images due to its ability to handle large datasets through the use of chain equations as compared to other imputation methods that rely on joint models.
Ten frontal face images were acquired through augmentation of the images with missingness using MICE algorithm. The images were captured into test image database 2, as shown in Figure 3.
In quest for uniformity, captured images were digitized into gray-scale precision and resized into dimensions. The data types were also changed into double precision for preprocessing. This makes the matrices conformable and enhances easy computations .
2.2. Research Design
Face images sent to the recognition system/module are first preprocessed through mean centering and Discrete Wavelet Transformation (DWT) mechanisms. The images in the train image database are the first to be sent to the recognition module for preprocessing. The preprocessed images are then passed to the feature extraction unit where the important features are extracted using the PCA/SVD algorithm. The extracted unique features are stored in memory as a created knowledge for recognition.
As stated earlier, two test image databases shown in Figures 2 and 3 were used in this study. The test images are also preprocessed using the mean centering and Discrete Wavelet Transform (DWT) mechanisms, and their unique features are extracted using PCA/SVD for recognition.
The unique features are passed to the classifier/recognition unit where they are matched with the stored knowledge created from the train images. In the classifier, the minimum recognition distance indicates a close match. It should be noted that only one test image is passed to the recognition module along with the train images at a time. The design of the study recognition module/system is shown in Figure 4.
In digital image processing, the preprocessing phase serves as a data preparation step for contrast enhancement, noise reduction, or filtering. The main objective of image preprocessing is to improve the quality of the images by removing acquired noise and suppressing unwanted distortion of the image feature .
Among the existing image enhancement procedures, filtering techniques have become very popular over the years for addressing the problem of noise removal and edge enhancement [15, 16]. According to Bhattacharyya , other approaches, which include neuro-fuzzy-genetic and wavelet-based approaches, operate on the underlying data regardless of the distributions and operating parameters.
In this study, as indicated earlier, the mean centering and Discrete Wavelet Transformation (DWT) mechanisms were adopted for preprocessing. Details of the DWT and mean centering mechanisms are presented in Section 2.3.1 and Section 2.3.2, respectively.
2.3.1. Discrete Wavelet Transform (DWT)
A wavelet transform is an efficient tool for data approximation, compression, and noise removal [17,18]. Kociołek et al.  defines DWT as a linear transformation that operates on a data vector whose length is an integer power of two, and transforms it into a numerically different vector of the same length. The DWT has received considerable attention in various signal-processing applications, including image watermarking . The primary objective of DWT as seen in multiresolution analysis  involves the decomposition of an image into frequency channels of constant bandwidth on a logarithmic scale. It provides a principled way of downsizing the range images and also captures both frequency and location information.
In the DWT cycle, an image is decomposed into four subbands denoted , , , and at the first level in the domain, where , , and represent the finest scale wavelet coefficients and stands for the coarse-level coefficients . is denoted as the low-frequency band and as the high-frequency band. Specifically the subband represents the lower resolution estimate of the original value, while midfrequency and high-frequency detail subbands , , and represent horizontal edge, vertical edge, and diagonal edge details, respectively .
The subband can further be decomposed to obtain another level of decomposition. This is because most of the energy in the original image is concentrated in the low-frequency subband. This makes the subband relatively free from noise. The decomposition process continues on the subband until the desired number of levels determined by the application is reached. The other and subbands contain the facial expression and face pose features, respectively. The subband can easily be perturbed by noises, expressions, and poses. This makes the subband the most unstable among the four subbands.
Some types of wavelet used in the literature for data approximation, compression, and noise removal are Haar, Daubechies sets, Morlet, Coiflets, Biorthogonal, and Mexican Hat Symlets.
We adopted the Haar wavelet because it is the simplest wavelet and can efficiently support the interest of the study. In its operation, it applies a pair of low-pass and high-pass filters for image decomposition first in image columns and then in image rows independently.
Mathematically, if we consider a vectorized image of dimension where is even, then the single-level Haar transform decomposes into two signals of length . These are the mean coefficient vector with componentsand details coefficient vector , with components
Now, we concatenate and into another -vector, which can be regarded as a linear matrix transformation of .
We then filter the transformed vector with the Gaussian filter. This is because the Gaussian mixture is isotropic and can represent data distributions by a mean vector and a covariance matrix . Most importantly, Gaussian noise is the default noise acquired due to illumination variations. Please refer to the work of Bhattacharyya  for other types of mixtures for non-Gaussian and asymmetric distributions.
After filtering, the transformed vector is inverted to with componentsand
Figure 5 shows the DWT cycle using the Haar wavelet.
2.3.2. Mean Centering
Given an image space , whose elements are the vectorized form of the individual images in the study database, we define as an centering matrix given bywhere is the identity matrix and is an with all entries equal to 1.
The mean centering of the -th image is performed by subtracting the mean image from the individual images under study. Mathematically, the mean centered image is given bywhere , is the mean image and is the mean centered matrix of the face space.
2.4. Feature Extraction
Feature extraction is the second step in digital image processing next to preprocessing (image preparation step). It aids in retrieving nonredundant and significant information from an image. The feature extraction phase is targeted at achieving time efficiency at the cost of data reduction , followed by object detection, localization, and recognition, which determine the position, location, and orientation of images .
According to Iwendi et al. , the main focus of feature optimization is not only to decrease the computational cost but also find such feature subsets that can work with different classifiers to produce better results.
Principal Component Analysis (PCA), also known as Karhunen–Loeve expansion, is a classical feature extraction and data representation technique widely used in the areas of pattern recognition and computer vision . PCA can be used to find lower dimensional subspace which identifies the axes with maximum variance .
In a recent study, Reddy et al.  investigated two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA), on four popular Machine Learning (ML) algorithms (Decision Tree Induction, Support Vector Machine (SVM), Naive Bayes Classifier, and Random Forest Classifier) using the publicly available Cardiotocography (CTG) dataset from the University of California and Irvine Machine Learning Repository. The experimentation results prove that PCA outperforms LDA in all the measures. They also found that the performance of the classifiers was not much affected by using PCA and LDA.
In this study, we adopted Principal Component Analysis (PCA) as a dimensionality reduction algorithm to extract the most significant components or those components which are more informative and less redundant, from the original data.
As indicated earlier, the DWT-PCA/SVD algorithm was used to train the image database to extract unique face features for recognition.
The primary objective of the PCA/SVD feature extraction mechanism is to find a set of orthonormal vectors, , which best describes the distribution of the image data . The -th vector is chosen such thatis a maximum subject to the orthonormality constraints,where and , are the eigenvectors and their corresponding eigenvalues, respectively, of the dispersion matrix, and is extracted through Singular Value Decomposition (SVD). The dispersion matrix is given by
The SVD decomposition gives two orthogonal matrices and and a diagonal matrix . We obtain the eigenfaces from the following equation:where is the -th column vector of .
From the train image database, the extracted features (principal components) for the -th image are given as
Hence, the extracted features (principal components) of all the face images in the train image database are represented by . These are stored in the recognition system internal memory as created knowledge for recognition.
2.5. Recognition Process
This is the last stage in the recognition module/system. Here, an unknown face image from either of the two test image databases (occluded and augmented face image database shown in Figures 2 and 3, respectively) is passed through the system for recognition.
Unique features of the unknown face image are extracted as
Let the extracted features (principal components) for all images in the -th () test image database be . Then, the recognition distances (Euclidean distances) are computed as
The train image that corresponds to the minimum Euclidean distance and , is chosen as the closest match to the unknown test image.
3. Results and Discussion
The results of matching the two set test images (occluded and augmented) for the MIT and JAFFE databases are shown in Figures 6 and 7, respectively. It can be seen from Figures 6 and 7 that there was no mismatch when the occluded face images were used as test images for recognition. Also, there was no mismatch when the augmented face images were used as test images for recognition.
3.1. Numerical Evaluations
It is evident from Figures 6 and 7 that the study algorithm (DWT-PCA/SVD) gave a perfect (100%) average recognition rate when used to recognize face images in both test image databases (occluded face image and augmented face image databases).
The average computational time for the recognition of all 20 images was 4 seconds.
3.2. Statistical Evaluations
We begin the statistical assessment with a discussion of some descriptive statistics followed by a test of significant difference between the average recognition distance of the occluded and augmented face images.
From Table 1, the average recognition distance for the recognition of occluded face images (430.715) with a corresponding standard error of 70.858 is greater than the average recognition distance for augmented face images (75.5811) with a corresponding standard error of 13.3511.
The median recognition distances for the occluded and augmented face image database are 345.8750 and 48.4765, respectively. The median recognition distance is used as the average recognition distance in the presence of outlier observations.
It is worthy to note that a relatively lower recognition distance is always preferred as it signifies a closer match. It can, therefore, be inferred from the abovementioned results that the MICE augmentation of the face images with missingness enhanced the recognition module to produce relatively lower average recognition distances.
Test of significant difference between the average recognition distance of occluded and augmented face images: Now, we assess whether there exists a statistically significant difference between the average recognition distances of occluded face images (from test image database (1) and augmented face images (test image database (2)) when they are used for recognition.
The paired sample t-test is suitable for this test only if its underlying assumption is satisfied. The test is very sensitive to the assumption that the observed difference should be normally distributed. The Shapiro–Wilk test of normality gave a test statistic value of 0.874 with a corresponding (0.014) . This indicates that the distribution of the observed difference between the recognition distance of occluded and augmented images is not normal. We now resort to the nonparametric counterpart of the paired sample t-test (related sample Wilcoxon signed-rank test) since the assumption of normality has been violated. The exact distribution of the Wilcoxon signed-rank test gives accurate and reliable results for small sample sizes. The test is distribution free and, hence, does not require the satisfaction of any parametric assumption.
Let denote the recognition distance recorded using occluded images as test images and denote the recognition distance using the augmented images as test images for the individual. Then, the observed differencesshould reflect the differential effects of the treatments. The test operates under the null hypothesis, : the median difference between recognition distance of occluded and augmented faces is zero.
Table 2 contains the results of the Wilcoxon signed-rank test.
As in Table 2, the Wilcoxon signed-rank test gave a standardized statistic value of with a corresponding . This indicates that the median observed difference of recognition distance between occluded and augmented face images is significantly different from zero. This means the average recognition distance when occluded face images are used as test images is significantly different from the average recognition distance when augmented face images are used as test images. As stated earlier, evidence from Table 1 suggests that the average recognition distance for the augmented face image is lower than the average recognition distance for the occluded face images. Since a relatively lower recognition distance signifies a closer match, it can be inferred from the statistical evaluation that, the MICE augmentation of the occluded face images improves the performance of the recognition algorithm and recognition process at large.
4. Conclusions and Recommendation
The study successfully assessed the performance of the DWT-PCA/SVD recognition algorithm on occluded and augmented face images. Numerical evaluation of the study algorithm gave a perfect (100%) average recognition rate each for recognition of occluded and augmented face images. This rate is slightly above the rates of Ayiah-Mensah et al.  who used FFT-PCA/SVD recognition algorithm and obtained average recognition rate each on the same databases. This shows that the adopted preprocessing mechanism (discrete wavelet transformation) has an edge over the Fast Fourier transformation (FFT) mechanism used by Ayiah-Mensah et al. . The perfect (100%) rate of recognition achieved cannot be guaranteed if the level of missingness in the face images increases.
The statistical evaluation revealed that there exists a significant difference between the average recognition distance of occluded face images and augmented face images. From the descriptive statistics shown in Table 1, the average recognition rate for the augmented face images (75.5811) is lower than the average recognition rate (430.7153) of the occluded face images. This points to the fact that the MICE augmentation improved the recognition performance of the study algorithm. This finding, although hidden from the numerical evaluation results, is evident from the statistical evaluation of the study algorithm.
According to Ayiah-Mensah et al. , the failure of the numerical evaluation exercise to uncover this finding can be attributed to the fact that the statistical evaluation mechanism is a more data-driven approach to assess the performance of the recognition algorithm.
The findings of the study are consistent with those of Min et al. , despite the differences in occlusion criteria (random occlusions; brow, eye, and mouth occlusions; and scarf and sunglass occlusions) and the database used to benchmark the recognition/classification systems.
The study, therefore, recommends the use of discrete wavelet transformation as a preprocessing mechanism in a recognition module. MICE augmentation is also recommended as a suitable data enhancement mechanism for imputing missing data/pixel of occluded face images. Future work will focus on assessing the MICE data enhancement mechanism on occluded face images when the percentage of missingness is increased.
The image data supporting this study are from previously reported studies and datasets, which have been cited. The processed data are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
M. E. Rana, A. A. Zadeh, and A. M. M. Alqurneh, “Use of image enhancement techniques for improving real time face recognition efficiency on wearable gadgets,” Journal of Engineering Science and Technology, vol. 12, no. 1, pp. 155–167, 2017.View at: Google Scholar
S. Van Buuren, Flexible Imputation of Missing Data, CRC Press, Boca Raton, FL, USA, 2018.
C. Iwendi, G. Srivastava, S. Khan, and P. K. R. Maddikunta, “Cyberbullying detection solutions based on deep learning architectures,” Multimedia Systems, pp. 1–14, 2020.View at: Google Scholar
T. R. Gadekallu, M. Alazab, R. Kaluri et al., “Hand gesture classification using a novel CNN-crow search algorithm,” Complex & Intelligent Systems, pp. 1–14, 2021.View at: Google Scholar
Y. Meyer, Wavelets: Algorithms & Applications, SIAM (Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1993.
M. Kociołek, A. Materka, M. Strzelecki, and P. Szczypiński, “Discrete wavelet transform-derived features for digital image texture analysis,” in Proceedings of the International Conference on Signals and Electronic Systems, pp. 163–168, Lodz, Poland, September 2001.View at: Google Scholar
L. Asiedu, F. Oduro, A. Adebanji, and F. O. Mettle, “A statistical assessment of whitened-PCA/SVD under variable environmental constraints,” International Journal of Ecological Economics and Statistics, vol. 37, pp. 63–74, 2016.View at: Google Scholar
R. Min, A. Hadid, and J.-L. Dugelay, “Improving the recognition of faces occluded by facial accessories,” in Proceedings of the 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 442–447, IEEE, Santa Barbara, CA, USA, March 2011.View at: Publisher Site | Google Scholar