Computational Intelligence and Neuroscience

Volume 2017 (2017), Article ID 5317850, 9 pages

https://doi.org/10.1155/2017/5317850

## Patch-Based Principal Component Analysis for Face Recognition

School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China

Correspondence should be addressed to Ting-Zhu Huang

Received 1 March 2017; Revised 4 June 2017; Accepted 7 June 2017; Published 11 July 2017

Academic Editor: George A. Papakostas

Copyright © 2017 Tai-Xiang Jiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We have proposed a patch-based principal component analysis (PCA) method to deal with face recognition. Many PCA-based methods for face recognition utilize the correlation between pixels, columns, or rows. But the local spatial information is not utilized or not fully utilized in these methods. We believe that patches are more meaningful basic units for face recognition than pixels, columns, or rows, since faces are discerned by patches containing eyes and noses. To calculate the correlation between patches, face images are divided into patches and then these patches are converted to column vectors which would be combined into a new “image matrix.” By replacing the images with the new “image matrix” in the two-dimensional PCA framework, we directly calculate the correlation of the divided patches by computing the total scatter. By optimizing the total scatter of the projected samples, we obtain the projection matrix for feature extraction. Finally, we use the nearest neighbor classifier. Extensive experiments on the ORL and FERET face database are reported to illustrate the performance of the patch-based PCA. Our method promotes the accuracy compared to one-dimensional PCA, two-dimensional PCA, and two-directional two-dimensional PCA.

#### 1. Introduction

The principal component analysis, one of the most popular* multivariate statistical techniques* [1], has been widely used in the areas of pattern recognition and signal processing [2]. It is a statistical method under the broad title of * factor analysis* [3]. The modern instantiation PCA was formalized by Hotelling [1, 4] who also coined the term* principal component*, but in fact we can trace its origin back to [5] or even Cauchy [6]. PCA analyzes the observed data which is usually described by several dependent and intercorrelated variables. Its goal is to extract the important information from the data and to express this information as a set of new orthogonal variables called principal components.

There are numerous PCA-based methods for face recognition, from one-dimensional PCA [7] to two-directional two-dimensional PCA known as (2D)^{2}PCA [8]. All these methods rely on two points. Firstly, the pattern of similarity of the observations and the variables can be represented as points in maps by PCA [2, 9, 10]. Secondly, the similarity of face images can be in some sense “calculated” by evaluating the distance of these points.

The main idea of one-dimensional PCA method for face recognition is* eigenspace projection*. A projection matrix is obtained by maximum the image covariance, which shows the correlation between pixels in each training data (or say labeled face image). The next step is projecting the 1D vectors (previously constructed from 2D images) into the feature space [11]. In addition, the eigenvectors corresponding to large eigenvalues (or say the principle components), which would resemble a human face after transforming into matrix of the same size of the original face image, are called* eigenface*. Then the nearest neighbor (NN) classifier is adopted by computing the distance in the eigenspace to verify the identity of unlabeled face images. For instance, we would be sure that the face belongs to the 1st individual, if an unlabeled face image is nearest to one of the 1st individual’s labeled face images in the eigenspace. However transforming 2D images into 1D vectors always leads to a very high-dimensional space, in which the calculating of the covariance matrix, which shows the correlation of pixels, is difficult. The size of the covariance matrix achieves , if the size of face images is . Hence, it would consume a lot of time to evaluate the eigenvectors of a such large size covariance matrix.

Two-dimensional principal component analysis (2DPCA) [12], as opposed to eigenface, projects face images into a subfeature space directly without image-to-vector conversion. This direct projection not only enables the preservation of partial image spatial information but also reduces computational burden [13]. The so-called image covariance matrix of 2DCPA, which is constructed directly using the original face image matrixes, is much smaller than the covariance matrix of eigenface method. In 2DPCA, the image covariance (scatter) matrix, which is somehow the same as the covariance matrix in the eigenface, shows the correlation of each column of each image. Motivated by 2DPCA, (2D)^{2}PCA [8] calculate the correlation from two directions of both of the columns and rows. 2DPCA and (2D)^{2}PCA have achieved good results in face recognition. However these methods fail to fully explore the local spatial information.

In order to further explore the local spatial information, let us take a look at the track of existing methods. Eigenface method only calculates the correlation of pixels, while the 2DPCA only calculates the correlation of columns. And (2D)^{2}PCA calculates the correlation of both columns and rows in the same time. Accuracy is promoted from one-dimensional PCA to (2D)^{2}PCA, when the basic unit is changing from pixels to both columns and rows. Then what is the best basic unit if this evolution continues? We believe that patch is the most meaningful basic unit for these linear classification methods (e.g., people is discerned by eyes and nose). The local spatial information of eyes and nose is contained in the patches. So it is more intuitive to consider the correlation of different patches. From another aspect, patch is successfully used in the field of image processing recently, not only face recognition [14–16] but also image denoising [17–19], image superresolution [20, 21], and image decomposition (cartoon-texture [22, 23] or illumination-reflectance [24] and further retinex image enhancement [25]). Patch is becoming a basic tool in these above-mentioned literatures. Motivated by our idea that the patch is the most meaningful basic unit for these linear classification methods and the widely successful application of patch, we intend to calculate the correlation of the patches in the computation of our PCA.

For the purpose of calculating the correlation of the patches, we simply add patch preprocessing before the frame work of 2DPCA. That is, we first divide the face images into patches and then we convert these patches into columns. The columns, in 2DPCA frame work, are substituted by our patch-unfold-columns, so the correlation between columns in the 2DPCA becomes the correlation between patches after calculating the image covariance (scatter) matrix. Then the orthonormal eigenvectors of the image covariance (scatter) matrix can be the optimal projection axes which are used for feature extraction. The optimal projection axes are used to form a matrix, which is called the feature matrix or feature image of the training images [12]. The test images are projected on this projection matrix and then classified by finding out the nearest neighbor of the projections of the test images. We call this method* patch-based principal component analysis* (PPCA). As a result, the main contribution of the proposed method is that the most meaningful basic unit patch is incorporated in the frame work of 2DPCA, so that the correlation between the most meaningful basic units is utilized to promote the accuracy rate. This is confirmed our experiments. Besides, the proposed method can be easily implemented.

In fact, we can choose the support vector machine (SVM) as classifier and this may improve the accuracy rate. But SVM is not necessary for the comparison among our method and eigenface method, 2DPCA, and (2D)^{2}PCA. In another aspect, we know that PCA is one of global techniques [26], so that it is difficult to utilize both the local spatial correlation between pixels in each patch and the nonlocal spatial correlation between patches as [17]. But we consider that the global computation could somehow compensate the utilization of the nonlocal spatial correlation between patches.

It is noteworthy that there has been great progress of face recognition nowadays. It is very hard for an improved version of an old method to challenge the recent deep learning [27, 28] based methods. Please refer to [29] for a more extensive overview on face recognition. However the improvement of an old method is still meaningful, since that many old methods are being widely employed, e.g., the alternating direction method of multipliers (ADMM) [30–35] and block coordinate decent (BCD) algorithm [36]. Meanwhile, what we focus on is the improvement of the PCA-based classification method. Moreover, the experimental results in Section 3 have indeed validated that our method outperforms other PCA-based methods.

The outline of this paper is given as follows. In Section 2, we present our PPCA method for face recognition. In Section 3, experimental results are reported to demonstrate the performance of the proposed method. Finally, some conclusions are drawn in Section 4.

#### 2. Patch-Based Principal Component Analysis

In 2DPCA, an image matrix of size is directly projected on -dimensional unitary column vectors: . By maximizing the total scatter , we obtain the projection matrix. Then the following steps are feature extraction and classification. Our PPCA just adds a patch preprocessing prior to this frame work above. Then, same as 2DPCA, we calculate the image covariance matrix and optimal projection axes for feature extraction and classification.

##### 2.1. Patch Preprocessor

Suppose that we have training facial images. For the -th training sample, we divide the image of size into patches of size (, ). If (or ) is not divisible by (or ), we would add overlap (or ), so that (or ) would always be integer no matter the choice of (or ). Generally, for the sake of reducing computational burden, we choose the smallest one of the overlaps for each selected (or ). Then we can get the number of patches of every face image:or with the overlap ()

Then we convert each patch into a column vector of size :More details about the patch-to-vector conversion are given in the Section 3. Then let represent all of the reshaped vectors of the -th training facial imagewhere the size of is , and .

It should be noted that we adopt the 2D-PCA framework rather than (2D)^{2}PCA. As mentioned before, (2D)^{2}PCA takes both the correlations of columns and rows into consideration, while the 2DPCA method concentrates on the correlations between column vectors. Meanwhile our patch preprocessing convert patches into vectors. Therein, it is reasonable to adopt the 2D-PCA framework rather than (2D)^{2}PCA.

##### 2.2. Total Scatter

Let be a matrix with orthonormal columns, . Then we project matrix of size onto by the following linear transformation [37, 38]: is an -dimensional projected vector (i.e., the projected feature vector [12]) of matrix . Same as 2DPCA, we use the total scatter of the projected samples to measure the discriminatory power of the projection matrix : Let us definewhich is called the image covariance (scatter matrix). The average matrix of all the preprocessed images isThen can be evaluated byIt is easy to verify that is a semipositive matrix. We can evaluate directly using the training samples. The total scatter of the projected samples can be expressed bywhere is a unitary column vector. This is called* generalized total scatter criterion* [12]. The unitary vector is called the optimal projection axis that maximizes the criterion.

##### 2.3. Optimization

It has been proved that the optimal projection axis , which maximizes the total scatter of the projected samples, is the eigenvectors of corresponding to the largest eigenvalues [38]. In general, we choose the orthonormal eigenvectors of corresponding to the first largest eigenvalues. They are equivalent toThe first eigenvector is required to have the largest possible variance (i.e., this component will “explain” or “extract” the largest part of the pattern information of the preprocessed face images [1]). We can simply control the value of by a threshold as follows [8]: where () are the first largest eigenvalues. We can determine by presetting or even referring to the results from different face database.

##### 2.4. Feature Extraction and Classification

For each* patch-preprocessed* facial image in training set , letwhere of size is the projection matrix. We call of size the* patch-based feature matrix* and () the* patch-based principal components (**)* of the -th sample image.

After patch preprocessing and 2DPCA projection, facial images in the training set have been transformed into the* patch-based feature matrixes*. We use the nearest neighbor (NN) classifier [39] for classification. We define the distance between two arbitrary patch-based feature matrixes bywhere denotes the Euclidean distance.

We have training facial images, each of which is assigned a given identity. Given a test facial image, we first do a patch preprocessing and obtain a preprocessed matrix . Then we project onto and obtain . Ifwhere is a preset thresholding, the test image results to the same kind of , that is, the test facial image and the -th training image, belongs to the same person. Otherwise, if , the test sample does not belong to any identity in this training data.

#### 3. Experimental Results

In this section, the performance among our proposed PPCA and the eigenface method (or say the 1DPCA method), the 2DPCA method, and the (2D)^{2}PCA method is evaluated on two well-known face image databases (ORL and FERET). To our point of view, experiments on constrained face databases are sufficient to validate the superiority of the proposed method among these methods. Thus, unconstrained face databases, for example, LFW database, are not taken into consideration.

First, the recognition accuracies of these four methods are compared with the experimental strategy that use half of the images in the database for training. After that, more experimental results show the influence from reordering and the size of patches. All experiments are performed using Matlab (R2014a) on a desktop with 3.40 GHz Intel core i7-2600 CPU and 12 GB RAM equipped with Windows 7 OS. If not specified, the preset threshold , which controls the number of projection vectors, is set to 0.90 in the latter experiments. That is, we extract 90 percentage energy of the whole training images.

##### 3.1. Recognition Accuracy Results on the FERET Database

The FERET database [40, 41] is a standard dataset used for facial recognition system evaluation. The Face Recognition Technology (FERET) program is managed by the Defense Advanced Research Projects Agency (DARPA) and the National Institute of Standards and Technology (NIST). Until 2003, there are 2,413 facial images representing 856 individuals in the FERET database. The performance of the above 4 methods are tested on the partial FERET face database, which contains 400 images (with the cropped size ) from 200 individuals, each providing 2 different images. The so-called** fa** subset, which contains 100 images, is used as training data, while the so-called** fb** subset, containing remaining 100 images, is used as testing data. Figure 1 shows 2 images of one individual in the ORL database.