Abstract

Low resolution (LR) in face recognition (FR) surveillance applications will cause the problem of dimensional mismatch between LR image and its high-resolution (HR) template. In this paper, a novel method called kernel coupled cross-regression (KCCR) is proposed to deal with this problem. Instead of processing in the original observing space directly, KCCR projects LR and HR face images into a unified nonlinear embedding feature space using kernel coupled mappings and graph embedding. Spectral regression is further employed to improve the generalization performance and reduce the time complexity. Meanwhile, cross-regression is developed to fully utilize the HR embedding to increase the information of the LR space, thus to improve the recognition performance. Experiments on the FERET and CMU PIE face database show that KCCR outperforms the existing structure-based methods in terms of recognition rate as well as time complexity.

1. Introduction

Face recognition (FR) has been widely studied for decades due to its great potential applications. Many technologies focused on dealing with complex conditions such as aging, occlusion, disguise, and variations in pose, illumination, and expression. Although the recognition accuracy of face recognition in controlled environments with cooperative subjects is satisfactory, the performance in real applications such as surveillance and mobile service is still an unsolved problem partially due to low-resolution (LR) image quality [1]. With the growing installation of embedded cameras in many places, there are increasing demands for face recognition in interaction and surveillance applications from interactive processing cameras in mobile communications, small-scale stand-alone cameras in banks and supermarkets, to large-scale multiple networked close-circuit televisions in public streets. In such cases, subjects are far from cameras without any restriction, and face regions tend to be small or poor quality. This issue is called low-resolution face recognition (LR FR).

Gallery images are generally assumed to be high resolution (HR) in practical face recognition applications. Therefore, LR will obviously cause the problem of dimensional mismatch between HR gallery images and LR probe ones in the special applications such as criminals monitoring and mobile communication with their HR images in network databases. To deal with the problem, three general ways can be considered as follows. The first way is down-scaling HR gallery image, which seems to be a feasible solution for the mismatching problem. Unfortunately, it drastically reduces the amount of available information, especially the high-frequency information mainly for recognition when the resolution is very low such as or even . Secondly, upscaling LR probe image, such as interpolation, is conventionally adopted in most of the subspace-based face recognition methods. Some researchers tried to propose some resolution-robust feature representation methods, which is to extract the discriminative information from LR images (or upsampled versions) directly. The landmark works are RQCr color feature [2], local frequency texture descriptor [3], and novel kernel correlation feature analysis [4]. However, they are sensitive to resolutions more or less and fail to complete full resolution robustness. For further refined solution, the widely used super resolution (SR) or face hallucination [5] technique can be employed to estimate HR faces from LR ones. However, they usually require a lot of images which belong to the same scene with precise alignment and also need large time consumption. In addition, most of these methods aimed to improve face appearance but failed to optimize face images from recognition perspective. Recently, a few attempts were made to achieve these two criteria under very LR case [6]. Also some researchers studied simultaneous super resolution and recognition for providing a promising way to solve this problem [7, 8], though they also need high-computational cost.

The third way is unified feature space, also called inter-resolution space, which is used to project HR gallery images and LR probe ones into a common space [9]. Here we call it as structure-based method, which is an essential way to solve the mismatch problem. It aims to build the holistic structure to facilitate direct matching between them from classification perspective. For example, Li et al. [10] proposed a general framework called coupled mappings (CMs) to project LR and HR face images into a unified feature space. The problem with this method is that it offers poor classification ability. Thus, they further introduced locality weight relationships [11] into CMs and proposed coupled locality preserving mappings (CLPMs), which significantly improved the performance. However, how to efficiently solve the eigenvalue decomposition in the projection matrix computation remains unsolved in CLPMs, which is the common problem in the structure-based method. Furthermore, CLPMs is sensitive to the parameters of the penalty weighting matrix such as scale and neighbor. Subsequently, other methods are developed to improve the classification performance of CMs. Deng et al. [12] considered two other relationships in LR classification (LR versus LR and HR versus HR) besides LR versus HR used in CMs. Zhou et al. [13] proposed simultaneous discriminant analysis method by introducing linear discriminant relationships [14] between intraclass scattering and interclass scattering into CMs. Ren et al. [15] integrated nonlinear kernel trick into CMs/CLPMs, which transferred LR/HR pairs from the input space to the kernel-embedding space. Although these methods are proven to be discriminative and efficient, they still require much more computational costs for system implementation.

Inspired by the works in [15, 16], we propose a novel structure-based method to deal with LR FR, called kernel coupled cross-regression (KCCR), which adopts two mapping matrices to project LR and HR face images represented by nonlinear kernel into a unified feature space for the final feature matching, shown in Figure 1. KCCR firstly transforms LR and HR images from the original observing space into the low-embedding space obtained by linear graph embedding technique [17]. Meanwhile, the features of images are represented by nonlinear kernel tricks to increase the computational power of the linear embeddings. Spectral egression (SR) [18] is then introduced to generalize the model and reduce the time complexity. Furthermore, cross-regression is developed to fully utilize the HR embedding to increase the information of the LR space, by which two more discriminant coupled projection matrices are obtained. The transformations can be learned offline from the training images. During testing, the input images are transformed using the two coupled projection matrices and then matching. Therefore, KCCR that combines cross-regression with kernel is more efficient and effective and can achieve better performance than the existing structure-based methods as well as some subspace learning methods or kernel-based methods. It will be shown in the experiments with the FERET and CMU PIE face database.

There are three main contributions in this paper.(1)We build the transformations in the low-embedding space, which is totally different from other existing structure-based methods performing in the original observing space, and the time complexity is reduced from cubic to linear by spectral regression.(2)We creatively propose cross-regression technique to improve the discriminant ability, which helps to extract more high frequency or local information among HR samples for LR matching.(3)Since nonlinear kernel functions have been successfully used to build more efficient coupled mapping models in [15, 19], we also introduce the kernel trick into our cross-regression model and further increase its recognition performance.

The rest of this paper is organized as follows. Section 2 presents our proposed KCCR method. In Section 3, we give an analysis of generality about KCCR and their connections to linear graph embedding and coupled mappings. Experimental results on the publicly used databases are shown in Section 4. Section 5 concludes the paper.

2. Kernel Coupled Cross-Regression

LR results in the problem of dimensional mismatch between HR gallery images and LR probe ones, which will directly lose a lot of useful information for optimal classification. Thus, it is feasible to project LR/HR pairs into a unified feature space. Moreover, such mappings are required to be most discriminative for feature matching, and the computation should be efficient. Our KCCR method is just designed to achieve these goals based on kernel coupled mappings and spectral regression.

2.1. Problem Description

The idea of coupled mappings aims to obtain two projection matrices and with sizes of and to realize the minimization problem as shown in (1), where and represent HR and LR face images, respectively and is the total number of samples.

CMs model [10] aims to obtain a unified feature space at the pixel or feature level. Therefore, the performance of CMs essentially depends on the contents or features of face images. In other words, CMs model itself does not bring any new information for LR FR. Here, we introduce a low-dimension embedding of , denoted as , to transform LR and HR into a low-embedding space based on spectral regression instead of an observing pixel space. Then it will obtain more information about LR samples by exploring the relationships among HR samples and improve the efficiency of CMs model drastically:

2.2. Graph Embedding and Regression Framework

Given a graph with vertex set and similarity matrix ; each vertex represents a face image and represents the weight of the edge joining vertices and . The graph can then be used for characterizing various statistical or geometric properties of the data set. Therefore, graph embedding [17] is to represent each vertex of the graph as a low-dimensional vector that preserves similarities between the data point pairs, and regression is used for solving the optimization problem of graph embedding, which reduces the time complexity. Based on these points, spectral regression [18] is introduced into CMs, which makes the CMs model more efficient for LR FR.

Let be the low-dimensional embeddings of the vertex set , which means if vertices and are close then and are also close. The optimal is given by minimizing where is the graph Laplacian matrix and is a diagonal matrix with . A constraint is then imposed for removing the arbitrary scaling factor in the embedding:

The optimal can be obtained by solving the maximum eigenvalue problem:

From the classification perspective, a relationship between and should be introduced, that is, via a linear function. Equation (4) can be written as

The optimal can be obtained by solving the maximum eigenvalue problem: This framework is called linearization of graph embedding (LGE). With different choices of similarity matrix , LGE will lead to many popular linear subspace learning methods such as linear discriminant analysis (LDA) [14] and locality preserving projection (LPP) [11]. For LDA and LPP, is defined as follows.

LDA. Suppose the th class has samples:

LPP. Let denote the set of nearest neighbors of .

2.3. The Proposed KCCR Method

Inspired by the work of Lei and Li [16] adopting two embeddings to describe heterogeneous face data, we introduce and to represent the embeddings of LR and HR face data, respectively. In the embedding solution, LDA aims to preserve the global relationships between data points in favor of LR face space mainly containing low-frequency information, while LPP aims to preserve the local relationships between data points in favor of HR face space mostly depending on high-frequency features. Therefore, compared with the representing LR and HR space both by LPP in CLPMs [10] and by LDA in SDA [13], and are obtained by LDA and LPP, respectively, in our CCR method for making full use of the combination of global and local information in data. Furthermore, in order to utilize the HR embedding by LPP to increase the information of the LR space by LDA, coupled cross-regression is designed for exploring the relationships between LR/HR data sets and their crossed low-embedding . With linear assumption, the problem is simplified to find the two projection matrices and for LR/HR data sets as follows:

To solve the solution and , the classical least square method is used for minimizing the cost. Besides, due to limited training samples, the variance of the estimation may be large, and thus the estimation is not reliable. To overcome this problem, penalties are imposed on the norm of and as in ridge regression. The regularization parameter , controls the trade-off between the bias and variance of the estimation. Based on these discussions, the objective function is formulated as follows:

By requiring the derivatives of the objective function with respect to and vanish, we get

Suppose that the Euclidean space is mapped to a Hilbert space through a nonlinear mapping function . Let denote the data matrix in the Hilbert space. Here, we replace LR and HR images with their nonlinear feature vectors; that is, and , which are induced by a kernel function ; that is, . Here is the kernel function such as the typical Gaussian kernel . Thus, the objective function can be replaced as follows:

Equation (12), therefore, can be changed into where and are the identity matrices with the size of and , respectively, and .

By using the matrix manipulation in kernel ridge regression [20], we have and, therefore, where and . Naturally, corresponding to eigenvectors and to (5), we can obtain coupled cross mapping matrices and to project LR/HR pairs into a unified discriminative space and thus for classification. It is worth noting that KCCR and CCR in the current version are specifically designed for single resolution case. For practical face recognition applications where the resolution of probe faces is likely to be changed, the two projection matrices for multiple resolutions case, for example, , and , are needed for handling this problem. However, how to determine variable in is a very difficult problem in this context. One possible solution is to select different pairs according to the nearest resolution. For example, when the probe image has the resolution of , or can be used to determine for constructing coupled mappings. Similarly, or can be used when the probe resolution is between and .

2.4. Algorithm Summarization

Similar to CMs, the proposed KCCR also consists of two main phases, which is summarized in Table 1. One is offline training phase, including three parts: learning the two low-embeddings and , computing the two coupled cross projection matrices and and transforming HR gallery images into the unified space. The other is online testing phase, including two parts: transforming LR probe images into the unified space and performing feature matching.

3. Generality Analysis about KCCR

In this section, we present an analysis of generality about KCCR and CCR and their connections to linear graph embedding (LGE) methods and coupled mappings (CMs) models. We divide the baseline methods into four groups: original, kernel, original CMs, and kernel CMs, which are shown in Table 2. In the original methods, LDA [14] and LPP [11] are two typical LGE methods, and spectral regression (SR) [18] introduces regression into LGE framework to solve the optimization problem. Obviously, the kernel tricks are induced into LGE and SR methods, that is kernel-based methods, such as kernel LDA [21], kernel LPP [22], and kernel SR [23], respectively.

The former two groups are correspondingly used to describe the relationships between samples in CMs models, and then some representative structure-based LR FR methods are proposed, that is, original CMs and kernel CMs. In the original CMs group, coupled locality preserving mappings (CLPMs) [10] and simultaneous discriminant analysis (SDA) [13] improve CMs based on LPP and LDA, respectively. In this paper, we use SR technique instead of LPP or LDA to describe the relationships and propose coupled cross-regression (CCR) method. The recently proposed coupled kernel embedding (CKE) method [15] and our kernel CCR method introduced the kernel trick into CLPMs and CCR model, respectively, and expanded the ideas of nonlinear dimensionality reduction for LR FR problem in the fourth group. Naturally, SDA method can also be improved by kernel, which is the future work.

From the above analyses, we can know that the original version and the kernel one of LGE methods can be successfully embedded into CMs models to discover the relationships between LR and HR feature space. It is worth noting that our KCCR method based on cross-regression and kernel technique shows more efficiency than other baseline methods.

4. Experimental Results and Discussions

To validate the effectiveness and efficiency of the proposed method for LR FR, and probe images in FERET [24] and CMU PIE [25] face database are adopted for experiments to compare with the original and kernel linear graph embedding (LGE) methods and the original and kernel coupled mappings (CMs) methods, which are shown in Table 2.

4.1. Databases and Settings

The training set of FERET contains 1002 frontal face images from 429 subjects. The experiments are based on the standard gallery (1196 images) and the probe set “fafb” (1195 images). For CMU PIE, 1428 frontal view face images with neutral expression and illumination variations in 68 subjects (21 images per subject) were selected in our experiment. For each subject, 3 images are randomly selected for training and the remaining 18 images for testing. Here, only one image with no flash illumination is taken as gallery, while the probe set contains the images with different illuminations (17 images per subject).

One sample in FERET and CMU PIE with the HR size and , respectively, and the corresponding LR size and are shown in Figure 2. In all experiments, HR face images are aligned by the positions of two eyes and normalized to zero mean and unit variance. Then the preprocessed HR face images are synthetically translated by random subpixel amounts, blurred with a Gaussian filter, and down-sampled to generate LR face images, as [5] does.

4.2. Results and Analyses

We evaluate the performance of our KCCR and CCR with some representative LGE methods, specifically, NN-LDA/LPP/SRLDA (LDA [14]/LPP [11]/SRLDA [18] with first restoring the LR probe images by nearest neighbour (NN) interpolation) and HR-LDA (LDA with the HR probe images, denoted as HR in the following sections). Furthermore, the kernel versions of these LGE methods are used for comparison, such as KDA [21], KLPP [22], and SRKDA [23]. Meanwhile, other representative state-of-the-art structure-based methods for LR FR are taken for evaluation, that is, CLPMs [10], SDA [13], CKE [15].

From the recognition performances in FERET with the probe size and shown in Figures 3 and 4, respectively, the recognition results of different methods are obviously classified into four levels. That is, HR method and the kernel versions of CMs methods such as KCCR and CKE rank the first level, and the original CMs methods such as CLPMs, SDA, and CCR follow subsequently. Then the kernel versions of LGE methods with LR probe images rank the third. Finally, LGE methods with LR probe images obtain the worst results. This phenomenon is more obvious for .

For , although all CMs methods obtain relative close performance, our KCCR method achieves the Rank-1 recognition rate of 90.8%, which still outperforms the other kernel CMs method, that is, CKE (90.6%), and CCR (90.5%) is better than other original CMs methods such as CLPMs (89.7%) and SDA (90.2%). In particular, the result of KCCR is even close to HR method (90.9%). On the other hand, the recognition rates of kernel and original LGE methods are shown as follows: KDA (78.6%), KLPP (74.1%), SRKDA (80.3%), LDA (71.7%), LPP (65.2%), and SRLDA (74.6%), respectively. In general, the kernel LGE methods are relatively better than the original ones but inferior to the proposed KCCR and CCR.

For , although the performances of all methods more or less decline, our KCCR and CCR methods still achieve 80.2% and 75.9%, which outperform CKE (78.8%), CLPMs (68.7%), and SDA (73.5%), respectively, and the performance gap between them is larger than that in . It implies that KCCR and CCR have a great potential in solving very LR problem. Compared with CMs methods, the performances of LGE methods, that is, KDA (57.7%), KLPP (44.1%), SRKDA (60.3%), LDA (44.9%), and SRLDA (50.4%), are degraded as a result of LR, especially LPP which only achieves the accuracy of 27.8%. Therefore, our KCCR and CCR methods are competitive for LR FR task.

Furthermore, Figures 5 and 6 show the recognition performances in CMU PIE with the probe size and , respectively. The results of different methods are similar to those in FERET. That is, our KCCR & CCR method still outperforms the baseline methods, though the performances of all the methods are slightly degraded by different illumination variations involved in CMU PIE. However, the baseline methods are affected much more as the resolution declines.

4.3. Model Selections

The , are two essential parameters in our CCR method which control the smoothness of the estimator under LR and HR cases, respectively. When the sample vectors are linearly independent, CCR provides the same solution as the coupled cross version of LGE as , decrease to zero. Also the relationships between and determine the performance of the model to some extent. In this subsection, we try to examine the impact of the two parameters on the performance of CCR.

In experiments, we find that the smoothness parameters mainly depend on different resolutions. For example, CCR obtains the best performance as , are set to 0.001 and 0.01, respectively, at the resolution of , while and at . Obviously, is also a possible condition to guarantee that HR samples can provide more information for LR classification in our CCR model. Therefore, we take the performance of CCR as a function of under different resolutions, which is shown in Figure 7. For convenience, the -axis is plotted as ( denotes ) which is strictly in the interval . It is easy to see that CCR can achieve significantly better performance than CLPMs over a large range of especially at the resolution of , which further shows the potential of our CCR method in solving very LR problem. Thus, the parameter selection is not a very crucial problem in our model. The selection of , in KCCR model shows similar property to CCR, although they are slightly different. Here, we empirically set , as 0.0001, 0.001 at the resolution of , and 0.0005, 0.001 at .

In addition, how to select appropriate kernels for LR face data is a more important problem in the KCCR, which is still an unsolved problem without any selection criteria in theory for all kernel-based methods. In this paper, Gaussian kernel is selected for building kernel representation for fair comparisons in previous experiments. Empirical analysis shows that kernel-based methods are usually sensitive to the selection of the hyperparameters, that is, the width of Gaussian kernel . Thus, finding the hyperparameters with a good generalization performance is crucial for a successful application. In our experiments, we adopt the same strategy used in [17] to select the optimal kernel parameter. The parameter is set as , , where is the standard deviation of the training samples. The best result is selected from the 21 configurations by cross-validation. The details can be referred as in [17]. However, it is unclear how to choose the optimal , which is still a future work for kernel-based models effectively used in LR FR applications.

4.4. Time Complexity Analyses

The average time of each method for recognizing per face images on CPU with 2.33 GHz is shown in Table 3. The properties of time complexity with and are similar. Here, we just give some analyses with . Although the time requirement of the original CMs methods (with time complexity) is generally higher than the original LGE methods (with time complexity), compared with other CMs methods such as CLPMs and SDA, the speed of our CCR method (with time complexity) is relatively close to SRLDA and even faster than LDA and LPP. With the introduction of kernel trick, the time complexity of kernel LGE method increases slightly compared with the original ones. The same situation occurs at CKE and KCCR with the time complexity of . However, our KCCR combining with the regression framework for LR FR task shows more efficiency than CKE or other baseline methods except for SRLDA and SRKDA.

5. Conclusions

In this paper, a novel structure-based method called kernel coupled cross-regression (KCCR) is proposed for LR FR. Our method combines coupled mappings model and graph embedding analysis to describe the relationships between LR and HR in low-embedding space. Kernel trick is further adopted for describing these relationships in an infinite dimensional and nonlinear space. Moreover, cross-regression is developed to fully utilize the HR embedding to increase the information of the LR space. As a result, the proposed KCCR method is more discriminative and faster. It outperforms LGE methods with LR probe images, the kernel versions of LGE methods, and other existing structure-based methods such as CLPMs, SDA, and CKE, especially in time complexity.

Acknowledgments

This work is supported by the National Key Technology R&D Program of China (2012BAH01F03), the National Natural Science Foundation of China (60973061), the National Basic Research (973) Program of China (2011CB302203), Ph.D. Programs Foundation of Ministry of Education of China (20100009110004), and China Postdoctoral Science Foundation (2013M530020).