Abstract

Multiple imaging modalities based face recognition has become a hot research topic. A great number of multispectral face recognition algorithms/systems have been designed in the last decade. How to extract features of different spectrum has still been an important issue for face recognition. To address this problem, we propose a robust tensor preserving projection (RTPP) algorithm which represents a multispectral image as a third-order tensor. RTPP constructs sparse neighborhoods and then computes weights of the tensor. RTPP iteratively obtains one spectral space transformation matrix through preserving the sparse neighborhoods. Due to sparse representation, RTPP can not only keep the underlying spatial structure of multispectral images but also enhance robustness. The experiments on both Equinox and DHUFO face databases show that the performance of the proposed method is better than those of related algorithms.

1. Introduction

Multibiometrics can be considered the fusion of different sensor modalities in a single recognition system. The reason of using two or more sensor modalities is to improve the recognition accuracy. Multiple imaging modalities based face recognition has become a hot research topic [17]. Recent studies have shown that multispectral face recognition offers several advantages, such as invariance to illumination changes [8, 9]. Multispectral image also reveals anatomical information of a subject [1]. Socolinsky and Selinger [3, 10] developed different recognition algorithms on visible and thermal infrared face image databases and obtained good performances. Chen et al. [11] tested the effect of illumination, facial expression, and passage of time between the training and testing images. Wang et al. [12] showed that color space combination represents a viable approach for improving face recognition performance. The image-based fusion designed in the wavelet domain and the feature-based fusion developed in the eigenspace domain were shown in [13]. Heo et al. [14] proposed to fuse visual and thermal images for robust face recognition. Multisensory biometric fusion algorithms were investigated for personal identification [15]. Pan et al. [4, 5] analyzed the facial tissue spectral measurements in the near-infrared spectral range (0.7 μm–1.0 μm) for face recognition. Denes et al. [6] tested the spectral asymmetry with three visible bands (0.6 μm, 0.7 μm, and 0.8 μm). Chang et al. [16] fused the multispectral images in the visible spectrum (0.4 μm–0.72 μm) into a single image to enhance face recognition accuracies. Chou and Bajcsy [7] preprocessed the multispectral images (visible: 0.4 μm–0.72 μm and near-infrared: 0.65 μm–1.1 μm) by principal component analysis (PCA) to perform face detection. Based on visible images, Wong and Zhao [17] adopted kernel PCA to remove eyeglasses of thermal face images.

The above algorithms are mainly developed to preserve the global structure information of the multispectral data. They do not clearly treat the manifold structure of the data. However, research results of manifold learning algorithms presented in the past decade demonstrate that the local geometric structure is more important than the global structure since the high-dimensional data often lies on the low-dimensional manifold [18]. Due to the low-dimensional manifold structure of the face images, the manifold-learning-based linear dimension reduction algorithms [49, 1214, 1924] become popular.

Since these linear feature extraction algorithms cannot deal with high-order tensor data, some of these algorithms were further extended to be multilinear cases, and a lot of tensor-based manifold learning algorithms were proposed by using higher order tensor decomposition [2527]. Within the past ten years, there has been great interest in high-order tensor feature extraction, and the tensor-based methods have been popular in computer vision and pattern recognition [2831]. For example, Igarashi et al. proposed tensor subspace analysis (TSA) [32] for second-order learning. Dai and Yeung proposed tensor NPE (TNPE) [19]. Recently, orthogonal tensor neighborhood preserving embedding (OTNPE) was proposed for facial expression recognition. Some variations were also proposed for gait recognition, action recognition, and so forth. For more details, please see the latest survey of multilinear subspace learning [13].

Recent research demonstrates that the high-order tensor based manifold learning algorithms, such as tensor preserving embedding (tensor NPE) [20], can obtain better performance than the classical feature extraction algorithms on tensor data set. Unfortunately, the tensor data contain large quantities of information redundancy and thus not all the features/variables are important to feature extraction and classification [2123]. It was shown that integrating sparse representation and manifold learning for feature extraction may obtain better performance [24]. It has been shown that the sparse representation methods can obtain better performance than their corresponding nonsparse methods in the real data. And these sparse methods can give an intuitionistic or semantic interpretation for the transformed features [25].

Till now, the field in high-order tensor data embedding with sparse manner has not been widely investigated and how to extend the manifold learning algorithms integrating sparseness and manifold structure for multispectral face recognition is unsolved. In this paper, motivated by tensor data embedding and sparse representation, we propose a novel method called robust tensor preserving projection (RTPP) for multispectral image feature extraction. The multispectral image is considered a third-order tensor. The aim of RTPP is to obtain transformation matrices through preserving the sparse information of the third-order tensors.

The rest of the paper is organized as follows. In Section 2, we give the related definitions to tensor. In Section 3, the introduction of tensor locality preserving projection is provided. In Section 4, a novel sparse tensor embedding method is presented. Experiments are carried out to evaluate the proposed tensor learning method in Section 5, and conclusions are given in Section 6.

2. Tensor Fundamentals

A tensor is a multidimensional array. It is the higher order generalization of scalar (zero-order tensor), vector (1st-order tensor), and matrix (2nd-order tensor). In this paper, lowercase letters (i.e., , , ) denote scalars, bold lowercase letters (i.e., ) denote vectors, uppercase letters (i.e., , , ) denote matrices, and bold uppercase letters (i.e., ) denote the tensors. It is assumed that the training samples are represented as the th-order tensor , where denotes the total number of training samples.

Definition 1. The inner product of two tensors is defined as . The Frobenius norm of a tensor is then defined as . And the distance between two tensors is defined as .

Definition 2. The -mode flattening of the th-order tensor into matrix , that is, , is defined as , .

Definition 3. The -mode product of a tensor by a matrix , denoted by , is an -tensor of which the entries are given by .
The aim of tensor learning algorithm is to obtain a set of projection matrices and map the original tensor into a new tensor:

3. Tensor Neighborhood Preserving Embedding

Let be the multispectral face images in a high-order tensor form and , , the number of individuals. Assume that are from an unknown manifold embedding in a tensor space . The aim of tensor NPE is to find optimal transformation matrices such that the local topological structure of is preserved and the intrinsic geometric property is effectively captured. The optimal transformation matrices project high-dimensional into low-dimensional representation , where .

We construct a neighborhood graph to represent the intrinsic geometric structure of and apply the heat kernel to define the affinity matrix as where denotes the set of nearest neighbors of and is a positive constant. The affinity matrix is then normalized such that each row sums to one. In order to preserve the geometric structure explicitly, we define the following objective function based on the Frobenius norm of a tensor: To eliminate an arbitrary scaling factor in the projection matrices, we impose the following constraint: . Then the optimization problem for tensor NPE can be expressed as Note that this optimization problem is a high-order nonlinear programming problem with a high-order nonlinear constraint, making direct computation of the projection matrices infeasible. In general, this type of problems can be solved approximately by employing an iterative scheme which was proposed for low-rank approximation. The optimization problem in (4) can be solved by such an iterative scheme. Assuming that are known, let . In addition, since and based on the properties of tensor and trace, we rewrite the optimization function and the constraint in (4) as follows: Thus, the optimization problem in (4) can be reformulated as The unknown transformation matrix can be obtained by solving the eigenvectors corresponding to the th smallest eigenvalues in the generalized eigenvalue equation The other transformation matrices can be obtained in a similar manner.

3.1. Tensor Locality Preserving Projection

Different from tensor NPE, the optimization problem for tensor LPP can be expressed as In general, the larger the value of is, the more important the tensor is in the embedded tensor space for representing the original tensor . It is easy to see that the objective function will give a high penalty if neighboring tensors and are mapped far apart. Thus if two tensors and are close to each other, then the corresponding tensors and in the embedded tensor space are also expected to be close to each other.

The optimization function of tensor LPP can be formulated as follows: Moreover the transformation matrix can be computed by solving the eigenvectors corresponding to the th smallest eigenvalues in the generalized eigenvalue equation

4. Robust Tensor Preserving Projection

Sparse representation algorithms have been widely studied in signal processing, computer vision, and pattern recognition. Wright et al. [25] used sparse representation for robust face reconstruction and recognition, Qiao et al. [26] proposed sparse preserving projections, and Cheng et al. [27] used the graph for image clustering. As demonstrated in [26, 27], the graphs constructed by the norm have the advantages of greater robustness to noise and information redundancy. In the following, we fuse the sparse representation with tensor feature extraction.

4.1. Sparse Tensor Representation

In this part, we present the sparse representation for the tensor data . Let be the optimal sparse representation coefficients. Sparse representation assumes that the training sample can be sparsely represented as a linear combination of the other data. Based on this assumption, the following sparse optimization problem was proposed in [25]: where matrix is the representation coefficient matrix satisfying and denotes the th row vector of . The parameter is a small constant set by users. However, the above optimization problem is NP hard. One can use the convex relaxation method to the NP-hard problem and solve the following optimization problem: Due to the sparseness of , then only a few (). It means that, for tensor , not all the other tensors are used in the representation. Let be the nonzero coefficients. Then the representation error is as follows: In this paper, we also use sparse representation classifier (SRC) for RTPP. SRC classifies the test sample to the class with the least within-class reconstruction error. For more details of SRC, please refer to [25].

Define the affinity matrix . Let , if . We can compute the as follows: Here we obtain the weights in a similar way as in LLE, except the constraints . The nonnegative constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. Previous studies have shown that there is psychological and physiological evidence for parts-based representation in human brain [11, 13, 17]. The sum-to-one constraint is used to make the weights invariant to translation.

Discriminant information can be naturally preserved in the weights, even if no class information is available. In face recognition, one particularly simple but reasonable assumption is that the samples from the same class lie on a linear subspace. In other words, the nonzero weights mostly correspond to the samples from the same class, which implies that the nonzero weights may help distinguish that class from the others. Therefore, the weights tend to include potential discriminant information.

In the design of the proposed RTPP, we use instead of the similarities used in tensor NPE. One advantage of the proposed technique is that the difficulty in selecting the size of the local neighborhood can be avoided in tensor NPE. Moreover, the similarities can give intuitionistic or semantic interpretation of the represented tensor data. Another advantage is that sparse representation has the potential discriminative ability since most nonzero sparse representation coefficients are located on the samples in the same class as the represented sample.

4.2. Algorithm of Robust Tensor Preserving Projection

For convenience, in this section, we use the notations in Section 3 to derive RTPP. Assuming that are known, now we want to compute the projection matrix . Using the similarities in (13), we have the optimization function as follows: Let , be a -dimensional unit vector with the th element 1, 0 otherwise, and denotes the row vector of . With simple formulation, we can get where is identity matrix. We can also obtain Then the optimization problem in (14) can be rewritten as Then the transformation matrix can be obtained by solving the eigenvectors corresponding to the smallest eigenvalues in the generalized eigenvalue equation The other transformation matrices can be obtained in a similar manner.

5. Experimental Results

In this section, experiments on Equinox data set and DHUFO data set are presented to evaluate RTPP in Algorithm 1 for recognition tasks. In the experiments, we compare the RTPP algorithm with the tensor NPE method. Besides tensor NPE, we also perform NPE directly on the serial combined data. The serial combined data is a super vector by combining as . For visible and thermal infrared data, such as , the serial combined data .

Input:  ,     and ;
(1) Construct the similarity matrix by (12) and (14);
(2) Compute the embedding as follows:
Initialize , ;
for  ,   do
     for  ,   do
   ;
   ;
   ;
   ;
   Compute   by solving the eigen function: ;
   if    for each   then
    break;
   end if
   end for
end for
Output:    , .

For the purpose of evaluating the performance of RTPP, we used face verification rate as the criteria. The FERET Verification Testing Protocol [28] recommends using the receiver operating characteristic (ROC) curves to depict the relations between the face verification rate (FVR) and the false accept rate (FAR). The ROC curves were plotted by using the Statistical Learning Toolbox according to the obtained score matrix. For tensor operations, we used the tensor toolbox developed by Bader and Kolda in MATLAB [29]. The sparse representations were obtained by Friedman et al. [30]. In the following experiments, we set for both data sets.

5.1. Experiments on Equinox Data Set

The National Institute of Standards and Technology and Equinox Corporation have developed a database (http://www.equinoxsensors.com/products/HID.html) of face images using registered broadband-visible/IR camera sensors for experimentation and performance evaluations [10]. Since the registration of the thermal images and the corresponding visible images is fulfilled by camera sensors, in our experiments, we did not need to do these procedures.

We used the long-wave infrared (LWIR) (i.e., 8 μm–12 μm) and the corresponding visible spectrum images from this database. The data were collected during a two-day period. Each pair of LWIR and visible light images was taken simultaneously and coregistered with 1/3 pixel accuracy. The LWIR images were radiometrically calibrated and stored as grayscale images with 12 bits per pixels. The visible images were also grayscale images represented with 8 bits per pixel [10].

The database contains frontal faces under the following scenarios: (1) three different light directions: frontal and lateral (right and left); (2) three facial expressions: frown, surprise, and smile; (3) vocals pronunciation expressions: subjects were asked to pronounce several vocals from which three representative frames were chosen; and (4) presence of glasses: for subjects wearing glasses, all of the above scenarios were repeated with and without glasses.

In our experiments, 1320 images (660 thermal images and 660 corresponding registered visible images) were used. These images belonged to 33 individuals. For each individual, we had 20 thermal images and 20 corresponding visible images. Original 12-bit gray level thermal images were converted into 8 bits. All images (including thermal images and visible images) were cut off the background, aligned, and then normalized with a resolution of 28 × 24. The goal of the preprocessing was to remove background and scale the faces. Figure 1 shows sample images of one person in the Equinox data set.

For any thermal image and its corresponding visible image, the tensor sample was represented in the size of pixels. In the experiments, 10 tensor samples (10 thermal images and 10 corresponding visible images) of each individual were randomly selected and used as training set and the remaining 10 tensor samples as test set. The experiments were independently performed 20 times and the average results were calculated.

For our proposed RTPP algorithm and tensor NPE algorithm, the reduced dimensions of the extracted features were and , respectively. For NPE algorithms performed on IR feature, visible feature, and serial combined feature, the corresponding reduced dimensions were and , respectively. For tensor NPE algorithm, we performed experiments to obtain the best parameter (the number of nearest neighbors) for Equinox data set. Figures 2 and 3 showed the ROC curves of the proposed RTPP algorithm and tensor NPE algorithm using different ’s (). From the ROC curves, we can find that the best performance could be obtained when (for both and ). And the performance of the proposed RTPP algorithm is much better than the tensor NPE algorithm, no matter which was selected.

The ROC curves of the different methods were shown in Figures 4 and 5. The NPE algorithm was also separately performed on visible data and thermal infrared data. In the NPE algorithm, we set . The results indicate that the performance of the proposed RTPP algorithm is better than other algorithms.

5.2. Experiments on DHUFO Data Set

DHUFO is a database of face images using registered visible/IR camera sensors for experimentation and performance evaluations. The data set was designed by the researchers. In our experiments, the long-wave infrared (LWIR) (i.e., 8 μ–12 μ) sensor was used. The registration of the thermal images and the corresponding visible images was fulfilled by the camera sensors. Face image variations in the DHUFO database included illumination, facial expression, and glasses. In our experiments, 1020 images, which involved variations in illumination and facial expressions, were selected. We manually cropped the face portion of the images. These images belonged to 17 individuals. For each individual, there were 30 thermal images and 30 corresponding visible images. All images (including thermal images and visible images) were cut off the background, aligned, and then normalized with a resolution of .

For any thermal image and its corresponding visible image, the tensor sample was represented in the size of pixels. In the experiments, 15 tensor samples (15 thermal images and 15 corresponding visible images) of each individual were randomly selected and used as training set and the remaining 15 tensor samples as test set. The experiments were independently performed 20 times and the average results were calculated.

For our proposed RTPP algorithm, tensor NPE algorithm, and tensor LPP algorithm, the reduced dimensions of the extracted features were and , respectively. For both NPE and LPP performed on the serial combined feature, the corresponding reduced dimensions were and , respectively. The ROC curves of the different methods were shown in Figures 6 and 7. In both NPE and LPP algorithm, we set . The results indicate that the performance of the proposed RTPP algorithm is better than other algorithms.

5.3. Discussion

Based on the experimental results, the following observations are obtained.(1)From the ROC curves of different methods on Equinox data set and DHUFO data set, the proposed RTPP algorithm obtained the best performance. The experimental results indicate that combining the norm for sparse tensor learning is a better way than using local information reconstruction.(2)RTPP does not introduce the local neighborhood parameter and thus there is essential difference. In RTPP, the norm is combined for the reconstruction coefficients with sparse properties; thus the advantages of robustness to data distortion and the potential discriminative ability proven in [2527] are encoded in the representation coefficients, which are preserved in the low-dimensional subspace. These are the essential reasons for RTPP to achieve good performance.(3)Since RTPP well preserves the spatial structure of the original multispectral face images, RTPP outperforms serial combined feature extraction algorithms.

6. Conclusion

We have proposed in this paper a novel tensor learning algorithm, called robust tensor preserving projection (RTPP), for multispectral face recognition. The STE algorithm incorporates tensor manifold criterion to learn multiple subspaces in high-order tensor space by preserving the sparse representation information of the multispectral images. RTPP cannot only keep the underlying spatial structure of multispectral images but also enhance robustness. Experimental results demonstrate the excellent performance of RTPP.

Since RTPP is an unsupervised learning algorithm, one of our future works will be supervised tensor learning algorithms. We also plan to enforce the sparsity on the projection matrix/vector and investigate the sparse projection learning methods for tensor recognition.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant no. 61375007) and the Shanghai Pujiang Program (Project no. 12PJ1402200).