Abstract

Kernel Fisher discriminant analysis (KFDA) method has demonstrated its success in extracting facial features for face recognition. Compared to linear techniques, it can better describe the complex and nonlinear variations of face images. However, a single kernel is not always suitable for the applications of face recognition which contain data from multiple, heterogeneous sources, such as face images under huge variations of pose, illumination, and facial expression. To improve the performance of KFDA in face recognition, a novel algorithm named multiple data-dependent kernel Fisher discriminant analysis (MDKFDA) is proposed in this paper. The constructed multiple data-dependent kernel (MDK) is a combination of several base kernels with a data-dependent kernel constraint on their weights. By solving the optimization equation based on Fisher criterion and maximizing the margin criterion, the parameter optimization of data-dependent kernel and multiple base kernels is achieved. Experimental results on the three face databases validate the effectiveness of the proposed algorithm.

1. Introduction

Face recognition has received extensive attention in many image processing applications. In these applications, the original face images commonly lie in a high-dimension space, resulting in low recognition accuracy and high cost. Existing image feature extraction algorithms can roughly fall into two categories: feature extraction based on signal processing and learning-based feature extraction [14]. By the utilization of learning-based approach, the original images can be mapped into a lower-dimensional feature space in which the essential structure of the original space becomes clear. To this end, Fisher discriminant analysis (FDA) [5], principal component analysis (PCA), and locality preserving projection (LPP) [6] are typical learning-based feature extraction techniques. Moreover, one of the most famous algorithms applied in face recognition is Fisher face, which is based on a two-phase framework: PCA plus LDA [3, 4]. It maximizes the between-class scatter and minimizes the within-class scatter to separate one class from the others. However, the entire above mentioned algorithms are linear subspaces analysis methods in essence, so they are inadequate to depict the complexity face images. To overcome the limitation, many nonlinear algorithms, such as kernel-based PCA (KPCA) [7] and FDA (KFDA) [8], have been devised and attained good performance in face recognition. It has been demonstrated that KFDA is a feasible nonlinear feature extraction algorithm for face recognition. However, the performance of KFDA is sensitive to the kernel function selection and its parameters. Moreover, the ability of single kernel is quite limited in depicting geometrical structure of some aspects for the input data. Once the face images are captured under huge variations of pose, illumination, facial expression, and so forth, single kernel-based FDA could not be suitable for the face recognition. In summary, kernel functions play an important part in face recognition applications [9, 10].

As a consequence, various approaches have been developed to handle the above issues, and two main categories are identified as follows. Devise multiple kernels by convex combination of multiple basic kernels. If so, different data descriptors can be used to depict the geometrical structures of original data from multiple views, which can complement to improve recognition performance [1114]. Develop data-dependent kernel (DK) by conformal transformation of the basic kernel. If so, the designed kernel would be adaptive to the input data, leading to a substantial improvement in the performance of KFDA algorithm [1517].

In this paper, to improve the performance of KFDA, we proposed a novel feature extraction algorithm for face recognition called multiple data-dependent kernel Fisher discriminant analysis (MDKFDA) based on the multiple kernel learning (MKL). The main contributions of this paper lie in the following. By introducing MDK into KFDA, maximum discrimination performance can be achieved in feature space. Multiple image features extracted in different descriptors are fully utilized in the MDKFDA algorithm. Nonlinear discriminant features are produced by the adoption of MKL.

The rest of this paper is organized as follows. Section 2 shows a brief overview of MKL and KFDA. In Section 3, we illustrate the proposed MDKFDA algorithm and introduce the parameter optimization scheme for data-dependent kernel and multiple base kernels. Extensive experimental results on face recognition are reported in Section 4. Finally, Section 5 concludes this paper.

In this section, we will briefly introduce some previous works related to this paper, including KFDA and MKL.

2.1. Kernel Fisher Discriminant Analysis

KFDA is a nonlinear feature extraction algorithm which combines the nonlinear kernel trick with FDA. Because of its ability to extract the discriminatory nonlinear features, KFDA and its variations are frequently used for face recognition. In this paper, a two-phased KFDA framework proposed by Yang et al. [18] is adopted to construct the MDKFDA. The two-phased KFDA framework mainly contains two parts: KPCA is applied to reduce the dimension of input space, and then LDA is used to further extract the features in the KPCA-transformed space.

Given the input training sample set including samples: where is the training sample of -dimension and represents the class label of . Given a sample , its nonlinear mapped image can be denoted as , and the discriminate feature vector can be obtained as follows: Equation (2) contains two transformations and . The transformation represents KPCA which transforms the input space into feature space , while the transformation is the Fisher discriminant transformation in the KPCA-transformed space .

Firstly, the issues in the process of KPCA are described as follows.

For a given nonlinear mapping , the input space can be projected into the feature space , which is considered as Hilbert space. The covariance operator on the feature space can be represented as where . The way to find the nonzero eigenpair of is illustrated as follows. Previously, to simplify the deduced process, the covariance operator is reformulated as Let us denote and construct a Gram matrix , whose element can be calculated through the use of kernel tricks: Centralize by , where .

We adopt the largest positive eigenvalues of and their corresponding orthonormal eigenvectors to calculate the eigenvectors of as follows: Hence, we can get the KPCA-transformed feature vector , and the th KPCA feature is obtained by Above all, we can describe the as follows: Secondly, the issues in the process of Fisher discriminant transformation are illustrated as follows.

FDA is used for further feature extraction in the KPCA-transformed space . In order to maximize the Fisher criterion, we first define the between-class scatter operator and the within-class scatter operator in feature space . Consider where is the number of training samples in class , represents the th samples in class , is the mean of the mapped samples in class , and is the mean across all mapped samples. Thus, we can obtain the Fisher criterion by where is the discriminant vector. According to Mercer kernel function theory, each can be described by the elements of feature space , and there always exist coefficients , such that where , . Hence, the Fisher optimal discriminant vectors are the stationary points , and, correspondingly, the transformation in (2) can be denoted by

2.2. Multiple Kernel Learning

In general, MKL refers to the process of learning a kernel machine which is the combination of multiple base kernel functions/matrices. Recent research efforts have shown that MKL is not only able to find an optimal combination weight of base kernels but also improve the performance of the resulting classifiers.

As mentioned above, is the -dimensional training sample set. For a given nonlinear mapping , the original data is projected into empirical feature space : Using Mercer’s theorem [19], the inner product of two transformed vectors in the nonlinear space can be expressed as where the operator means inner product. Such kernel function is usually called Mercer kernel, and some commonly used Mercer kernels are shown as follows [20]: Among them, Gaussian kernel is one of the most widespread kernels. However, Gaussian kernel can only reflect the local nonlinear feature of the data, while the linear kernel and polynomial kernel are overall kernel functions. It has been shown that the kernel-based feature extraction algorithm is appropriate for solving the nonlinear problems in face recognition. Nevertheless, the disadvantage of single kernel-based algorithms is lack of the generalization representation capability for multidimensional and multiclass data. Recent applications have indicated that MKL could provide us a more flexible framework to fuse information from different data source and enhance the performance of classifiers [2126].

In the MKL framework, given basic kernel functions , the multiple kernel function can be generally represented as [27] where the weighted coefficient is commonly obtained by solving the optimal object function of the kernel subspace learning algorithm. It is noticed that optimizing the coefficient is a critical problem for improving the performance of MKL.

3. The Proposed Multiple Data-Dependent Kernel (MDK)

As mentioned above, given a training data , the elements of MDK can be formulated as where , is the th basic kernel chosen from the commonly used ones such as Gaussian kernel or polynomial kernel, is the number of candidate basic kernel, is the weight for the th basic kernel, and is the factor function called data-dependent kernel (DK) which takes the form of where is the combination coefficient. The set , called the “empirical cores,” are chosen from the training data. It is notable that MDK also satisfies the Mercer condition, since is equal to multiply DK and basic kernels together, which is the linear combination of kernels.

As mentioned above, we can see that the main problems in MDK are to choose the optimal weight for basic kernels and the coefficients of data-dependent kernel . In this paper, we adopt the iterative method based on the maximum margin criterion (MMC) and Fisher scalar to optimize weight and coefficient , respectively. The schematic of MDK is shown in Figure 1.

3.1. Weight Optimization for Multiple Kernels

To gain good performance of MDKFDA for face recognition, learning proper weights of candidate base kernels is illustrated in this section. In KFDA, we measure the class separability in kernel feature space and the kernel Fisher criterion can be expressed as In this section, the diagonalization strategy [28] is adopted to find the optimal , based on which, the maximum margin criterion (MMC) [29] is employed as the objective function to optimize weight . Consider To maximize , we introduce a Lagrangian to solve the optimization problem as follows: A series of partial derivatives can be achieved through differentiating with respect to and . By setting these derivatives to zero, we can use Newton’s iteration method to solve these equations, and the optimized weights for multiple kernels are achieved as follows:

3.2. Coefficients Optimization for Data-Dependent Kernel

Since the optimized weights for multiple kernels have been achieved, investigating proper coefficients of DK is described in this section. In empirical feature space, let denote the Fisher scalar, and and have been defined in (9) and (10), respectively. Given training dataset , is the kernel matrix for all samples, whose element can be described as , and is the submatrices of the . Hence, can be written as Consequentially, the between-class scatter matrix and within-class scatter matrix can be expressed as follows: For the multiple kernel , and can be replaced by the and the , respectively, and their corresponding kernel matrix is translated into , in which and have been defined in (19), and then we have

Theorem 1. Let be the -dimensional vector with unity elements; the Fisher scalar is equivalent to

Proof. As shown in Section 2.1, the dimension of empirical feature space is set as . Then, and ,    , are, respectively, defined as the and , matrices whose rows are the vectors , and , . The and in Section 2.1 can be expressed as follows: Moreover, since the empirical feature space maintains the dot product, (24) is equivalent to where . As such, and can be described as From the above description, it is remarkable that and , where . Finally, the relationship is proved: In order to obtain the optimal coefficients, should be maximized. Since , can be reformulated as where and . To maximize , the standard gradient approach is adopted. Given that respectively, the partial differential of and with respect to are and the partial differential of is Let ; it is obtained that Hence, is equal to the largest eigenvalue of the matrix , and the optimal is the eigenvector corresponding to the largest eigenvalue; thus, an iteration algorithm is employed to calculate the optimal : where is the learning rate, and it is given by where is the initial learning rate, , and denotes the current iteration number and prespecified iteration number, respectively. In summary, the optimal coefficients can be obtained by choosing and properly.

3.3. Complete MDKFDA Algorithm

In summary of the discussion so far, the steps of complete MDKFDA algorithm are described as follows.

Step 1 (construct the MDK). Gaussian kernel is adopted to construct the data-dependent kernel (DK), while the linear kernel, Gaussian kernel, and polynomial kernel are employed as the base kernels of multiple kernel function.

Step 2 (optimize the weights). The maximum margin criterion (MMC) is employed as the objective function to optimize weights for multiple kernels, and the optimized coefficients for DK are achieved by virtue of the Fisher scalar.

Step 3 (transform the data). The MDK is used to transform the input space into feature space , by which the original input data is converted into feature data . The transformation is .

Step 4 (extract the Fisher discriminant vectors). , in are calculated to get the Fisher criterion . By maximizing , the Fisher optimal discriminant vectors are achieved, and the Fisher discriminant transformation is .

Step 5 (obtain the MDKFDA feature vector). Based on the first four steps, the expression of the MDKFDA feature vector is obtained.

4. Experimental Results and Discussions

In this section, we conduct several experiments on three face databases to evaluate the performance of the proposed MDKFDA algorithm by comparing it with several widespread algorithms in face recognition, including PCA, LPP, FDA, KPCA, KFDA, and DKFDA. The ORL face database [30], YALE face database [31], and PIE face database [32] are adopted in the experiments, and partial sample images of one person from different databases are shown in Figures 2, 3, and 4. In the following experiments, we select randomly 5 images per individual as training set and the rest 5 for testing. To make the experiments more reasonable, we repeated the trails 10 times to achieve an average performance.

4.1. Face Recognition Using ORL Database

The ORL face database contains 400 face images of 40 individuals, and variations in these 400 face images include angle, lighting, expression, and face details. As shown in Figure 2, the size of all the original images were shaped into pixels, and the primary part of the original image was reserved. Three kernels are employed as the base kernels of multiple kernel function in MDK, including linear kernel, Gaussian kernel, and polynomial kernel. Moreover, Gaussian kernel is adopted to construct DK. Table 1 shows the comparison of maximal average recognition ratio between several nonkernel algorithms and the proposed MDKFDA algorithm. Table 2 reports the comparison of maximal average recognition ratio between several single kernel-based algorithms and the proposed MDKFDA. According to the experiment results, the MDKFDA algorithm outperforms the other algorithms, which implies that the MDKFDA can integrate multiple base kernels with data-dependent kernel (DK) effectively to improve the recognition ratio. Besides, it can be seen that all the single kernel-based feature extraction algorithms outperform their corresponding linear versions, which indicates that the kernel-based algorithm is advantageous for face recognition.

4.2. Face Recognition Using YALE Database

The YALE database contains 165 front view face images from 15 individuals. Each individual has eleven images that vary with expression and configurations. Three kernels are employed as the base kernels of multiple kernel function in MDK, including linear kernel, Gaussian kernel, and polynomial kernel. Moreover, Gaussian kernel is adopted to construct DK. Table 3 displays the average recognition rates of PCA, LDA, LPP, and MDKFDA. Table 4 reports the performance comparison of different kernel-based algorithms, including KPCA, KFDA, DKFDA, and MDKFDA. The results indicate that the MDKFDA algorithm outperforms the other algorithms and all the single kernel-based feature extraction algorithms outperform their corresponding linear versions.

4.3. Face Recognition Using PIE Database

The PIE face database contains over 40,000 face images from 68 people, and the images are captured under 13 different poses, 43 different illumination, and 4 different expressions. In this test, we select 150 face images from 15 individuals. The selection of relevant base kernels for multiple kernel function and DK is the same as that in the former experiments. From Tables 5 and 6, it can be seen that the MDKFDA algorithm outperforms the other algorithms.

4.4. Discussions

Experiments based on the three face databases have been systematically implemented, and the results reveal some interesting findings which are summarized as follows.(1)The single kernel-based nonlinear feature extraction algorithms such as KPCA, KFDA, and DKFDA perform better than their corresponding linear versions such as PCA and LDA. The main reason is attributed to the fact that, compared to the linear techniques, the features extracted by kernel-based algorithms can better describe the complex and nonlinear variations of face images, that is, illumination, pose, and facial expression. Hence, a better recognition rate can be achieved.(2)The average recognition rates of PCA, LDA, and LPP on PIE database are significantly less than those on the other two databases. The reason is mainly as follows. Since the images from the PIE database are captured under more complicated conditions, they appear to have more complicated nonlinear characteristics than those from the other databases, which makes them difficult to handle by linear algorithms.(3)The DKFDA algorithm outperforms the other single kernel-based algorithms, that is, KPCA, KLPP, and KFDA. As shown in Table 6, when KPCA with polynomial kernel is adopted, the recognition ratio of face images is not increased. This is because the characteristics of kernel are ill-suited for some database. However, the DK can solve this problem, and the reason is that the DKFDA algorithm has the adaptability for different databases. The structure of DK can be changed by adjusting the kernel parameter using iterative method, so various input data can be better expressed.(4)The proposed MDKFDA algorithm consistently performs better than the KPCA, KLPP, KFDA, and DKFDA as well as their corresponding linear version, which indicates that, compared to the single kernel-based algorithms, the MDKFDA algorithm can effectively integrate the multiple base kernels with data-dependent kernel (DK) and gain a good performance on face recognition.

5. Conclusions

In this paper, on the assumption that multiple kernel-based recognition algorithms can depict the complex and heterogeneous face image dataset by the utilization of multiple descriptors, a novel kernel-based approach for face recognition, called multiple data-dependent kernel Fisher discriminant analysis (MDKFDA), is proposed in this paper. Focusing on the construction of MDK, two main issues have been considered. The first issue concerns optimizing the weights of multiple base kernels. For this purpose, by maximizing the margin maximization criterion (MMC), an iterative method based on Lagrange multipliers is adopted to yield the optimized weights. The second issue aims at optimizing the coefficients of data-dependent kernel. To this end, by solving the optimization equation based on Fisher scalar, a gradient-based learning algorithm is employed to yield the optimized coefficients. Finally, the resulting multiple kernel functions and data-dependent kernel are integrated together as a new kernel, which is incorporated into the KFDA to construct the MDKFDA. Experiments on three face databases prove the effectiveness of the MDKFDA, and this algorithm is ready to be applied to other classification applications in the future.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (no. 51374099), Heilongjiang Province Natural Science Foundation for the Youth (no. QC2012C070), Heilongjiang Province Natural Science Foundation (F201345), and the Fundamental Research Funds for the Central Universities of China (no. HEUCF140807).