Research Article | Open Access
Multiple Data-Dependent Kernel Fisher Discriminant Analysis for Face Recognition
Kernel Fisher discriminant analysis (KFDA) method has demonstrated its success in extracting facial features for face recognition. Compared to linear techniques, it can better describe the complex and nonlinear variations of face images. However, a single kernel is not always suitable for the applications of face recognition which contain data from multiple, heterogeneous sources, such as face images under huge variations of pose, illumination, and facial expression. To improve the performance of KFDA in face recognition, a novel algorithm named multiple data-dependent kernel Fisher discriminant analysis (MDKFDA) is proposed in this paper. The constructed multiple data-dependent kernel (MDK) is a combination of several base kernels with a data-dependent kernel constraint on their weights. By solving the optimization equation based on Fisher criterion and maximizing the margin criterion, the parameter optimization of data-dependent kernel and multiple base kernels is achieved. Experimental results on the three face databases validate the effectiveness of the proposed algorithm.
Face recognition has received extensive attention in many image processing applications. In these applications, the original face images commonly lie in a high-dimension space, resulting in low recognition accuracy and high cost. Existing image feature extraction algorithms can roughly fall into two categories: feature extraction based on signal processing and learning-based feature extraction [1–4]. By the utilization of learning-based approach, the original images can be mapped into a lower-dimensional feature space in which the essential structure of the original space becomes clear. To this end, Fisher discriminant analysis (FDA) , principal component analysis (PCA), and locality preserving projection (LPP)  are typical learning-based feature extraction techniques. Moreover, one of the most famous algorithms applied in face recognition is Fisher face, which is based on a two-phase framework: PCA plus LDA [3, 4]. It maximizes the between-class scatter and minimizes the within-class scatter to separate one class from the others. However, the entire above mentioned algorithms are linear subspaces analysis methods in essence, so they are inadequate to depict the complexity face images. To overcome the limitation, many nonlinear algorithms, such as kernel-based PCA (KPCA)  and FDA (KFDA) , have been devised and attained good performance in face recognition. It has been demonstrated that KFDA is a feasible nonlinear feature extraction algorithm for face recognition. However, the performance of KFDA is sensitive to the kernel function selection and its parameters. Moreover, the ability of single kernel is quite limited in depicting geometrical structure of some aspects for the input data. Once the face images are captured under huge variations of pose, illumination, facial expression, and so forth, single kernel-based FDA could not be suitable for the face recognition. In summary, kernel functions play an important part in face recognition applications [9, 10].
As a consequence, various approaches have been developed to handle the above issues, and two main categories are identified as follows. Devise multiple kernels by convex combination of multiple basic kernels. If so, different data descriptors can be used to depict the geometrical structures of original data from multiple views, which can complement to improve recognition performance [11–14]. Develop data-dependent kernel (DK) by conformal transformation of the basic kernel. If so, the designed kernel would be adaptive to the input data, leading to a substantial improvement in the performance of KFDA algorithm [15–17].
In this paper, to improve the performance of KFDA, we proposed a novel feature extraction algorithm for face recognition called multiple data-dependent kernel Fisher discriminant analysis (MDKFDA) based on the multiple kernel learning (MKL). The main contributions of this paper lie in the following. By introducing MDK into KFDA, maximum discrimination performance can be achieved in feature space. Multiple image features extracted in different descriptors are fully utilized in the MDKFDA algorithm. Nonlinear discriminant features are produced by the adoption of MKL.
The rest of this paper is organized as follows. Section 2 shows a brief overview of MKL and KFDA. In Section 3, we illustrate the proposed MDKFDA algorithm and introduce the parameter optimization scheme for data-dependent kernel and multiple base kernels. Extensive experimental results on face recognition are reported in Section 4. Finally, Section 5 concludes this paper.
2. Related Work
In this section, we will briefly introduce some previous works related to this paper, including KFDA and MKL.
2.1. Kernel Fisher Discriminant Analysis
KFDA is a nonlinear feature extraction algorithm which combines the nonlinear kernel trick with FDA. Because of its ability to extract the discriminatory nonlinear features, KFDA and its variations are frequently used for face recognition. In this paper, a two-phased KFDA framework proposed by Yang et al.  is adopted to construct the MDKFDA. The two-phased KFDA framework mainly contains two parts: KPCA is applied to reduce the dimension of input space, and then LDA is used to further extract the features in the KPCA-transformed space.
Given the input training sample set including samples: where is the training sample of -dimension and represents the class label of . Given a sample , its nonlinear mapped image can be denoted as , and the discriminate feature vector can be obtained as follows: Equation (2) contains two transformations and . The transformation represents KPCA which transforms the input space into feature space , while the transformation is the Fisher discriminant transformation in the KPCA-transformed space .
Firstly, the issues in the process of KPCA are described as follows.
For a given nonlinear mapping , the input space can be projected into the feature space , which is considered as Hilbert space. The covariance operator on the feature space can be represented as where . The way to find the nonzero eigenpair of is illustrated as follows. Previously, to simplify the deduced process, the covariance operator is reformulated as Let us denote and construct a Gram matrix , whose element can be calculated through the use of kernel tricks: Centralize by , where .
We adopt the largest positive eigenvalues of and their corresponding orthonormal eigenvectors to calculate the eigenvectors of as follows: Hence, we can get the KPCA-transformed feature vector , and the th KPCA feature is obtained by Above all, we can describe the as follows: Secondly, the issues in the process of Fisher discriminant transformation are illustrated as follows.
FDA is used for further feature extraction in the KPCA-transformed space . In order to maximize the Fisher criterion, we first define the between-class scatter operator and the within-class scatter operator in feature space . Consider where is the number of training samples in class , represents the th samples in class , is the mean of the mapped samples in class , and is the mean across all mapped samples. Thus, we can obtain the Fisher criterion by where is the discriminant vector. According to Mercer kernel function theory, each can be described by the elements of feature space , and there always exist coefficients , such that where , . Hence, the Fisher optimal discriminant vectors are the stationary points , and, correspondingly, the transformation in (2) can be denoted by
2.2. Multiple Kernel Learning
In general, MKL refers to the process of learning a kernel machine which is the combination of multiple base kernel functions/matrices. Recent research efforts have shown that MKL is not only able to find an optimal combination weight of base kernels but also improve the performance of the resulting classifiers.
As mentioned above, is the -dimensional training sample set. For a given nonlinear mapping , the original data is projected into empirical feature space : Using Mercer’s theorem , the inner product of two transformed vectors in the nonlinear space can be expressed as where the operator means inner product. Such kernel function is usually called Mercer kernel, and some commonly used Mercer kernels are shown as follows : Among them, Gaussian kernel is one of the most widespread kernels. However, Gaussian kernel can only reflect the local nonlinear feature of the data, while the linear kernel and polynomial kernel are overall kernel functions. It has been shown that the kernel-based feature extraction algorithm is appropriate for solving the nonlinear problems in face recognition. Nevertheless, the disadvantage of single kernel-based algorithms is lack of the generalization representation capability for multidimensional and multiclass data. Recent applications have indicated that MKL could provide us a more flexible framework to fuse information from different data source and enhance the performance of classifiers [21–26].
In the MKL framework, given basic kernel functions , the multiple kernel function can be generally represented as  where the weighted coefficient is commonly obtained by solving the optimal object function of the kernel subspace learning algorithm. It is noticed that optimizing the coefficient is a critical problem for improving the performance of MKL.
3. The Proposed Multiple Data-Dependent Kernel (MDK)
As mentioned above, given a training data , the elements of MDK can be formulated as where , is the th basic kernel chosen from the commonly used ones such as Gaussian kernel or polynomial kernel, is the number of candidate basic kernel, is the weight for the th basic kernel, and is the factor function called data-dependent kernel (DK) which takes the form of where is the combination coefficient. The set , called the “empirical cores,” are chosen from the training data. It is notable that MDK also satisfies the Mercer condition, since is equal to multiply DK and basic kernels together, which is the linear combination of kernels.
As mentioned above, we can see that the main problems in MDK are to choose the optimal weight for basic kernels and the coefficients of data-dependent kernel . In this paper, we adopt the iterative method based on the maximum margin criterion (MMC) and Fisher scalar to optimize weight and coefficient , respectively. The schematic of MDK is shown in Figure 1.
3.1. Weight Optimization for Multiple Kernels
To gain good performance of MDKFDA for face recognition, learning proper weights of candidate base kernels is illustrated in this section. In KFDA, we measure the class separability in kernel feature space and the kernel Fisher criterion can be expressed as In this section, the diagonalization strategy  is adopted to find the optimal , based on which, the maximum margin criterion (MMC)  is employed as the objective function to optimize weight . Consider To maximize , we introduce a Lagrangian to solve the optimization problem as follows: A series of partial derivatives can be achieved through differentiating with respect to and . By setting these derivatives to zero, we can use Newton’s iteration method to solve these equations, and the optimized weights for multiple kernels are achieved as follows:
3.2. Coefficients Optimization for Data-Dependent Kernel
Since the optimized weights for multiple kernels have been achieved, investigating proper coefficients of DK is described in this section. In empirical feature space, let denote the Fisher scalar, and and have been defined in (9) and (10), respectively. Given training dataset , is the kernel matrix for all samples, whose element can be described as , and is the submatrices of the . Hence, can be written as Consequentially, the between-class scatter matrix and within-class scatter matrix can be expressed as follows: For the multiple kernel , and can be replaced by the and the , respectively, and their corresponding kernel matrix is translated into , in which and have been defined in (19), and then we have
Theorem 1. Let be the -dimensional vector with unity elements; the Fisher scalar is equivalent to
Proof. As shown in Section 2.1, the dimension of empirical feature space is set as . Then, and , , are, respectively, defined as the and , matrices whose rows are the vectors , and , . The and in Section 2.1 can be expressed as follows: Moreover, since the empirical feature space maintains the dot product, (24) is equivalent to where . As such, and can be described as From the above description, it is remarkable that and , where . Finally, the relationship is proved: In order to obtain the optimal coefficients, should be maximized. Since , can be reformulated as where and . To maximize , the standard gradient approach is adopted. Given that respectively, the partial differential of and with respect to are and the partial differential of is Let ; it is obtained that Hence, is equal to the largest eigenvalue of the matrix , and the optimal is the eigenvector corresponding to the largest eigenvalue; thus, an iteration algorithm is employed to calculate the optimal : where is the learning rate, and it is given by where is the initial learning rate, , and denotes the current iteration number and prespecified iteration number, respectively. In summary, the optimal coefficients can be obtained by choosing and properly.
3.3. Complete MDKFDA Algorithm
In summary of the discussion so far, the steps of complete MDKFDA algorithm are described as follows.
Step 1 (construct the MDK). Gaussian kernel is adopted to construct the data-dependent kernel (DK), while the linear kernel, Gaussian kernel, and polynomial kernel are employed as the base kernels of multiple kernel function.
Step 2 (optimize the weights). The maximum margin criterion (MMC) is employed as the objective function to optimize weights for multiple kernels, and the optimized coefficients for DK are achieved by virtue of the Fisher scalar.
Step 3 (transform the data). The MDK is used to transform the input space into feature space , by which the original input data is converted into feature data . The transformation is .
Step 4 (extract the Fisher discriminant vectors). , in are calculated to get the Fisher criterion . By maximizing , the Fisher optimal discriminant vectors are achieved, and the Fisher discriminant transformation is .
Step 5 (obtain the MDKFDA feature vector). Based on the first four steps, the expression of the MDKFDA feature vector is obtained.
4. Experimental Results and Discussions
In this section, we conduct several experiments on three face databases to evaluate the performance of the proposed MDKFDA algorithm by comparing it with several widespread algorithms in face recognition, including PCA, LPP, FDA, KPCA, KFDA, and DKFDA. The ORL face database , YALE face database , and PIE face database  are adopted in the experiments, and partial sample images of one person from different databases are shown in Figures 2, 3, and 4. In the following experiments, we select randomly 5 images per individual as training set and the rest 5 for testing. To make the experiments more reasonable, we repeated the trails 10 times to achieve an average performance.
4.1. Face Recognition Using ORL Database
The ORL face database contains 400 face images of 40 individuals, and variations in these 400 face images include angle, lighting, expression, and face details. As shown in Figure 2, the size of all the original images were shaped into pixels, and the primary part of the original image was reserved. Three kernels are employed as the base kernels of multiple kernel function in MDK, including linear kernel, Gaussian kernel, and polynomial kernel. Moreover, Gaussian kernel is adopted to construct DK. Table 1 shows the comparison of maximal average recognition ratio between several nonkernel algorithms and the proposed MDKFDA algorithm. Table 2 reports the comparison of maximal average recognition ratio between several single kernel-based algorithms and the proposed MDKFDA. According to the experiment results, the MDKFDA algorithm outperforms the other algorithms, which implies that the MDKFDA can integrate multiple base kernels with data-dependent kernel (DK) effectively to improve the recognition ratio. Besides, it can be seen that all the single kernel-based feature extraction algorithms outperform their corresponding linear versions, which indicates that the kernel-based algorithm is advantageous for face recognition.
4.2. Face Recognition Using YALE Database
The YALE database contains 165 front view face images from 15 individuals. Each individual has eleven images that vary with expression and configurations. Three kernels are employed as the base kernels of multiple kernel function in MDK, including linear kernel, Gaussian kernel, and polynomial kernel. Moreover, Gaussian kernel is adopted to construct DK. Table 3 displays the average recognition rates of PCA, LDA, LPP, and MDKFDA. Table 4 reports the performance comparison of different kernel-based algorithms, including KPCA, KFDA, DKFDA, and MDKFDA. The results indicate that the MDKFDA algorithm outperforms the other algorithms and all the single kernel-based feature extraction algorithms outperform their corresponding linear versions.
4.3. Face Recognition Using PIE Database
The PIE face database contains over 40,000 face images from 68 people, and the images are captured under 13 different poses, 43 different illumination, and 4 different expressions. In this test, we select 150 face images from 15 individuals. The selection of relevant base kernels for multiple kernel function and DK is the same as that in the former experiments. From Tables 5 and 6, it can be seen that the MDKFDA algorithm outperforms the other algorithms.
Experiments based on the three face databases have been systematically implemented, and the results reveal some interesting findings which are summarized as follows.(1)The single kernel-based nonlinear feature extraction algorithms such as KPCA, KFDA, and DKFDA perform better than their corresponding linear versions such as PCA and LDA. The main reason is attributed to the fact that, compared to the linear techniques, the features extracted by kernel-based algorithms can better describe the complex and nonlinear variations of face images, that is, illumination, pose, and facial expression. Hence, a better recognition rate can be achieved.(2)The average recognition rates of PCA, LDA, and LPP on PIE database are significantly less than those on the other two databases. The reason is mainly as follows. Since the images from the PIE database are captured under more complicated conditions, they appear to have more complicated nonlinear characteristics than those from the other databases, which makes them difficult to handle by linear algorithms.(3)The DKFDA algorithm outperforms the other single kernel-based algorithms, that is, KPCA, KLPP, and KFDA. As shown in Table 6, when KPCA with polynomial kernel is adopted, the recognition ratio of face images is not increased. This is because the characteristics of kernel are ill-suited for some database. However, the DK can solve this problem, and the reason is that the DKFDA algorithm has the adaptability for different databases. The structure of DK can be changed by adjusting the kernel parameter using iterative method, so various input data can be better expressed.(4)The proposed MDKFDA algorithm consistently performs better than the KPCA, KLPP, KFDA, and DKFDA as well as their corresponding linear version, which indicates that, compared to the single kernel-based algorithms, the MDKFDA algorithm can effectively integrate the multiple base kernels with data-dependent kernel (DK) and gain a good performance on face recognition.
In this paper, on the assumption that multiple kernel-based recognition algorithms can depict the complex and heterogeneous face image dataset by the utilization of multiple descriptors, a novel kernel-based approach for face recognition, called multiple data-dependent kernel Fisher discriminant analysis (MDKFDA), is proposed in this paper. Focusing on the construction of MDK, two main issues have been considered. The first issue concerns optimizing the weights of multiple base kernels. For this purpose, by maximizing the margin maximization criterion (MMC), an iterative method based on Lagrange multipliers is adopted to yield the optimized weights. The second issue aims at optimizing the coefficients of data-dependent kernel. To this end, by solving the optimization equation based on Fisher scalar, a gradient-based learning algorithm is employed to yield the optimized coefficients. Finally, the resulting multiple kernel functions and data-dependent kernel are integrated together as a new kernel, which is incorporated into the KFDA to construct the MDKFDA. Experiments on three face databases prove the effectiveness of the MDKFDA, and this algorithm is ready to be applied to other classification applications in the future.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by the National Natural Science Foundation of China (no. 51374099), Heilongjiang Province Natural Science Foundation for the Youth (no. QC2012C070), Heilongjiang Province Natural Science Foundation (F201345), and the Fundamental Research Funds for the Central Universities of China (no. HEUCF140807).
- J. Yang, G. Cheng, and M. Li, “Extraction of affine invariant features using fractal,” Advances in Mathematical Physics, vol. 2013, Article ID 950289, 8 pages, 2013.
- Z. Li, J. Yang, M. Li, and R. Lan, “Estimation of large scalings in images based on multilayer pseudopolar fractional fourier transform,” Mathematical Problems in Engineering, vol. 2013, Article ID 179489, 9 pages, 2013.
- D. L. Swets, “Using discriminant eigenfeatures for image retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831–836, 1996.
- P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. fisherfaces: recognition using class specific linear projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711–720, 1997.
- R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, Wiley-Interscience, Hoboken, NJ, USA, 2nd edition, 2000.
- X. He and P. Niyogi, “Locality preserving projections,” Neural Information Processing Systems, vol. 16, pp. 435–445, 2003.
- B. Scholkopf, A. Smola, and K.-R. Muller, “Kernel principal component analysis,” in Advance in Kernel Methods, Support Vector Learning, B. Scholkopf, C. J. C. Burges, and A. J. Smola, Eds., MIT Press, Cambridge, Mass, USA, 1999.
- S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K.-R. Müller, “Fisher discriminant analysis with kernels,” in Proceedings of the IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing, pp. 41–48, August 1999.
- S. Zafeiriou, G. Tzimiropoulos, M. Petrou, and T. Stathaki, “Regularized kernel discriminant analysis with a robust kernel for face recognition and verification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 3, pp. 526–534, 2012.
- Z. Wang and X. Sun, “Manifold adaptive kernel local fisher discriminant analysis for face recognition,” Journal of Multimedia, vol. 7, no. 6, pp. 387–393, 2012.
- A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet, “Simple MKL,” Journal of Machine Learning Research, vol. 9, pp. 2491–2521, 2008.
- D. P. Lewis, T. Jebara, and W. S. Noble, “Nonstationary kernel combination,” in Proceedings of the 23rd International Conference on Machine Learning (ICML '06), pp. 553–560, Pittsburgh, Pa, USA, June 2006.
- M. Gönen and E. Alpaydin, “Localized multiple kernel learning,” in Proceedings of the 25th International Conference on Machine Learning, pp. 352–359, ACM, Helsinki, Finland, July 2008.
- S. C. H. Hoi, R. Jin, P. Zhao, and T. Yang, “Online multiple kernel classification,” Machine Learning, vol. 90, no. 2, pp. 289–316, 2013.
- S. Amari and S. Wu, “Improving support vector machine classifiers by modifying kernel functions,” Neural Networks, vol. 12, no. 6, pp. 783–789, 1999.
- H. Xiong, M. N. S. Swamy, and M. O. Ahmad, “Optimizing the kernel in the empirical feature space,” IEEE Transactions on Neural Networks, vol. 16, no. 2, pp. 460–474, 2005.
- I.-L. Chen, C.-H. Li, B.-C. Kuo, and H.-Y. Huang, “Applying optimal algorithm to data-dependent kernel for hyperspectral image classification,” in Proceedings of the 30th IEEE International Geoscience and Remote Sensing Symposium (IGARSS '10), pp. 2808–2811, Taichung, Taiwan, July 2010.
- J. Yang, A. F. Frangi, D. Zhang, and Z. Jin, “KPCA plus LDA: a complete kernel fisher discriminant framework for feature extraction and recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 230–244, 2005.
- J. Mercer, “Functions of positive and negative type and their connection with the theory of integral equations,” Philosophical Transactions of the Royal Society A, vol. 209, no. 441–458, pp. 415–446, 1909.
- B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, Mass, USA, 2002.
- F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan, “Multiple kernel learning, conic duality, and the SMO algorithm,” in Proceedings of the 21st International Conference on Machine Learning (ICML '04), pp. 41–48, July 2004.
- T. Sun, L. Jiao, F. Liu, S. Wang, and J. Feng, “Selective multiple kernel learning for classification with ensemble strategy,” Pattern Recognition, vol. 46, no. 11, pp. 3081–3090, 2013.
- C. Yeh, C. Huang, and S. Lee, “Multi-kernel support vector clustering for multi-class classification,” International Journal of Innovative Computing, Information and Control, vol. 6, no. 5, pp. 2245–2262, 2010.
- X. Liu, L. Wang, J. Yin, E. Zhu, and J. Zhang, “An efficient approach to integrating radius information into multiple kernel learning,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 43, no. 2, pp. 557–569, 2013.
- H. Huang, Y. Chuang, and C. Chen, “Multiple kernel fuzzy clustering,” IEEE Transactions on Fuzzy Systems, vol. 20, no. 1, pp. 120–134, 2012.
- Y. Gu, C. Wang, D. You, Y. Zhang, S. Wang, and Y. Zhang, “Representative multiple kernel learning for classification in hyperspectral imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 7, pp. 2852–2865, 2012.
- H. Q. Wang, F. C. Sun, Y. N. Cai, N. Chen, and L. G. Ding, “On multiple kernel learning methods,” Acta Automatica Sinica, vol. 36, no. 8, pp. 1037–1050, 2010.
- J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “Face recognition using kernel direct discriminant analysis algorithms,” IEEE Transactions on Neural Networks, vol. 14, no. 1, pp. 117–126, 2003.
- H. Li, T. Jiang, and K. Zhang, “Efficient and robust feature extraction by maximum margin criterion,” in Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Schölkopf, Eds., pp. 157–165, MIT Press, Cambridge, Mass, USA, 2004.
- ORL Face Database AT&T Laboratories, Cambridge, UK, http://www.cl.cam.ac.uk/research/dtg/attarchive/facesataglance.html.
- Yale Face Database: UCSD Computer Vision, America, USA, http://vision.ucsd.edu/content/yale-face-database.
- T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination, and expression database,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1615–1618, 2003.
Copyright © 2014 Yue Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.