Mathematical Problems in Engineering

Volume 2015, Article ID 641510, 13 pages

http://dx.doi.org/10.1155/2015/641510

## A Fuzzy Kernel Maximum Margin Criterion for Image Feature Extraction

^{1}College of Mathematics and Computer Science, Guangxi University for Nationalities, Nanning 530006, China^{2}Guangxi Key Laboratory of Hybrid Computation and IC Design Analysis, Nanning 530006, China^{3}The China-ASEAN Study Center of Guangxi University for Nationalities, Nanning 530006, China

Received 12 November 2014; Revised 24 March 2015; Accepted 24 March 2015

Academic Editor: Hari M. Srivastava

Copyright © 2015 Shibin Xuan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Based on kernel principal component analysis, fuzzy set theory, and maximum margin criterion, a novel image feature extraction and recognition method, called fuzzy kernel maximum margin criterion (FKMMC), is proposed. In the proposed method, two new fuzzy scatter matrixes are redefined. The new fuzzy scatter matrix can reflect fully the relation between fuzzy membership degree and the offset of the training sample to subclass center. Besides, a concise reliable computational method of the fuzzy between-class scatter matrix is provided. Experimental results on four face databases (AR, extended Yale B, GTFD, and FERET) demonstrate that the proposed method outperforms other methods.

#### 1. Introduction

The dimensionality reduction is an important research topic in computer vision and pattern recognition for many years [1–3]. As is well known, lots of methods had lower efficiency and limitations in high dimension case. Data transformation is an essential method of dimensionality reduction, which can transform high-dimensional data to relatively low-dimensional space according to certain criterions, and the problem can be solved by an existing method in low-dimensional space. In order to achieve this goal, a variety of approaches were proposed. Most famous methods might be Principal Component Analysis (PCA) [1] and Linear Discriminate Analysis (LDA) [2]. On this basis, a number of improvement algorithms have been proposed.

PCA is a kind of unsupervised learning algorithms; it reflects the overall variability of the data. Each axis has different contribution to this variety. It is well known that the axes corresponding with the larger eigenvalue possess bigger contribution, while the axes corresponding with the smaller eigenvalue often reflected noise or details. Therefore, the axes corresponding with the larger eigenvalue are chosen as transformational operator, and it not only retains the most useful information of the original image, but also reaches the effect of smoothing and denoising. Due to the fact that PCA is a linear method based on the Gaussian distribution, which is not suitable for non-Gaussian distribution case, for this purpose, the kernel-based principal analytical method (KPCA) [3] was proposed, which is nonlinearly related to the input space. For the aim of dimensionality reduction and data interpretation, a number of principal axis selection and sparse methods have been proposed, such as the latest kernel entropy principal component analysis (KECA) [4], which chooses the principal axes by utilizing a maximized sample density Renyi entropy.

LDA is a traditional supervised learning method for dimensionality reduction, which hopes to be able to obtain a transformation operator by maximizing the between-class distance, while minimizing the within-class distance. Because algorithm needs to compute inverse matrix in its solving process, especially in the small size sample (SSS) situation, the within-class scatter matrix often is singular, which causes LDA to fail to run. To solve the singularity problem, a mass of improved algorithms has been proposed.

The LDA+PCA [5] is a well-known null subspace method, which only calculates the maximum eigenvectors to form the transformation matrix when the within-class scatter matrix is of full rank; otherwise, it first runs PCA and then performs LDA. The regularized discriminant analysis (RDA) [6] tries to obtain more reliable estimates of the eigenvalues by correcting the eigenvalue distortion with the ridge-type regularization. Penalty discriminant analysis (PDA) [7] desires not only to overcome the small size sample problem but also to smooth the coefficients of discrimination vectors for better interpretation. The Inverse Fisher discriminant analysis (IFDA) [8] modifies the procedure of PCA and derives regular and irregular information from the within-class scatter matrix by inverse Fisher discrimination criterion. Locality Preserving Projections (LPP) [9] are a linear subspace learning method derived from Laplacian Eigenmap, which possesses a significant advantage, and it can generate an explicit map and then minimize the local scatter of the projected data. The local geometrical structure based tensor subspace analysis (TSA) [10] captures an optimal linear approximation to the face manifold in the sense of local isometrics. Maximum margin criterion (MMC) [11] used the difference of both between-class scatter and within-class scatter as discrimination criterion. Linear Laplacian discrimination (LLD) [12] formulates the within-class scatter matrix and the between-class scatter matrix by means of similarity-weighted criterions. The similarities here are computed from the exponential function of pairwise distance in the original sample spaces, which is protected from various forms of metrics. So LLD can be applicable to any linear space for classification. Kernel linear discriminant analysis (KLDA) [13] is equivalent to kernel principal component analysis (KPCA) plus Fisher linear discriminant analysis (LDA). The optimal solution for KLDA is obtained by solving a general eigenvalue problem, but the within-class scatter matrix is often singular. The fuzzy inverse Fisher discriminant analysis (FIFDA) [14] is built on the inverse Fisher discrimination criterion and fuzzy membership degree. In this method, a membership degree matrix is calculated using FKNN, and then the membership degree is incorporated into the definition of the between-class scatter matrix and within-class scatter matrix to get the fuzzy between-class scatter matrix and fuzzy within-class scatter matrix. The 2-dimensional linear discriminant analysis (2DLDA) [15] is based on 2D image matrices; that is, the image matrix does not need to be transformed into a vector. Instead, the image between-class scatter matrix and within-class scatter matrix can be constructed directly from the image matrices, and their eigenvectors are computed for image feature extraction. The Laplacian bidirectional maximum margin criterion (LBMMC) [16] formulates the image total Laplacian matrix, image within-class Laplacian matrix, and image between-class Laplacian matrix using the sample similar weight that is widely used in machine learning. Two-dimensional MMC (2DMMC) has been proposed [17], which aims to find two orthogonal projection matrices to project the original image matrices to a low-dimensional matrix subspace. In the projected subspace, a sample is close to those in the same class but far from those in different classes. Two-dimensional maximum margin criterion (B2D-MMC) [18] introduces a blockwise model for face recognition, performing one-side subspace projection inside each block manifold, in which a block is close to those belonging to the same class but far from those belonging to different classes. The unilateral projection and the blockwise learning can avoid iterations and alternations as in current bilateral projection based two-dimensional feature extraction approaches and have advantages in complexity and locality. In recent years, representation-based face recognition methods [19, 20] have caused wide public concern in pattern recognition, but they only focus on classification techniques. In this paper, we pay close attention to feature extraction, rather than classification techniques.

In particular, the latter methods focused on embedding weight into the scatter matrix to improve the performance of an algorithm. I think this idea is highly significant because class attribution of training sample has an obvious ambiguity since training samples are not completely separable among the subclasses and are often partial overlapping. Moreover, the kernel and fuzzy approaches are ideal mathematical tools to solve such problems.

In the state of the art, kernel and fuzzy technology cannot be combined with each other, and it is uncertain how to select the bandwidth of the kernel. In the improved algorithms of LDA, weighted scatter matrix cannot reflect sufficiently the interrelation between training sample and subclass prototype. Otherwise, the eigendecompositionn of the scatter matrix is influenced by its calculation since summation operator produces a computational error.

In this paper, we propose a kernel fuzzy maximum margin criterion (KFMMC) for feature extraction and recognition. This method is accomplished by means of a two-stage procedure. Firstly, the data are transformed into the kernel subspace by kernel principal component analysis (KPCA) with 98% choosing ratio. Secondly, in order to simplify calculation, we construct the fuzzy between-class scatter matrix and fuzzy within-class scatter matrix on kernel subspace by Euclid distance based basic fuzzy membership. Then the algorithm maximizes the difference of both fuzzy between-class scatter matrix and within-class scatter matrix for obtaining transformational operator. It can integrate efficiently both kernel feature analysis and classification information for achieving dimensional reduction. Our main work can be summarized as in the following several aspects. The proposed algorithm replaces local Laplacian factor in LBMMC with fuzzy membership degree of training samples to sample subclass. It fully embedded fuzzy membership degree into the between-class scatter matrix and the within-class scatter matrix by transforming the between-class scatter matrix, which is different from the fuzzy embedding way of FIFDA. The variance of training samples is used as the bandwidth of the Gaussian kernel, which can avoid effectively uncertainty of parameter settings, and it can meet the properties of Gaussian distribution. In order to remove noise in the training sample and retain initial data information in kernel subspace, we only abandon partial eigenvectors corresponding least eigenvalues in accordance with 2% summation of all eigenvalues. In the kernel subspace generated by KPCA, the algorithm performs the MMC algorithm embedded fuzzy factor. Finally, we can obtain a succinct kernel transformational operator. The proposed algorithm does not need an iterative procedure as other feature decomposition-based algorithms and does not need to solve inverse matrix. Since it is not necessary to compute the inverse matrix, the small size sample (SSS) problem is alleviated in traditional LDA and its improvements.

The organization of this paper is as follows. In Section 2, the KPCA, MMC, LLD, and FIFDA are reviewed briefly. In Section 3, a new method of embedding fuzzy factor into scatter matrix is presented, and new succinct computing formulas for two scatter matrices are described in detail. In Section 4, the proposed algorithm and its computational complexity (includes training time and testing time) are discussed in detail. In Section 5, experiments are presented to demonstrate the effectiveness of the proposed algorithm. Conclusions are drawn in Section 6.

#### 2. Related Works

For the sake of convenient description, in this section, we introduce simply several corresponding algorithms in connection with our research.

##### 2.1. Kernel Principal Component Analysis (KPCA)

For data base , where , , KPCA is a nonlinearly mapping projecting the input space into the kernel feature space, and it can be given by , satisfying . Let . A positive semidefinite kernel function, or Mercer kernel, , computes an inner product in Hilbert space:We may define the kernel matrix . The element of is to be . Therefore, is an inner product matrix in . The kernel matrix can be eigen-decomposed as , where is a diagonal matrix storing the eigenvalues , and is a matrix with the corresponding eigenvectors as columns. Then a feature space principal axes , requiring , can be denoted by [3]. Hence,where denotes the element of . In order to remove noise, after discarding partial principal axes corresponding with the least eigenvalues according to some percentage (commonly 2–5%), the remaining axes constitute projective operator, denoted by ; that is, , , according to the descent order of corresponding eigenvalues. Let , and . Let , ; then the map of on kernel space is .

On these bases, Jenssen [4] proposed the Renyi entropy based way of choosing kernel feature axes. In this method, is a sequence in descending order, some axes can be selected as the projective operators corresponding to the top eigenvalues, whereas we find that chosen projective operators according to Renyi entropy are not better than the direct selective eigenvectors corresponding largest eigenvalues for later classification. Therefore, in this paper, we do not use kernel entropy methods.

##### 2.2. Maximum Margin Criterion

Suppose that there are known pattern classes in training data set , and is the number of training samples in class. The between-class scatter matrix and within-class scatter matrix can be written as (3) and (4), respectively,where is the total number of training samples, , denotes training sample in the class, is the mean of training samples in class, and refers to the mean of all training samples.

In classical Fisher discriminant analysis, the discrimination criterion is maximizing the ratio of the between-class scatter to the within-class scatter. MMC defined the difference of between-class scatter matrix and within-class scatter matrix as discriminant rule and obtained a transformational matrix , , and . The concerned problem can come down to the following constrained optimization:Solving this optimization problem is really the feature decomposition of . The generated eigenvectors are sorted in descending order according to the corresponding eigenvalue. consists of first eigenvectors. Comparing with LDA, the main merit of MMC is to avoid calculating inverse within-class scatter matrix. However, the within-class scatter matrix is often the singular matrix.

##### 2.3. Linear Laplacian Discrimination (LLD) [12]

Inspired by the application of Laplacian Eigenmaps in manifold learning and its linearization LPP in clustering and recognition, Zhao et al. [12] proposed Linear Laplacian Discrimination. Its basic theory can be described as follows. Supposing that is a -dimension sample space, is the Euclidean norm in the original sample space. Weight is defined bySign , and denotes an all-one column vector of length . Let is a 0-1 indicator matrix of class and satisfies , where is a training sample set of class. Let Then within-class scatter matrix can be calculated byFor between-class scatter matrix, weight is defined as follows:Letand then between-class scatter matrix can be defined asFinally, transformational operator will be solved satisfying

Similar to what the authors pointed out, like LDA, LLD encounters the computational troubles as well when the within-class scatter matrix is singular. Although the authors proposed some methods to address this issue, this problem was not resolved in essence. Moreover, how to assign parameter in the expression computing weight is a problem, and two weights in both within-class scatter matrix and between-class scatter matrix are in disagreement.

##### 2.4. Fuzzy Inverse Fisher Discriminant Analysis (FIFDA)

In FIFDA, fuzzy membership degree and each class center are obtained through FKNN algorithm. The fuzzy membership degree of training sample can be computed as follows:where denotes neighbor size and is a the number of the neighbors of the sample in the class. satisfies two obvious properties:The mean vector of each class isThe corresponding fuzzy within-class scatter matrix and the fuzzy between-class scatter matrix can be defined as (17) and (18), respectively, where is a constant which controls the influence of fuzzy membership degree.

Finally the fuzzy inverse Fisher criterion function can be defined as follows:In this method, the fuzzy between-class scatter matrix and fuzzy within-class scatter matrix are redefined according to FKNN. This method reduces the sensitivity to substantial variations between face images caused by varying illumination, viewing conditions, and facial expression. But we can find out that the embedded way of fuzzy membership degree is not very appropriate in the definition of fuzzy between-class scatter matrix. The main reason is that the fuzzy membership degree reflects relation between a training sample and some class center, but expresses the difference between the fuzzy mean of class and total training sample mean. So it is improper to take as the weight of . Besides, the parameter in KNN affects also the performance of FIFDA.

#### 3. Fuzzy Maximum Margin Criterion (FMMC)

Based on the statement in Section 2, there are several problems that merit our attention. In the LLD method, the weight is a Gaussian function, and the important property of Gaussian function is smoothing and denoising. But it is uncertain how to assign parameter . In particular, it cannot provide further classifying information when the training samples overlap. However, fuzzy theory can deal better with this problem. FIFDA method defines fuzzy membership degree with the adjacent properties between training samples. But we can see that there are some artificial factors in the fuzzy membership definition. Moreover, the fuzzy between-class scatter matrix in FIFDA cannot integrate tightly fuzzy membership degree and the samples, and the latter is used to calculate the fuzzy membership degree. In this paper, in order to avoid the uncertainty of fuzzy membership in FIFDA, we employ traditional Euclidean distance based fuzzy membership and redefine fuzzy between-class scatter matrix and within-class scatter matrix. Finally, we give a succinct way of computing new fuzzy between-class scatter matrix.

Suppose there are known pattern classes in training data set , is the number of training samples in class. is the number of total training samples. , denotes training sample in the class, is the mean of training samples in class, and refers to total mean of all training samples. is the membership degree of training sample to class . Consider Corresponding fuzzy within-class scatter matrix can be defined as Let , and indicates an all-one row vector of length . satisfies , where is a matrix with the samples in class as columns.

Set Then because Therefore, the fuzzy between-class scatter matrix can be defined as follows: In order to reduce computational complexity, the above fuzzy between-class scatter matrix can be further be simplified as follows: Let , let . Considerand then we haveTherefore, the fuzzy maximum margin problem can be translated into an objective optimization problem as We can obtain transformational operators from eigendecomposition of . That is, firstly, we eigendecompose and then sort the acquired eigenvalues in descending order, which are denoted by , and corresponding eigenvectors are . Finally we obtain transformational operator which consisted of the first eigenvectors according to dimensional reduction requirement.

In the above statements, the standard between-class scatter matrix is unfolded as the expression about the difference between training sample and subclass mean , which is convenient for fuzzy membership degree embeds to between-class scatter matrix. Since fuzzy membership degree reflects the affiliation of sample to subclass , it is the most efficient way to let fuzzy membership degree act directly on for reflecting the constraint role of fuzzy membership to sample deviation. It enjoys larger superiority than any other fuzzy scatter matrix definition. The later experiments will reveal the flexibility of the definition.

In calculating fuzzy between-class scatter matrix, the triple accumulation operation is transformed into the succinct matrix operation. It can use effectively the matrix computation superiority provided by MATLAB. In particular, (28) shows that fuzzy between-class scatter matrix can be computed by the product of both matrix and its inverse matrix. It can effectively avoid that is an inexact real symmetric matrix caused by machine precision and computational errors, whereas the scatter matrix obtained by summation is usually inexact real symmetric matrix. In particular, when the number of training samples is very large, this phenomenon easier emerges. However, the real symmetry is very important to ensure that the obtained eigenvalues are real in successive eigendecomposition. In reality, in experimentation on the AR face database, we can find that some eigenvalues are a complex number when we calculated directly fuzzy between-class scatter matrix according to (25). We are not seeking to let this situation occur in feature extraction process. Therefore, our way offers a kind of concise reliable computational method of fuzzy between-class scatter matrix for the other researchers to embed fuzzy factor (or other weights) into the scatter matrix.

#### 4. Fuzzy Kernel Maximum Margin Criterion Based Algorithm for Feature Extraction

In this section, we provide a novel fuzzy kernel maximum margin criterion (FKMMC) for feature extraction, which consists of FMMC and KPCA. In KPCA step, we obtain transformational operator according to 98% selection ratios for eigenvalue. Other eigenvectors are abandoned to achieve the purpose of denoising. The concrete algorithm can be described as follows.

*Algorithm 1 (fuzzy kernel maximum margin criterion (FKMMC)). **Step 1.* Compute the standard variance of training samples and refer to it as .*Step 2.* Compute kernel matrix of training samples by Gaussian kernel function with bandwidth . *Step 3*. is eigendecomposed as , , , . We choose first values with 98%, and let and let . Then kernel projective operator is as follows: . *Step 4*. Projecting original samples into kernel subspace, we have , for convenience sake, yet let . *Step 5*. First compute total samples mean and subclass samples mean and then compute the fuzzy membership degree of each sample to subclass mean according to (20). *Step 6*. Compute fuzzy within-class scatter matrix and fuzzy between-class scatter matrix according to (23) and (28). *Step 7*. Eigendecompose and then rank eigenvector according to eigenvalues descending order, denoted by . We choose first eigenvectors to reconstruct transformational operator according to the desire of dimensional reduction. *Step 8*. Formulate ultimate FKMMC transformational operator: . *Step 9*. Output .

*Algorithm 2 (FKMMC based recognition algorithm). **Step 1*. Input recognized sample . *Step 2*. Compute . *Step 3.* Output .

From the above algorithm we can see that the algorithm only includes matrix product, matrix transposition, diagonalization, and eigendecomposition. The eigendecomposed matrixes are all real symmetrical matrixes and the matrix possesses real eigenvalues according to matrix theory. Thus, these matrixes can be all eigendecomposed, and the algorithm does not need to calculate inverse matrix. Therefore, the total computational process is feasible. In Algorithm 1, is a matrix and is a matrix, so that the output of Algorithm 1 is a matrix. In Algorithm 2, is a vector, and its output is a vector. In the image recognition, commonly satisfy . Therefore, the proposed algorithm can reduce efficiently the data dimension. Since the kernel map makes data in kernel space to be separated in an easier way, the FMMC can provide more classified information. So the proposed algorithm includes more classified information while data dimension is reduced. On the other hand, the computational complexity of the sample standard variance is , and the kernel matrix ’s is . The computational complexity of eigendecomposition kernel matrix is . The computational complexity of both and is all . Considering , so the computational complexity of the proposed algorithm is , that is, the computational complexity of computing the kernel matrix.

#### 5. Experimental Results

In our experiments, four face image databases, namely, AR database, Yale B database, FERET database, and Georgia Tech face database, are used to compare the performance of the proposed fuzzy kernel maximum margin criterion (FKMMC) approach with some other algorithms: KPCA [3], KLDA [13], LLD [12], LPP [9], 2DLDA [15], FIFDA [14], TSA [10], LBMMC [16], 2DMMC [17], and B2DMMC [18]. The experiments are implemented on an Intel(R) Core (TM) 2 Duo CPU T7500 @ 2.20 GHz Dell Computer with 1 G RAM and the programming environment is MATLAB (version 2006a).

##### 5.1. Experiments on Georgia Tech Face Database

The face image database used in our experiments is the Georgia Tech Face Database (GTFD) [21, 22], which consists of 50 subjects with 15 face images available for each subject. These face images vary in size, facial expression, illumination, and rotation along the image plane direction and perpendicular direction to the image plane. In our experiments, all images in the database were manually cropped and resized to 60 × 40. After the images were cropped, most of the complex background has been excluded. Also, in-plane rotation was partially eliminated, but the out-of-plane rotation was left untouched. They are further converted to gray level images for both training and testing purposes.

In our first experiment, we choose first samples per individual for training and the remaining individual for testing, and let , respectively. For each , KPCA, KLDA, LLD, 2DLDA, LPP, FIFDA, TSA, LBMMC, 2DMMC, B2DMMC, and the proposed FKMMC are used for feature extraction, respectively. In the PCA stage of LLD and FIFDA, the eigenvectors are selected as transformational operator keeping nearly 99% image energy. In the FIFDA, let , and the FKNN parameter is set as . In the LBMMC algorithm, the parameter of the similarity is set as . In the LPP and TSA algorithms, is set as default. In the 2DLDA, LPP, TSA, LBMMC algorithms, the selected eigenvectors (projection vectors) are full rank. In the KPCA, KLDA, LLD, and the proposed FKMMC, the number of selected eigenvectors (projection vectors) is 20% of the number of total training samples. In the TSA algorithm, the number of iterations is taken to be 10. In the B2D-MMC algorithm, the number of the layer is set as 6. Finally, a nearest neighbor classifier with Euclid distance is employed. The final results are given in Figure 1. From Figure 1, we can see that the proposed method enjoys the best recognition rate. Although TSA and 2DLDA near the result of the proposed algorithm, TSA needs 20 eigendecomposition for 10 recursions, and 2DLDA needs to calculate the Moore-Penrose pseudo inverse of a matrix. But the proposed algorithm does not need to calculate the inverse of matrix and recursion, and moreover, its stability and true recognition rate are also higher than 2DLDA and TSA.