Abstract

In many image classification applications, it is common to extract multiple visual features from different views to describe an image. Since different visual features have their own specific statistical properties and discriminative powers for image classification, the conventional solution for multiple view data is to concatenate these feature vectors as a new feature vector. However, this simple concatenation strategy not only ignores the complementary nature of different views, but also ends up with “curse of dimensionality.” To address this problem, we propose a novel multiview subspace learning algorithm in this paper, named multiview discriminative geometry preserving projection (MDGPP) for feature extraction and classification. MDGPP can not only preserve the intraclass geometry and interclass discrimination information under a single view, but also explore the complementary property of different views to obtain a low-dimensional optimal consensus embedding by using an alternating-optimization-based iterative algorithm. Experimental results on face recognition and facial expression recognition demonstrate the effectiveness of the proposed algorithm.

1. Introduction

Many computer vision and pattern recognition applications involve processing data in a high-dimensional space. Directly operating on such high-dimensional data is difficult due to the so-called “curse of dimensionality.” For computational time, storage, and classification performance considerations, dimensionality reduction (DR) techniques provide a means to solve this problem by generating a succinct and representative low-dimensional subspace of the original high-dimensional data space. Over the past two decades, many dimensionality reduction algorithms have been proposed and successfully applied to face recognition [1]. The most representative ones are principal component analysis (PCA) and linear discriminant analysis (LDA) [2].

PCA is an unsupervised dimensionality reduction method, which aims to project the high-dimensional data into a low-dimensional subspace spanned by the leading eigenvectors of a covariance matrix. LDA is supervised and its goal is to pursue a low-dimensional subspace by maximizing the ratio of between-class variance to within-class variance. Due to the utilization of label information, LDA usually outperforms PCA for classification tasks when sufficient labeled training data are available. While these two algorithms have attained reasonable good performance in pattern classification, they may fail to discover a highly nonlinear submanifold embedded in the high-dimensional ambient space as they seek only a compact Euclidean subspace for data representation and classification [3].

Recently, there has been considerable interest in manifold learning algorithms for dimensionality reduction and feature extraction. The basic consideration of these algorithms is that the high-dimensional data may lie on an intrinsic nonlinear low-dimensional manifold. In order to detect the underlying manifold structure, nonlinear dimensionality reduction algorithms such as ISOMAP [4], locally linear embedding (LLE) [5], and Laplacian eigenmap (LE) [6] have been proposed. All of these algorithms are defined only on the training data, and the issue of how to map new testing data remains difficult. Therefore, they cannot be applied to classification problem directly. To overcome the above so-called out-of-sample problem, He and Niyogi [7] developed the locality preserving projection (LPP), in which the linear projection function is adopted for mapping new data samples. As LPP is originally unsupervised, some recent attempts have exploited the discriminant information and derived many discriminant manifold learning algorithms to enhance the classification performance. The representative algorithms include local discriminant embedding (LDE) [8], locality sensitive discriminant analysis (LSDA) [9], margin Fisher analysis (MFA) [10], local Fisher discriminant analysis (LFDA) [11], and discriminative geometry preserving projection (DGPP) [12]. Despite having different assumptions, all these algorithms can be unified into a general graph embedding framework (GEF) [10] with different constraints. While these algorithms have utilized both local geometry and the discriminative information for dimensionality reduction and achieved reasonably good performance in different pattern classification tasks, they assume that the data are represented in a single vector. They can be regarded as single-view-based methods and thus cannot handle data described by multiview features directly. In many practical pattern classification applications, different views (visual features) have their own specific statistical properties, and each view represents the data partially. To address this problem, the traditional solution for multiple view data is to simply concatenate vectors of different views into a new long vector and then apply dimensionality reduction algorithms directly on the concatenated vector. However, this concatenation ignores the diversity of multiple views and thus cannot explore the complementary nature and specific statistical properties of different views. Recent studies have provided convincing evidence of this fact [1315]. Hence, it is more reasonable to assign different weights to different views (features) for feature extraction and classification. In computer vision and machine learning research, many works have shown that leveraging the complementary nature of the multiple views can better represent the data for feature extraction and classification [1315]. Therefore, an efficient manifold learning algorithm that can cope with multiview data and place proper weights on different views is of great interest and significance.

Motivated by the above observations and reasons, we propose unifying different views under a discriminant manifold learning framework called multiview discriminative geometry preserving projection (MDGPP). Under each view, we can implement the discrimination and local geometry preservations as those used in discriminative geometry preserving projection (DGPP) [12]. Unifying different views in such a multiview discriminant manifold learning framework is meaningful, since data with different features can be appropriately integrated to further improve the classification performance. Specifically, we first implement the discrimination preservation by maximizing the average weighted pairwise distance between samples in different classes and simultaneously minimizing the average weighted pairwise distance between samples in the same class. Meanwhile, the local geometry preservation is implemented by minimizing the reconstruction error of samples in the same class. Then, we learn a low-dimensional feature subspace by utilizing both intraclass geometry and interclass discrimination information, such that the complementary nature of different views (features) can be fully exploited when classification is performed in the derived feature subspace. Experimental results on face recognition and facial expression recognition are presented to demonstrate the effectiveness of the proposed algorithm.

The remainder of the paper is organized as follows. Section 2 reviews the related works. Section 3 presents the details of the proposed MDGPP algorithm. Experimental results on face recognition are presented in Section 4, and the concluding remarks are provided in Section 5.

Multiview learning is one important topic in the machine learning and pattern recognition communities. In such a setting, view weight information is introduced to measure the importance of different features in characterizing data, and different weights reflect different contribution to the learning process. The aim of multiview learning is to exploit more complementary information of different views rather than only a single view to further improve the learning performance. The traditional solution for multiview data is to concatenate all features into one vector and then conduct machine learning for such feature space. However, this solution is not optimal as these features usually have different physical properties. Simply concatenating them will ignore the complementary nature and specific statistical properties of different views, and thus causing performance degradation. In addition, this simple concatenation will end up with the curse of dimensionality problem for the subsequent learning task.

In order to perform multiview learning, much effort has been focused on multiview metric learning [14], multiview classification and retrieval [16], multiview clustering [15], and multiview semisupervised learning [17]. All these approaches demonstrated that the learning performance can be significantly enhanced if the complementary nature of different views is exploited and all views are appropriately integrated. It is very natural that multiview learning idea should also be considered in dimensionality reduction. However, most of the existing dimensionality reduction algorithms are designed only for single view data and cannot cope with multiview data directly. To address this problem, Long et al. [18] first proposed multiple view spectral embedding (MVSE) method. MVSE performs a dimensionality reduction process on each view independently, and then based on the obtained low-dimensionality representation, it constructs a common low-dimensional embedding that is close to each representation as much as possible. Although MVSE allow selecting different dimensionality reduction algorithms for each view, the original multiview data are invisible to the final learning process. Thus, MVSE cannot well explore the complementary information of different views. Xia et al. [19] proposed multiview spectral embedding (MSE) method to find low-dimensional and sufficiently smooth embedding based on the patch alignment framework [20]. However, MSE ignores the flexibility of allowing shared information between subset of different views owing to the global coordinate alignment process. To unify different views for dimensionality reduction under a probabilistics, Xie et al. [21] extended the stochastic neighbor embedding (SNE) to its multiview version and proposed multiview stochastic neighbor embedding (MSNE). Although MSNE operates on a probabilistic framework, it is an unsupervised method and its classification abilities may be limited since the class label information is not used in the learning process. More recently, inspired by the recent advances of sparse coding technique, Han et al. [22] proposed spectral sparse multiview embedding (SSMVE) method to deal with dimensionality reduction for multiview data. Although SSMVE can impose sparsity constraint on the loading matrix of multiview dimensionality reduction, it is unsupervised and does not explicitly consider the manifold structure on which the high dimensional data possibly reside. In the next section, focusing on the manifold learning and pattern classification, we propose a novel multiview discriminative geometry preserving projection (MDGPP) for multiview dimensionality reduction, which explicitly considers the local manifold structure and discriminative information as well as the complementary characteristics of different views in high-dimensional data.

3. Multiview Discriminative Geometry Preserving Projection (MDGPP)

In this section, we propose a new manifold learning algorithm called multiview discriminative geometry preserving projection (MDGPP), which aims to find a unified low-dimensional and sufficiently smooth embedding over all views simultaneously. To better explain the algorithm details of the proposed MDGPP, we introduce some important notations used in the remainder of this paper. Capital letters such as denote data matrices, and represents the entry of . Lower case letters such as denote data vectors, and represents the th data element of . Superscript such as and represents data from the th view, respectively. Based on these notations, MDGPP can be described as follows according to the DGPP framework [12].

Given a multiview data set with data samples and each with feature representations, that is, , wherein represents the feature matrix for the th view, the aim of MDGPP is to find a projective matrix to map into a low-dimensional representation through , where denotes the dimension of low-dimensional feature representation and satisfies . The workflow of MDGPP can be simply described as follows. First, MDGPP builds a part optimization for a sample on a single view by preserving both the intraclass geometry and interclass discrimination. Afterward, all parts of optimization from different views are unified as a whole via view weight coefficients. Then an alternating-optimization-based iterative algorithm is derived to obtain the optimal low-dimensional embedding from multiple views.

Given the th view , MDGPP first makes an attempt to preserve discriminative information in the reduced low-dimensional space by maximizing the average weighted pairwise distance between samples in different classes and simultaneously minimizing the average weighted pairwise distance between samples in the same class on the th view. Thus, we have where denotes the trace operation of matrix, is the graph Laplacian on the th view, is a diagonal matrix with its element on the th view, and is the weighting matrix which encodes both the distance weighting information and the class label information on the th view where in is the class label of sample , is the number of samples belonging to the th class, and is set as according to LPP [7] for locality preservation.

Second, we try to implement the local geometry preservation by assuming that each sample can be linearly reconstructed by the samples which share the same class label with on the th view. Thus, we can obtain the reconstruction coefficient by minimizing the reconstruction error on the th view; that is, under the constraint

Then, by solving (3) and (4), we have where denotes the local Gram matrix and .

Once obtaining the reconstruction coefficient on the th view, then MDGPP aims to reconstruct from with in the projected low-dimensional space; thus we have where is an identity matrix defined on the th view, and is the reconstruction coefficient matrix on the th view.

As a result, by combining (1) and (6) together, the part optimization for is where , and is a tradeoff coefficient which is empirically set as 1 in this experiment.

Based on the local manifold information encoded in and , (7) aims at finding a sufficiently smooth low-dimensional embedding by preserving the interclass discrimination and intraclass geometry on the th view.

Because multiviews could provide complementary information in characterizing data from different viewpoints, different views certainly have different contributions to the low-dimensional feature subspace. In order to well discover the complementary information of data from different views, a nonnegative weighted set is imposed on each view independently. Generally speaking, the larger is, the more the contribution of the view is made to obtain the low-dimensional feature subspace. Hence, by summing over all parts of optimization defined in (7), we can formulate MDGPP as the following optimization problem: subject to

The solution to in (8) subject to (9) is corresponding to the maximum over different views, and otherwise, which means that only the best view is finally selected by this method. Consequently, this solution cannot meet the demand for exploring the complementary characteristics of different views to get a better low-dimensional embedding than that based on a single view. In order to avoid this problem, we set with by following the trick utilized in [1619]. In this condition, achieves its maximum when according to and . Similarly for different views can be obtained by setting ; thus each view makes a specific contribution to obtaining the final low-dimensional embedding. Consequently, the new objective function of MDGPP can be defined as follows: subject to

The above optimization problem is a nonlinearly constrained nonconvex optimization problem, so there is no direct approach to find its global optimal solution. In this paper, we derive an alternating-optimization-based iterative algorithm to find a local optimal solution. The alternating optimization iteratively updates the projection matrix and weight vector .

First, we update by fixing . The optimal problem (10) subject to (11) becomes subject to

Following the standard Lagrange multiplier, we construct the following Lagrangian function by incorporating the constraint (13) into (12): where the Lagrange multiplier satisfies .

Taking the partial derivation of the Lagrangian function with respect to and and setting them to zeros, we have

Hence, according to (13) and (14), the weight coefficient can be calculated as

Then, we can make the following observations according to (15): If ; then the values of different will be close to each other. If , then only corresponding to the maximum over different views, and otherwise. Thus, the choice of should respect to the complementary property of different views. The effect of the parameter will be discussed in the later experiments.

Second, we update by fixing . The optimal problem (10) subject to (11) can be equivalently transformed into the following form: subject to where . Since defined in (7) is a symmetric matrix, is also a symmetric matrix.

Obviously, the solution of (18) subject to (19) can be obtained by solving the following standard eigendecomposition problem

Let the eigenvectors be solutions of (20) ordered according to eigenvalues . Then, the optimal projection matrix is given by . Now, we discuss how to determine the reduced feature dimension by using the Ky Fan theorem [23].

Ky Fan Theorem. Let be a symmetric matrix with eigenvalues and the corresponding eigenvectors . Then Moreover, the optimal is given by , where is an arbitrary orthogonal matrix.

From the above Ky Fan theorem, we can make the following observations. The optimal solution to (18) subject to (19) is composed of the largest eigenvectors of the matrix , and the optimal value of objective function (18) equals the sum of the largest eigenvalues of the matrix . Therefore, the optimal reduced feature dimension is equivalent to the number of positive eigenvalues of the matrix .

Alternately updating and by solving (17) and (20) until convergence, we can obtain the final optimal projection matrix for multiple views. A simple initialization for could be . According to the aforementioned statement, the proposed MDGPP algorithm is summarized as follows.

Algorithm 1 (MDGPP algorithm).
Input. A multiview data set , the dimension of the reduced low-dimensional subspace , tuning parameter , iteration number , and convergence error .
Output. Projection matrix .
Algorithm.

Step 1. Simultaneously consider both intraclass geometry and interclass discrimination information to calculate for each view according to (7).

Step 2 (initialization).  (1)Set ; (2) obtain by solving the eigendecomposition problem (20).

Step 3 (local optimization). For  (1)calculate as shown in (17); (2)solve the eigenvalue equation in (20); (3)sort their eigenvectors according to their corresponding eigenvalues: , and obtain ; (4)if and , then go to Step 4.

Step 4 (output projection matrix). Output the final optimal projection matrix .
We now briefly analyze the computational complexity of the MDGPP algorithm, which is dominated by three parts. One is for constructing the matrix for different views. As shown in (7), the computational complexity of this part is . In addition, each iteration involves computing view weight and solving a standard eigendecomposition problem; the computational complexity of running two parts in each iteration is and , respectively. Therefore, the total computational complexity of MDGPP is , where denotes the iteration number and is always set to less than five in all experiments.

4. Experimental Results

In this section, we evaluate the effectiveness of our proposed MDGPP algorithm for two image classification tasks including face recognition and facial expression recognition. Two widely used face databases including AR [24] and CMU PIE [25] are employed for face recognition evaluation, and the well-known Japanese female facial expression (JAFFE) [26] database is used for facial expression recognition evaluation. We also compare the proposed MDGPP algorithm with some traditional single-view-based dimensionality reduction algorithms, such as PCA [2], LDA [2], LPP [3], MFA [10], DGPP [12], and the three latest multiview dimensionality reduction algorithms, including MVSE [18], MSNE [21], MSE [19], and SSMVE [22]. The nearest neighbor classifier with the Euclidean distance was adopted for classification. For a fair comparison, all the results reported here are based on the best tuned parameters of all the compared algorithms.

4.1. Data Sets and Experimental Settings

We conducted face recognition experiments on the widely used AR and CMU PIE face databases and facial expression recognition experiments on the well-known Japanese female facial expression (JAFFE) database.

The AR database [24] contains over 4,000 color images corresponding to 126 people (70 men and 56 women), which include frontal view faces with different facial expressions, illumination conditions, and occlusions (sun glasses and scarf). Each person has 26 different images taken in two sessions (separated by two weeks). In our experiments, we used a subset of 800 face images from 100 persons (50 men and 50 women) with eight face images of different expressions and lighting conditions per person. Figure 1 shows eight sample images of one individual from the subset of the AR database.

The CMU PIE database [25] comprises more than 40,000 facial images of 68 people with different poses, illumination conditions, and facial expressions. In this experiment, we selected a subset of the CMU PIE database which consists of 3060 frontal face images with varying expression and illumination from 68 persons with 45 images from each person. Figure 2 shows some sample images of one individual from the subset of the CMU PIE database.

The Japanese female facial expression (JAFFE) database [26] contains 213 facial images of ten Japanese women. Each facial image shows one of seven expressions: neutral, happiness, sadness, surprise, anger, disgust, or fear. Figure 3 shows some facial images from the JAFFE database. In this experiment, following the general setting scheme of facial expression recognition, we discard all the neutral facial images and only utilize the remainder 183 facial images which include six basic facial expressions.

For all the face images in the above three face databases, the facial part of each image was manually aligned, cropped, and resized into according to the eye’s positions. For each facial image, we extract the commonly used four kinds of low-level visual features to represent four different views. These four features include color histogram (CH) [27], scale-invariant feature transform (SIFT) [28], Gabor [29], and local binary pattern (LBP) [30]. For the CH feature extraction, we used 64 bins to encode a histogram feature for each facial image according to [27]. For the SIFT feature extraction, we densely sampled and calculated the SIFT descriptors of patches over a grid with spacing of 8 pixels according to [28]. For the Gabor feature extraction, following [29], we adopted 40 Gabor kernel functions from five scales and eight orientations. For the LBP feature extraction, we followed the parameter settings in [30] and utilized 256 bins to encode a histogram feature for each facial image. For more details on these four feature descriptors, please refer to [2730]. Because these four features are complementary to each other in representing facial images, we empirically set the tuning parameter in MDGPP to be five.

In this experiment, each facial image set was partitioned into the nonoverlap training and testing sets. For each database, we randomly selected 50% data as the training set and use the remaining 50% data as the testing set. To reduce statistical variation for each random partition, we repeated these trials independently ten times and reported the average recognition results.

4.2. Compared Algorithms

We compared our proposed MDGPP algorithm with the following dimensionality reduction algorithms.(1)PCA [2]: PCA is an unsupervised dimensionality reduction algorithm.(2)LDA [2]: LDA is a supervised dimensionality reduction algorithm. We adopted a Tikhonov regularization term rather than PCA preprocessing to avoid the well-known small sample size (singularity) problem in LDA.(3)LPP [3]: LPP is an unsupervised manifold learning algorithm. There is a nearest neighbor number to be tuned in LPP and it was empirically set to be five in our experiments. In addition, the Tikhonov regularization was also adopted to avoid the small sample size (singularity) problem in LPP.(4)MFA [10]: MFA is a supervised manifold learning algorithm. There are two parameters (i.e., nearest neighbor number and nearest neighbor number) to be tuned in MFA. We empirically set and in our experiments. Meanwhile, the Tikhonov regularization was also adopted to avoid the small sample size (singularity) problem in MFA.(5)DGPP [12]: DGPP is a supervised manifold learning algorithm. There is a tradeoff parameter to be tuned in DGPP and it was empirically set to be one in our experiments.(6)MVSE [18]: MSVE is an initially proposed multiview algorithm for dimensionality reduction.(7)MSNE [21]: MSNE is a probability-based unsupervised multiview algorithm. We followed the parameter setting in [21] and set the tradeoff coefficient to be five in our experiments.(8)MSE [19]: MSE is a supervised multiview algorithm. There are two parameters (i.e., the nearest neighbor number and the tuning coefficient ) to be tuned in MSE. We empirically set and in our experiments.(9)SSMVE [22]: SSMVE is a sparse unsupervised multiview algorithm. We followed the parameter setting method in [22] and set the regularized parameter to be one in our experiments.

It is worth noting that since PCA, LDA, LPP, MFA, and DGPP are all single-view-based algorithms, these five algorithms adopt the conventional feature concatenation-based strategy to cope with the multiview data.

4.3. Experimental Results

For each face image database, the recognition performance of different algorithms was evaluated on the testing data separately. The conventional nearest neighbor classifier with the Euclidean distance was applied to perform recognition in the subspace derived from different dimensionality reduction algorithms. Tables 1, 2, and 3 report the recognition accuracies and the corresponding optimal dimensions obtained on the AR, CMU PIE, and JAFFE databases, respectively. Figures 4, 5, and 6 illustrate the recognition accuracies versus the variation of reduced dimensions on the AR, CMU PIE, and JAFFE databases, respectively. According to the above experimental results, we can make the following observations.(1)As can be seen from Tables 1, 2, and 3 and Figures 4, 5, and 6, our proposed MDGPP algorithm consistently outperforms the conventional single-view-based algorithms (i.e., PCA, LDA, LPP, MFA, and DGPP) and the latest multiview algorithms (i.e., MVSE, MSNE, MSE, and SSMVE) in all the experiments, which implies that extracting a discriminative feature subspace by using both intraclass geometry and interclass discrimination and explicitly considering the complementary information of different facial features can achieve the best recognition performance.(2)The multiview learning algorithms (i.e., MVSE, MSNE, MSE, SSMVE, and MDGPP ) perform much better than single-view-based algorithms (i.e., PCA, LDA, LPP, MFA, and DGPP), which demonstrates that simple concatenation strategy cannot duly combine features from multiple views, and the recognition performance can be successfully improved by exploring the complementary characteristics of different views.(3)For the single-view-based algorithms, the manifold learning algorithms (i.e., LPP, MFA, and DGPP) perform much better than the conventional dimensionality reduction algorithms (i.e., PCA and LDA). This observation confirms that the local manifold structure information is crucial for image classification. Moreover, the supervised manifold learning algorithms (i.e., MFA and DGPP) perform much better than the unsupervised manifold learning algorithm LPP, which demonstrates that the utilization of discriminant information is useful to improve the image classification performance.(4)For the multiview learning algorithms, the supervised multiview algorithms (i.e., MSE and MDGPP) outperform the unsupervised multiview algorithms (i.e., MVSE, MSNE, and SSMVE) due to the utilization of the labeled facial images.(5)Although MVSE, MSNE, and SSMVE are all unsupervised multiview learning algorithms, SSMVE performs much better than MVSE and MSNE. The possible explanation is that the SSMVE algorithm adopts the sparse coding technique, which is naturally discriminative in determining the appropriate combination of different views.(6)Among the compared multiview learning algorithms, MVSE performs the worst. The reason is that MVSE performs a dimensionality reduction process on each view independently. Hence it cannot fully integrate the complementary information of different views to produce a good low-dimensional embedding.(7)MDGPP can improve the recognition performance of DGPP. The reason is that MDGPP can make use of multiple facial feature representations in a common learned subspace such that some complementary information can be explored for recognition task.

4.4. Convergence Analysis

Since our proposed MDGPP is an iteration algorithm, we also evaluate its recognition performance with different numbers of iteration. Figures 7, 8, and 9 show the recognition accuracy of MDGPP versus different numbers of iteration on the AR, CMU PIE, and JAFFE databases, respectively. As can be seen from these figures, we can observe that our proposed MDGPP algorithm can converge to a local optimal optimum value in less than five iterations.

4.5. Parameter Analysis

We investigate the parameter effects of our proposed MDGPP algorithm: tradeoff coefficient and tuning parameter . Since each parameter can affect the recognition performance, we fix one parameter as used in the previous experiments and test the effect of the remaining one. Figures 10, 11, and 12 show the influence of the parameter in the MDGPP algorithm on the AR, CMU PIE, and JAFFE databases, respectively. Figures 13, 14, and 15 show the influence of the parameter in the MDGPP algorithm on the AR, CMU PIE, and JAFFE databases, respectively. From Figure 10 to Figure 15, we can observe that MDGPP demonstrates a stable recognition performance over a large range of both and . Therefore, we can conclude that the performance of MDGPP is not sensitive to the parameters and .

5. Conclusion

In this paper, we have proposed a new multiview learning algorithm, called multiview discriminative geometry preserving projection (MDGPP) for feature extraction and classification by exploring the complementary property of different views. MDGPP can encode different features from different views in a physically meaningful subspace and learn a low-dimensional and sufficiently smooth embedding over all views simultaneously with an alternating-optimization-based iterative algorithm. Experimental results on three face image databases show that the proposed MDGPP algorithm outperforms other multiview and single view learning algorithms.

Conflict of Interests

The authors declare that they have no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grants no. 70701013 and 61174056, the Natural Science Foundation of Henan Province under Grant no. 102300410020, the National Science Foundation for Postdoctoral Scientists of China under Grant no. 2011M500035, and the Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant no. 20110023110002.