Mathematical Problems in Engineering

Volume 2015, Article ID 706180, 10 pages

http://dx.doi.org/10.1155/2015/706180

## Semisupervised Tangent Space Discriminant Analysis

Shanghai Key Laboratory of Multidimensional Information Processing, Department of Computer Science and Technology, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China

Received 8 July 2014; Revised 5 November 2014; Accepted 14 November 2014

Academic Editor: Xin Xu

Copyright © 2015 Yang Zhou and Shiliang Sun. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A novel semisupervised dimensionality reduction method named Semisupervised Tangent Space Discriminant Analysis (STSD) is presented, where we assume that data can be well characterized by a linear function on the underlying manifold. For this purpose, a new regularizer using tangent spaces is developed, which not only can capture the local manifold structure from both labeled and unlabeled data, but also has the complementarity with the Laplacian regularizer. Furthermore, STSD has an analytic form of the global optimal solution which can be computed by solving a generalized eigenvalue problem. To perform nonlinear dimensionality reduction and process structured data, a kernel extension of our method is also presented. Experimental results on multiple real-world data sets demonstrate the effectiveness of the proposed method.

#### 1. Introduction

Dimensionality reduction is to find a low-dimensional representation of high-dimensional data, while preserving data information as much as possible. Processing data in the low-dimensional space can reduce computational cost and suppress noises. Provided that dimensionality reduction is performed appropriately, the discovered low-dimensional representation of data will benefit subsequent tasks, for example, classification, clustering, and data visualization. Classical dimensionality reduction methods include supervised approaches like linear discriminant analysis (LDA) [1] and unsupervised ones such as principal component analysis (PCA) [2].

LDA is a supervised dimensionality reduction method. It finds a subspace in which the data points from different classes are projected far away from each other, while the data points belonging to the same class are projected as close as possible. One merit of LDA is that LDA can extract the discriminative information of data, which is crucial for classification. Due to its effectiveness, LDA is widely used in many applications, for example, bankruptcy prediction, face recognition, and data mining. However, LDA may get undesirable results when the labeled examples used for learning are not sufficient, because the between-class scatter and the within-class scatter of data could be estimated inaccurately.

PCA is a representative of unsupervised dimensionality reduction methods. It seeks a set of orthogonal projection directions along which the sum of the variances of data is maximized. PCA is a common data preprocessing technique to find a low-dimensional representation of high-dimensional data. In order to meet the requirements of different applications, many unsupervised dimensionality reduction methods have been proposed, such as Laplacian Eigenmaps [3], Hessian Eigenmaps [4], Locally Linear Embedding [5], Locality Preserving Projections [6], and Local Tangent Space Alignment [7]. Although it is shown that unsupervised approaches work well in many applications, they may not be the best choices for some learning scenarios because they may fail to capture the discriminative structure from data.

In many real-world applications, only limited labeled data can be accessed while a large number of unlabeled data are available. In this case, it is reasonable to perform semisupervised learning which can utilize both labeled and unlabeled data. Recently, several semisupervised dimensionality reduction methods have been proposed, for example, Semisupervised Discriminant Analysis (SDA) [8], Semisupervised Discriminant Analysis (SSDA) with path-based similarity [9], and Semisupervised Local Fisher Discriminant Analysis (SELF) [10]. SDA aims to find a transformation matrix following the criterion of LDA while imposing a smoothness penalty on a graph which is built to exploit the local geometry of the underlying manifold. Similarly, SSDA also builds a graph for semisupervised learning. However, the graph is constructed using a path-based similarity measure to capture the global structure of data. SELF combines the ideas of local LDA [11] and PCA so that it can integrate the information brought by both labeled and unlabeled data.

Although all of these methods have their own advantages in semisupervised learning, the essential strategy of many of them for utilizing unlabeled data relies on the Laplacian regularization. In this paper, we present a novel method named Semisupervised Tangent Space Discriminant Analysis (STSD) for semisupervised dimensionality reduction, which can reflect the discriminant information and a specific manifold structure from both labeled and unlabeled data. Unlike adopting the Laplacian based regularizer, we develop a new regularization term which can discover the linearity of the local manifold structure of data. Specifically, by introducing tangent spaces we represent the local geometry at each data point as a linear function and make the change of such functions as smooth as possible. This means that STSD appeals to a linear function on the manifold. In addition, the objective function of STSD can be optimized analytically through solving a generalized eigenvalue problem.

#### 2. Preliminaries

Consider a data set consisting of examples and labels, , where denotes a -dimensional example, denotes the class label corresponding to , and is the total number of classes. LDA seeks a transformation such that the between-class scatter is maximized and the within-class scatter is minimized [1]. The objective function of LDA can be written aswhere denotes the transpose of a matrix or a vector, is the between-class scatter matrix, and is the within-class scatter matrix. The definitions of and arewhere is the number of examples from the th class, is the mean of all the examples, and is the mean of the examples from class .

Define the total scatter matrix asIt is well known that [1] and (1) is equivalent toThe solution of (5) can be readily obtained by solving a generalized eigenvalue problem: . It should be noted that the rank of the between-class scatter matrix is at most , and thus we can obtain at most meaningful eigenvectors with respect to nonzero eigenvalues. This implies that LDA can project data into a space whose dimensionality is at most .

In practice, we usually impose a regularizer on (5) to obtain a more stable solution. Then the optimization problem becomes where denotes the imposed regularizer and is a trade-off parameter. When we use the Tikhonov regularizer, that is, , the optimization problem is usually referred to as Regularized Discriminant Analysis (RDA) [12].

#### 3. Semisupervised Tangent Space Discriminant Analysis

As a supervised method, LDA has no ability to extract information from unlabeled data. Motivated by Tangent Space Intrinsic Manifold Regularization (TSIMR) [13], we develop a novel regularizer to capture the manifold structure of both labeled and unlabeled data. Utilizing this regularizer, the LDA model can be extended to a semisupervised one following the regularization framework. Then we will first derive our novel regularizer for semisupervised learning and then present our Semisupervised Tangent Space Discriminant Analysis (STSD) algorithm as well as its kernel extension.

##### 3.1. The Regularizer for Semisupervised Dimensionality Reduction

TSIMR [13] is a regularization method for unsupervised dimensionality reduction, which is intrinsic to data manifold and favors a linear function on the manifold. Inspired by TSIMR, we employ tangent spaces to represent the local geometry of data. Suppose that the data are sampled from an -dimensional smooth manifold in a -dimensional space. Let denote the tangent space attached to , where is a fixed data point on the . Using the first-order Taylor expansion at , any function defined on the manifold can be expressed as where is a -dimensional data point and is an -dimensional tangent vector which gives the -dimensional representation of in . is a matrix formed by the orthonormal bases of , which can be estimated through local PCA, that is, performing standard PCA on the neighborhood of . is an -dimensional vector representing the directional derivative of at with respect to on the manifold .

Consider a transformation which can map the -dimensional data to a one-dimensional embedding. Then the embedding of can be expressed as . If there are two data points and that have a small Euclidean distance, by using the first-order Taylor expansion at and , the embeddings and can be represented as Suppose that the data can be well characterized by a linear function on the underlying manifold . Then the remainders in (8) and (9) can be omitted.

Substituting into (8), we haveFurthermore, by substituting (9) into (8), we obtain which naturally leads toSince is formed by the orthonormal bases of , it satisfies for all , where is an -dimensional identity matrix. We can multiply both sides of (12) with ; then (12) becomes to

Armed with the above results, we can formulate our regularizer for semisupervised dimensionality reduction. Consider data () sampled from a function along the manifold . Since every example and its neighbors should satisfy (10) and (13), it is reasonable to formulate a regularizer as follows:where , denotes the set of nearest neighbors of , and is a trade-off parameter to control the influences of (10) and (13).

Relating data with a discrete weighted graph is a popular choice, and there are indeed a large family of graph based statistical and machine learning methods. It also makes sense for us to generalize the regularizer in (14) using a symmetric weight matrix constructed from the above data collection . There are several manners to construct . One typical way is to build an adjacency graph by connecting each data point to its -nearest-neighbors with an edge and then weight every edge of the graph by a certain measure. Generally, if two data points and are “close,” the corresponding weight is large, whereas if they are “far away,” then the is small. For example, the heat kernel function is widely used to construct a weight matrix. The weight is computed byif there is an edge connecting with and otherwise.

Therefore, the generalization of the proposed regularizer turns out to beand is an symmetric weight matrix reflecting the similarity of the data points. It is clear that when the variation of the first-order Taylor expansion at every data point is smooth, the value of , which measures the linearity of the function along the manifold , will be small.

The regularizer (16) can be reformulated as a canonical matrix quadratic form as follows:where is the data matrix and is a positive semidefinite matrix constructed by four blocks, that is, , , , and . This formulation will be very useful in developing our algorithm. Recall that the dimensionality of the directional derivative () is . Thereby the size of is . For simplicity, we omit the detailed derivation of .

It should be noted that, besides the principle that accorded with TSIMR, the regularizer (16) can be explained from another perspective. Recently, Lin et al. [14] proposed a regularization method called Parallel Field Regularization (PFR) for semisupervised regression. In spite of the different learning scenarios, PFR shares the same spirit with TSIMR in essence. Moreover, when the bases of the tangent space at any data point are orthonormal, PFR can be converted to TSIMR. It also provides a more theoretical but complex explanation for our regularizer from the vector field perspective.

##### 3.2. An Algorithm

With the regularizer developed in Section 3.1, we can present our STSD algorithm. Suppose the training data include labeled examples belonging to classes and unlabeled examples where is a -dimensional example and is the class label associated with the example . Define , and let , be two augmented matrices extended from the between-class scatter matrix and the total scatter matrix . Note that in the semisupervised learning scenario discussed in this section, the mean of all the samples in (2) and (4) should be the center of both the labeled and unlabeled examples; that is, . The objective function of STSD can be written as follows:where is a trade-off parameter. It is clear that and . Therefore, STSD seeks an optimal such that the between-class scatter is maximized, and the total scatter as well as the regularizer defined in (17) is minimized at the same time.

The optimization of the objective function (18) can be achieved by solving a generalized eigenvalue problem: whose solution can be easily given by the eigenvector with respect to the maximal eigenvalue. Note that since the mean is the center of both labeled and unlabeled examples, the rank of is . It implies that there are at most eigenvectors with respect to the nonzero eigenvalues. Therefore, given the optimal eigenvectors , we can form a transformation matrix sized as , and then the -dimensional embedding of an example can be computed through .

In many applications, especially when the dimensionality of data is high while the data size is small, the matrix in (19) may be singular. This singularity problem may lead to an unstable solution and deteriorate the performance of STSD. Fortunately, there are many approaches to deal with the singularity problem. In this paper, we use the Tikhonov regularization because of its simplicity and wide applicability. Finally, the generalized eigenvalue problem (19) turns out to bewhere is the identity matrix and . Algorithm 1 gives the pseudocode for STSD.