Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 917259, 13 pages

http://dx.doi.org/10.1155/2015/917259

## Local and Global Geometric Structure Preserving and Application to Hyperspectral Image Classification

^{1}Department of Computer and Information Science, University of Macau, Avenida Padre Tomas Pereira, Taipa 1356, Macau^{2}Department of Mathematics and Computer Science, Guangxi Normal University of Nationalities, Chongzuo 532200, China

Received 15 December 2014; Revised 16 March 2015; Accepted 16 March 2015

Academic Editor: Hakim Naceur

Copyright © 2015 Huiwu Luo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Locality Preserving Projection (LPP) has shown great efficiency in feature extraction. LPP captures
the locality by the *K*-nearest neighborhoods. However, recent progress has demonstrated the importance
of global geometric structure in discriminant analysis. Thus, both the locality and global geometric
structure are critical for dimension reduction. In this paper, a novel linear supervised dimensionality
reduction algorithm, called *Locality and Global Geometric Structure Preserving* (LGGSP)
projection, is proposed for dimension reduction. LGGSP encodes not only the local structure information
into the optimal objective functions, but also the global structure information. To be specific,
two adjacent matrices, that is, similarity matrix and variance matrix, are constructed to detect the local
intrinsic structure. Besides, a margin matrix is defined to capture the global structure of different
classes. Finally, the three matrices are integrated into the framework of graph embedding for optimal
solution. The proposed scheme is illustrated using both simulated data points and the well-known
Indian Pines hyperspectral data set, and the experimental results are promising.

#### 1. Introduction

Hyperspectral image (HSI) processing, as a typical application of high dimensional data analysis, has witnessed great interest among worldwide researchers [1]. The acquisition of hyperspectral image is usually concerned with analysis, measurement, understanding, and interpretation from a given scenario at different airline distance by the satellite [2]. Different HSI data poses different level of challenge to the task of data analysis. However, a common issue of HSI data is the high dimensional feature space within relative small sample size [3], which is also known as the “Hughes phenomenon.” To increase efficiency, dimensionality of HSI data must be reduced before further processing. Dimension reduction plays a significant role in HSI community [4].

Popular dimension reduction literatures can be roughly categorized into two approaches. The first one is nonlinear approaches, for example, Isomap embedding (Isomap) [5], local tangent space alignment (LTSA) [6], Laplacian eigenmaps (LE) [7], local linear embedding (LLE) [8], and so forth. The other one is linear approaches, for example, principal component analysis (PCA), linear discriminant analysis (LDA), random projection (RP) [9], Locality Preserving Projection (LPP) [10], and so forth. Furthermore, some of these methods are supervised approaches, and some of the others are unsupervised approaches.

Recently, some articles [11] pointed out that high dimensional data may rely on a submanifold that reflects the inherent geodesic structure. Under this circumstance, both PCA and LDA may fail to find the hidden manifold, whereas the nonlinear literatures, such as LLE and Isomap, have been proposed and developed to tackle this difficulty. However, the mapping of LLE and Isomap is implicit and there is no exact computational expression of new data points. That is, the projected data points of LLE and Isomap are defined on the training data points, and both methods can not directly embed a new data point in the projected space. Moreover, these methods are computationally expensive. This drawback makes these algorithms hard to be further developed and limits their massive application in various areas, especially in the hyperspectral image analysis community.

To address this issue, He and Niyogi [10] proposed the Locality Preserving Projection (LPP), which is a linear approximation of intrinsic manifold, to reduce the high dimensional facial feature vectors into a low dimensional subspace. The neighborhood relationship in LPP is pre-served in the projected submanifold. However, LPP is an unsupervised algorithm. The discriminant information is ignored. Wong and Zhao [11] proposed a supervised version LPP, where discriminant information of different classes is adopted to improve the classification performance. Vasuhi and Vaidehi [12] found that the basis of LPP in the projected space is not orthogonal. They applied an orthogonal basis to facial classification and found that the classified accuracy of orthogonal basis was better than conventional LPP. A common theme of many discriminant analysis based methods is this: by minimizing neighborhood distance from the same class, the locality based approaches utilize discriminant information to maximize the distance among data points from different classes, simultaneously preserving the intraclass compactness. The distance of adjacent data points represents the local geometrical structure of the same class, yet distance from different data points indicates the global geometrical structure of different classes. By doing so, the structure of data points in the projected space is expected to be similar to the original space.

Despite this, some articles, such as* Local and Global Structures Preserving Projection* (LGSPP) [13] and* Joint Global and Local Structure Discriminant Analysis* (JGLDA) [14], reported that besides locality, the global structure is also important. The locality can be generally captured by a Laplacian matrix that comes from neighborhood relationship, that is, adjacent graph. Moreover, the global geometric structure can also be captured by a relationship matrix, for example, penalty matrix [15], -farthest neighborhood adjacent matrix [16], or -nearest neighborhood adjacent matrix [17]. However, these methods only capture the similarity structure of data points to learn the intrinsic geometric structure (local structure). They ignore the distribution of data points, and the structure of data points in the embedded space is destroyed. Consequently, it leads to incorrect description of data structure. In most instances, a single locality is insufficient for describing the intrinsic geometry of data points. Thus, it will be more discriminative if both local and global statistic properties are integrated to describe the geometry of data points.

Motivated by these factors, in this paper, we proposed a novel approach, that is,* Locality and Global Geometric Structure Preserving* (LGGSP) projection, that makes use of not only the local structure, but also the global structure of data points, to reduce the dimensionality of feature vectors. Specifically, we focus on the global distribution of data points, where the local structure is characterized by the similarity and the diversity of samples from the same class, respectively. Besides, the global structure is characterized by the margin of different samples. To achieve the goal of discovering both local and global structures hidden in data points, we first define three optimization functions. Then we solve them in the framework of graph embedding to make the LGGSP algorithm supervised. And finally, a linear transformation is found by utilizing the principle of discriminant analysis.

The rest of this paper is organized as follows. Section 2 provides a brief analysis of basic discriminant techniques. Proposed LGGSP is presented in Section 3. Results of synthetic data sets and real hyperspectral image data are presented in Section 4. Finally, concluding remarks and discussion are drawn in Section 5.

#### 2. Related Works

Before further discussion, some of the notations that will be used throughout this paper are listed in Notations section.

A brief review of discriminant analysis techniques, for example, Locality Preserving Projection (LPP) [10] and discriminant analysis [13, 14], are provided in this section. To facilitate the following discussion, we start with a supervised learning problem. Suppose that the -dimensional data set , is distributed on a -dimensional submanifold (). And this data set belongs to classes with class labels , respectively. Let be the samples number of class ; then . We are expected to find a transformation , that projected the -dimensional data points to -dimensional data points with the goal of preserving the data structure without losing any information needed. The notation represents the transpose of a matrix or a vector. Thereby, the problem at hand is how to evaluate the data model and formulate the objective transformation .

##### 2.1. Locality Preserving Projection

LDA aims to learn a global structure that separates samples efficiently. Nevertheless, for most real world applications, the local structure of neighborhood is also important. Locality Preserving Projection (LPP) is a graph based subspace learning algorithm, where the neighborhood structure will be preserved in the projected space. To achieve this goal, a weighted graph is constructed, where represents the vertex set, denotes the edges of connected data points, and is the similarity weight that characterizes the likelihood of pairwise data points.

For a new coming point , LPP defines a transformation in the mapping space; that is, . Then the criterion function of LPP becomeswhere is the similarity matrix of two data points. If two neighboring data points and are mapped far away, then incurs a heavy penalty. This property ensures that adjacent data points stay as close as possible in the embedded space.

By simple algebra formulation, it can be deduced from (1) thatwhere is a diagonal matrix whose entries are column (or row) sums of ; that is, . represents Laplacian matrix, which is the discrete approximation of Laplace-Beltrami operator on compact Rimannian manifold [11]. Naturally, the matrix provides a measure on the data points. The importance of is relevant to the value of . To make a uniform measurement and remove the arbitrary scaling factor in the embedding, LPP imposes an additional constraint:This constraint is joined into the objective function. Finally the minimal problem is reduced toThe solution can be gained by solving a generalized eigenvector decomposition:

Let be the -smallest eigenvalue of (5) with ascending order, that is, , and the corresponding eigenvectors. Then the solution of (4) is given byFor a new testing instance , the new data points in the embedding space are given by

LPP can significantly find a projection that preserves the data structure. However, due to its unsupervised nature, data points that are close to boundaries may even be put closely in the projected space. In fact, these points may belong to different classes. Besides, LPP only makes use of nearest neighborhoods, and the global geometric structure is fully ignored in the calculation procedure. This drawback makes this algorithm apt to overfit the training samples. From the above analysis, we can see that LPP is sensitive to noise for those defective samples. For this reason, LPP congenitally has some deficiencies on learning ability and robusticity.

##### 2.2. Laplacian Discriminant Analysis

As an extension of discriminate analysis, the efficiency of Laplacian linear discriminant analysis (LapLDA) has been proved by many studies [18]. The common behavior of these approaches is that an adjacent graph is employed to model the geometrical structure of the intrinsic manifold [19]. There are two popular approaches to conduct the adjacent matrix, of which the first approach is by adopting -nearest neighborhood (the NN approach), and the other one is by placing an edge on two data points within a controllable Euclidean distance (the -neighborhood approach). LapLDA depicts the locality by the following quadratic function:where denotes the “weights” of connected points with the following definition:The notation in (9) denotes the neighbors of . By this definition, the smaller the distance between two connected neighborhoods, the bigger the “weight” they arise, and the closer the distance they should keep in the mapped space. Nevertheless, (8) also enforces data points with bigger distance to be closer in the low dimensional subspace, which may bring chaos to the structure between connected data pairs.

To cope with this issue, some researchers proposed a novel approach that integrates both global and local structure into the objective function [14]. In order to construct a reasonable locality adjacent matrix, the typical global structure of neighborhood data points can be presented by the following penalty matrix :where denotes the -farthest neighborhood of and is the square Euclidean distance of two points and , respectively.

#### 3. Proposed Methodology

The structure of HSI data is very complex; hence it is insufficient to represent HSI data using only global property or local property. To model the complex HSI data, a novel approach, which preserves both the local and global geometric structure of data samples, is proposed in this section. The new approach is called* Locality and Global Geometric Structure Preserving* (LGGSP) projection. Detailed motivation and formulation are given below.

##### 3.1. Capturing the Local Structure of Intraclass Samples

Inspired by [11, 14], the local structure in LGGSP is described by two adjacent matrices, that is, the similarity matrix and the diversity matrix. To model the local structure, two adjacent graphs, that is, and , are adopted to model the diversity and similarity over the whole training data samples from the same class, where the notation is the whole training samples, is the diversity matrix, and is the similarity matrix, respectively. reflects the variance of nearby data points, and characterizes the similarity among nearby data points.

To make samples more separable, we define a sophisticated similarity matrix:where is the class prior probabilities of the th class, the slack parameter, and the Frobenius norm.

Statistically, if two samples and are very close, that is, is small, then the distance between them is also small, and the similarity should be large enough in the embedding space. In contrast, if is large, which implies that they prefer to be dissimilar in distance, the corresponding similarity will be small. Note that, in (11), the class prior is imposed to ensure that they have the same class prior probability in the embedded space.

On the other hand, to measure the distribution of nearby data points, diversity is introduced. Different from the similarity matrix, the numerical value of diversity between two connected samples with large distance will be large. On the other hand, diversity of two connected samples with small distance should be small. This property explicates the trivial diversity of two adjacent points from the same class. Thus, the diversity matrix can be defined as follows:where the notations of and are the free tuning parameters.

Now consider the problem of mapping the original HSI data to a line so that the connected points from the same class can be preserved. Let be such mapped point from . A reasonable criterion for selecting the “good” mapping should be the one that optimizes the following two objective functions:Note that (13) incurs a heavy penalty on the within-class graph if two adjacent points and , which are close to each other, are mapped far apart, yet in fact they are from the same class. Similarly, (14) incurs a heavy penalty on the within-class graph if two neighboring points and are mapped close enough, that is, a single point, whereas they share the same label. Hence, minimizing (13) is to ensure that neighboring points which have the same label are also close in the embedding space. Simultaneously, maximizing (14) can prevent overfitting problem and the variation can also be preserved in the projected space. The limitation of (13) is that it may enforce connecting points with large distance to be very close to each other in the reduced space and lead to violations of topological structure preserving. By the constraint of (14), the situation may be alleviated.

By integrating (13) and (14) together, the structural topology can be approximately preserved in the embedding space. That is, connected data points with larger distance prefer to be larger. Simultaneously samples with small distance can be kept close enough in the embedded space. Thus, the local structure can be preserved under the objective functions of (13) and (14), respectively.

##### 3.2. Capturing the Global Structure of Interclass Samples

To capture the global structure from different classes, an adjacent graph is constructed over the whole training samples. The notation denotes the weight matrix of graph , and it is a variation (i.e., margin or distribution) of the connected samples from different classes on the entire training data set. Similar to local Fisher’s goal [20], we do not “weight” the value of different samples from different classes. The reason behind this is that, since we want to separate the samples from different classes at maximum, the affinity in the original feature space will be ignored in the embedded subspace. To encode the discriminant information into the variation matrix , the elements of can be defined as

Now consider the problem of mapping HSI data to a line so that connected data points in adjacent matrix stay as far as possible. In order to encode the discriminant information, a reasonable mapping can be found by optimizing the following function:Note that the objective function of (16) on the between-class graph will incur a heavy penalty when two neighboring points and are mapped close enough, despairing the fact that the labels of two connected points and are actually different from each other. In this case, maximizing (16) will enforce the corresponding mapped points and to keep far apart. Thus, the global geometric structure of interclass samples could be well detected by (16).

##### 3.3. Optimal Solution

Let and be the connected points in the original space, the projected direction, and the embedded points; that is, and , respectively. To solve the objective functions of (13), (14), and (16) in the Laplacian graph embedding framework, we substitute into the three functions. For simple algebraic formulation, the three objective functions can be written asLikewise,where

The notations , , and represent the -dimensional diagonal matrices whose th diagonal element isrespectively. The matrices of , , and are the Laplacian matrices in graph embedding [15].

Now let us join the three objective functions of (17) and (18) into one objective function, and the final optimal problem reduces to findingwhereThe notations , , and in (22) represent the nonnegative constants that balance the “importance” on each criterion, where , , and . In the whole experiments, we take the value of , , and . Moreover, the notations and represent the between-class scatter matrix and within-class scatter matrix, respectively.

Note that optimizing the problem of (21) will lead to a generalized eigenvalue decomposition problem:Let the column vector be the solution of (23), where the column vectors are corresponding to the eigenvalues that are ordered by . Then the optimal projected direction of LGGSP is given bywhere is a -dimensional vector, is a projected direction, and is the original high dimensional point.

#### 4. Experiments

##### 4.1. Experiment on Synthetic Data Sets

To illustrate the effectiveness of proposed LGGSP algorithm, five synthetic data sets were investigated, that is, a toy example, “tulip” data set, “ripley” data set, a generated multimodal example, and a “two-moon” data set. Seven methods, that is, LDA [21], PCA [22], LPP [10], MFA [23], LGSPP [13], JGLDA [14], and proposed LGGSP algorithm, were compared. There are 100 test samples for the toy example, 100 test samples for “tulip,” 1000 test samples for “ripley,” 200 test samples for the multimodal example, and 100 test samples for the “two-moon” data set. All algorithms were implemented in Matlab language and all computations are carried out on an Acer Aspire-5750G laptop with i7-2670QM processor (2.2 GHz) and Ubuntu 12.04.1 LTS (-bit version) operating system.

Figure 1 shows the results of a simple case, that is, two classes, for the first 3 test data sets. Several conclusions can be extracted from these examples. First of all, LDA, MFA, JGLDA, and proposed LGGSP algorithms work quite well on a simple linear separable toy example. All algorithms produce comparable results on the “tulip” data set. For the “ripley” data set, only LPP, LDA, JGLDA, and proposed LGGSP find the optimal direction. The three examples indicate the robustness of proposed LGGSP algorithm.