Mathematical Problems in Engineering

Volume 2015, Article ID 241436, 18 pages

http://dx.doi.org/10.1155/2015/241436

## Enhancing Both Efficiency and Representational Capability of Isomap by Extensive Landmark Selection

School of Mathematics and Statistics and Institute for Information and System Science, Xi’an Jiaotong University, Xi’an 710049, China

Received 24 November 2014; Accepted 20 February 2015

Academic Editor: Wanquan Liu

Copyright © 2015 Dong Liang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The problems of improving computational efficiency and extending representational capability are the two hottest topics in approaches of global manifold learning. In this paper, a new method called extensive landmark Isomap (EL-Isomap) is presented, addressing both topics simultaneously. On one hand, originated from landmark Isomap (L-Isomap), which is known for its high computational efficiency property, EL-Isomap also possesses high computational efficiency through utilizing a small set of landmarks to embed all data points. On the other hand, EL-Isomap significantly extends the representational capability of L-Isomap and other global manifold learning approaches by utilizing only an available subset from the whole landmark set instead of all to embed each point. Particularly, compared with other manifold learning approaches, the data manifolds with intrinsic low-dimensional concave topologies and essential loops can be unwrapped by the new method more successfully, which are shown by simulation results on a series of synthetic and real-world data sets. Moreover, the accuracy, robustness, and computational complexity of EL-Isomap are analyzed in this paper, and the relation between EL-Isomap and L-Isomap is also discussed theoretically.

#### 1. Introduction

Nonlinear dimensionality reduction (NLDR) is an attractive topic in many scientific fields [1–4]. The task of NLDR is to recover the latent low-dimensional structures hidden in high-dimensional data [5–7]. In many areas of artificial intelligence and data mining, the encountered high-dimensional data are intrinsically distributed on a smooth, low-dimensional manifold. The NLDR problem on such data is specifically called “manifold learning” problem [8, 9]. In recent years, there have emerged many manifold learning approaches [10–13] which are applied to many real-world application problems (e.g., hyperspectral imaging classification [14] and object tracking [15]), aiming at discovering the intrinsic geometric representations of the nonlinear data manifolds. Based on the intrinsic construction principles, these approaches can be divided into two categories: global and local approaches. Global approaches, such as Isomap [1] and CDA [10], attempt to preserve geometry at both local and global scales, essentially constructing entire isometric corresponding between all data pairs in the original and latent spaces. Local approaches, such as LLE [2] and Laplacian eigenmaps [11], attempt to preserve the local geometry of the data, intrinsically keeping invariance between all local areas in the original and latent spaces.

Compared with the local approaches, the global approaches are better in terms of giving more faithful global geometric representations and being more understandable on metric-preserving construction principles. Yet they mainly lose on two points [9]: (1) computational efficiency: the related algorithm of a global approach may be too expensive when the data are of large size, while, for the local approach, only sparse matrix computations are involved, yielding an acceptable polynomial speedup; (2) representational capacity: a global approach cannot consistently take effect except when the input data are uniformly distributed on the manifold with the intrinsic topology of a convex region in the latent space, while, for the local approach, a more extensive range of manifolds is available.

Corresponding to the above two points, the two topics have attracted more and more attention recently, which are improving the computational speed and extending the available range of the global approaches, such that both performances could be comparable or in excess of those of the local approaches. Recently, both topics have been developed to a certain extent independently. The most typical work addressed to the first topic is the landmark Isomap (L-Isomap [9, 16]), which approximates the global computation on the whole data set by calculations on a much smaller subset (consisting of landmark points). The most prominent property of L-Isomap is that it significantly decreases the computational complexity of Isomap, under the condition that global geometric structures can still be well preserved. And also several extensive methods have been proposed for the second topic. Conformal Isomap extends Isomap to be applicable to the certain curved and offset data manifolds [9]; local MDS specially give an extension of CDA to let it be applicable to data set lying on the sphere manifold, through compromising the trustworthiness of the visualization and continuity of the mapping so as to split the sphere into two adjacent discs [17]. By building a neighborhood graph of the data to represent the underlying manifold in advance and then finding the maximum subgraph to tear or cut the manifold, the method presented in [18] can extend geodesic-distance-based approaches (including Isomap and CDA) to be available on some data manifolds with loops and having holes. By virtue of techniques of graph theory, several methods have also been proposed recently to let global approaches be extensively effective on multicluster manifolds [19–21].

The main purpose of this paper is to present a new method, addressed on both topics simultaneously. Similar to the L-Isomap, the new method also utilizes specific landmark set to embed the new input data, due to which it can be seen as an extension of L-Isomap, therefore called extensive L-Isomap or EL-Isomap. It is common knowledge that the EL-Isomap can have the similar high efficiency of L-Isomap by using landmark subset as the reference to embed the whole data set. However, the distinctions between L-Isomap and EL-Isomap in motivations, algorithms, and theoretical foundations lead to significantly different performance of the applications. The simulation results show that EL-Isomap considerably extends the range of manifolds, on which the original global approaches (including L-Isomap) take effect. The typical two examples are the data manifolds with loops and the ones with intrinsic topology of concave regions in the low-dimensional space. The synchronous improvement on both topics makes EL-Isomap distinguished from other global approaches, which is evidently verified by the simulations implemented on a series of synthetic and real-world data sets.

In summary, the proposed method has mainly the following threefold contributions in manifold learning. First, it essentially extends the available range of current manifold learning techniques and can be effectively utilized in data lying on loopy manifold, concave structured manifold, and others with complex manifold configurations. The new method thus possesses the advantage owned by many local manifold learning approaches. Second, by calculating and utilizing the geodesic distance across the entire manifold under mathematical deductions, the new method is capable of keeping global low-dimensional structure under the whole manifold. It thus inherits the advantage of global manifold learning approach, especially those geodesic-distance-based ones. Furthermore, the proposed method guarantees a low computational complexity in implementation, which is comparable to current most efficient manifold learning techniques. All these contributions have been theoretically evaluated or empirically substantiated through experiments.

This paper is organized as follows: Section 2 presents the new global approach, by virtue of comparing with L-Isomap in different viewpoints; Section 3 introduces specific strategies for landmark selection of EL-Isomap; the simulation results on synthetic and real-world data sets are demonstrated in Section 4; some discussions and conclusion are given finally.

#### 2. From L-Isomap to EL-Isomap

Since being presented in [9], L-Isomap method has attracted many attention due to its high efficiency in applications. This prominent property is attributed to the utilization of the landmark subset, which is the common process in algorithms of L-Isomap and EL-Isomap. However, in essence, the two methods have significant difference in algorithm, theory, and application. In this section, EL-Isomap is presented by comparisons with L-Isomap in motivation, algorithm, reasonability, and computational complexity. The relation between two methods is also analyzed.

##### 2.1. Motivations of L-Isomap and EL-Isomap

As mentioned in the first section, the initial motivation of L-Isomap is the first topic, that is, improving the computational efficiency of the global approaches. By approximating a large global computation through calculations of a much smaller set, L-Isomap significantly decreases the computational complexity of Isomap to almost linearly increasing with the number of input data set, which makes L-Isomap comparable to the local approaches in this point.

The main motivation of EL-Isomap is changed to the second topic, that is, enlarging the range of data manifolds, on which the global approaches can implement effective manifold learning. Furthermore, the algorithm also inherits the high efficiency property of L-Isomap.

Particularly, the construction of EL-Isomap is motivated heuristically by the following facts. The related information utilized by Isomap is the estimated geodesic distances between all data pairs, while, for the L-Isomap, those become the estimated geodesic distances between all data and the landmarks, a small subset of the original data set. Except the reduction of computational complexity, this also brings another extra advantage to L-Isomap: even when some of the geodesic distances (between nonlandmarks) are impossible, or not easy, or not faithful to be estimated (as the geodesic distance between A and B in Figure 1), L-Isomap can still take effect. So far, the first advantage is confirmed and emphasized in the applications of L-Isomap. Yet the second one is still exhibited very limitedly and almost ignored by the users of the method [9, 16]. The reason is that the extensive effective range brought by L-Isomap is still not conspicuous. Through developing the EL-Isomap, which utilizes the information decreased to the estimated geodesic distances between each input datum and part of the corresponding landmarks, we aim at highlighting the second advantage to let it also be a focus of attention.