Mathematical Problems in Engineering

Volume 2016, Article ID 5815429, 9 pages

http://dx.doi.org/10.1155/2016/5815429

## Parallelization of Eigenvalue-Based Dimensional Reductions via Homotopy Continuation

Institute of Electronics, Chinese Academy of Sciences, North Fourth Ring Road West 98, Beijing 100190, China

Received 12 December 2015; Accepted 22 February 2016

Academic Editor: Jinyun Yuan

Copyright © 2016 Size Bi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper investigates a homotopy-based method for embedding with hundreds of thousands of data items that yields a parallel algorithm suitable for running on a distributed system. Current eigenvalue-based embedding algorithms attempt to use a sparsification of the distance matrix to approximate a low-dimensional representation when handling large-scale data sets. The main reason of taking approximation is that it is still hindered by the eigendecomposition bottleneck for high-dimensional matrices in the embedding process. In this study, a homotopy continuation algorithm is applied for improving this embedding model by parallelizing the corresponding eigendecomposition. The eigenvalue solution is converted to the operation of ordinary differential equations with initialized values, and all isolated positive eigenvalues and corresponding eigenvectors can be obtained in parallel according to predicting eigenpaths. Experiments on the real data sets show that the homotopy-based approach is potential to be implemented for millions of data sets.

#### 1. Introduction

Dimensional reduction has been an important technology for data visualization and analysis in the age of big data. One of these methods is based on eigendecomposition used widely in observing intrinsic associations between data items by 2D graphic. With this context, the solution of eigenvalues is an essential step regarded as a multidimensional scaling (MDS) problem to establish low-dimensional embedding spaces with a distance matrix. The capacity of an implemented eigendecomposition determines what scale of data can be dealt with. However, there is a bottleneck of eigendecomposition in embedding large-scale data sets directly [1–3]. These eigenvalue-based approaches have been developed to two categories: the direct solution immediately acquires the eigenpairs (eigenvalues and associated eigenvectors) from the distance matrix [4] and the indirect solution selects a small set of landmark points to calculate the eigenvalues; then these estimated eigenpairs are fed to approximate the entire embedding [1]. All of them are operated in a serial computational mode [5], leading to some negative influences in processing massive data sets: firstly the indirect methods need to select landmark set that will break up embedded space by the error, generated from approximate calculation [1, 3]; secondly they both have low efficiency needing a balance between the representational precision and computational efficiency [4, 5]; thirdly they both are prone to omit some positive eigenvalues that reduces the maximum dimensions available for dimensional reduction [2].

This work investigates the parallelization of eigenvalue-based dimensional reductions in the context of large-scale data sets. It is aiming at providing a paralleled solution for data reduction in practice. The general frameworks are often divided into two categories [4], distance scaling and classical scaling; the methods based on the spring model of distance scaling [6] are prone to local minima and can only be used to embed data in two or three dimensions [7], while classical scaling, the eigenvalue-based approaches, [8] has been considered as a useful replacement for large graphs. To overcome the eigenvalue problem in big processing, several algebraic operations have been conducted. One of the most useful methods is the matrix approximation [1, 2, 9]. It samples a small matrix from the distance matrix to solve eigenvalues and then approximates objective embedding. This workflow is closely related between each segment of computation. For matrix sampling, some studies on the selecting strategy [3, 10–12] make the approach worse for parallelization to cut the time consumed consumption. In today’s eigencalculation technology, it is well established that eigendecomposition of large and sparse matrix can be resolved by some novel numerical solutions, such as Block Lanczos [13], PCG Lanczos [14], and homotopy [15, 16]. Resorting to these high-performance approaches may expand considerably the applications of dimensional reductions.

As to the homotopy continuation method, it is suggested as an alternative to find all isolated eigenvalues of a large matrix. It is demonstrated that there are distinct smooth curves connecting trivial solutions to desired eigenvalues [17] and then a proof of the existence of homotopy curves for eigenvalue problems is presented by [18]. Reference [19] reports the numerical experience of the homotopy for computing eigenvalues of real symmetric tridiagonal matrices. It enables eigenvalues to calculate for embedding in parallel. For the recent studies, [16] combines with a divide-and-conquer approach to find all the eigenvalues and eigenvectors of a diagonal-plus-semiseparable matrix. Reference [20] establishes some bounds to justify the Newton method in constructing the homotopy curves.

In this paper, a parallelization of eigenvalue-based dimensional reduction (HC-MDS, Homotopy Continuation Multidimensional Scaling) via the homotopy continuation method is proposed. Unlike those -based approaches, HC-MDS can directly solve most positive eigenvalues in parallel from large distance matrices (transformed from more than 60,000 data items). With the theory, we construct a homotopy continuation function to link a trivial set with the object eigenpairs. In this case, the eigenvalue problem is converted to the solution of ordinary differential equations (ODE). Each eigenpair will be obtained by tracing the ordinary differential function with corresponding initialized values that means the solutions of eigenvalues can be implemented in independent computational threads. This model will bring an advantage that allows scheduling computational resources for each eigenpair calculation, so that it can be enhanced on parallel or distributed systems.

The remainder of this report is organized as follows. Section 2 talks about the review on the eigenvalue-based MDS. Section 3 outlines the methodologies used in this paper. The experiments are presented in Section 4. Finally, we draw a conclusion in Section 5.

#### 2. Background

Most eigenvalue-based dimensional reductions involve a step of spectral decomposition for solving eigenvalues, such as LLE [21], Laplacian Eigenmaps [22], Isomap [23], and L-Isomap [1]. A more fundamental MDS by using the power method for eigenpairs is illustrated in [4]. The aim of eigenvalue calculation in embedding is to obtain eigenpairs of the inner product matrix, which is converted by the distance matrix using a “double-centering.” A low-dimensional space is then built by those positive ones among the estimated eigenvalues and corresponding eigenvectors. Without loss of generality, the general eigenvalue-based dimensional reduction is summarized as follows.

*Procedure*. Eigenvalues in embedding are shown as follows.

*Step 1. *Let denote the distance of square matrix that is symmetric and real valued. An existing inner product matrix () can be derived from () as .

*Step 2. *Implement one of eigendecomposition methods into , as ; is a diagonal matrix of eigenvalues and is a matrix whose columns are corresponding orthonormal eigenvectors.

*Step 3. * is the embedded space that can be generated by with , where is a combination of the positive eigenpairs among . Those negative and zero eigenpairs should be discarded because of the arithmetical rule of extraction of square root.

If the approximate method is applied for approximating embedding [2], the following steps should be proceeded.

*Step 4. *Assume that the inner product matrix belongs to a partition of the original inner product matrix . The remainder embedded space is approximated by with the error estimation .

*Step 5. *Let be incorporated with ; we finally obtain the entire embedded space as .

For embedding with eigenpairs, most investigations do not discuss how the eigenpairs are generated but commonly draw a conclusion that eigendecomposing for large symmetric, dense matrices is an expensive calculation. Currently the embedding technologies applied on large-scale data sets are mainly used for data visualization. Since this kind of application only needs two or three eigenvalues, the largest one and the second largest one are solved by the existing eigenvalue methods, for example, the power iteration method [4]. However, this eigenvalue approach operates in a serial way and its computation is considerably inefficient [4]. To capture more positive eigenvalues for embedding, QR has been justified in small to medium matrices [24] and all eigenvalues will be solved including negative ones and zero ones that are useless and consuming CPU time; Lanczos, a family of eigenpair solutions based on the Lanczos process, has demonstrated that it is available for millions of data items in actual engineering applications [14], but they may have the problem to complete computing procedure once related errors accumulated to a certain range [24]. Therefore, we investigate an alternative, the homotopy continuation method, for parallelization of embedding. Compared to the presented works, we employ a Runge-Kutta (R-K) method with step estimation to control tracking procedures of homotopy curves that can omit the constraints for eigenpairs tracking [20].

#### 3. Methodology

The homotopy continuation method consists of two parts: how to convert the eigendecomposition of embedding to the solution of ODE with initialized values (Section 3.1) and how to calculate each ODE of eigenpair using numerical methods (Section 3.2).

##### 3.1. Homotopy for Eigenpairs

Before constructing the homotopy function, the symmetric square matrix should be transformed to the symmetric tridiagonal matrix by the Householder transformation [20]. The eigenvalues of are equal to the eigenvalues of . We obtain the eigenvalues of through the approximate eigendecomposition to , and then the corresponding eigenvectors of can be reversed out by their eigenvalues. For eigendecomposing to the transformed matrix , we consider the eigenvalue problem aswhere is the corresponding eigenvector of the eigenvalue by . The orthogonality of eigenvectors is expressed by .

Associating (1) with the orthogonality of eigenvectors, we convert the eigenvalue problem to an optimal problem aswhere is the corresponding zero vector by .

To find the optimum set of such object function, the homotopy method has been recommended. We thus construct a homotopy function to link the object function with an auxiliary function by the Newton method [20]:where , , , and . The constant matrix needs to be user-defined, where its eigenpairs are apparently captured; for example, if , we often set as . Obviously, the diagonal elements , , and are exactly the eigenvalues of and , , are the corresponding eigenvectors, respectively.

Since there are distinct smooth curves called eigenpaths connecting the trivial solution of to the desired eigenpairs of [17], each eigenpath is expressed by an ODE with a pair of initial values aswhere the interval of is and the initial values are given by obtained from one eigenpair of . Solving (4) with all initial values given by , the eigendecomposition of can be implemented in parallel. We show the derivation of (4) from (3) as follows.

We take the derivative of (3):

Adding the initialized values, we obtain the ODE of homotopy curve as (4). The solving procedure of (4) is the solution of ODE within . When increases to 1, each eigenpair of is obtained as .

We need to transform the calculated eigenvector because the eigenvector from the homotopy method cannot be applied to build an embedded space directly. The transforming formula is described aswhere is the eigenvector of corresponding to .

Since it is proved that the eigenpaths of homotopy function are continuous and monotonous [17, 18], this brings an advantage for forecasting the trend of eigenpaths: the negative eigenvalues and undersize positive eigenvalues can be estimated by the tracking of homotopy function curves, for example, Figure 1. Specifically, each trace of eigenvalues evolution is implemented in an independent computational thread. When the trend of an ODE is predicted to be negative (red tracks), the associated computation is regarded as not effective and shut down. With this feature, it enables dynamical allocation of computing resource.