Computational Intelligence and Neuroscience

Volume 2017, Article ID 2658707, 14 pages

https://doi.org/10.1155/2017/2658707

## Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection

^{1}Guangxi Key Laboratory of Cryptogpraphy and Information Security, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China^{2}State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China^{3}State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450002, China^{4}National University of Defense Technology, Nanjing 210012, China

Correspondence should be addressed to Mao Ye; moc.361@cgxxoamey

Received 7 March 2017; Accepted 1 August 2017; Published 25 September 2017

Academic Editor: Diego Andina

Copyright © 2017 Wenfen Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted -means clustering and thus gives the theoretical guarantee to this special kind of -means clustering where each point has its corresponding weight.

#### 1. Introduction

With the arrival of the big data era, data has become an important asset. How to analyse the large scale data efficiently is becoming a big challenge [1, 2]. As an underlying method for data analysis, clustering can partition a data set into several subsets according to the similarities of points [3], and it has become a basic tool for image analysis [4, 5], community detection [6, 7], disease diagnosis [8], and so on. Therefore, more and more attention has been paid to the design of efficient and effective clustering algorithms.

Constrained clustering can improve the accuracy of the clustering result via encoding constraint information into unsupervised clustering. As an important area of clustering, many constrained clustering algorithms [9–17] have been proposed. Since spectral clustering often has high clustering accuracy and the suitability for a wide range of geometries [18, 19], constrained spectral clustering (CSC) [11–17] can usually have better performance than other constrained clustering algorithms. However, the space complexity and time complexity of many CSC algorithms [11–15] restrict their applications over large scale data sets, where is the number of data points. The most efficient CSC algorithm known is SCACS algorithm [16], which reduces the space and time complexities to be linear with through incorporating the landmark-based graph construction [20, 21] with the constrained normalized cuts problem [15]. What is needed to be noticed is that the constrained normalized cuts problem [15] makes SCACS algorithm solve the generalized eigenvector problem twice. In 2016, Cucuringu et al. [17] proposed a new CSC algorithm with better accuracy and shorter running time empirically than constrained normalized cuts problem. Taking a new encoding technique of constraint information, the new CSC model just needs the computation of eigenvectors once.

By means of integrating many basic partitions into a unified partition, ensemble clustering has many excellent properties such as the improvement of clustering quality, the robustness and stability of clustering results, the handling of noise, the reuse of knowledge [3], and the suitability to multisource and heterogeneous data [22]. Researchers have proposed many ensemble clustering algorithms [22–29]. Since there are different notations in different literatures, we call the integration of basic partitions as ensemble clustering or consensus clustering and call the union of the stages of basic clustering and ensemble clustering as cluster ensemble in the following. Among different ensemble clustering methods, the method based on coassociation matrix has become a landmark [22]. Specifically, the coassociation matrix is constructed to represent the similarities of pairs of points from the basic partitions and the final partition result is computed via the graph partition method on the matrix. Thus, this kind of method suffers from the high space and time complexity. Recently, Liu et al. [22] transformed spectral clustering on coassociation matrix to weighted -means clustering over specific binary matrix equivalently, which decreased the space and time complexities vastly. However, when the number of basic partitions or clusters is large, the corresponding binary matrix will be high dimensional.

As the seminal work, Johnson and Lindenstrauss [30] pointed out that the random projection produced by random orthogonal matrix could preserve the pairwise distances of data sets approximately with reduced dimensions. Subsequently, a lot of researches constructed more matrices with the above properties: random Gaussian matrix [31], random sign matrix [32], random matrix based on randomized Hadamard transform [33], random matrix based on block random hashing [34], and so on. In addition, dimensionality reduction with random projection has also been widely applied to data mining methods such as classification [35], clustering [36–38], and anomaly detection [39]. In terms of object function, there are several works [36–38] to prove that random projection can maintain the accuracy of -means clustering approximately. Since its objective function is different from that of -means clustering, the theoretical analysis of the influence of random projection on weighted -means clustering is still scarce.

*Our Contribution*. In this paper, our contributions can be divided into three parts: the first part is the proposition of a fast CSC algorithm which is suitable for a wide range of data sets; the second part is the analysis of the effect of random projection on the spectral ensemble clustering; the third part is the proposition of a scalable semisupervised cluster ensemble algorithm. More specifically, the contributions are as follows:(i)We propose a fast CSC algorithm whose space and time complexities are linear with the size of a data set: we compress the size of the original model proposed by Cucuringu et al. [17] by the encoding of landmark-based graph construction and improve the efficiency further via random sampling in the process of -means clustering. Besides, we prove that the new CSC algorithm will have the comparable clustering result of the original model asymptotically. Experimental results show that the new algorithm not only can utilize the constraint information effectively, but also costs less running time and fits a wider range of data sets compared to the state of the art SCACS method.(ii)With respect to the difference of objective function caused by random projection, we give a detailed proof that random projection can keep the clustering quality of spectral ensemble clustering within a small factor. Based on this theoretical analysis, we design a spectral ensemble clustering algorithm with reduced dimensions caused by sparse random projection. Experiments over different data sets also verify the correctness of our theoretical results. Moreover, since the theoretical analysis is also suitable for the ordinary weighted -means clustering, the influence of random projection on weighted -means clustering is also obtained.(iii)We propose a scalable semisupervised cluster ensemble algorithm through the combination of the fast CSC algorithm and spectral ensemble clustering algorithm with random projection. The efficiency and effectiveness of the new cluster ensemble algorithm are also demonstrated theoretically and empirically.

The remainder of our paper is organized as follows. In Section 2, we introduce the CSC model of Cucuringu et al. [17], landmark-based graph construction, and two related components in our cluster ensemble algorithm: spectral ensemble clustering and random projection. In Section 3, we present our fast CSC algorithm and give its asymptotic property. Then, the algorithm formulation and theoretical analysis of spectral ensemble clustering with random projection are displayed in Section 4. In Section 5, we show the experiment results of our algorithms. Finally, we draw the conclusions of the article and put forward the future directions in Section 6.

#### 2. Preliminaries

In this section, we present the CSC algorithm proposed by Cucuringu et al. [17] and introduce landmark-based graph construction [20, 21] which will be applied to our fast CSC algorithm. In addition, we also introduce spectral ensemble clustering algorithm [22] and sparse random projection [34] which can be used to speed up the spectral ensemble clustering.

##### 2.1. Constrained Spectral Clustering

Here, we first introduce the notion of undirected graph which is very important in constrained spectral clustering and then show the CSC model proposed by Cucuringu et al. [17].

Let be an undirected graph, where is the vertex set, is the edge set, and is the weight set with respect to the edges. is specially the nonnegative weight of the edge between the vertices and , indicating the level of “affinity” between and . If , there is no edge between the vertices and . We denote as the Laplacian matrix of , where the diagonal entry of diagonal matrix is ; is an adjacency matrix with .

The constrained spectral clustering has three undirected graphs: one data graph and two knowledge graphs and . In data graph , each weight indicates the similarity level of vertices in the corresponding edge. The “must link” (ML) graph gives the “must link” information of vertices: each edge in indicates that the corresponding vertices should be in the same group and the level of “must link” belief is described by the weight. The “cannot-link” (CL) graph has analogous components to . The values of weights in the two knowledge graphs are both nonnegative and set according to the constraint information such as prior knowledge. For example, assuming that the range of value of weight is set from 0 to 1, if we have known that points , are in the same group, their corresponding weight . If we only have confidence in the constraint information that the two points are in the same group, the weight , and if we have no constraint information about these two points, .

Viewing pairwise similarities of vertices as the implicit ML constraints declaration, Cucuringu et al. [17] defined a generalized ML graph where is the level of trust for ML constrains. Let be the number of clusters and be the indicator vector of cluster such that if the th data point belongs to cluster and otherwise. In order to violate as few ML constraints as possible and meet as many CL constraints as possible, the constrained way cuts problem [17] can be described as

To solve the problem in (1) approximately, Cucuringu et al. [17] relaxed the condition “” to be the real vectors. Thus, the solution vectors of the relaxed problem are the first nontrivial generalized eigenvectors of the problem After getting the generalized eigenvectors, an additional embedding phase embeds the row vectors of eigenvectors matrix onto the -dimensional sphere and gives the theoretical guarantees of clustering results. The detailed embedding procedures can be accessed in [17]. However, the construction cost and storage cost of data graphs for large scale data sets are both huge (). What is more, if the number of iterations in the process of -means clustering on the embedded eigenvectors matrix is great, the process will also be time-consuming over large scale data sets.

##### 2.2. Landmark-Based Graph Construction

Based on sparse coding theory [40], the landmark-based graph construction [20, 21] scales linearly with the number of data points and can suit large scale data sets very well.

Let data set be and the row vector of be data points; sparse coding problem is defined as follows: where each column vector of is the basis vector, column vectors of are the representations of data points over and is the number of basis vectors. To avoid the high time complexity of solving sparse coding problem, landmark-based graph construction just samples points randomly from input data as basis vectors. In the process of computing , if is among the nearest basis vectors of data points , can be computed as where is the indices set of the nearest basis vectors of and is Gaussian kernel function with bandwidth ; otherwise .

After obtaining the sparse representation , graph affinity matrix is constructed as follows: where and is a diagonal matrix with diagonal entry . Since Chen and Cai [20, 21] have pointed out that was automatically normalized, the normalized graph Laplacian matrix for is . Considering , the time of computing is much less than the time of the nearest neighbors graph construction.

##### 2.3. Spectral Ensemble Clustering

To gain the unified results from different basic partitions, spectral ensemble clustering applies spectral clustering to the coassociation matrix [24] derived from basic partitions. In 2015, Liu et al. [22] transformed spectral ensemble clustering into weighted means clustering over specific binary matrix. This transformation decreased the time and space complexities effectively and our new ensemble clustering method is based on this nice transformation.

Given basic clustering results of data set ; the coassociation matrix is constructed in the following way: where is the label of in the th clustering result , and

Viewing this coassociation matrix as adjacency matrix, spectral ensemble clustering uses spectral clustering to get final clustering result. In the process of the transformation from spectral clustering to weighted -means clustering, binary matrix [22] is built as follows: where , if , and otherwise; “” indicates a row vector. The following lemma [22] presents the connection between spectral ensemble clustering and weighted -means clustering.

Lemma 1 (see [22]). *Given a basic partitions set , let the corresponding coassociation matrix be , the diagonal matrix whose diagonal elements are sums of rows of be , and the diagonal element set of be . Then normalized cuts spectral clustering on coassociation matrix has equivalent objective function to weighted -means clustering on data sets with weight set .**Through Lemma 1, the space and time complexities of spectral ensemble clustering can be decreased dramatically. However, when the number of basic partitions and cluster number are large, the binary matrix will be a high dimensional data set, resulting in long running time for weighted -means clustering.*

##### 2.4. Random Projection

Recently, random projection has become a common technique of dimensionality reduction [36–39, 41]. Random projection often has low computing complexity and can preserve the structure of original data approximately. In this paper, we use the sparse random projection proposed by Kane and Nelson [34]. When most of the elements of data are zero, the sparse random projection can utilize the sparsity of data effectively and speed up the process of dimensionality reduction.

Lemma 2 (see [34]). *For any , , there exists an sparse random matrix , where and , such that for any fixed **And the random matrix can be constructed as follows: where matrix () is a sparse matrix with nonzero elements , is a random hashing such that for , and matrix is a diagonal matrix with .*

The number of nonzero (nnz) elements of sparse random matrix is , and the time complexity of is nnz. Lemma 2 implies that the sparse random projection can preserve the length of data points approximately. Thus, for data points, since there are pairwise distances, we can conclude that the pairwise distances squares can be preserved within a factor of with .

#### 3. Fast Constrained Spectral Clustering Framework

In this section, we introduce our fast CSC framework for large scale data sets. Inspired by [20, 21], we also try to compute the sparse representation and obtain the approximate adjacency matrix , where , and . Then, our fast framework decreases the size of graph Laplacian through the above approximate graph reconstruction. At last, we analyse the asymptotic property of our new CSC algorithm.

##### 3.1. Framework Formulation

To get the generalized eigenvector approximately, we can let , where is the sparse representation in (5) and . Thus, bringing the back to (1) can decrease the size of problem apparently if .

Specifically, we use to denote constraint matrix, where if edge , if edge , and otherwise. Let adjacency matrix be computed approximately by . Next, bring into (1) and relax their solution over real vectors. Thus, we reformulate the original problem as the following problem.

*Problem 3. *One hasTo obtain shorthand notations, we denote by and denote by . Thus, the first nontrivial generalized eigenvectors of the problem are the solution vectors of (11).

In order to speed up the -means clustering on the embedded eigenvector matrix, we sample row vectors of eigenvectors matrix randomly and get centers through -means clustering over the selected row vectors. According to the distances between centers and row vectors, we can partition all the row vectors into different clusters. Cucuringu et al. [17] have pointed out that the specific embedding process after getting the generalized eigenvectors can concentrate the row vectors of eigenvector matrix onto the -dimensional sphere and a simple partition algorithm such as -means clustering can be applied to get the final clustering result. Since random sampling is a popular scalability method for -means clustering [42], we will take it to improve the efficiency of the clustering on the row vectors of eigenvector matrix. The experimental results in Section 5 also show that random sampling has little influence on the clustering results and makes the algorithm more efficient than the original one.

Our fast CSC framework is shown in Algorithm 1. In our new algorithm, parameter (in of Step ()) stands for the trust level on constraint information. Since the of the original problem (see (2)) has been taken to a constant in the previous work [17], we also set as a constant.