Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection

Liu, Wenfen; Ye, Mao; Wei, Jianghong; Hu, Xuexian

doi:https://doi.org/10.1155/2017/2658707

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Preliminaries Conclusion Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2017 | Article ID 2658707 | https://doi.org/10.1155/2017/2658707

Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection

Wenfen Liu,^1,2,3Mao Ye ,⁴Jianghong Wei,³and Xuexian Hu³

Academic Editor: Diego Andina

Received07 Mar 2017

Accepted01 Aug 2017

Published25 Sept 2017

Abstract

Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted -means clustering and thus gives the theoretical guarantee to this special kind of -means clustering where each point has its corresponding weight.

1. Introduction

With the arrival of the big data era, data has become an important asset. How to analyse the large scale data efficiently is becoming a big challenge [1, 2]. As an underlying method for data analysis, clustering can partition a data set into several subsets according to the similarities of points [3], and it has become a basic tool for image analysis [4, 5], community detection [6, 7], disease diagnosis [8], and so on. Therefore, more and more attention has been paid to the design of efficient and effective clustering algorithms.

Constrained clustering can improve the accuracy of the clustering result via encoding constraint information into unsupervised clustering. As an important area of clustering, many constrained clustering algorithms [9–17] have been proposed. Since spectral clustering often has high clustering accuracy and the suitability for a wide range of geometries [18, 19], constrained spectral clustering (CSC) [11–17] can usually have better performance than other constrained clustering algorithms. However, the space complexity and time complexity of many CSC algorithms [11–15] restrict their applications over large scale data sets, where is the number of data points. The most efficient CSC algorithm known is SCACS algorithm [16], which reduces the space and time complexities to be linear with through incorporating the landmark-based graph construction [20, 21] with the constrained normalized cuts problem [15]. What is needed to be noticed is that the constrained normalized cuts problem [15] makes SCACS algorithm solve the generalized eigenvector problem twice. In 2016, Cucuringu et al. [17] proposed a new CSC algorithm with better accuracy and shorter running time empirically than constrained normalized cuts problem. Taking a new encoding technique of constraint information, the new CSC model just needs the computation of eigenvectors once.

By means of integrating many basic partitions into a unified partition, ensemble clustering has many excellent properties such as the improvement of clustering quality, the robustness and stability of clustering results, the handling of noise, the reuse of knowledge [3], and the suitability to multisource and heterogeneous data [22]. Researchers have proposed many ensemble clustering algorithms [22–29]. Since there are different notations in different literatures, we call the integration of basic partitions as ensemble clustering or consensus clustering and call the union of the stages of basic clustering and ensemble clustering as cluster ensemble in the following. Among different ensemble clustering methods, the method based on coassociation matrix has become a landmark [22]. Specifically, the coassociation matrix is constructed to represent the similarities of pairs of points from the basic partitions and the final partition result is computed via the graph partition method on the matrix. Thus, this kind of method suffers from the high space and time complexity. Recently, Liu et al. [22] transformed spectral clustering on coassociation matrix to weighted -means clustering over specific binary matrix equivalently, which decreased the space and time complexities vastly. However, when the number of basic partitions or clusters is large, the corresponding binary matrix will be high dimensional.

As the seminal work, Johnson and Lindenstrauss [30] pointed out that the random projection produced by random orthogonal matrix could preserve the pairwise distances of data sets approximately with reduced dimensions. Subsequently, a lot of researches constructed more matrices with the above properties: random Gaussian matrix [31], random sign matrix [32], random matrix based on randomized Hadamard transform [33], random matrix based on block random hashing [34], and so on. In addition, dimensionality reduction with random projection has also been widely applied to data mining methods such as classification [35], clustering [36–38], and anomaly detection [39]. In terms of object function, there are several works [36–38] to prove that random projection can maintain the accuracy of -means clustering approximately. Since its objective function is different from that of -means clustering, the theoretical analysis of the influence of random projection on weighted -means clustering is still scarce.

Our Contribution. In this paper, our contributions can be divided into three parts: the first part is the proposition of a fast CSC algorithm which is suitable for a wide range of data sets; the second part is the analysis of the effect of random projection on the spectral ensemble clustering; the third part is the proposition of a scalable semisupervised cluster ensemble algorithm. More specifically, the contributions are as follows:(i)We propose a fast CSC algorithm whose space and time complexities are linear with the size of a data set: we compress the size of the original model proposed by Cucuringu et al. [17] by the encoding of landmark-based graph construction and improve the efficiency further via random sampling in the process of -means clustering. Besides, we prove that the new CSC algorithm will have the comparable clustering result of the original model asymptotically. Experimental results show that the new algorithm not only can utilize the constraint information effectively, but also costs less running time and fits a wider range of data sets compared to the state of the art SCACS method.(ii)With respect to the difference of objective function caused by random projection, we give a detailed proof that random projection can keep the clustering quality of spectral ensemble clustering within a small factor. Based on this theoretical analysis, we design a spectral ensemble clustering algorithm with reduced dimensions caused by sparse random projection. Experiments over different data sets also verify the correctness of our theoretical results. Moreover, since the theoretical analysis is also suitable for the ordinary weighted -means clustering, the influence of random projection on weighted -means clustering is also obtained.(iii)We propose a scalable semisupervised cluster ensemble algorithm through the combination of the fast CSC algorithm and spectral ensemble clustering algorithm with random projection. The efficiency and effectiveness of the new cluster ensemble algorithm are also demonstrated theoretically and empirically.

The remainder of our paper is organized as follows. In Section 2, we introduce the CSC model of Cucuringu et al. [17], landmark-based graph construction, and two related components in our cluster ensemble algorithm: spectral ensemble clustering and random projection. In Section 3, we present our fast CSC algorithm and give its asymptotic property. Then, the algorithm formulation and theoretical analysis of spectral ensemble clustering with random projection are displayed in Section 4. In Section 5, we show the experiment results of our algorithms. Finally, we draw the conclusions of the article and put forward the future directions in Section 6.

2. Preliminaries

In this section, we present the CSC algorithm proposed by Cucuringu et al. [17] and introduce landmark-based graph construction [20, 21] which will be applied to our fast CSC algorithm. In addition, we also introduce spectral ensemble clustering algorithm [22] and sparse random projection [34] which can be used to speed up the spectral ensemble clustering.

2.1. Constrained Spectral Clustering

Here, we first introduce the notion of undirected graph which is very important in constrained spectral clustering and then show the CSC model proposed by Cucuringu et al. [17].

Let be an undirected graph, where is the vertex set, is the edge set, and is the weight set with respect to the edges. is specially the nonnegative weight of the edge between the vertices and , indicating the level of “affinity” between and . If , there is no edge between the vertices and . We denote as the Laplacian matrix of , where the diagonal entry of diagonal matrix is ; is an adjacency matrix with .

The constrained spectral clustering has three undirected graphs: one data graph and two knowledge graphs and . In data graph , each weight indicates the similarity level of vertices in the corresponding edge. The “must link” (ML) graph gives the “must link” information of vertices: each edge in indicates that the corresponding vertices should be in the same group and the level of “must link” belief is described by the weight. The “cannot-link” (CL) graph has analogous components to . The values of weights in the two knowledge graphs are both nonnegative and set according to the constraint information such as prior knowledge. For example, assuming that the range of value of weight is set from 0 to 1, if we have known that points , are in the same group, their corresponding weight . If we only have confidence in the constraint information that the two points are in the same group, the weight , and if we have no constraint information about these two points, .

Viewing pairwise similarities of vertices as the implicit ML constraints declaration, Cucuringu et al. [17] defined a generalized ML graph where is the level of trust for ML constrains. Let be the number of clusters and be the indicator vector of cluster such that if the th data point belongs to cluster and otherwise. In order to violate as few ML constraints as possible and meet as many CL constraints as possible, the constrained way cuts problem [17] can be described as

To solve the problem in (1) approximately, Cucuringu et al. [17] relaxed the condition “” to be the real vectors. Thus, the solution vectors of the relaxed problem are the first nontrivial generalized eigenvectors of the problem After getting the generalized eigenvectors, an additional embedding phase embeds the row vectors of eigenvectors matrix onto the -dimensional sphere and gives the theoretical guarantees of clustering results. The detailed embedding procedures can be accessed in [17]. However, the construction cost and storage cost of data graphs for large scale data sets are both huge (). What is more, if the number of iterations in the process of -means clustering on the embedded eigenvectors matrix is great, the process will also be time-consuming over large scale data sets.

2.2. Landmark-Based Graph Construction

Based on sparse coding theory [40], the landmark-based graph construction [20, 21] scales linearly with the number of data points and can suit large scale data sets very well.

Let data set be and the row vector of be data points; sparse coding problem is defined as follows: where each column vector of is the basis vector, column vectors of are the representations of data points over and is the number of basis vectors. To avoid the high time complexity of solving sparse coding problem, landmark-based graph construction just samples points randomly from input data as basis vectors. In the process of computing , if is among the nearest basis vectors of data points , can be computed as where is the indices set of the nearest basis vectors of and is Gaussian kernel function with bandwidth ; otherwise .

After obtaining the sparse representation , graph affinity matrix is constructed as follows: where and is a diagonal matrix with diagonal entry . Since Chen and Cai [20, 21] have pointed out that was automatically normalized, the normalized graph Laplacian matrix for is . Considering , the time of computing is much less than the time of the nearest neighbors graph construction.

2.3. Spectral Ensemble Clustering

To gain the unified results from different basic partitions, spectral ensemble clustering applies spectral clustering to the coassociation matrix [24] derived from basic partitions. In 2015, Liu et al. [22] transformed spectral ensemble clustering into weighted means clustering over specific binary matrix. This transformation decreased the time and space complexities effectively and our new ensemble clustering method is based on this nice transformation.

Given basic clustering results of data set ; the coassociation matrix is constructed in the following way: where is the label of in the th clustering result , and

Viewing this coassociation matrix as adjacency matrix, spectral ensemble clustering uses spectral clustering to get final clustering result. In the process of the transformation from spectral clustering to weighted -means clustering, binary matrix [22] is built as follows: where , if , and otherwise; “” indicates a row vector. The following lemma [22] presents the connection between spectral ensemble clustering and weighted -means clustering.

Lemma 1 (see [22]). Given a basic partitions set , let the corresponding coassociation matrix be , the diagonal matrix whose diagonal elements are sums of rows of be , and the diagonal element set of be . Then normalized cuts spectral clustering on coassociation matrix has equivalent objective function to weighted -means clustering on data sets with weight set .
Through Lemma 1, the space and time complexities of spectral ensemble clustering can be decreased dramatically. However, when the number of basic partitions and cluster number are large, the binary matrix will be a high dimensional data set, resulting in long running time for weighted -means clustering.

2.4. Random Projection

Recently, random projection has become a common technique of dimensionality reduction [36–39, 41]. Random projection often has low computing complexity and can preserve the structure of original data approximately. In this paper, we use the sparse random projection proposed by Kane and Nelson [34]. When most of the elements of data are zero, the sparse random projection can utilize the sparsity of data effectively and speed up the process of dimensionality reduction.

Lemma 2 (see [34]). For any , , there exists an sparse random matrix , where and , such that for any fixed And the random matrix can be constructed as follows: where matrix () is a sparse matrix with nonzero elements , is a random hashing such that for , and matrix is a diagonal matrix with .

The number of nonzero (nnz) elements of sparse random matrix is , and the time complexity of is nnz. Lemma 2 implies that the sparse random projection can preserve the length of data points approximately. Thus, for data points, since there are pairwise distances, we can conclude that the pairwise distances squares can be preserved within a factor of with .

3. Fast Constrained Spectral Clustering Framework

In this section, we introduce our fast CSC framework for large scale data sets. Inspired by [20, 21], we also try to compute the sparse representation and obtain the approximate adjacency matrix , where , and . Then, our fast framework decreases the size of graph Laplacian through the above approximate graph reconstruction. At last, we analyse the asymptotic property of our new CSC algorithm.

3.1. Framework Formulation

To get the generalized eigenvector approximately, we can let , where is the sparse representation in (5) and . Thus, bringing the back to (1) can decrease the size of problem apparently if .

Specifically, we use to denote constraint matrix, where if edge , if edge , and otherwise. Let adjacency matrix be computed approximately by . Next, bring into (1) and relax their solution over real vectors. Thus, we reformulate the original problem as the following problem.

Problem 3. One hasTo obtain shorthand notations, we denote by and denote by . Thus, the first nontrivial generalized eigenvectors of the problem are the solution vectors of (11).

In order to speed up the -means clustering on the embedded eigenvector matrix, we sample row vectors of eigenvectors matrix randomly and get centers through -means clustering over the selected row vectors. According to the distances between centers and row vectors, we can partition all the row vectors into different clusters. Cucuringu et al. [17] have pointed out that the specific embedding process after getting the generalized eigenvectors can concentrate the row vectors of eigenvector matrix onto the -dimensional sphere and a simple partition algorithm such as -means clustering can be applied to get the final clustering result. Since random sampling is a popular scalability method for -means clustering [42], we will take it to improve the efficiency of the clustering on the row vectors of eigenvector matrix. The experimental results in Section 5 also show that random sampling has little influence on the clustering results and makes the algorithm more efficient than the original one.

Our fast CSC framework is shown in Algorithm 1. In our new algorithm, parameter (in of Step ()) stands for the trust level on constraint information. Since the of the original problem (see (2)) has been taken to a constant in the previous work [17], we also set as a constant.

Input: data set , the number of landmark points , constraint matrix , cluster number ,
confidence parameter , sample rate ;
Output: the grouping result.
() Compute the sparse representation in Equation (5);
() Compute Laplacian and , where
is the Laplacian matrix of , is the Laplacian matrix of ;
() Solve the first non-trivial generalized eigenvectors of Equation (12);
() Compute ;
() Embed into a -dimensional sphere using the embedding process in [17];
() Sample row vectors of randomly and run -means clustering on the sampled row vectors;
() Get the clustering result utilizing distances between centers of -means clustering and row vectors of .

The complexity analysis of Algorithm 1 is presented as follows. The time of computing is . In Step (), the is computed as follows:Let the number of data points with constraint information be ; then the time cost for computing is . Hence, the time cost of Steps () and () is . Besides, the time complexity of Step () is , that of Step () is , and that of Step () is . Thus, the time cost of the first 5 steps is considering and . Assuming the iteration numbers of -means clustering are , the time cost of Steps () and () is , which is much less than the time cost of -means clustering on with . Hence, the time complexity of our algorithm is Since three matrices , , and are stored, the memory complexity is

3.2. Asymptotic Property of the Framework

In this subsection, we show that the partition result of our fast CSC algorithm could be comparable to that of the original model [17] as converges to .

Theorem 4. Assuming the adjacency matrix in the original model is full rank, the result of Step () in Algorithm 1 will converge to the generalized eigenvectors of (2) as converges to .

Proof. From the construction of sparse representation , we can get that where is the normalized adjacency matrix. Equation (12) can be rewritten as Equally, we have that Since the rank of will be equal to , can be removed. Thus the equation will be This equation shows that and in Step () of Algorithm 1 are indeed the eigenvector and eigenvalue of (2), respectively. Moreover, the number of eigenvectors of (19) will converge to as converges . Hence Algorithm 2 could also get all the eigenvectors of (2) asymptotically.

Input: binary matrix , weights set , cluster number .
Output: the final partition result.
() Generate a sparse random matrix meeting the requirements of Lemma 2,
where , , , ;
() Compute , where is a diagonal matrix with diagonal entries ;
() Compute ;
() Run weighted -means clustering on with weight set to obtain the final clustering result.

Since the eigenvectors of our framework will converge to that of original CSC model [17] and the random sampling has little influence on the clustering result of embedded eigenvectors matrix, our new CSC algorithm will generate the partition result which is comparable to that of original framework. In addition, the reason why we give the assumption of Theorem 4 is that each row vector of adjacency matrix is the similarity representation of certain point over the whole data set, and those representations are often linearly independent. In the experiments, we have demonstrated this theory empirically on the 30 nearest neighbors adjacency matrices of three data sets.

4. Spectral Ensemble Clustering with Random Projection

In this section, we propose an improved spectral ensemble clustering algorithm with random projection. The new ensemble clustering not only improves the efficiency of spectral ensemble clustering algorithm designed by Liu et al. [22], but also can theoretically preserve the approximate clustering result.

4.1. Algorithm Formulation

In this subsection, we give the detailed procedure of our new spectral ensemble clustering algorithm. We denote the original spectral ensemble clustering [22] by SEC and our improved spectral ensemble clustering with random projection by SECRP.

From the description of Section 2.3, we can know that the SEC algorithm transforms the spectral clustering on the coassociation matrix into weighted -means clustering on the specific binary matrix . The dimension of binary matrix is , where is the cluster number of basic partition . When the number of clusters and/or basic partitions is big, is probably a high dimensional matrix on which the weighted -means clustering runs slowly.

To avoid the high dimensions of , we design an improved SEC algorithm with random projection for dimensionality reduction. The new algorithm SECRP is showed in Algorithm 2.

The complexity analysis of the new algorithm is as follows. Obviously, the running time of Steps () and () is very short (compared with that of Step ()). The time of Step () is , where is the number of basic partitions; denotes the number of nonzero entries. Another common method of dimensionality reduction is singular value decomposition (SVD). The time of running SVD on binary matrix is , and that of the product between eigenvectors and is . Since , random projection with sparse random matrix is a cost-effective method of dimensionality reduction. With respect to the weighted -means clustering, dimensionality reduction of random projection can decrease the running time of each iteration from to .

As a basic module, Algorithm 2 can be combined with different basic partition methods to produce different cluster ensemble algorithms. Thus, taking Algorithm 1 as the basic partition algorithm for Algorithm 2 could generate an efficient constrained cluster ensemble method with high accuracy (both basic partitions and final clustering are spectral clustering). Moreover, the last two steps of Algorithm 2 are just weighted -means clustering with sparse random projection, which is also suitable for any other applications of weighted -means clustering.

4.2. Theoretical Analysis of New Ensemble Algorithm

In this subsection, we demonstrate that our new algorithm SECRP can maintain the clustering result of SEC approximately.

For the theoretical analysis, we give the formal definition of weighted -means clustering problem with matrix notation:

Definition 5 (weighted -means clustering problem). Given an points set (each row is a data point), diagonal matrix whose diagonal entries set is weights set and clusters number find an indicator matrix such that where denotes the square of Frobenius norm; is selected from the set of all indicator matrices. An indicator matrix has one nonzero element on each row. Specifically, if the th point belongs to the th cluster, , where denotes the sum of weights points in cluster .
Since computing is an NP-hard problem, we focus on the approximate algorithm for weighted -means clustering. The corresponding definition is as follows.

Definition 6 (weighted -means approximation algorithm). An algorithm is called the “-approximation” for weighted -means clustering problem, if the algorithm takes , , and as input and outputs an indicator matrix such that where is the approximation factor and is the failure probability of the “-approximation” weighted -means clustering algorithm.
Though there is the -approximation -means clustering algorithm such as [43], it is unclear whether the -approximation weighted -means clustering algorithm exists or not. To facilitate the proof of our theory, we assume that the approximation algorithm exists and utilize the definition of approximation algorithm in the process of proof. And we will take the weighted version of the classical -means clustering algorithm [44] as the weighted -means clustering to verify our theoretical results in the following experiments.

Theorem 7. Let matrix , weight set , and cluster number be the inputs of Algorithm 2. Let . Assuming that a -approximation weighted -means clustering algorithm exists, then the output of Algorithm 2 satisfies with probability of at least :In the above, is the computing result of Step () in Algorithm 2; is the optimal solution of weighted -means clustering on .

This theorem reveals that random projection not only can be used to improve the efficiency of spectral ensemble clustering with lower dimensions, but also maintains its final result approximately.

In the following, we present a useful lemma which is needed in the proof of Theorem 7. The results of the lemma are based on the results of [36] and Lemma 2.

Lemma 8. Let , , , , and be the same as those in Theorem 7; denote by , the product of top singular vectors (left and right) and singular values of by .
(1) (Lemma of [36]) Let the SVD of be , where and are the left and right singular vector matrices; is a diagonal matrix whose diagonal elements are the singular values. With probability of at least 0.97,where is the pseudoinverse of matrix; is an matrix with .
(2) (Lemma of [36]) For any matrix , with probability of at least 0.99,(3) (Combination of Lemmas and of [36]) With probability of at least 0.99,

These conclusions are all about the influences of random matrix on the norms of different matrices, which are useful for bounding the norms of the matrices in Theorem 7. In the following proof of Theorem 7, we start by decomposing the term in (22). Then, based on the influences of random matrix in Lemma 8, we manipulate the norms of the different terms in the decomposition result.

Proof. Using the notation of Lemma 8, (22) can be decomposed into where . The last equation is based on the orthogonality of and .
We first give the bound of the second term of (26). According to our definition of indicator matrix, . Thus, is a projector matrix; namely, its norm is 1. As a result, we get where the second inequality is caused by the fact that and the optimality of SVD.
We next bound the first term of (26). From the first statement of Lemma 8, we get From Definition 6 and the meaning of of Theorem 7, we get Using the statement 2 of Lemma 8, (29) can be transformed to Combining the statement 3 of Lemma 8 and (30), we get From (28) and (31), and rescaling , we can get Finally, combining (27) and (32) concludes the proof.

It is easy to check that the above theoretical analysis can be also applied to ordinary weighted -means clustering, indicating that the method of dimensionality reduction with random projection can preserve the clustering quality of weighted means clustering approximately. Furthermore, the integration of Theorems 4 and 7 means that the new semisupervised cluster ensemble method (combination of Algorithms 1 and 2) can have an encouraging clustering result.

5. Experiments

In this section, we present the experimental results of our new algorithms in Sections 3 and 4. We implemented all the related algorithms in Matlab and conducted our experiments on a Windows machine with the Intel Core 3.6 GHz processor and 16 GB of RAM.

5.1. Data Sets and Experimental Settings

In order to facilitate the comparison, we performed experiments on three data sets which can be achieved from public web sites (http://archive.ics.uci.edu/ml/), (http://www.cad.zju.edu.cn/home/dengcai/). Table 1 summarizes their basic information.

The constraint information is generated from the real labels of data sets. In our experiments, we sample the labeled points randomly from data sets. The constraint matrix is constructed as

The validation measures of the partition result used in our experiments are cluster accuracy (CA) [45] and normalized mutual information (NMI) [25]. The CA is computed as where is the cluster number of clustering result, is the number of data points, is the maximum number of points with the same true label in the th cluster. For computing the NMI, we construct two random variables and from the clustering result and true label, respectively. The probability distributions of random variables are the proportions of different clusters (or classes) over the whole data set. The NMI is computed as follows: where denotes the mutual information of random variables and , denotes the entropy of a random variable, is the number of data points, is the number of points in both cluster and class , is the points number of cluster , and is the points number of class . The values of CA and NMI both vary from 0 and 1, and the higher value means better clustering solution.

5.2. Comparisons of Different Constrained Spectral Clustering

In this subsection, we compare our fast CSC (constrained spectral clustering) algorithm with other spectral clustering algorithms. Following is the list of information of different algorithms in comparison:(i)LSC- [20, 21]: the unsupervised spectral clustering baseline with landmark-based graph construction.(ii)SCACS [16]: the most efficient CSC algorithm known and be set as the CSC baseline over MNIST and CoverType data sets.(iii)CCS [17]: the original CSC model proposed in [17], set as the CSC baseline over LetterRec data set. (Since the constructions of the nearest neighbors graphs are both time-consuming on MNIST and CoverType data sets, we do not run CCS algorithm on these two data sets.)(iv)CCS-L: our improved CCS algorithm with landmark-based graph construction.(v)CCS-LS: our improved CCS algorithm with landmark-based graph construction and random sampling.

In the process of the landmark-based graph construction, we fix the number of landmark points and the number of nearest neighbors . The parameters in SCACS algorithm that we used are , which is the same as those in [16]. Since in the original model CCS [17] it has been pointed out that could be a constant number and was set to 5 in their implementation code, we also set in CCS, CCS-L, and CCS-LS.

First, we investigate the influence of the number of labeled points on the performance of algorithms. We vary the value of from 100 to 1000 with step size 100. For each value of , we select the labeled points randomly to produce constraint information and repeat 20 trials with different labeled points sets. The corresponding experimental results are presented in Figure 1. Figures 1(a), 1(b), and 1(c) are related to CA of clustering results, Figures 1(d), 1(e), and 1(f) are related to NMI, and Figures 1(g), 1(h), and 1(i) are related to running time. We can see that our algorithm CCS-LS outperforms LSC-R on all data sets and the values of CA and NMI increase with the growth of constraint information. Those indicate that our algorithm can employ the constraint information appropriately. Compared with SCACS, our algorithm has the similar performances on LetterRec and MNIST data sets and superior performances on CoverType data set, indicating that our algorithm adapts a wider range of geometries. Over the three data sets, the performances of CCS-LS are all close to CCS-L. What is more, our algorithm runs fastest among these algorithms.

(a) LetterRec

(b) MNIST

(c) CoverType

(d) LetterRec

(e) MNIST

(f) CoverType

(g) LetterRec

(h) MNIST

(i) CoverType

Next, we study the influence of random sampling (Step () of Algorithm 1) which can be seen in Figure 2. In the experiments, we fix and change the sample rate from 0.1 to 1 by a step size 0.1. We still run 20 independent trials considering the randomness and compute the means of validity measures. We can see that the values of CA and NMI vary slightly along with the growth of sample rate, verifying the feasibility of random sampling.

5.3. Performance of the Spectral Ensemble Clustering with Random Projection

Since cluster ensemble consists of two parts: basic partition clustering and ensemble clustering, we below combine different basic partition clustering algorithms and different ensemble clustering algorithms to get different cluster ensemble algorithms. Thus, the performance of new ensemble clustering algorithm (Algorithm 2) and new cluster ensemble algorithm (combination of Algorithms 1 and 2) can both be manifested. Following is the list of information of different cluster ensemble algorithms in comparison:(i)CK-: the basic partition clustering algorithm “CK” is the constrained -means clustering algorithm [9], and the ensemble clustering algorithm “SE” is the spectral ensemble clustering (SEC) algorithm [22].(ii)SCACS-: the basic partition clustering algorithm is SCACS [16] in Section 5.2, and the ensemble clustering algorithm is also SE [22].(iii)CCSS-SE: the basic partition clustering algorithm “CCSS” is our fast CSC algorithm (Algorithm 1), and the ensemble clustering algorithm is also SE [22].(iv)CCSS-SER: the basic partition clustering algorithm is CCSS, and the ensemble clustering algorithm “SER” is our spectral ensemble clustering with random projection (Algorithm 2).

In the phase of basic partition clustering, we fix the number of basic partitions as 50 and the parameters of basic clustering algorithms are the same as those in the last subsection. In addition, similar to the operation of SE [22], the basic partitions are obtained by varying the cluster number from to . We repeat each cluster ensemble algorithm 10 times and present the average values of results.

First, we show the comparison of different cluster ensemble algorithms in terms of different constraint information in Figure 3. Here the dimensionality of CCSS-SER reduced by random projection is 40 and we change the number of labeled points from 100 to 1000 with step size 100. In the figure, the validity measures of Figures 3(a)–3(c) and Figures 3(d)–3(f) are related to CA and NMI, respectively. Just like the results of last subsection, CCSS-SE has similar performance to that of SCACS-SE on LetterRec and MNIST data sets and has much better performance on CoverType data set. From the comparison between Figure 1 and 3, we can see that the two validity measures are both higher than those of the basic partition dramatically, verifying ensemble clustering’s improvement in clustering quality. Compared with CK-SE, CCSS-SE and CCSS-SER both have better performance significantly, which indicates that the basic partitions have an obvious impact on the final result and also verify the high accuracy of our new constrained spectral cluster ensemble method. In addition, the little difference of performance between CCSS-SE and CCSS-SER implies that the random projection can preserve the results of spectral ensemble clustering approximately on different constraint information.

(a) LetterRec

(b) MNIST

(c) CoverType

(d) LetterRec

(e) MNIST

(f) CoverType

Second, we inspect the influence of dimensions of random projection on the performance of our algorithm in Figure 4 and Table 2. In Figure 4, the “SEC-SVD” denotes the SEC algorithm with dimensionality reduction of SVD. When is above certain bound, the validity measures of “SECRP” (denote our algorithm SECRP) are almost stable and similar to those of SEC over all three data sets. This indicates that the accuracy of clustering algorithm can be kept when the dimensions surpass a certain bound, which verifies Theorem 7. The small bound of dimensions () also reveals the effectiveness of dimensionality reduction of random projection. With respect to SEC-SVD, although it can also preserve the accuracy of clustering algorithm, its running time is not encouraging. Even letting , the running time comparisons of original algorithm and SVD method over three data sets are 3.47 s/10.85 s, 4.91 s/14.54 s, and 22.06 s/326.61 s. These phenomena may be caused by the tardiness of SVD on large matrix and the breaking of sparseness of binary matrix . In Table 2, the decrease of running time verifies the efficiency of our new spectral ensemble clustering. Combining this and subfigures (g,h,i) in Figure 1, the efficiency of new constrained cluster ensemble method is also verified. In addition, we can see the decrease of running time caused by random projection is declining with the growth of dimensions, indicating the relative small dimensionality with random projection is preferable.

(a) LetterRec

(b) MNIST

(c) CoverType

(d) LetterRec

(e) MNIST

(f) CoverType

6. Conclusion

To handle large scale data sets, we propose a fast CSC algorithm. The new algorithm can decrease the space and time complexity of a recently introduced CSC model through landmark-based graph construction and improve its efficiency further by random sampling. The new algorithm not only has the similar property of original model asymptotically, but also is the most efficient and suitable to a wide range of data sets empirically. Taking the new CSC algorithm as basic partition algorithm, we design an efficient semisupervised cluster ensemble algorithm. In the stage of consensus clustering, we reduce the dimensionality of input of spectral ensemble clustering by sparse random projection and prove that the sparse random projection can keep the clustering quality approximately. The experimental results over several data sets also verify the efficiency and effectiveness of new cluster ensemble algorithm. Moreover, in the process of spectral ensemble clustering, the influence analysis of dimensionality reduction with random projection can also give the theoretical guarantee for the weighted -means clustering with random projection. In the future, we will use techniques such as applying several different basic partition methods, selecting the results of basic partitions, and giving different weights for basic partitions to improve the performance of our cluster ensemble algorithm further.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Nature Science Foundation of China under Grants 61502527 and 61379150 and in part by the Open Foundation of State Key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) (no. SKLNST-2013-1-06).

References

J. Shen, D. Liu, J. Shen, Q. Liu, and X. Sun, “A secure cloud-assisted urban data sharing framework for ubiquitous-cities,” Pervasive and Mobile Computing, 2017.
View at: Publisher Site | Google Scholar
Q. Liu, W. Cai, J. Shen, Z. Fu, X. Liu, and N. Linge, “A speculative approach to spatial‐temporal efficiency with multi‐objective optimization in a heterogeneous cloud environment,” Security and Communication Networks, vol. 9, no. 17, pp. 4002–4012, 2016.
View at: Publisher Site | Google Scholar
C. C. Aggarwal and C. K. Reddy, Eds., Data Classification: Algorithms and Applications, CRC Press, New York, NY, USA, 2014.
View at: MathSciNet
Y. Zheng, B. Jeon, D. Xu, Q. M. J. Wu, and H. Zhang, “Image segmentation by generalized hierarchical fuzzy C-means algorithm,” Journal of Intelligent and Fuzzy Systems, vol. 28, no. 2, pp. 961–973, 2015.
View at: Publisher Site | Google Scholar
Z. Zhou, Q. J. Wu, F. Huang, and X. Sun, “Fast and accurate near-duplicate image elimination for visual sensor networks,” International Journal of Distributed Sensor Networks, vol. 13, no. 2, 2017.
View at: Publisher Site | Google Scholar
H. Rong, T. Ma, M. Tang, and J. Cao, “A novel subgraph K+-isomorphism method in social network based on graph similarity detection,” Soft Computing, pp. 1–19, 2017.
View at: Publisher Site | Google Scholar
T. Ma, Y. Wang, M. Tang et al., “LED: a fast overlapping communities detection algorithm based on structural clustering,” Neurocomputing, vol. 207, pp. 488–500, 2016.
View at: Publisher Site | Google Scholar
Z. Yu, H. Chen, J. You et al., “Adaptive fuzzy consensus clustering framework for clustering analysis of cancer data,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 4, pp. 887–901, 2015.
View at: Publisher Site | Google Scholar
K. Wagstaff, C. Cardie, S. Rogers, and S. Schrödl, “Constrained k-means clustering with background knowledge,” in Proceedings of the 18th International Conference on Machine Learning (ICML '01), pp. 577–584, Williams College, Williamstown, Mass, USA, June 2001.
View at: Google Scholar
E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J. Russell, “Distance metric learning with application to clustering with side-information,” in Proceedings of the 15th International Conference on Neural Information Processing Systems (NIPS '02), pp. 505–512, Vancouver, Canada, December 2002.
View at: Google Scholar
S. D. Kamvar, D. Klein, and C. D. Manning, “Spectral learning,” in Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI '03), pp. 561–566, Acapulco, Mexico, August 2003.
View at: Google Scholar
Z. Lu and M. Á. Carreira-Perpiñán, “Constrained spectral clustering through affinity propagation,” in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), Anchorage, Ala, USA, June 2008.
View at: Publisher Site | Google Scholar
Z. Li, J. Liu, and X. Tang, “Constrained clustering via spectral regularization,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR '09), pp. 421–428, Miami, Fla, USA, June 2009.
View at: Publisher Site | Google Scholar
Z. Lu and H. H. Ip, “Constrained spectral clustering via exhaustive and efficient constraint propagation,” in Proceedings of the 11th European Conference on Computer Vision on Computer Vision (ECCV '10), vol. 6316, pp. 1–14, Crete, Greece, September 2010.
View at: Google Scholar
X. Wang, B. Qian, and I. Davidson, “On constrained spectral clustering and its applications,” Data Mining and Knowledge Discovery, vol. 28, no. 1, pp. 1–30, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
J. Li, Y. Xia, Z. Shan, and Y. Liu, “Scalable constrained spectral clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 2, pp. 589–593, 2015.
View at: Publisher Site | Google Scholar
M. Cucuringu, I. Koutis, S. Chawla, G. L. Miller, and R. Peng, “Simple and scalable constrained clustering: a generalized spectral method,” in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS '16), pp. 445–454, Cadiz, Spain, May 2016.
View at: Google Scholar
A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: analysis and an algorithm,” in Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS '01), pp. 849–856, Vancouver, Canada, December 2001.
View at: Google Scholar
J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000.
View at: Publisher Site | Google Scholar
X. Chen and D. Cai, “Large scale spectral clustering with landmark-based representation,” in Proceedings of the 25th AAAI Conference on Artificial Intelligence and the 23rd Innovative Applications of Artificial Intelligence Conference, pp. 313–318, August 2011.
View at: Google Scholar
D. Cai and X. Chen, “Large scale spectral clustering via landmark-based sparse representation,” IEEE Transactions on Cybernetics, vol. 45, no. 8, pp. 1669–1680, 2015.
View at: Publisher Site | Google Scholar
H. Liu, T. Liu, J. Wu, D. Tao, and Y. Fu, “Spectral ensemble clustering,” in Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15), pp. 715–724, Sydney, Australia, August 2015.
View at: Publisher Site | Google Scholar
X. Z. Fern and C. E. Brodley, “Random projection for high dimensional data clustering: a cluster ensemble approach,” in Proceedings of the 20th International Conference on Machine Learning (ICML '03), vol. 3, pp. 186–193, August 2003.
View at: Google Scholar
A. L. N. Fred and A. K. Jain, “Combining multiple clusterings using evidence accumulation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 835–850, 2005.
View at: Publisher Site | Google Scholar
A. Strehl and J. Ghosh, “Cluster ensembles—a knowledge reuse framework for combining multiple partitions,” Journal of Machine Learning Research, vol. 3, no. 3, pp. 583–617, 2003.
View at: Publisher Site | Google Scholar | MathSciNet
J. Wu, H. Liu, H. Xiong, and J. Cao, “A theoretic framework of K-means-based Consensus Clustering,” in Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI '13), pp. 1799–1805, Beijing, China, August 2013.
View at: Google Scholar
Z. Yu, L. Li, J. Liu, J. Zhang, and G. Han, “Adaptive noise immune cluster ensemble using affinity propagation,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 12, pp. 3176–3189, 2015.
View at: Publisher Site | Google Scholar
Z. Yu, P. Luo, J. You et al., “Incremental semi-supervised clustering ensemble for high dimensional data clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 3, pp. 701–714, 2016.
View at: Publisher Site | Google Scholar
M. Ye, W. Liu, J. Wei, and X. Hu, “Fuzzy c-means and cluster ensemble with random projection for big data clustering,” Mathematical Problems in Engineering, vol. 2016, Article ID 6529794, 13 pages, 2016.
View at: Publisher Site | Google Scholar | MathSciNet
W. B. Johnson and J. Lindenstrauss, “Extensions of lipschitz mappings into a hilbert space,” Contemporary Mathematics, vol. 26, pp. 189–206, 1984.
View at: Google Scholar
P. Indyk and R. Motwani, “Approximate nearest neighbors: towards removing the curse of dimensionality,” in Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pp. 604–613, ACM, 1998.
View at: Google Scholar | MathSciNet
D. Achlioptas, “Database-friendly random projections: Johnson-Lindenstrauss with binary coins,” Journal of Computer and System Sciences, vol. 66, no. 4, pp. 671–687, 2003.
View at: Publisher Site | Google Scholar | MathSciNet
J. A. Tropp, “Improved analysis of the subsampled randomized Hadamard transform,” Advances in Adaptive Data Analysis. Theory and Applications, vol. 3, no. 1-2, pp. 115–126, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
D. M. Kane and J. Nelson, “Sparser Johnson-Lindenstrauss transforms,” Journal of the ACM, vol. 61, no. 1, article 4, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
S. Paul, C. Boutsidis, M. Magdon-Ismail, and P. Drineas, “Random projections for linear support vector machines,” ACM Transactions on Knowledge Discovery from Data, vol. 8, no. 4, article 22, 2014.
View at: Publisher Site | Google Scholar
C. Boutsidis, A. Zouzias, and P. Drineas, “Random projections for κ-means clustering,” in Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS '10), pp. 298–306, December 2010.
View at: Google Scholar
C. Boutsidis, A. Zouzias, M. W. Mahoney, and P. Drineas, “Randomized dimensionality reduction for c-means clustering,” IEEE Transactions on Information Theory, vol. 61, no. 2, pp. 1045–1062, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
M. B. Cohen, S. Elder, C. Musco, C. Musco, and M. Persu, “Dimensionality reduction for k-means clustering and low rank approximation,” in Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC '15), pp. 163–172, June 2015.
View at: Publisher Site | Google Scholar
Q. Ding and E. D. Kolaczyk, “A compressed PCA subspace method for anomaly detection in high-dimensional data,” IEEE Transactions on Information Theory, vol. 59, no. 11, pp. 7419–7433, 2013.
View at: Publisher Site | Google Scholar
H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efficient sparse coding algorithms,” in Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS'06), pp. 801–808, Vancouver, Canada, 2006.
View at: Google Scholar
M. Popescu, J. Keller, J. Bezdek, and A. Zare, “Random projections fuzzy c-means (RPFCM) for big data clustering,” in Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE '15), pp. 1–6, Istanbul, Turkey, August 2015.
View at: Publisher Site | Google Scholar
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques: Concepts and Techniques, Elsevier, 2011.
A. Kumar, Y. Sabharwal, and S. Sen, “A simple linear time (1+)-approximation algorithm for k-means clustering in any dimensions,” in Proceedings of the 45th Symposium on Foundations of Computer Science (FOCS '04), pp. 454–462, Rome, Italy, October 2004.
View at: Google Scholar
D. Arthur and S. Vassilvitskii, “k-means++: the advantages of careful seeding,” in Proceedings of the 18th ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035, 2007.
View at: Google Scholar | MathSciNet
A. Fahad, N. Alshatri, Z. Tari et al., “A survey of clustering algorithms for big data: taxonomy and empirical analysis,” IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 267–279, 2014.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2017 Wenfen Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1709

Downloads

1274

Citations