Abstract

A number of literature reports have shown that multi-view clustering can acquire a better performance on complete multi-view data. However, real-world data usually suffers from missing some samples in each view and has a small number of labeled samples. Additionally, almost all existing multi-view clustering models do not execute incomplete multi-view data well and fail to fully utilize the labeled samples to reduce computational complexity, which precludes them from practical application. In view of these problems, this paper proposes a novel framework called Semi-supervised Multi-View Clustering with Weighted Anchor Graph Embedding (SMVC_WAGE), which is conceptually simple and efficiently generates high-quality clustering results in practice. Specifically, we introduce a simple and effective anchor strategy. Based on selected anchor points, we can exploit the intrinsic and extrinsic view information to bridge all samples and capture more reliable nonlinear relations, which greatly enhances efficiency and improves stableness. Meanwhile, we construct the global fused graph compatibly across multiple views via a parameter-free graph fusion mechanism which directly coalesces the view-wise graphs. To this end, the proposed method can not only deal with complete multi-view clustering well but also be easily extended to incomplete multi-view cases. Experimental results clearly show that our algorithm surpasses some state-of-the-art competitors in clustering ability and time cost.

1. Introduction

In many practical applications, a growing amount of real-world data naturally appears in multiple views, which are called multi-view data, where the data may be characterized by different attributes or be collected from diverse sources. For example, an image can be described with different features, such as SIFT (Scale-Invariant Feature Transform), HOG (Histogram of Oriented Gradient), LBP (Local Binary Pattern), etc. [1]; a piece of specific news can be reported to multiple news organizations [2]; and a web page can be represented as a web page with links, texts, and images, respectively [3]. In other words, all of these objects are characterized by different characteristics, and each characteristic is referred to as one view describing the object. Generally, an individual view has a wealth of information to execute machine learning tasks, but it ignores leveraging the consistent and complementary information from multiple views [4]. Proper use of such information has the possibility of elevating various machine learning performances. Therefore, it is critical to consider how to effectively leverage such information.

Multi-view clustering, which adaptively separates data into corresponding groups by utilizing the consistency or complementarity principle among multiple views, is a very popular research direction. From the perspective of involved technologies, most of the existing literature reports are roughly classified into three types: matrix factorization-based, graph-based, and subspace-based approaches. As Kang et al. [5] pointed out, matrix factorization-based approaches seek a common matrix among different views, and graph-based approaches explore a common affinity graph, while subspace-based approaches learn the consensus subspace with low dimension. Therefore, as multi-view clustering, the key to obtaining high performance is to confirm that the optimal consistent representation is generated. To this end, multiple multi-view clustering models have been presented [617] and widely used in various real-world scenarios, for instance, object recognition [18], feature selection [19], information retrieval [20], etc.

One of the basic assumptions is that all views are complete, which is adopted by the aforementioned multi-view clustering approaches. However, in real-world applications, it is very common that samples are missing in some views for a lot of reasons, such as man-made faults or temporary failure of the sensor. Thus, previous complete multi-view methods cannot work well in this scenario since the pairwise information of samples missing some views cannot be directly used. If we want to apply conventional multi-view clustering algorithms to deal with the incomplete dataset, we can either remove the samples with incompleteness or fill incomplete samples with information during pre-processing. Nevertheless, these pre-processing methods will cause the original data to lose information or introduce noise, which makes conventional multi-view clustering methods unavoidably degrade or even fail. Therefore, incomplete multi-view clustering cases have drawn increasing interest recently, and many attempts have been made to tackle this problem [2, 2126].

Moreover, real-world data usually contains a small number of labeled samples in some practical applications. The aforementioned methods are unsupervised and cannot leverage prior information to improve the performance, which limits their application. In practice, labeled samples are available, and efficiently exploiting these data can significantly improve clustering performance and reduce clustering time consumption. Inspired by this framework, some advanced semi-supervised multi-view clustering frameworks have recently been created to perform various clustering tasks [2733]. However, most of these methods learn the optimal common indicator matrix from multiple views by performing alternative optimization algorithms, which leads to high computational complexity and cannot be widely used.

In view of the above issues, we present a new framework called Semi-supervised Multi-View Clustering with Weighted Anchor Graph Embedding (SMVC_WAGE), which is conceptually simple and efficiently generates high-quality clustering results in practice. SMVC_WAGE employs inherent consistency and external complementary information to seek the optimal fusion graph that spans multiple views compatibly in structure. Specifically, we apply the anchor graph learning to bridge all the intrinsic view samples, which can greatly enhance efficiency and improve stableness. Moreover, this can also solve the dilemma that samples sharing no common views cannot be directly used for computing cross-view similarities. Besides, instead of regularizing or weighting the loss of each view in a conventional way, the proposed method directly combines the graphs of different views to construct the global optimal fused graph, where the weights are learned in a nearly parameter-free manner. Therefore, through exploring anchor selection strategy from labeled samples and designing the weighted fusion mechanism for multiple views simultaneously, the proposed method can not only deal with complete multi-view clustering well, but also be easily extended to the incomplete multi-view instance. The main contributions of this paper are summarized as follows:(1)We provide a simple and effective anchor strategy. Based on these anchor points, the proposed method can exploit the intrinsic and extrinsic view information to bridge all samples and capture more reliable nonlinear relations, which can greatly enhance efficiency and improve stableness while partitioning multi-view data into different clusters.(2)We propose a novel graph fusion mechanism that constructs the global fused graph via directly coalescing the view-wise graphs, and the procedure is nearly free of parameters.(3)We present a more general semi-supervised clustering framework that can deal with complete multi-view clustering well and be easily extended to incomplete multi-view cases.(4)Experimental results on six widely used multi-view datasets clearly show that our algorithm surpasses some state-of-the-art competitors in clustering ability and time cost.

Other parts of the paper are organized as follows: Section 2 briefly reviews the related works. In Section 3, the proposed algorithm is described in detail. Afterwards, the experimental results and discussion are given in Section 4. Finally, Section 5 concludes the paper.

In this section, we firstly make an introduction of recent progress of two specific multi-view clustering approaches. Then, we briefly describe the related work of semi-supervised multi-view clustering.

2.1. Complete Multi-View Clustering

Multi-view clustering exploits the consistent and complementary information from multi-view data to increase clustering performance and stability, which has attracted extensive attention recently. Numerous multi-view clustering models have been built. Usually, the multi-view clustering approaches assume that total samples have complete information in each view, where the samples are called complete multi-view data. Roughly speaking, in terms of related techniques, they can be mainly divided into two sections: graph-based and subspace-based methods.

Graph-based methods aim to construct the optimal fusion graph which is performed by graph-cut or other techniques to obtain the final result. Li et al. [6] developed a novel approach, named Multi-view Spectral Clustering (MVSC), which selects several uniform salient points to construct a bipartite graph that represents the manifold structures of multi-view data. Nie et al. [7] offered a new approach called Self-weighted Multi-view Clustering (SwMC), which is completely self-weighted and directly assigns the cluster label to the corresponding data point without any post-processing. Wang et al. [8] proposed a general Graph-based Multi-view Clustering (GMC), which jointly learns the graph of each view and the unified graph in a mutually enhanced manner and directly generates the final clustering result. Tang et al. [9] presented a robust model for Multi-view Subspace Clustering, which designs a diversity regularization term to enhance the diversity and reduce the redundancy among different feature views. Additionally, graph-based methods usually need to predefine graphs, and the quality of the graph largely determines the final clustering performance. The work in [10] introduced a novel model named Multi-view Clustering with Graph Learning (MVGL), which learns one global graph from different graphs constructed by all views to promote the quality of the final fusion graph. The work in [11] presented a novel method named Multi-view Consensus Graph Clustering (MCGC), which minimizes disagreement among all views and imposes a low-rank restraint on the Laplacian matrix to gain a unison graph. The study in[12] proposed a novel model called Graph Structure Fusion (GSF), which designs an objective function to adaptively tune the structure of the global graph. The work in [13] proposed a novel multi-view clustering method, which learns a unified graph via cross-view graph diffusion (CGD), where the initial value entered is each predefined view-wise graph matrix. To further learn a compact feature representation, the study in [14] proposed to capture both the shared information and distinguishing knowledge across different views via projecting each view into a common label space and preserve the local structure of samples by using the matrix-induced regularization.

Subspace-based methods are widely studied; they utilize various techniques to obtain low-dimensional embedding. In general, they can efficiently reduce the dimensionality of the raw data and be easy to explain. Because of this property, the study in [15] proposed to simulate different views as different relations in a knowledge graph, which learns a unified embedding and several view-specific embeddings from similarity triplets to perform multi-view clustering. The work in [16] proposed a novel model called Latent Multi-view Subspace Clustering (LMSC), which encodes complementary information between different views to automatically learn one latent consistent representation. To decrease the computational complexity and this memory requirement, the work in [17] introduced a novel framework entitled Binary Multi-View Clustering (BMVC), which jointly learns these collaborative binary codes and binary cluster structures to perform large-scale multi-view clustering.

2.2. Incomplete Multi-View Clustering

In practical applications, we are more likely to be provided with incomplete multi-view data. However, conventional multi-view clustering approaches unavoidably degrade or even fail while dealing with incomplete multi-view data. Recently, many works have been executed to solve this issue, which can be generally classified into matrix factorization-based and graph-based methods in terms of involved techniques.

Matrix factorization-based methods directly learn a latent consistent representation with low dimensionality from all views by utilizing the matrix factorization techniques. Li et al. [21] developed a pioneering approach called Partial multi-View Clustering (PVC), which learns a latent consistent subspace of complete samples and a private latent representation of incomplete samples by exploiting nonnegative matrix factorization (NMF) and sparsity norm regularization. Zhao et al. [22] presented a model that learns the compact global structure over the entire samples across all views by integrating Partial multi-View Clustering and graph Laplacian term. Shao et al. [23] presented the framework named Multi-Incomplete-view Clustering (MIC), which exploits weighted NMF and -norm regularization to learn the latent consistent feature matrix. Hu and Chen [24] proposed the approach called Doubly Aligned Incomplete Multi-view Clustering (DAIMC), which can handle negative entries through integrating weighted semi-NMF and -norm regularized regression. While the above approaches can deal with incomplete multi-view data, the comparatively large storage and computational complexities limit their real-world applications. Liu et al. [25] proposed a novel framework called Late Fusion Incomplete Multi-view Clustering (LF-IMVC), which simultaneously imputes each incomplete sample and learns a consistent indicator matrix.

Graph-based methods focus on learning the low-dimensional representation from each graph which is constructed by each view and uncover the relationships between all samples. Wen et al. [26] introduced a general framework, which learns the low-dimensional representations from all views via exploiting spectral constraint and coregularization term. Guo and Ye [2] proposed a new algorithm named Anchor-based Partial Multi-view Clustering (APMC), which integrates the intrinsic and extrinsic view information into the fused similarities via anchors; then, the unified clustering outcome can be achieved by performing spectral clustering on the fused similarities.

2.3. Semi-Supervised Multi-View Clustering

Semi-supervised multi-view clustering, which uses a small proportion of labeled samples as well as a great number of unlabeled samples to perform clustering, is one of the hottest research directions in machine learning. As the most popular technique in the area of semi-supervised multi-view clustering, graph-based methods construct a graph, where vertices contain unlabeled and labeled data and edges reflecting the similarity of vertices spread information from labeled to unlabeled vertices. Thinking of each kind of feature as a modality, Cai et al. [27] proposed an algorithm named Adaptive Multi-Modal Semi-Supervised classification (AMMSS), which jointly learns the weight and the commonly shared class indicator matrix. Karasuyama and Mamitsuka [28] proposed a new method called Sparse Multiple Graph Integration (SMGI), which linearly combines multiple graph Laplacian matrices with sparse weights for label propagation. Nie et al. [29] presented a new framework called Auto-weighted Multiple Graph Learning (AMGL), which automatically learns a set of optimal weights without any parameters. Nie et al. [30] presented a novel model named Multi-view Learning with Adaptive Neighbors (MLAN), which directly partitions the final optimal graph into corresponding groups and the process only has the parameter for the robustness. To take advantage of the information in multi-view data, Nie et al. [31] proposed a new model called Adaptive MUlti-view SEmi-supervised (AMUSE), which obtains a more suitable unified graph for semi-supervised learning via imposing a structural regularization term constraint. Aiming at the incomplete multi-view issue, Yang et al. [32] proposed a novel framework called Semi-supervised Learning with Incomplete Modalities (SLIM). It employs the inherent modal consistency to learn discriminative modal predictors and performs clustering via the external complementary information of unlabeled data. However, graph-based approaches do not always make sure whether the final representation has the same label as the raw data. Cai et al. [33] introduced a new semi-supervised Multi-View Clustering method based on Constrained Nonnegative Matrix Factorization (MVCNMF). It propagates the label information to a consistent representation via exploiting matrix factorization techniques.

3. Proposed Method

In this section, we elaborate our simple yet effective approach called Semi-supervised Multi-View Clustering with Weighted Anchor Graph Embedding (SMVC_WAGE), which provides a general framework for semi-supervised multi-view clustering. Specifically, SMVC_WAGE firstly provides a simple and effective anchor strategy that exploits the intrinsic and extrinsic view information to bridge all samples and capture more reliable nonlinear relations. Then, the proposed method learns the weight for each view via utilizing the seed-based semi-supervised -Means and the designed mathematical techniques to seek the optimal fusion graph that spans multiple views compatibly in structure. Ultimately, spectral clustering is conducted on the global fused graph to obtain a unified clustering result. To this end, in the following, we describe the notation and problem definition firstly and then introduce the Semi-supervised -Means based on Seed for single-view clustering. Thirdly, we propose SMVC_WAGE for solving both complete and incomplete multi-view clustering.

3.1. Notation and Problem Definition

(1)Notations. Except in some specified cases, italic, not bold letters represent scalars. Bold uppercase letters denote matrices, while bold lowercase letters are vectors. is an identity matrix with an appropriate size, and is an all-one vector with a compatible length.(2)Definition. As multi-view data, each sample is characterized by multiple views with one unified label. Assume that we are provided with a dataset composed of samples from the views in clusters, in which is the data matrix of the -th view. Denote as the -th sample in the -th view, where is the dimensionality of data features in the -th view.

Multi-view clustering aims to classify all samples into batches via utilizing the consistent and complementary information from multi-view data, where is assumed to be predefined by users.

3.2. Semi-Supervised -Means Based on Seed

The proposed method performs spectral clustering on the global fused graph to obtain a unified clustering result whereas -Means clustering is the important component of spectral clustering. Additionally, for our method, the seed-based semi-supervised -Means is the key step to learn the weights from multiple views. Therefore, it is necessary to review Semi-supervised -Means based on Seed.

Without any loss of generalization, we assume a single-view data matrix , where can be acquired from the above-mentioned multi-view data. Suppose that the single-view data matrix is categorized into clusters . In a semi-supervised single-view clustering framework, we customarily collect a small amount of labeled data , termed the seed set , through prior knowledge, and we suppose that, for each cluster , there is typically at least one seed point . Note that we take a disjoint partitioning of the seed set , so that belongs to . In semi-supervised -Means, the seed set is utilized to initialize the -Means approach. Thus, the centroid of the -th cluster is initialized with the mean of the -th partition ; then, the semi-supervised -Means objective function can be written aswhere is the -th sample from the single-view data matrix , is the mean of the -th partition , and is the Dirac delta function. Furthermore, and can be defined as the following equations, respectively.where is the number of samples in .

Through further analysis of -Means objective function equation (1), its optimal solution is an NP-hard problem [34]. However, the objective function is quickly locally minimized and converges to a local optimum by using the efficient iterative relocation algorithms [35].

3.3. The Proposed Method for Complete Multi-View Data
3.3.1. Anchor-Based Global Fused Similarity Matrix Construction in Multi-View Data

In recent years, some studies [2, 36, 37] apply an anchor-based scheme to form the similarity matrix . Generally, the anchor-based scheme mainly consists of two steps. The first step is that anchor points can be searched from the raw data, where . The second is that a matrix is designed to measure the similarity between anchor points and data points.

There are two common methods for anchor point generation: random selection and -Means method. Random selection is to extract a portion of data as anchor points via adopting random sampling from original data. Although the random selection strategy saves time, it cannot ensure that the selected anchor points are always good, which makes the results neither ideal nor stable. -Means approach utilizes the clustering centroids as anchor points, which makes the chosen anchors more representative in comparison with random selection. Nevertheless, an inevitable problem is that -Means is sensible to its origin centroid. To eliminate this problem, the -Means method requires numerous independent and repeated running. For this reason, exploiting the -Means as a pre-processing or post-processing framework is also unpredictable and has computational complexity. Considering that several real samples may have the label in practice and real samples that belong to the same cluster have similar statistical characteristics, while samples belonging to different clusters have greater differences in statistical characteristics, we can obtain the seed set from the labeled samples in views, where denotes the seed set in the -th view with clusters, denotes the corresponding label vector, and denotes the number of labeled samples. Then, the mean of each partitioning in the seed set can be chosen as anchor points.

Specifically, the generated anchor points set in the -th view can be represented as , where can be obtained according to (2). Then, the similarity between data point and anchor point is defined aswhere is a distance function, such as distance. The truncated similarity matrix is defined in -th view based on a kernel function , and Gaussian kernel is usually adopted. The parameter can be set to 1 without loss of generality.

For multi-view clustering, there is a common assumption that it can increase clustering performance and stability via appropriately exploiting the consistent and complementary information between different views. Based on this assumption, how to seamlessly combine multiple views is crucial to the final clustering result. Considering the differences in the clustering quality of each view, we first calculate the clustering accuracy of each view through the prior information and then obtain the weights for different views, where the view with greater clustering accuracy has larger weight during information fusion, and similarly the view with less clustering accuracy has a smaller weight. More specifically, we utilize the semi-supervised -Means to acquire clustering result in the -th seed set , where anchor points set of the -th view is used to initialize semi-supervised -Means. Note that and are the cluster labels and the ground-truth labels of the seed set , respectively, and then we calculate the clustering accuracy of each view by (17) in the seed set . Furthermore, to ensure that the view with greater clustering accuracy has a larger weight, we apply the softmax function to acquire the weights for different views. The weights of the views can be represented bywhere is the non-negative normalized weight for the -th view and the sum of all elements of is 1, is the clustering accuracy for the -th view in the seed set , and is a scalar used to control the distribution of weights between different views.

The truncated similarity matrix can be obtained by (4), and then all truncated similarity matrices are integrated into a global truncated similarity matrix between all samples and anchors.

Once we obtain the matrix , the global fused similarity matrix between all samples can be approximated by an anchor graph [36].where is the diagonal matrix.

3.3.2. Spectral Analysis on Global Fused Similarity Matrix

To further simplify the clustering process, spectral clustering can be performed on the global fused similarity matrix . Specifically, the objective function of spectral clustering iswhere is the matrix trace operator, is the indicator matrix, and is the number of clusters. The Laplacian matrix is defined as in graph theory, where the degree matrix is written as a diagonal matrix with . We can obtain the indicator matrix that consists of eigenvectors corresponding to the largest eigenvalues by performing eigen decomposition on . However, the computational complexity is via performing eigen decomposition on , which leads to being not suitable for large-scale data.

Fortunately, according to [2, 37], is a double stochastic matrix. Thus, the degree matrix is an identity matrix , and the Laplacian matrix can be written as . To make the analysis simple, (8) is equivalent to the following equation:

Note that can be written as , where and . The Singular Value Decomposition (SVD) of can be formulated aswhere , , and are the left singular vector matrix, singular value matrix, and right singular vector matrix, respectively. Furthermore, and satisfy both and . Thus can be derived from (10) as

It is obvious that the column vectors of are the eigenvectors of the similarity matrix . To reduce the computational complexity, we prefer to conduct SVD on to acquire the desired rather than to directly perform eigenvalue decomposition on . Based on this, (10) is written as

Since and are the singular value matrix and right singular vector matrix of , respectively, we can perform eigen decomposition on a small matrix , resulting in eigenvector-eigenvalue pairs , where . We denote by a column-orthonormal matrix containing the eigen vectors and by a diagonal matrix storing the eigen values on the main diagonal. It is obvious thatwhere returns a diagonal matrix storing all the eigen values of . Thus, the singular value matrix can be derived as . Then, the final solution can be simplified aswhere is the indicator matrix. After that, we can perform semi-supervised -Means on to acquire the final results. The whole procedure of SMVC_WAGE for complete multi-view data is summarized in Algorithm 1.

Input:
(1)Given the complete multi-view data , where is the data matrix of the -th view.
(2)Given labeled samples covering clusters; the corresponding label vector .
(3)The number of clusters ; the trade-off parameter .
Output:
(1)Cluster label of each sample.
Procedure
(1)Initialize the trade-off parameter and the width parameter in Gaussian kernel function.
(2)Generate the seed set containing labeled samples in views, where denotes the seed set in the -th view with clusters.
(3)Generate anchor points set for each view by (2).
(4)Construct the truncated similarity matrix for each view by (4).
(5)Calculate clustering accuracy of each view by (1) and (17) in the seed set .
(6)Acquire the weight for different views by (5).
(7)Obtain global truncated similarity matrix by (6).
(8)Calculate global fused similarity matrix by (7).
(9)Derive the indicator matrix by (9)–(14).
(10)Perform semi-supervised -Means on the indicator matrix to acquire the final results.
3.4. The Proposed Method for Incomplete Multi-View Data

Our proposed method (SMVC_WAGE) can not only deal with complete multi-view clustering well, but also be easily extended to incomplete multi-view clustering. To simplify the incomplete multi-view case, we take three views as an example, which verifies that SMVC_WAGE can be straightforwardly extended to the scenarios of incomplete multi-view data.

Similar to the problem definition in Section 3.1, we still assume that the incomplete three-view data consists of samples. In order to make the discussion easy without losing generality, we follow [2] to adjust the original dataset to , where , , , , , , and denote the samples present in the three views, both view-1 and view-2, both view-1 and view-3, both view-2 and view-3, only view-1, only view-2, and only view-3, respectively. Similarly, is the number of samples described by the three views. denotes the number of samples shared by both view-1 and view-2; and have the same meaning. stands for the number of samples only existing in the -th view. The total number of samples is .

As stated in Section 3.3, the proposed method (SMVC_WAGE) for incomplete multi-view data mainly consists of two steps, i.e., construction of anchor-based global fused similarity matrix and spectral analysis of a global fused similarity matrix. Figure 1 shows the whole construction process of the global fused similarity matrix, and all possible cases are considered in incomplete three-view data, i.e., missing two views, missing one view, and missing no view.

It is very challenging to randomly choose from labeled data to generate anchor points in incomplete multi-view data, as some labeled samples miss one view or two views, and thus pairwise information may be unavailable. Fortunately, the common samples appearing in all views can help generate anchor points to solve the dilemma. Based on the above analysis, we assume that all labeled samples covering each cluster are included in common samples; then, we can obtain the seed set from the common samples with the label, where denotes the seed set present in all views, denotes the corresponding label vector, and represents the number of samples in the seed set. Then, as stated in Section 3.3.1, the generated anchor points set in the -th view can be represented as , where can be obtained according to (2).

As illustrated in the second column in Figure 1, we partition this incomplete three-view case into three scenarios. Specifically, we rearrange the samples according to the characteristics of each sample so that we can directly perform the anchor-based truncated similarity matrix construction method described in Section 3.3.1 on each scenario. Each scenario can be represented as a view and the view’s anchor points, where missing samples are removed. Taking the first scenario as an example, there are samples that appeared in view-1, and anchor points are generated from the seed set . Then, we construct an anchor-based truncated similarity matrix by (4). Similarly, we can analyze other scenarios.

To fuse the above truncated similarity matrices that appeared in three scenarios appropriately, we reorder them into aligned matrices, with rows and columns following the order of the original samples. To fully exploit the consistent and complementary information among different views, we make the view with high quality have a larger weight ratio in the common representation by employing the prior knowledge from multi-view data. More specifically, we first obtain the clustering accuracy of each view in the seed set and apply softmax function to acquire the weight for different views as mentioned in Section 3.3.1. Then, we obtain the global truncated similarity matrix according to the weighted combining scheme by (6). Finally, we acquire the global fused similarity matrix by (7) as Figure 1 shows.

According to Section 3.3.2, as a final step, spectral clustering is performed on the global fused similarity matrix to acquire a unified clustering result. The whole procedure of SMVC_WAGE for incomplete three-view data is summarized in Algorithm 2.

Input:
(1)Given the incomplete three-view data , where is the data matrix of the -th view.
(2)Given labeled samples appearing in all views and covering clusters; the corresponding label vector .
(3)The number of clusters ; the trade-off parameter .
Output:
(1)Cluster label of each sample.
Procedure
(1)Initialize trade-off parameter and width parameter in Gaussian kernel function.
(2)Adjust original data to .
(3)Generate seed set , which contains labeled samples appearing in three views, where denotes the seed set in the -th view with K clusters.
(4)Generate anchor points set for each view by (2).
(5)Remove missing samples for each view as illustrated in the second column in Figure 1.
(6)Construct truncated similarity matrix for each view by (4) as illustrated in Figure 1.
(7)Reorder into aligned matrix with rows and columns following the order of original samples, where as illustrated in the fourth column in Figure 1.
(8)Calculate clustering accuracy for each view by (1) and (17) in the seed set .
(9)Acquire weight for different views by (5).
(10)Obtain global truncated similarity matrix by (6).
(11)Calculate global fused similarity matrix by (7).
(12)Derive the indicator matrix by (9)–(14).
(13)Perform semi-supervised -Means on the indicator matrix to acquire the final results.
3.5. Theoretical Analysis of the Proposed Algorithm

In this section, we provide a brief theoretical analysis of the proposed algorithm, containing computational complexity analysis and convergence analysis.

3.5.1. Computational Complexity Analysis

The computational complexity of the proposed algorithm mainly consists of five parts, i.e., calculating , , , , and the final clustering results. In Algorithm 1, the corresponding computation is in steps 3, 4, 6, 9, and 10, where the number of anchor points and clusters, expressed in , is equal for each view. Specifically, computation complexity of these steps is summarized as follows:(1)Obtaining anchor points set containing K anchor points requires (2)Obtaining the truncated similarity matrix requires according to (4)(3)Obtaining the weight requires according to (5) and (17), where and are the iterative number and the number of labeled samples, respectively, when performing semi-supervised -Means for each view(4)Obtaining the indicator matrix requires by performing eigen decomposition on according to (13)(5)Obtaining the final clustering results requires by performing semi-supervised -Means, where t is the iterative number

Therefore, the total main computational complexity of Algorithm 1 is

Note that the dataset’s view number , clusters or anchor points number and , and the number of labeled samples depends on the samples number and the percentage of labeled data . Since we exploit semi-supervised -Means to obtain the clustering result, and t are usually small [38].

Compared with Algorithm 1, the main difference of Algorithm 2 is to deal with incomplete multi-view data. Therefore, similar to the Algorithm 1, the total main computational complexity of Algorithm 2 iswhere denotes the number of non-missing samples in the -th view.

According to the above analysis, in order to further simplify the representation, the overall computational complexity of SMVC_WAGE is , where . In addition, the experimental results of running time have also proven the computational advantages of SMVC_WAGE.

3.5.2. Convergence Analysis

Firstly, the whole procedure of SMVC_WAGE just exploits the semi-supervised -Means to calculate the optimal clustering result in an iterative manner, where the strong convergence property of semi-supervised -Means has been proven in [38, 39]. Secondly, by calculating (13) performing eigen decomposition, indicator matrix can obtain the global optimal solution [9]. Thirdly, the experimental result of convergence study can also demonstrate the strong convergence of SMVC_WAGE. In summary, the proposed method has good convergence property.

4. Experiments

In this section, extensive experiments are performed to evaluate the performance of our method (SMVC_WAGE). Firstly, we describe six multi-view datasets used in the experiment. Secondly, we introduce the comparative methods and evaluation metrics. Ultimately, the comparison results show the proposed method's effectiveness and efficiency.

4.1. Datasets Description

Six real-world multi-view datasets are adopted to validate our method. Among these datasets, the first two are text datasets, and the other four are image datasets. They are widely used benchmark datasets. The descriptions of these datasets are given below, and some important statistical information is presented in Table 1.(1)Cornell (http://lig-membres.imag.fr/grimal/data.html): this text dataset is one of the popular WebKB datasets [3, 26]. It includes 195 documents with more than 5 labels: student, project, course, staff, and faculty, where each document is characterized by two views: the citation view and the content view, i.e., 195 citation features and 1703 content features.(2)3Sources (http://erdos.ucd.ie/datasets/3sources.html): this text dataset is naturally an incomplete multi-view dataset [2] and is collected from three well-known online news sources: BBC, Reuters, and The Guardian. In total, it contains 948 news articles covering 416 distinct news stories, which are categorized into six topical labels: business, entertainment, health, politics, sport, and technology. Among these distinct stories, 53 appear in a single news source, 194 are in two sources, and 169 are reported in all three sources.(3)UCI Handwritten Digit (http://archive.ics.uci.edu/ml/datasets/Multiple+Features): this image dataset consists of 2000 samples of hand-written numerals (0–9) extracted from Dutch utility maps. Each class has 200 samples. There are six different types of features which can be used for performing multi-view learning, that is, 76 Fourier coefficients of the character shapes, 216 profile correlations, 64 Karhunen–Loève coefficients, 240 pixel averages in 2 × 3 windows, 47 Zernike moments, and 6 morphological features [31].(4)ORL (http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html): this image dataset contains 400 images of 40 distinct individuals with 10 different images which were taken at different times, varying the lighting, facial expressions, and facial details [40]. Following experiments in [41], we used three feature sets: 4096 dimension intensity feature, 3304 dimension LBP feature, and 6750 dimension Gabor feature.(5)NUS-WIDE-OBJECT (NUS) (https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html): this image dataset is a real-world web image dataset. There are 31 object categories and 30000 images in total [42]. In our experiments, 7 categories of the animal concept are selected. They are bear, cow, elk, fox, horses, tiger, and zebra. Each image can be represented by five public available low-level features on its homepage website: 64 dimension color histogram (CH), 225 dimension color moments (CM), 144 dimension color correlation (CORR), 73 dimension edge distribution (ED), and 128 wavelet texture (WT).(6)MSRC-v1 (https://www.microsoft.com/en-us/research/project/image-understanding/): this image dataset contains 240 images in eight categories, and each category has 30 images [43]. Following experiments in [29], we select seven categories: tree, building, airplane, cow, face, car, and bicycle. Each image is represented by five features: 24 dimension Color Moment (CM), 576 dimension Histogram of Oriented Gradient (HOG), 512 dimension GIST, 256 dimension Local Binary Pattern (LBP), and 254 Centrist features (CENTRIST).

4.2. Compared Methods and Experimental Settings

Our proposed method solves the problem of complete and incomplete multi-view clustering. Thus, to prove the efficiency and effectiveness of this framework, we choose Spectral Clustering [44] and three multi-view methods to compare the performance of complete multi-view clustering: MVSC [6], AMGL [29], and MLAN [45]. Similarly, we compare the Spectral Clustering [44], PVC [21], IMG [22], DAIMC [24], IMSC_AGL [26], and APMC [2] for incomplete multi-view clustering. We denote the proposed method as SMVC_WAGE. The description of these methods is given as follows:(1)SC: we perform Spectral Clustering (SC) [44] on all views independently as the baseline.(2)SC (concat): we firstly concatenate all views into long dimension features and then run Spectral Clustering [44] to acquire the result.(3)MVSC: Multi-View Spectral Clustering (MVSC) [6] constructs a bipartite graph and then uses local manifold fusion to integrate the graph of each view into a fused graph. Finally, Spectral Clustering is performed on the fused graph to obtain the result.(4)AMGL: Auto-weighted Multiple Graph Learning (AMGL) [29] is a Spectral Clustering-based method and is easily extended to semi-supervised multi-view clustering. It automatically learns a set of optimal weights without any parameters.(5)MLAN: Multi-view Learning with Adaptive Neighbors (MLAN) [45] is a graph-based multi-view learning model and calculates the ideal weights automatically after finite iterations. It can perform local manifold structure learning and semi-supervised clustering simultaneously.(6)PVC: Partial multi-View Clustering (PVC) [21] works based on non-negative matrix factorization to acquire a consistent representation. Lastly, -Means is performed on the consistent representation to acquire the result.(7)IMG: Incomplete Multi-modal Grouping (IMG) [22] utilizes matrix factorization techniques to obtain the consistent representation. It learns the compact global structure from the latent consistent representation, and lastly -Means is performed on the consistent representation to acquire the result.(8)DAIMC: Doubly Aligned Incomplete Multi-view Clustering (DAIMC) algorithm [24] learns a latent consistent representation from all views via integrating weighted semi-NMF and -norm regularized regression. Lastly, -Means is performed on the latent consistent representation to acquire the result.(9)IMSC_AGL: Incomplete Multi-view Spectral Clustering with Adaptive Graph Learning (IMSC_AGL) [26] integrates the spectral clustering and adaptive graph learning technique to obtain the latent consistent representation from all views. Lastly, it partitions the samples into their respective groups via -Means clustering.(10)APMC: Anchor-based Partial Multi-view Clustering (APMC) [2] utilizes anchors to integrate intra- and inter-view similarity matrices, and then Spectral Clustering is performed on the fused similarity matrix to acquire the unified result.

For comparison methods, the source codes are available from the authors’ websites. Since the 3Sources dataset is a naturally incomplete multi-view dataset, we utilize it for incomplete multi-view clustering and conduct complete multi-view clustering on the other datasets. We select the best two views from the 3Sources dataset as the input of PVC and IMG, because they cannot work on more than two-view scenario. Since SC cannot directly deal with incomplete multi-view data, we first populate the missing information with the mean of the feature values in the corresponding view. Empirically, the number of nearest neighbors accounts for 10% of the dataset size. Since all the comparison methods conduct -Means clustering on the latent consistent representation, we set the maximum number of iterations to 1000 for -Means clustering. Considering the limitation of the comparison methods, we firstly learn a latent consistent representation of the raw data and then use labeled data to generate seed clusters that are utilized to initialize the cluster centroids of semi-supervised -Means. Furthermore, to make the experiments more conclusive and fair, the parameters of each method are initialized, being corresponding to the paper’s report, and present the final result of SMVC_WAGE with the trade-off parameter and the width parameter in Gaussian kernel function. In terms of semi-supervised clustering, for all datasets, we randomly choose a small proportion as labeled data in each category, where the proportion is denoted by . To randomize the experiment, we run each method 20 times with different random initialization to record the mean performance as well as the standard deviations in all experiments. Due to different parameter ranges and preprocessing, some of the results may be inconsistent with the published information.

4.3. Evaluation Metrics

There are many evaluation metrics for assessing the clustering performance [46]. In our experiments, we choose three evaluation metrics, namely, Clustering Accuracy (ACC), Normalized Mutual Information (NMI), and Purity, to conduct a comprehensive evaluation. These evaluation metrics can be calculated in a certain framework through the clustering result and the ground-truth of the dataset.

The first evaluation metric is ACC, usually defined as follows:where means the number of samples, means the ground-truth label of the -th sample, means the corresponding cluster label calculated, means the Dirac delta function:and is the optimal mapping function that arranges the cluster labels to match the ground-truth labels via the Kuhn–Munkres algorithm [47].

The second evaluation metric is NMI, which integrates mutual information and entropy. NMI is formulated as follows:where denotes the mutual information between and , and returns the entropy.

Let be the number of samples in cluster which is acquired via performing clustering methods, and be the number of samples belonging to cluster with the ground-truth label. Then, NMI is rewritten aswhere means the number of samples in the intersection between and .

The third evaluation metric is Purity which measures the effectiveness of clustering by calculating the percentage of correct labels. Purity is defined by

For the three evaluation metrics, a higher value indicates a better performance. The readers can refer to [48] to get more details about their definitions.

4.4. Experimental Results and Analysis
4.4.1. Complete Multi-View Clustering Results

To explore the effectiveness of our method, these complete multi-view methods are performed on five complete multi-view datasets with different percentages of labeled data, where the experimental results are enumerated in Tables 26 in the form of ACC, NMI, and Purity. Through the analysis of these tables, we can get some observations as follows: (1)From Tables 26, we can see that the clustering performances are quite different in single-view clustering scenarios for all multi-view datasets. This is mainly because each view has a difference in the feature scales and distributions. The experimental results also imply that it is necessary to research how to appropriately combine multiple views to enhance the clustering performance.(2)From Tables 26 and Figures 2(a)2(e), we can find that the proposed SMVC_WAGE can obtain much better results than the best single view and concat for all scenarios. Meanwhile, we can see that concat performs the worst in most instances, mainly because directly concatenating views into a long view may lead to redundant information, resulting in poor clustering results. Thus, these experiment results demonstrate that clustering performance can be effectively improved via properly exploiting the consistent and complementary information to learn a common representation.(3)From Tables 26, we can see that the proposed SMVC_WAGE outperforms all competitors such as MVSC, AMGL, and MLAN while dealing with most of the complete multi-view clustering. This is mainly because SMVC_WAGE can not only fully exploit the intrinsic consistency and extrinsic complementary information across different views, but also make the high-quality single view has a larger weight ratio in the common representation by utilizing the prior information in the multi-view data. These experimental results prove that our method is effective in complete multi-view clustering.(4)From Tables 26, we observe that the performance of the above methods first rises to high value and then maintain slight variation as the number of labeled data increases. For the proposed SMVC_WAGE, with 30% or 40% labeled data, the method always obtains the best result. Meanwhile, with 10% or 20% labeled data, our method obtains slightly worse results. The main reason is that our method heavily depends on how to construct the graph through prior information. Thus, we cannot generate the structure of the graph optimally when there is less labeled data, leading to slightly worse results.

4.4.2. Incomplete Multi-View Clustering Results

To explore the effectiveness of the presented SMVC_WAGE in dealing with the incomplete multi-view data, we conduct experiments on the naturally incomplete 3Sources dataset, where the missing rate of each view is 16%, 28%, and 30%, respectively. The results are recorded in Table 7 and Figure 2(f). Similar to the complete multi-view clustering, the above comparison results show that the performance of the proposed SMVC_WAGE is significantly superior to all the compared methods on the 3Sources dataset with a different percent of labeled data. Thus, our method can deal with incomplete multi-view clustering well.

The above experimental results on Cornell, UCI Handwritten Digit, ORL, NUS-WIDE-OBJECT, MSRC-v1, and 3Sources have well proven that the presented SMVC_WAGE outperforms most algorithms in terms of clustering ability. The main reason is that SMVC_WAGE firstly introduces an effective and simple anchor strategy that can bridge all samples and capture more reliable nonlinear relations to deal with both complete and incomplete multi-view data. Besides, it exploits the intrinsic consistency and extrinsic complementary information to learn a structured optimal fused graph in a semi-supervised clustering weighting manner, which can greatly enhance efficiency and improve stableness.

4.5. Running Time

The running time was recorded to compare the computational complexity of the methods on all datasets. From Table 8, it is clear that the proposed SMVC_WAGE has the shortest running time in almost all datasets except MSRC-v1. Meanwhile, as shown in Figure 3, in which the original data is from Table 8, we see that the running time of SMVC_WAGE is many times smaller than the above-mentioned multi-view clustering algorithms on all datasets, especially the UCI Handwritten Digit, ORL, NUS-WIDE-OBJECT, and 3Sources dataset. This is mainly because these datasets have a relatively large number of views and samples, and the data quality of each view varies greatly. In summary, the experimental results have fully proven the computational advantages of SMVC_WAGE.

4.6. Parameter Sensitivity Analysis

Our proposed SMVC_WAGE has only one hyperparameter , which trades off the weight of each view. In the following, the parameter analytical experiments are performed on each dataset to reveal the effect of this parameter. We first set the percentage of labeled data from 10% to 40% as mentioned before; then, we explore the ACC of SMVC_WAGE by ranging the within and record the average performances. As shown in Figures 4(a) and 4(d), we observe that, with increasing from 0.01 to 100, the mean of ACC with fixed first increases to high value and then decreases. Regarding Figures 4(b)4(f), similarly, we observe that the result of SMVC_WAGE increases first and then maintains slight variation. Therefore, SMVC_WAGE can obtain a stable great performance across a wide range of . Obviously, the performance keeps optimal in . These experiments have fully demonstrated that SMVC_WAGE is not so sensitive to the variation of the hyperparameter in the final results.

4.7. Convergence Study

To investigate the convergence empirically, we record ACC of SMVC_WAGE in every iteration on each dataset where we set the percentage of labeled data and the hyperparameter , respectively. For a full iteration, SMVC_WAGE firstly calculates the clustering accuracy for all views via performing semi-supervised -Means in order to obtain the global fused similarity matrix. In this process, the final ACC is not calculated, but it will consume some time. Without loss of generality, we will use semi-supervised -Means as an iteration each time, while recording the final ACC. We plot ACC in Figure 5. For each subfigure, we can see that the value of the ACC is zero in the first multiple iterations at the beginning because our algorithm uses the prior information of the data, and after a finite number of iterations, the ACC begins to increase and gradually stabilize. Moreover, it reveals that SMVC_WAGE usually converges within 50 iterations for all datasets, which empirically proves the high efficiency of our algorithm.

5. Conclusion

In this paper, a new semi-supervised multi-view clustering framework is developed, which is conceptually simple and efficiently generates high-quality clustering results in practice. Specifically, our method introduces a simple and effective anchor strategy that exploits the intrinsic and extrinsic view information to bridge all samples and capture more reliable nonlinear relations, which can greatly enhance efficiency and improve stableness. Besides, this can also solve the dilemma that samples sharing no common views cannot be directly used for computing cross-view similarities. Meanwhile, instead of regularizing or weighting the loss of each view in a conventional way, the proposed method constructs the global fused graph that spans multiple views compatibly in the structure via a parameter-free graph fusion mechanism which directly coalesces the view-wise graphs. To this end, the proposed method can not only deal with complete multi-view clustering well, but also be easily extended to the incomplete multi-view instance. Experimental results on six widely used real-world datasets clearly show that our proposed algorithm is superior to some state-of-the-art competitors in clustering ability and time cost.

When handling incomplete multi-view clustering, we found that the main limitation of this approach may be that anchor points can only be generated from common samples appearing in all views, which remains to be further studied.

Data Availability

Six publicly available benchmark multi-view datasets are utilized: the Cornell dataset, 3Sources dataset, UCI Handwritten Digit dataset, ORL dataset, NUS-WIDE-OBJECT dataset, and MSRC-v1 dataset. All the multi-view datasets' homepages are listed in this paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the Joint Fund of the National Natural Science Foundation of China and Guangdong Province under grant no. U1701266, in part by the Natural Science Foundation of Guangdong Province under grant no. 2018A030313751, in part by the Guangdong Provincial Key Laboratory of Intellectual Property and Big Data under grant no. 2018B030322016, in part by the Special Projects for Key Fields in Higher Education of Guangdong Province under grant no. 2020ZDZX3077, in part by the Science and Technology Project of Guangzhou City under grant no. 202002020035, and in part by the National Natural Science Foundation of China under grant no. 61906219.