Abstract

Generally, multimodality data contain different potential information available and are capable of providing an enhanced analytical result compared to monosource data. The way to combine the data plays a crucial role in multimodality data analysis which is worth investigating. Multimodality clustering, which seeks a partition of the data in multiple views, has attracted considerable attention, for example, robust multiview spectral clustering (RMSC) explicitly handles the possible noise in the transition probability matrices associated with different views. Spectral clustering algorithm embeds the input data into a low-dimensional representation by dividing the clustering problem into subproblems, and the corresponding eigenvalue reflects the loss of each subproblem. So, the eigenvalues of the Laplacian matrix should be treated differently, while RMSC regularizes each singular value equally when recovering the low-rank matrix. In this paper, we propose a multimodality clustering algorithm which recovers the low-rank matrix by weighted nuclear norm minimization. We also propose a method to evaluate the weight vector by learning a shared low-rank matrix. In our experiments, we use several real-world datasets to test our method, and experimental results show that the proposed method has a better performance than other baselines.

1. Introduction

Clustering, a task of partitioning data points into multiple clusters, is a fundamental research problem in data mining and machine intelligence. A series of algorithms have been proposed over the past decades [17]. One of the representative methods is spectral clustering, which has a lot of applications [811]. With the development of information and communication technologies, which led to data production in most areas, it is relatively easy to capture features from a given subject. So, it is necessary to design new pattern recognition methods to deal with views of the same subjects. For example, in multilingual information retrieval, the same document can be represented by different languages, and each language can be regarded as a view. These individual views can provide complementary information to each other which can lead to improved performance on the learning task. In the context of multimodality clustering, it seeks to get a better clustering performance by leveraging the information from multiple views.

Many multimodality clustering methods have been proposed in recent years. In general, there are three steps when clustering multiple data [12]:(1)Obtain a similarity matrix from each view (2)Compute a projection of each similarity matrix into a space suitable for clustering(3)Produce a clustering assignment (i.e., K-means)

The main difference between the multimodality clustering methods lies in the step where the information is collapsed to produce a single new representation. The first category (information merges in Step 1) merges to get a new similarity matrix. The method presented in [13] is a Markov chain method for the generalized normalized cut on multimodality data; the method described in [14] uses the philosophy of co-regularization to make the clustering in different views agree with each other, as described in [15]; RMSC is a Markov-chain-based multimodality spectral clustering method via low-rank and sparse decomposition. The second category (information merges in Step 2) of methods merges the information to generate a compatible projection for all views. In [16], the authors used canonical correlation analysis to maximize the correlation of subjects across the projected views. In the third step, spectral clustering produces the assignment by K-means. The assignment is not stable for the randomness of K-means, so the third category learns a stable assignment. For example, ensemble clustering [17, 18] methods are designed to find a stable assignment.

The standard nuclear norm minimization regularizes each singular value equally to pursue the convexity of the loss function, while the singular values have different meanings and should be treated differently. Gu [19] proposed a weighted nuclear norm method and applied it to image denoising. The weight vector is evaluated by the singular values of image patches and the noise variance, but it is not useful for multimodality clustering.

In [15], we presented a Markov-chain-based multimodality spectral clustering method via low-rank and sparse decomposition. In this paper, as shown in Figure 1, we extend our previous study by applying the weighted nuclear norm to multimodality clustering and propose a method to evaluate the weight vector. The difference between them is that RMSC recovers the low-rank matrix by solving a nuclear norm minimization (NNM) problem, while the proposed method recovers that by solving a weighted nuclear norm minimization (WNNM) problem. For the experiments, we use several real-world datasets to test our method. Experimental results show that the proposed method has a better performance than other baselines.

This paper is organized as follows. Section 2 briefly describes the related work from which our method is based on. Section 3 describes the reason for using WNNM on multimodality clustering, defines our algorithm, and presents the optimization procedure. Section 4 presents the results of our method and other multimodality clustering methods. Section 5 outlines the main contributions of the work presented in this paper.

To make this paper clear, Table 1 summarizes the symbols used in this paper.

2.1. Spectral Clustering

Finding good clusters has been a focus of considerable research in pattern recognition. Spectral clustering applies the spectral graph theory [20] which gives the conditions where a graph can be divided into several nonconnected subgraphs. The method embeds the input data into a low-dimensional representation and then applies K-means.

Here we give the framework of the spectral clustering algorithm [8, 21, 22] (Algorithm 1).

Input:
(1)Construct the similarity matrix by Gaussian kernel, where represents the similarity of the th sample and the th sample.
(2)Compute the normalized symmetrical Laplacian , where D is a diagonal matrix with .
(3)Let U be a matrix with columns representing the top eigenvectors of .
(4)Normalize each row of U.
(5)Run the k-means algorithm on U.
Output: the result of k-means.
2.2. Robust Multimodality Spectral Clustering via Low-Rank and Sparse Decomposition (RMSC)

Consider a set of multimodality data with , where is the number of views, is the number of data points, represents the feature dimension of the -th view, and the -th column in represents the features of the -th data point in the -th view . The first step of RMSC is using Gaussian kernels to define the similarity matrix, i.e., where denotes the norm and denotes the standard deviation (e.g., one can set to be the average Euclidean distance over all pairs of data points). The second step is to construct the transition matrix by where is a diagonal matrix with . Under the low-rank and sparse assumptions, they formulate the transition matrix construction problem aswhere the norm is the number of nonzero elements in , represents the rank of , is a vector with all ones, and is a trade-off parameter. Note that the constraints enforce to be a transition probability matrix, i.e., each of its rows is a probability distribution.

As the problem is nonconvex, they replace with the trace norm , and with the norm , resulting in the following convex optimization problem:

The norm is well known to be a convex surrogate of . Then, they propose an optimization procedure to solve this problem via the augmented Lagrangian multiplier (ALM) scheme, which has shown its good balance between efficiency and accuracy in many matrix learning problems.

Let ( are in a nonascending order) represent the -th singular value . When updating , the subproblem is

Let be the SVD form of , and the solution is as follows:where is the shrinkage operator.

In the optimization procedure, each single value adds or subtracts the same value. So, RMSC treats each singular value equally, which may degrade the performance of the result of clustering.

2.3. Weighted Nuclear Norm

Gu et al. [19] studied the weighted nuclear norm minimization (WNNM) problem, where the singular values are assigned different weights. The definition of weighted nuclear norm of a matrix is as follows:

They analyzed the solutions of the WNNM problem under different weight conditions and proposed a method to evaluate the weight vector according to many image patches when applied the WNNM algorithm to image denoising. The difference between WNNM and our method is as follows: (1) we extend the weighted nuclear norm to multimodality clustering; (2) the methods which evaluated the weight vector were different; the former evaluates the weight vector according to image patches, while our method evaluates that by matrix decomposition.

3. The Proposed Method: Weighted RMSC

In this section, we present how to apply the weighted nuclear norm to multimodality clustering.

As described in Section 2.2, RMSC treats each singular value when updating , while for spectral clustering, different eigenvalues of have different meaning.

According to [23], the RatioCut object function is defined aswhere Tr (.) denotes the trace of a matrix and represents the -th column of .

So, the RatioCut object function is defined as

According to Rayleigh–Ritz theorem [24] the problem has a fixed solution and is constructed by the top eigenvectors of . From equation (6), we can find that the normalized spectral clustering divides the problem into subproblems. Each subproblem partitions the points into 2 clusters, and is the solution of the subproblem ( is an all-one vector, which represents it dividing all the points into the same cluster; this partition is useless).

As , the RatioCut object function can be rewritten as

So, we can find that the loss of each subproblem has a relation with the corresponding eigenvalue; the small eigenvalue reflects the little loss of the subproblem, so the smaller the eigenvalue is, the larger the weight of corresponding eigenvector should be assigned. It is also known that

So, the larger eigenvalues (or singular values, ) of are more important than the smaller ones when updating in RMSC; the larger the eigenvalues, the less they should be shrunk. Therefore, the weight assigned to should be inversely proportional to . We letwhere is a constant; is to avoid dividing by zero.

For multimodality clustering, we can construct Laplacian matrices, leading to groups of , which are not equal. So evaluating accurate is a challenging procedure, leading to the difficulty to make sure the weight of singular value. As we know, the output of RMSC is a shared low-rank matrix, and all the views share the same singular values. So, one way to evaluate the singular values is making use of other multimodality clustering algorithms, such as RMSC, to get the shared Laplacian matrix and then evaluating the final by the singular values of the shared Laplacian matrix.

Following RMSC, under the low-rank and sparse assumptions, we formulate the transition matrix construction problem as follows:

The optimization problem (11) is still challenging because the matrix has two constraints. We introduce an auxiliary variable to solve this problem. The optimization problem (11) becomes as follows:

The corresponding augmented Lagrange function of (12) iswhere , represent the Lagrange multipliers, denotes the inner product of matrices (i.e., for two matrices and , ), and is an adaptive penalty parameter.

The sketch of the proposed algorithm is shown in Algorithm 2. Next we will present the update rules for each of , , and .

Input: .
Initialize: , .
Evaluate W by running RMSC.
Repeat
(1)Let .
(2)For j = 1, 2, , n
 Update
(3)For i = 1, 2, , m
 Update via equation (16).
(4)Update via equation (15).
(5)Set .
(6)For i = 1, 2, , m
 Set .
(7)Set .
Until.
Output: , .

When other variables are fixed, the subproblem with respect to is

More specifically, let be the SVD form of . We use RMSC to evaluate the final and use it to evaluate via Equation (10). According to [19], the solution to (14) is as follows:

The subproblem with respect to can be simplified aswhich has a closed form solution .

With other variables being fixed, we update by solving

The solution is given by RMSC, which can be decomposed into independent subproblems. Each subproblem is a proximal operator problem with probabilistic simplex constraint, which can be efficiently solved by the projection algorithm.

4. Experimental Setup

The proposed method was tested on several real-world datasets; the details are shown in Table 2.

In all the experiments, we use six metrics to measure the clustering performances: F-score, precision, recall, normalized mutual information (NMI) [25], entropy, and adjusted rand index (Adj-RI) [26]. Note that higher values indicate better performance except for entropy.

When evaluating the weight vector, there is a constant parameter . We set in all the experiments. Similarity matrices are constructed by Gaussian kernels. is set to the median of the Euclidean distance between every pair of data points for all of the datasets except BBCSports . is set to be 0.005.

5. Experimental Results

We chose the following six multimodality clustering algorithms as baselines:(1)Single view: performing spectral clustering on a single view.(2)Feature concatenation: concatenating all the features of each view and then performing spectral clustering on the new representation.(3)Kernel addition: constructing the similarity matrices from each view and then averaging all the matrices to obtain a new similarity matrix.(4)Mixture of Markov chains (MMC): a mixture of Markov chains defined on each view [13].(5)Co-regularized spectral clustering (Co-Reg): making use of the philosophy of co-regularization to make the clustering in different views agree with each other [14].(6)Robust multiview spectral clustering via low-rank and sparse decomposition (RMSC): a Markov-chain-based multimodality spectral clustering method via low-rank and sparse decomposition [15].

Following the settings in [14], we use the Gaussian kernel to construct similarity matrix for each view if needed in all algorithms.

Table 3 shows the results of the proposed method and the baselines on BBCSports. As can be seen, the proposed method shows superior performance gains over the baselines with respect to all the six metrics. Here are some statistics: the results of our method indicate a relative increase of , , , and with respect to F-score, precision, NMI, and Adj-RI, respectively, compared to the corresponding second best baseline.

Table 4 shows the results of the proposed method and the baselines on UCI. As can be seen, the proposed method shows superior performance gains over the baselines with respect to all the six metrics. Here are some statistics: the results of our method indicate a relative increase of , , , and with respect to F-score, precision, NMI, and Adj-RI, respectively, compared to the corresponding second best baseline.

Table 5 shows the results of the proposed method and the baselines on WebKb. As can be seen, the proposed method shows superior performance gains over the baselines with respect to most of the six metrics. Here are some statistics: the results of our method indicate a relative increase of , , and with respect to F-score, NMI, and Adj-RI, respectively, compared to the corresponding second best baseline. Although the precision value of the proposed method is lower than that of kernel addition, the difference is small.

Table 6 shows the results of the proposed method and the baselines on Reuters. As can be seen, the proposed method shows superior performance gains over the baselines with respect to all the six metrics. Here are some statistics: the results of our method indicate a relative increase of , , , and with respect to F-score, precision, NMI, and Adj-RI, respectively, compared to the corresponding second best baseline. Although the recall value of the proposed method is lower than that of kernel addition and feature concatenation, the difference is small.

6. Conclusion

With the development of information and communication technologies, it is necessary to design new pattern recognition methods to deal with views of the same subjects. It is a challenge task to deal with multimodality problems. Inspired by the previous work, we proposed a method applying the weighted nuclear norm to RMSC and gave a method to evaluate the weight vector, which distinguishes different single values. To solve the optimization problem, we designed a procedure based on ALM. To evaluate the proposed method, we apply it to four real-world datasets. Experimental results show that the proposed method has a superior performance than other baselines. In the future, we will continue the studies in multimodality clustering, including evaluating the weight vector more accurately and clustering on the large-scale datasets.

Data Availability

The BBCSports dataset used to support the findings of this study has been deposited in the UCD repository (http://mlg.ucd.ie/datasets/bbc.html) and the other datasets have been deposited in the UCI repository (http://archive.ics.uci.edu/ml/index.php).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the Cultivating Science Foundation of Taizhou University (2019PY014), the Agricultural Science and Technology Project of Taizhou (20ny13), the Zhejiang Provincial Natural Science Foundation of China (LQ21F020001 and LQ21A010001), and Taizhou Science and Technology Project (1901gy20).