Abstract

Many real-world datasets are described by multiple views, which can provide complementary information to each other. Synthesizing multiview features for data representation can lead to more comprehensive data description for clustering task. However, it is often difficult to preserve the locally real structure in each view and reconcile the noises and outliers among views. In this paper, instead of seeking for the common representation among views, a novel robust neighboring constraint nonnegative matrix factorization (rNNMF) is proposed to learn the neighbor structure representation in each view, and L2,1-norm-based loss function is designed to improve its robustness against noises and outliers. Then, a final comprehensive representation of data was integrated with those representations of multiviews. Finally, a neighboring similarity graph was learned and the graph cut method was used to partition data into its underlying clusters. Experimental results on several real-world datasets have shown that our model achieves more accurate performance in multiview clustering compared to existing state-of-the-art methods.

1. Introduction

Clustering is a fundamental topic in machine learning and data mining tasks. Datasets often are comprised of different views, and the views often provide compatible and complementary information in real world. Thus, multiview clustering (MVC) aims to integrate those different views and uncover the consistent latent information to achieve perfect clustering performance [1]. Over the past decades, it has attracted great attention [2, 3] and has been widely used in various real applications [4].

Essentially, given the multiview inputs, the critical work in MVC is to fuse information of different views and learn the common agreement for clustering. For efficiently integrating views, many subspace clustering-based methods [5, 6] and nonnegative matrix factorization- (NMF-) [7, 8] based methods have been developed. In particular, NMF is shown to be equivalent to relaxed k-means and symmetric NMF is closely related to spectral clustering [9]. However, NMF cannot preserve the geometrical structure of the data space, which is essential for the algorithms to find the true cluster structures. Many manifold learning methods, which are motivated by the so-called locally invariant idea that the nearby points are likely to have similar embedding, have been proposed, such as locally linear embedding (LLE) [10] and locality preserving projection (LPP) [11]. In particular, Cai et al. [12] proposed graph-regularized nonnegative matrix factorization (GNMF) to find a compact representation which can uncover the hidden semantics and simultaneously respect the intrinsic geometric structure. It is well accepted that the clustering performance can be significantly enhanced when the local invariance is considered. These are single-view clustering methods.

On the other hand, many NMF-based MVC methods [13] have attracted attention, in which various constraints have been applied to the coefficient matrix to cluster the data points. Multi-NMF [14] formulated a joint multiview NMF learning process with the constraint that encourages representation of each view toward a common consensus. Many extensions of multi-NMF methods were proposed for image clustering and other tasks [15]. In [16], two weight matrices are introduced to alleviate the issue of dataset imbalance in real applications. Ou et al. [17] explored the local geometric structure for each view under the patch alignment framework and adopted correntropy-induced metric to measure the reconstruction error of each view to improve the robustness. A deep matrix factorization model [18] aimed to seek a common representation by introducing graph regularization to guide shared representation learning in the final layer of each view. However, existing approaches are all used to exploit common information shared by multiple views but neglect the diversity among views. The diversity means that each view of the data contains some distinct information that other views do not have.

In this paper, we propose a novel MVC method, with a novel algorithm, called robust neighboring constraint NMF (rNNMF), which uses the locally neighboring structure of each view to capture the diversity features. In rNNMF, a neighboring graph is constructed and updated for each view in factorization process to obtain the underlying diversity features. Finally, these diversity features will be combined to create an integrated feature for datasets, and then, a global graph is further generated from this integrated feature and Ncut is used to partition data into its underlying groups.

In summary, the novelty and contribution of our research are as follows:(1)A neighboring constraint NMF method is proposed to learn the diversity representation of data in each view. The proposed model only keeps the nearest relationship between a point and its nearest neighbor to maintain geometrical structure in feature learning in each view.(2)L2,1-norm loss function is used in rNNMF to improve the robustness of feature in each view and reduce the effect of noisy features.

The rest of this paper will be organized as follows. Section 2 will introduce some related work about NMF-based MVC algorithms. Our proposed robust NMF-based MVC model will be introduced in Sections 3, the experimental results will be introduced in Section 4, and conclusions and discussions will be introduced in Section 5.

Both subspace clustering and NMF-based methods are important in MVC. For example, a robust graph can be learned with correlation consensus agreement in [5] to improve the clustering performance. A multigraph regularized low-rank representation- (LRR-) based method is proposed to achieve the data correlation consensus among all views [6]. A structured LRR was proposed by factorizing into the latent low-dimensional data-cluster representations, which characterize the data clustering structure for each view [1]. Meanwhile, NMF-based methods [19] were also proved to be useful, which enforce the constraint that the elements of the factor matrices must be nonnegative. It shows that when the Frobenius norm is used as a divergence, NMF is equivalent to a relaxed form of K-means clustering method. However, NMF fails to discover the intrinsic geometry of the data, which is essential to the real applications. To preserve the locally geometrical structure of the data space, Cai et al. imposed graph regularization on NMF (GNMF). In [20], Shang et al. proposed graph dual regularization NMF (DNMF) which simultaneously considered the geometric structures of data manifold and feature manifold. Two subspace clustering algorithms were proposed in [21]. It established connection with spectral normalized cut [22] and ratio cut clustering. It also extended the nonlinear orthogonal NMF and introduced a graph regularization to obtain a factorization that respects a local geometric structure after the nonlinear mapping.

In MVC, NMF-based methods also have received increasing attention. Let input and be the -th view, then it is a matrix, where denotes the feature dimensionality in row and is the number of data in column. is the representation of the -th view, it is a matrix, where denotes the feature dimensionality and is the corresponding basis matrix. For NMF-based methods, the overall framework is as follows:where is the Frobenius norm, and both and should be nonnegative. By default, the input data for each view should also be nonnegative for the NMF-based methods, i.e., . is the regularization term to learn the agreement among different views. For example, MulNMF designed a constraint that encourages representation of each view toward a common consensus . DiNMF [23] introduced a constraint term to guarantee the diversity among points in different views. In order to deal with mixed-sign data, based on the semi-NMF model, a deep semi-NMF method couples the output representation in the final layer of factorization and enforces views that share the same representation after layer by layer factorization.

Although good performance can be achieved in those methods by finding a common agreement among views, the consensus information cannot be explored effectively and do not make full use of information of multiple views [18]. For full use of diversity information of views, combination of views’ representations is a natural method. Our work focuses on this kind of combination styles. However, original information contained among views can usually lead to poor performance in clustering. Therefore, it is necessary to design a new method which can not only get maximum preservation diversity feature of each view but also obtain the aggregated representation with good clustering performance.

3. Structural Constraint Semi-NMF

3.1. Robust Neighboring Constraint Regularization

Given is the low-dimensional representation of , we introduce a special matrix to indicate the constraint of the -th view; it is defined aswhere is the set of the nearest neighbors of point in the -th view. If point is one of the nearest neighbors of will be set as 1. We hope the difference between the points and is as small as possible, which can be represented by , where indicates the -th column of . So, we introduce constraint to describe this diversity of the point with its neighbors. The smaller the value of , the more similar they are. We introduce the L2,1 norm to penalize for seeking a representation in each view:

3.2. Objective Function and Optimization Algorithm

In MVC, to learn the neighbor information in each view, based on ORNMF [19] which is a robust representation approach, the proposed rNNMF can be expressed aswhere the L2,1-norm is applied to the loss function and defined as . With the error for each point not being squared, the impact of large errors is reduced significantly. The first term in equation (4) indicates the -th data fidelity term, and the second term is the nearest neighboring constraint term. is the -th representation from the -th view. is the positive parameter to specify the relative importance of the factorization term and regularization term in the model.

Like the most NMF-based methods, the objective function in (4) is not convex, so we present an iterative algorithm to achieve the local minima of (4).

Computing , to update with fixed, we need to solve the object function as follows:as and being NP-hard, a Lagrange multiplier matrix is introduced. Then, the Lagrangian function iswhere and are diagonal matrices, which are matrices, and the elements defined as

The partial derivative of Lagrangian function with respect to is computed as follows:

Because is mixed-sign data, we should decompose it into two nonnegative parts and , representing the positive part and the negative part, respectively,

Let , , , and . The updating rule of is formulated as follows:

Computing , to update with fixed, the following object function should be solved:

This is similar as that in [19, 24]. So we have the updating rule aswhere indicates the Hadamard product. For each view, the updating rule of and satisfies the theorem in [24], which guarantees the correctness of the rule. The correctness analysis and convergence proof of (10) and (12) are shown based on the method [19] in the following section.

3.3. Correctness and Convergence

Theorem 1. If the updating rule of converges, then the final solution satisfies the KKT optimality condition.

Proof. At convergence, , where t denotes the t-th iteration, and the following formula holds:

We now prove the convergence of the updating rule (10) using the auxiliary function approach in [19]. The definition of the auxiliary function is as follows.

Definition 1. is an auxiliary function for if hold for any and a constant matrix .
The auxiliary function is useful because of the following Lemma 1.

Lemma 1. is nonincreasing under the updating rule , if is an auxiliary function of .

Proof. Following the definition of , we have

The key point is to find an appropriate auxiliary function for (6). Because the learning process is independent in each view, in generally. Let represents to indicate each view’s process, and let . We rewrite (6) as follows:

Since the update rules are elementwise, we should prove each is nonincreasing under the update (10) by defining the auxiliary function regarding as follows.

Lemma 2. The functionis an auxiliary function of in problem (6).

Proof. Since is obvious, we need to prove . To this end, we compare (16) with (15). For the inequality , which holds when , we have the following inequalities:

With the lemma and proposition in [19], we have the following inequalities:

Collecting all bounds, holds, and Lemma 2 is proven.

Theorem 2. Problem (6) is nonincreasing under the iterative updating rule (10).

Proof. is a convex function. To find its minima, following the KKT condition, we let

This can derive the updating rule (10) under the objective function in (6). This updating rule of also can be derived by this method.

3.4. MVC with rNNMF
3.4.1. Representation Combining

After decomposed by the matrix factorizations model, the final representation can be obtained by combining different views’ representation. The final output which is a matrix can be obtained from of multiview as

3.4.2. Clustering with Similarity Graph

Because the neighboring structure is kept during the factorization, the graph-based clustering method will be chosen to cluster data in our study. The similarity graph is built from the final representation by the k-NN algorithm. Then, normalized cut (Ncut) [22] is used to obtain the final clustering results. It can achieve better performance considering the graph structure of the data. Details of our method are described in Algorithm 1.

Input: input , parameter , parameter
Initialize:
for each view do
  
end
design via (2) with
while not converged do
for all view do
   update via (12)
   update via (10)
   update via (2) with
end
end
combining:
Similarity graph: is built with and
clustering: Ncut
Output: clustering results
3.5. Complexity Analysis

Suppose that the dimensions in all the input views are the same, denoted by d. The dimension of is denoted by p. And V is the total number of views. The overall cost of in each view is . Updating costs , and the cost of is . Furthermore, each of , , , and is . Note is the number of iterations. The overall complexity is . Nevertheless, (containing non-zero elements) is a sparse matrix. The complexity will be reduced obviously.

4. Experiment

4.1. Experimental Setting
4.1.1. Datasets

Four datasets are used in the experiment.

UCI Digit (https://archive.ics.uci.edu/ml/datasets/Multiple+Features) is a dataset of handwritten digits of 0 to 9 from UCI machine learning repository. It consists of 2000 points. Similarly to the work in [21], we use 76 Fourier coefficients and 216 profile correlations.

3Sources (http://mlg.ucd.ie/datasets/3sources.html) is collected from three well-known online news sources and each is treated as one view. We select the 169 stories which are reported in all three sources.

ORL (http://cs.tju.edu.cn/faculty/zhangchangqing/code/ORL_mtv.rar) contains 400 different images of 40 subjects with three views: intensity, LBP, and Gabor. All images are resized into 48 × 48. LBP is a 59-dimension histogram over 9 × 10 pixel patches generated from cropped images. The scale parameter λ in Gabor wavelets is fixed as 4 at four orientations θ = {0°, 45°, 90°, 135°} with a cropped image of size 25°×°30 pixels.

Washington (http://www.cs.umd.edu/projects/linqs/projects/lbc/) belongs to WebKB, which collected webpages from four universities. The webpages are distributed over five classes: student, project, course, staff, and faculty, and they are described by two views: the content view and the citation view. Each webpage is described by 1703 words in the content view and the number of citation links between other pages in the citation view. We summarize them in Table 1.

The algorithms that we employ to compare are as follows: (1) BestSV performs the best performance in each view [25]; NMF-based methods: (2) MulNMF and (3) D-SNMF; subspace clustering based methods: (4) c-LRSSC [26], (5) p-LRSSC [26], (6) RMSC [27], (7) ECMSC [28], and (8) MVGL [29].

The codes of all the baseline methods are provided by their authors. We adjust the parameters of all comparison methods according to the corresponding literature to obtain their best performance. For RMSC, its parameter is searched from 0.005 to 100 as the authors’ suggestion. For all the NMF-based methods, we set the dimensionality of the new space to be the same as or bigger than the number of clusters and the initial step follows the authors’ suggestion. K-means will be applied to the new representation for clustering. This process is repeated 10 times, and the average clustering performance is recorded as the final result.

For rNNMF, the dimensionality of is 60 for UCI and ORL, and it is 20 for 3Sources and Washington. It is initialized by NNDSVD and repeated five times for average results. The hot kernel, in which the parameter is 2, is used to determine the distance between points to select the neighboring points as same as in [30] (https://github.com/louloupiano/PCPSNMF).

For evaluation, we use three evaluation metrics: accuracy (ACC), normalized mutual information (NMI), and adjusted Rand index (AR) [31]. For all the metrics, a high value denotes good performance.

4.2. Clustering Performance

Table 2 summarizes the clustering performance. The best values are in bold. As the table shows, rNNMF achieves the highest performance. In UCI, with settings and , it outperforms the second best method by roughly 6.78%, 9.07%, and 12.49% in the three metrics. In 3Sources, it also obtains the best with settings and . In ORL, when and , it also outperforms the best in ACC and AR, while becomes worse slightly in NMI than ECMSC. In Washington, with and , it also outperforms the second best method by roughly 2.43%, 0.77%, and 12.87%. In all, rNNMF achieves more accurate performance than others obviously.

We present Figure 1 to show more details on clustering results with the similarity matrix yielded from two high-performance MVC methods in UCI and ORL. For UCI, we can see that diagonal blocks of rNNMF are whiter than MulNMF, and the surrounding nondiagonal black blocks are blacker than MulNMF. For ORL, the similar conclusions also hold, which is clearer in the similarity matrix of rNNMF.

Hence, with the representation combining process, rNNMF can fuse multiple views efficiently. And neighboring constraint plays an important role in discovering the underlying structure of points. It is observed that a comprehensive graph structure is important to discover the cluster structure.

4.3. Influence of the Parameters

plays an important role for representation learning process in rNNMF. We test within the setting [0, 0.0001, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1] for all datasets. Among them, means there is no neighboring constraint in our model. The ACC and NMI results are shown in Figure 2.

It shows that ACC performance and NMI performance improve with increasing in UCI, and they will reach their best performance when . Then, the performance drops obviously. This tendency can also be observed in 3Sources. The only difference is that the best ACC and NMI can be obtained when . However, in ORL dataset, performances improve slightly with increasing , and they drop obviously after reaching the best value. ACC and NMI in Washington will drop with increasing . This shows that too large can destroy the similarity structure in each view, which can lead to worse performance in final clustering process. And the suitable can strengthen the cluster structure in learning process. Furthermore, when , ACC and NMI performances in our model are better than those in some methods, such as D-SNMF and RMSC, demonstrating that combining multiviews and building graph is useful in MVC.

Parameter is important to create the final similarity graph. Figure 3 shows the ACC and NMI results with different values. As we can see, the final results are sensitive with in integrated representation. In UCI, 3Sources, and ORL, obviously, the best performance can be obtained when , and then, the tendency of performance will drop as increases. In Washington, ACC and NMI reach their best when . This shows that the influence of mainly focuses on similarity graph building from integrated representation, and it is important to build the graph from representation.

4.4. Convergence Analysis

Figure 4 shows the convergence property of rNNMF by computing the objective error in each iteration. It is clear that the objective value decreases steadily in all datasets. All of NMI finally keeps the rough stability around the convergence point. So the maximum number of iterations is set to 100 for all the experiments.

5. Conclusion

In this paper, we proposed a novel NMF-based MVC model, named “rNNMF.” In this model, the neighbor structure representation can be learned in each view, and L2,1-norm-based loss function is designed to improve its robustness against noises and outliers. Then, a final representation of data was integrated with those representations of all views, and a graph was learned from this representation. Finally, the graph cut method was used to partition data into its underlying clusters. Unlike existing methods, rNNMF can well encode the local structure from each view feature space and achieve the structure agreement via combining fusion. Experiments show that the rNNMF-based model yields higher performance. One of the important further works is to find a better graph structure to obtain more clear representation. In addition, the weight for each view is also worth studying to deal with varying levels of quality. The future work can also be done to extend rNNMF model and optimization strategy to handle dynamic data and achieve online multiview clustering.

Data Availability

All the data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Key Research Development Program of China under Grant no. 2017YFB0802800 and in part by the National Natural Science Foundation of China under Grant no. 61473149.