Abstract
In the era of Industry 4.0, singleview clustering algorithm is difficult to play a role in the face of complex data, i.e., multiview data. In recent years, an extension of the traditional singleview clustering is multiview clustering technology, which is becoming more and more popular. Although the multiview clustering algorithm has better effectiveness than the singleview clustering algorithm, almost all the current multiview clustering algorithms usually have two weaknesses as follows. (1) The current multiview collaborative clustering strategy lacks theoretical support. (2) The weight of each view is averaged. To solve the abovementioned problems, we used the HavrdaCharvat entropy and fuzzy index to construct a new collaborative multiview fuzzy cmeans clustering algorithm using fuzzy weighting called CoMVFCM. The corresponding results show that the CoMVFCM has the best clustering performance among all the comparison clustering algorithms.
1. Introduction
In the era of Industry 4.0, as the methods of data collection become more and more diverse, the complexity of data is also increasing. For example, a driverless car will collect environmental data through a variety of sensors and conduct analysis and processing from multiple views while driving. In unsupervised learning, clustering is usually used for complexity data analysis. However, traditional clustering methods, such as Kmeans [1, 2], fuzzy Cmeans (FCM) [3, 4], maximum entropy clustering (MEC) [5, 6], and possibilistic Cmeans (PCM) [7, 8], are all designed for singleview data analysis. When the singleview algorithms [9–11] encounter a multiview clustering task, the common practice is to first consider each view independently, treating each view as an independent clustering task. After finishing each clustering task of each view, the integrated learning mechanism [12] is used to select an appropriate integrated learning strategy to integrate multiple clustering results and then get the final clustering results. However, due to obvious deviation of clustering results in a certain view or great difference of clustering results among different views, the multiview strategy which artificially separates each view for independent analysis may result in inaccurate global partitioning results obtained by integrated learning or unstable algorithm performance.
In many real applications, multiview representation of data is becoming more and more popular [13], especially in the field of medicine [14, 15]. For example, people’s living standard and economic situation have been improved since entering the twentieth century. But the incidence rate of cancer has increased by nearly 50% compared with that of the 1880s due to environmental pollution, food safety, and working pressure. However, with the continuous improvement of medical level, various detection methods such as laboratory examination (routine examination, serological examination, gene or gene product examination, etc.), imaging and endoscopy (Xray examination, Bultrasound examination, CT examination, radionuclide imaging, etc.), and cytopathological examination (puncture biopsy, forceps biopsy, section analysis, etc.) have been proposed and applied. These methods can be used to analyze the suspected patients from different views. This is a typical multiview data representation problem. The above examples reveal that developing various multiview clustering algorithms is very necessary for us to better observe and mine the essence of data from the viewpoint of its diverse descriptions and accordingly obtain a better clustering result that simultaneously satisfies every representation (view). Introducing the multiview technology into the traditional clustering analysis method so that there is collaborative learning in the clustering process is considered to be an effective solution. In recent years, some effective multiview clustering methods have been proposed using the above strategies. Yamanishi et al. [16] proposed a collaborative clustering algorithm CoEM algorithm that can be used to solve multiview problems based on the EM algorithm from the perspective of probability and test the proposed algorithm’s effectiveness through some textlike samples. Inspired by FCM algorithm, Pedrycz [17] controlled the fuzzy partition between the various views, constructed a divisional cooperative control function, and finally obtained the CoFC algorithm. The algorithm has shown certain advantages on various datasets. More early related multiview clustering studies can be found in [18, 19].
As mentioned above, a lot of researches have begun to focus on the construction of multiview clustering algorithm. Through the summary of the current research on multiview clustering, we find that the current research mainly focuses on the following aspects: (1) the early multiview clustering algorithm usually preprocesses the data itself, and the most direct method is to synthesize a multiview data into a singleview data through feature fusion and then use the data clustering analysis; (2) most of the multiview clustering algorithms proposed in recent years use collaborative learning strategy, which can enhance the performance of each view data in the process of clustering; (3) when most multiview clustering algorithms with collaborative learning ability treat each view, their common practice is to average the weight of each view. In particular, we take the most classic CoFKM algorithm [20] as an example, which is one of the representative multiview clustering algorithms in recent years. The algorithm designs a very effective multiperspective collaborative learning strategy, which can make the data between different perspectives use membership to complete collaborative learning in the process of clustering. But the algorithm also has a fatal disadvantage, that is, it just treats each perspective equally and does not give each perspective different weights. In addition, the multiperspective collaborative learning strategy proposed by the algorithm also lacks the necessary physical meaning, which cannot explain why this collaborative learning strategy can contribute to the final clustering performance. In response to the abovementioned challenges, in this study, a new view space division criterion is first constructed based on the HavrdaCharvat entropy, which is used to control the space division results across different views so that the space division results of each view tend to be as consistent as possible in order to obtain a more stable and more comprehensive global spatial division result. Furthermore, we introduced fuzzy index and fuzzy weights to adaptively weight each view and effectively adjust the weight of each view during the clustering process so that the view with the clearest spatial division has a larger weight. Finally, a new collaborative multiview fuzzy cmeans clustering algorithm using fuzzy weighting called CoMVFCM is proposed by combining with the HavrdaCharvat entropy and fuzzy index. We summarized the contributions of this study here: (1)We construct a new view space division criterion using the HavrdaCharvat entropy. The built criterion can be used to control the space division results across different views(2)We construct a view weighting mechanism using fuzzy index. The new view weighting mechanism can be used to recognize the importance degree of each view
Overall, our proposed CoMVFCM algorithm not only has good space division ability but also has the ability to adaptively recognize the best view.
2. Related Work
When multiview clustering task is coming, Cleuziou et al. [20] proposed the CoFKM method based on classical FCM. In CoFKM, multiview clustering is achieved by a constraint of fuzzy membership degree which is aimed at keeping the partition result of each view as consistent as possible. Here, the CoFKM method is defined as and , , .
By substituting Equation (2) to Equation (1), Equation (1) is simplified as where , can be used to control the contribution of , and is the average fuzzy membership degree of and .
The objective function of CoFKM can be optimized by introducing Lagrange multipliers. So the fuzzy membership degree and center are obtained as
To obtain a fuzzy division standard with global considerations, the fuzzy membership of each view can be computed by using the geometric mean method [20]. The specific expression is as follows:
From the CoFKM algorithm, we can draw a general framework to represent multiview clustering, which is illustrated in Figure 1. The CoFKM algorithm incorporates the spatial division and approximation criteria across different views in the clustering and realizes collaborative learning across different views; it has more effective multiview clustering performance compared with traditional singleview integrated clustering technology. However, as we stated before, it still has challenges to be further solved.
3. Collaborative Multiview Fuzzy Clustering (CoMVFCM) Using Entropy Technology
In view of the two shortcomings of the current multiview clustering methods, the following two new technologies based on the entropy theory were introduced. (1)We use the HavrdaCharvat entropy to construct a new view space division approximation criterion and find the maximum similarity component between each view so that while improving the performance of clustering, it also gives the view space division approximation criterion new physical meaning from the perspective of entropy(2)We propose an entropyweighted multiview clustering technology. By weighting each view, we can find the best view in the iterative optimization process and get the best fuzzy division result at the same time, in order to effectively control the weight
Figure 2 illustrates the new framework of multiview clustering.
3.1. Approximation Criterion of Space Division from Different Views Based on the HavrdaCharvat Entropy
In this study, the HavrdaCharvat entropy of is defined as
It is obvious that if the fuzzy membership degree is considered as a probability matrix, when the constraint holds, is equal to 0. It is very intuitive to show that the uncertainty of belonging to each division in the sample set of this view reaches the minimum value. That is to say, when the objective function reaches its minimum value, the HavrdaCharvat entropy of also reaches its minimum value.
Although Equation (6) can ensure that the uncertainty of division can be minimized, it is limited to a single view. In order to expand it into a field of multiple views, in this study, we expand Equation (6) into the following expression form by referring to the relevant strategies used in [20]:
So we have
We observe from Equations (7) and (8) that can be used to effectively regulate the weight relationship between the current view and the membership degree division of other views ( and ). So we can get the weighted average of membership degree and finally make the membership degree division of each view as consistent as possible, so as to obtain the spatial division result with a more global view.
3.2. Multiview Adaptive Weighting Based on Fuzzy Index
In this study, we develop an automatic view weighting strategy using fuzzy index to recognize the best view. Suppose represents the weight of view under the condition that and , then can be considered as the probability distribution which is defined as
Fuzzy index technology is introduced through the above methods to make the objective function achieve the optimal entropy as much as possible, which is also the classical fuzzy cmeans clustering principle [4].
3.3. CoMVFCM
According to the above definitions, we propose our new multiview clustering method. The objective function of CoMVFCM is and , , .
Obviously, the objective function contains two main parts. The first one is which is derived from HavrdaCharvat and used for collaborative clustering. The essence of the first part is to find out as many similar parts among different perspectives as possible through multiview clustering technology and finally make the spatial division results of different views tend to be the same. The second part is which is derived from fuzzy index. This part can be used to adaptively calculate the weight values of each view, and finally, when the algorithm reaches the optimal level, the optimal view partitioning results can be obtained according to the weight matrix of the views. The parameter can be set to . The parameter can be determined by using grid optimization [13, 21].
To obtain the final result of space division with global characteristics, the integration strategy of global space division mentioned in [20] is abandoned in this study. We define a new integration strategy to obtain the final space division as
3.3.1. Optimization
The proposed multiview can be optimized by introducing Lagrange multipliers. In this section, we give three theorems to obtain updating rules in terms of fuzzy membership degree, view weights, and cluster centers.
Theorem 1. When and are fixed, the cluster center can be solved by
Proof. By setting , we have . Therefore, Theorem 1 is achieved.
Theorem 2. When the cluster center and view weight matrix are fixed, the fuzzy membership degree matrix can be solved by
Proof. By introducing Lagrange multipliers and considering the constraint , we have the following objective function:
where .
By setting the partial derivative of w.r.t. to 0, i.e., , we have
Similarly, with , we have
By combining Equations (15) and (16) to remove , we have
Therefore, the proof of Theorem 2 is achieved.
Theorem 3. When the center matrix and the fuzzy membership degree matrix are fixed, the weight matrix can be solved by
Proof. Similar to Theorem 2, by introducing Lagrange multipliers and considering the constraint , we have Theorem 3.

Remark 4. In this section, a novel multiview clustering method called CoMVFCM is proposed. The proposed CoMVFCM method can find the most important view adaptively, and it also can obtain the best space division by using Equation (11). However, we will find that the proposed CoMVFCM method has three predefined parameters. These predefined parameters should be defined by using grid optimization which will lose many time costs. In the near future, we will consider how to reduce the number of these predefined parameters.
4. Experimental Studies
4.1. Settings
In this study, we introduce several UCI datasets to evaluate the proposed multiview clustering method. For fair comparison, CoFKM [20], LSSMTC [22], CombKM [22], and Coclustering [23] are introduced for benchmarking testing.
We introduce two commonly used criteria, i.e., NMI and RI to evaluate all clustering methods. They are defined as follows. (1)Normalized Mutual Information (NMI) [24, 25] where represents the number of samples in the th cluster, represent the matching degree of the th cluster and the th cluster, and represents the size of the dataset.(2)Rand Index (RI) [24, 25] where represents the number of pairing points that have the same class label and belong to the same class and represents the number of matching points with different class labels and belonging to different classes of data points.
The value range of the above two indexes is [0 1]. The closer the value of these two indicators is to 1, the better the performance is. Experimental environment: the experimental hardware platform was Intel Core i7 CPU, with a memory of 16 GB. The programming environment is MATLAB 2010.
4.2. Experimental Results
In this section, some realworld datasets from the famous UCI database will be used to test our algorithm: (1) Iris dataset, (2) Multiple Features (MF) dataset, (3) Image Segmentation (IS) dataset, and (4) Water Treatment Plant (WTP) dataset. The performance of the CoMVFCM algorithm proposed in this study is verified and analyzed by using the above datasets when processing real multiview clustering tasks. In order to have a more intuitive impression of the perspectives contained in the three datasets, this paper will present the composition of the four datasets, as shown in Table 1. At the same time, the experimental results of algorithm comparison for these four real datasets are shown in Table 2.
For the Iris dataset, we will observe that the proposed CoMVFCM has the best clustering performance among all the adopted comparison algorithms. The experimental result of Iris shows that the proposed two multiview collaborative clustering strategies have significant advantages in multiview clustering task. For the other three datasets, since the LSSMTC algorithm needs to ensure that the dimensions of each clustering task are consistent, it cannot be used in the face of the samples with different perspectives such as MF, IS, and WTP. By observing the rest of each other algorithm’s clustering results of MF datasets, it can be found that based on the multiple points of view of CoFKM, the algorithm of this paper has a larger cluster advantage, but because of MF data, no angle exists obvious separability which exist the importance degree of the equilibrium between different points of view; this makes the clustering results from the NMI and RI of the proposed algorithm in the paper with CoFKM algorithm similar to the average of the two major indicators, and from the variance analysis, the method is still more stable so it still reflects that the method still has certain advantages. For IS dataset, the effect of the proposed method on this sample is relatively obvious, and its clustering index is significantly better than that of the other algorithms, which further confirms the effectiveness of the CoMVFCM. Finally, through the analysis of the experimental results of the WTP dataset, the same conclusion can be obtained with the above two datasets. In conclusion, through the experiments on real dataset multiple points of view and analysis, we can get a clear conclusion of the clustering algorithm in dealing with multiple points of view which have many view feature clustering tasks generally superior to the clustering algorithm, multiple points of view, and has a view of selective CoMVFCM algorithm, and clustering algorithm is much better than the previous view.
5. Conclusion
Based on cluster technology, multiple points of view are introduced on the basis of the classical FCM algorithm using the HavrdaCharvat entropy structure of different view space approaching. The proposed CoMVFCM method can better find out the similarities between view compositions, but also from the view of entropy approximation of a different view space, reasonable physical explanation, and thus get more guiding significance to the overall space partition. In addition, in this paper, another contribution is to obtain the importance degree of each view. Through the understanding of the fuzzy theory, multiple points of view are proposed based on the fuzzy index of the adaptive weighted strategy and succeeded in introducing the strategy to the latest fuzzy clustering technology, multiple points of view on the new objective function to achieve the optimal solution. Next, we can evaluate the degree of importance of each view according to the relationship between the weights of each view. The obtained degree of importance of each view provides a new method of integration of the global weighted view space integration means. Experimental results on four real UCI datasets show that the CoMVFCM has better sample adaptability and superior algorithm performance compared with previous algorithms and related algorithms. However, since the algorithm in this paper is based on the framework of the classical fuzzy cmeans (FCM) algorithm, the effectiveness of the algorithm may be tested to a certain extent when dealing with higherdimensional data, which also points out the direction for our future research on the multiview clustering method under highdimensional data scene.
Data Availability
The dataset analyzed for this study can be found in this link [http://archive.ics.uci.edu/ml/index.php].
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant 61772241 and in part by the 2018 Six Talent Peaks Project in Jiangsu Province under Grant XYDXX127.