A Belief Two-Level Weighted Clustering Method for Incomplete Pattern Based on Multiview Fusion
Incomplete pattern clustering is a challenging task because the unknown attributes of the missing data introduce uncertain information that affects the accuracy of the results. In addition, the clustering method based on the single view ignores the complementary information from multiple views. Therefore, a new belief two-level weighted clustering method based on multiview fusion (BTC-MV) is proposed to deal with incomplete patterns. Initially, the BTC-MV method estimates the missing data by an attribute-level weighted imputation method with k-nearest neighbor (KNN) strategy based on multiple views. The unknown attributes are replaced by the average of the KNN. Then, the clustering method based on multiple views is proposed for a complete data set with estimations; the view weights represent the reliability of the evidence from different source spaces. The membership values from multiple views, which indicate the probability of the pattern belonging to different categories, reduce the risk of misclustering. Finally, a view-level weighted fusion strategy based on the belief function theory is proposed to integrate the membership values from different source spaces, which improves the accuracy of the clustering task. To validate the performance of the BTC-MV method, extensive experiments are conducted to compare with classical methods, such as MI-KM, MI-KMVC, KNNI-FCM, and KNNI-MFCM. Results on six UCI data sets show that the error rate of the BTC-MV method is lower than that of the other methods. Therefore, it can be concluded that the BTC-MV method has superior performance in dealing with incomplete patterns.
In the information era, data have abundant research value, but collecting complete data is significantly difficult. In the collection process, the reasons for missing data are varied, including subjective and objective factors, such as equipment malfunction, personnel operation error, false memory, and partial rejection by the respondents . Missing data, also called an incomplete pattern, is a common phenomenon in practical applications. A survey shows that 45% of data sets in the UCI machine learning repository, which covers many fields, are incomplete . Deletion and imputation methods are commonly used to deal with missing data. Deleting incomplete patterns is an easy method, which is acceptable when the incomplete pattern accounts for less than 5% of the whole data set . The imputation method, which replaces missing values with estimations, is a popular method for dealing with incomplete patterns . For instance, the KNN technology and its derivatives have been used in many application fields because of their strong operability [5–7].
A number of imputation methods based on the KNN technology have been proposed [8–10]. In the early method, the average value of the k-nearest neighbor about the incomplete pattern was used to express the missing value . In addition, the imputation methods of integrating KNN and other technologies were proposed by some researchers [11–13]. For example, an adaptive imputation method for missing values, which uses KNN and self-organizing map (SOM) based on belief function theory, is proposed in . In this method, the uncertainty caused by the missing data is represented. The linear local approximation method is presented, which uses the KNN with optimal weights obtained by local linear reconstruction technology to estimate the missing values . The estimated values obtained by the traditional KNN based on a single view are globally optimum but may not be locally optimum. Therefore, the imputation method based on a single view may decrease the accuracy of clustering methods.
Clustering is an important task of pattern recognition and machine learning, which divides objects into different clusters based on the similarity between patterns . Hard clustering and fuzzy clustering methods have been used in many fields by their universality [15–17], but the clustering methods based on a single view ignore the information from multiple views . Compared with the single-view clustering method, the clustering method based on multiple views explores the complementary information of each view, which can improve the accuracy of the clustering result [19, 20]. Recently, multiview clustering has become a popular research topic [21, 22]. A collaborative multiview clustering method is proposed in  to overcome disagreement between the views, the different properties, and scales of views. The weights that represent the importance of views and features are proposed in ; an objective function is designed to express the heterogeneity of different views and the consistency across views during iterations. Jiang et al.  proposed the multiview FCM clustering algorithm with views and feature weights based on collaborative learning; this method can exclude irrelevant components in the clustering procedure, which increases the precision of the clustering results. In addition, multiview spectral clustering methods have been studied recently. The spectral clustering algorithms consist of two steps as follows: learning the similarity graphs from instances and obtaining the clustering result based on spectral clustering. Tang et al.  proposed a unified one-step multiview spectral clustering method (UOMvSC). In order to obtain the clustering results, the UOMvSC method combined the multiview embedding matrices and graphs into a unified graph. A joint affinity graph for multiview clustering is proposed in ; the diversity regularization term is designed to learn the different weights of diverse views. Zheng et al.  proposed a novel multiview clustering method that integrates within-view partial graph learning, cross-view partial graph fusion, and cluster structure recovery. However, most of the clustering methods for incomplete patterns are based on single-view, and the clustering results are not accurate enough. In addition, to our knowledge, there is little research on the multiview imputation method, although researchers have proposed numerous methods to improve the accuracy of the estimation.
In this paper, we develop a belief two-level weighted clustering method for incomplete patterns based on multiview fusion (BTC-MV). The main contributions of this work are summarized as follows:(1)Attribute-level weighted imputation strategy for incomplete patterns: In this strategy, the variance of each attribute, which is called attribute weight, is used to reflect the importance. The weighted attribute is used in searching for KNN of the incomplete pattern based on multiple views.(2)View-level weighted fusion strategy based on belief function theory: The view-level weights are obtained by optimizing the new objective function of the clustering method based on multiple views. They are regarded as the discounted factors in the belief fusion, which represent the importance of the evidence from different view spaces.(3)To the best of our knowledge, the belief two-level weighted clustering method for incomplete patterns based on multiview fusion is proposed for the first time. Compared with other state-of-the-art methods, the BTC-MV method performs better in multiview clustering for incomplete patterns.
The rest of this paper is organized as follows: In Section 2, we introduce related work on missing data classification methods and the basics of belief function theory. The details of the belief clustering method for incomplete patterns based on multiview fusion are shown in Section 3. In Section 4, we compare the BTC-MV method with other state-of-the-art methods on six UCI data sets. Finally, the conclusion is drawn in Section 5.
2. Related Work
2.1. Classification of the Missing Data
According to the missing mechanism, the incomplete pattern can be divided into missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) . Wang et al.  proposed a query algorithm based on the Spark framework to handle query problems with incomplete data sets. The clustering method for incomplete patterns includes the imputation of the missing data and the clustering for data sets.
In addition to the abovementioned imputation method based on the KNN technology, the mean imputation (MI) and fuzzy c-means imputation (FCMI) methods have obtained significant research progress [31–33]. In MI , the missing data are estimated by the mean value or mode of the corresponding attribute, and it is used for the data sets with a similar attribute distribution in each category. However, the estimations of the same attribute in different incomplete patterns are equal. In FCMI , the estimations are calculated by the clustering centers and the distance between the centers and the patterns. However, the performance of this imputation strategy depends on initial conditions.
The clustering algorithm is applied to partition the data set into several clusters, and it has been widely used in various fields. A cluster-based information retrieval approach was proposed in . The k-means clustering method and frequent closed item set mining were combined to extract clusters of documents and find frequent terms. The clustering method and pattern mining algorithm were integrated to search for the most relevant object from a clustered set of objects . The space-time series clustering methods, such as hierarchical, partitioning-based, and overlapping clustering methods were used in big urban traffic data sets . In addition to the single-view clustering methods described above, many researchers have extended the single-view clustering methods to multiview clustering methods. A multiview FCM clustering method based on the collaborative learning was proposed in ; it included a single-view partition process and a collaborative step to share information between different views. Wang and Chen  proposed a multiview fuzzy clustering method with minmax optimization. The multiview clustering method can integrate the information from different views.
In recent years, with the development of the neural networks, many models based on deep learning have been built to classify incomplete data sets [40–42]. In , a multivariate time series generative adversarial network is proposed for multivariate time series imputation, which improves the imputation performance. However, the performance of the deep learning classification models depends on large data sets. When the data set is small, the model cannot be stable.
2.2. Basics of the Belief Function Theory
The belief function theory is called evidence theory or Dempster–Shafer theory (DST), which is a classic theoretical framework used in probabilistic reasoning [43, 44]. The belief function theory can generate a belief mass by fusing the useful evidence from independent sources, which is used in many fields [45, 46]. In this theory, the discernment framework consists of finite, mutually exclusive, and complete elements of the problem under study, and it is represented by . The power-set of the discernment framework expresses the uncertainty, which is denoted as . The basic belief assignment (BBA) is a function from to [0, 1], which satisfies the following conditions:where expresses the probability of the evidence supporting proposition but does not support the occurrence of any true subset of . All elements that satisfy and are called focal elements of .
The outputs of classifiers or fuzzy clustering methods indicate the extent of the corresponding evidence that supports different classes. The DS fusion theory [47, 48] is used in many fields because it can integrate the evidence from many independent sources by its commutative and associative properties. The fusion strategy of the evidence from different independent sources and at the discernment framework is shown as follows:where indicates the conflict belief mass between evidence from different sources. However, the result of the DS fusion theory is unreasonable when the conflict between evidence from different sources is significantly high. Therefore, a series of methods are proposed to solve the abovementioned problems, such as a fusion strategy proposed by Dubois and Prade in  and PCR6 rules based on the proportional conflict redistribution .
3. Clustering Method for Incomplete Pattern
We propose the BTC-MV method to decrease the error rate of the clustering method in incomplete patterns, where data are randomly missing or unobserved. The flowchart of the BTC-MV method is presented in Figure 1. First, an attribute-level weighted imputation strategy is proposed to estimate the missing or unobserved value in the data set . In this step, the variance of each attribute in the data set is calculated and regarded as the weight of the KNN and the missing values are estimated by the KNN. Second, a fuzzy C-means clustering method based on multiple views is proposed to cluster the complete data set with estimated values, and the membership values and the weight of multiple views are submitted to a view-level weighted fusion strategy to get precise results. Third, the BTC-MV method uses the weight of multiple views as the discounted factors, and a belief fusion strategy is proposed to fuse the membership values of the pattern in different views. Finally, the clustering results are obtained. The details of the BTC-MV method are shown as follows.
3.1. Attribute-Level Weighted Imputation Strategy
Here, all attributes in data set are divided into views. expresses the feature matrix of the pattern under the view space . We assume that some attributes of pattern are unobserved, because the clustering method for incomplete patterns is our research topic.
In the BTC-MV method, the attribute-level weighted imputation strategy based on the KNN is proposed to estimate the missing value. First, we calculate the variance of each attribute in the data set , as shown in equation (3), which expresses the importance of different attributes in . A bigger variance indicates a larger difference between all instances in the attribute space, so the estimation calculated by k-nearest instances is more accurate. Then, the weighted KNN method is proposed to search for the top-k-nearest neighbors of in the view space ; the distance between the complete pattern and the incomplete pattern is shown in equation (4). The variance of the attribute is regarded as the weight of distance between complete patterns and incomplete patterns. According to the weighted distance, we obtain K neighbors closest to the incomplete pattern and estimate the missing value. Finally, the estimated value of the missing data is calculated by equation (5). The imputation strategy is shown in Algorithm 1, which introduces the variance of different attributes in to estimate the missing data and improve the precision of the estimation.where is the th attributes of the data set in the view space , is the number of the patterns in the view space , is the number of the attributes in view space , is the variance of attribute , and it is normalized as the weight of attribute under the view space . is an incomplete pattern with unobservable attributes. is the complete sample belonging to class under the view space . denotes the weighted distance between and . is the estimated value of the incomplete pattern in the attribute space . is the sum of the top-K nearest neighbors for .
3.2. Clustering Method Based on Multiple Views
After the attribute-level weighted imputation method based on multiple views, the data set is regarded as a complete data set with estimations. The fuzzy C-means clustering method based on multiple views is conducted on the complete data set , which is shown in Algorithm 2. In each view, we calculate the clustering centers, the membership values, and the view weights. The objective function of the clustering method based on multiple views is shown in the following equation:s.t.where is the weight exponent that determines the fuzziness of the clustering result, is the membership value of the th pattern to the th cluster center , and is the weight of the th view. expresses the Euclidean distance between and in the view space .
The optimal values of the multiview clustering method are obtained by minimizing the objective function by iterative optimization. In general, the optimal values are derived by setting the partial derivatives of the objective function to zero. According to the Lagrangian multiplier method, the Lagrangian function of the objective function under the constraints of equation (7) is shown in the following equation:where and are the Lagrangian multipliers.
The optimal values of the objective function , such as the cluster center , the weight of the view , and the membership value , are obtained by calculating partial derivatives of the function , which are shown in the following equations:
3.3. View-Level Weighted Fusion Strategy Based on the Belief Function Theory
In the multiview clustering process, the weights of various views are different, which indicates that the reliability of the evidence from various sources is different. Therefore, the membership values of the pattern belonging to different clustering centers are not equally weighted in different views. We use discounting techniques and DS fusion theory to integrate different membership values of the pattern and named it the view-level weighted fusion strategy based on belief function theory. In this method, a classic discounted rule proposed by Shafer in  is applied here; the membership values based on the multiple views can be regarded as the evidence that the pattern belongs to all possible classes in the discernment framework. First, we multiply the membership values by the view weights representing reliability. Then, the discounted membership values in different views are fused by a belief function theory. Finally, the clustering results can be obtained. In this section, the membership values are treated as mass values; the view weights are regarded as the discounted factors; and the discounted masses are obtained by equation (13). The discounted masses are regarded as the probability that the pattern belongs to different categories in multiple views. represents the imprecision of the clustering method due to incomplete patterns. In the BTC-MV method, the discounted masses from multiple views are fused by the DS theory, as shown in equation (2). Finally, the clustering results are determined by the maximum belief masses.
4. Experiment Application
In this section, in order to test the performance of the BTC-MV method, we conduct massive experiments on six data sets with different dimensions from the UCI repository . We divided the attributes of each data set into different groups to satisfy the scenarios of multiple views. Some attributes of this data set are randomly missing to meet the assumption of an incomplete data set. The important information of these well-known data sets, including the number of attributes (Na), classes (Nc), instances (Ni), and views (), is shown in Table 1. These six data sets, where the attributes range from 4 to 16, views range from 2 to 4, classes range from 2 to 7, and the instances range from 150 to 13611, are representative and generic results can be obtained.
In order to justify the performance of the BTC-MV method, the classic imputation methods and clustering methods are combined and compared with the proposed BTC-MV method. The typical methods of estimating missing data include MI  and k-nearest neighbors (KNN) . The classic clustering methods used in the comparison experiments include K-means, and FCM . According to the number of views, it can be divided into single-view clustering and multiview clustering. Therefore, there are four comparison methods, such as MI-K-means based on single-view clustering (MI-KM), MI-K-means based on multiview clustering (MI-KMVC), KNNI-FCM based on single-view clustering (KNNI-FCM), and KNNI-FCM based on multiview clustering (KNNI-MFCM).
The error rate marked as is used to evaluate the performance of the BTC-MV method. The formula for calculating error rate is , where is the number of the patterns with error clustering results and is the total number of the patterns used to conduct experiments. The experiments are conducted with MATLAB software.
4.1. Experiment 1
In the methods of MI-KM, MI-KMVC, KNNI-FCM, KNNI-MFCM, and BTC-MV, parameter K represents the number of the patterns used to estimate the missing data, and it is one of the main parameters in BTC-MV. In the BTC-MV method, K patterns closest to the incomplete data are searched from multiple views with the known attributes. It is worth noting that the parameter K influences the precision of the estimations and the performance of the clustering methods. In order to verify the influence of parameter K on the clustering methods, numerous experiments are carried out under different K values and the comparison results are shown in Figure 2. The error rate of the BTC-MV method varies with the parameter K. However, when K takes a value from 3 to 20, the error rate of the BTC-MV method fluctuates in an acceptable extent. This result indicates that the BTC-MV method has strong robustness for parameter K, which is an advantage of the BTC-MV method in practical classification applications.
4.2. Experiment 2
In this experiment, we set each data set to have 10%, 30%, and 50% incomplete patterns, respectively. Moreover, for each incomplete pattern, there are 50% unknown attributes. We compare the performance of the BTC-MV with other clustering methods on six incomplete data sets, which are shown in Tables 2–4. The error rate of the BTC-MV method on different data sets is lower than that of other methods. It may be because the performance of the attribute-level weighted imputation strategy in the BTC-MV method is superior. This imputation method can accurately estimate the missing values because it makes the patterns with high attribute correlation closer to the missing data. So, we can obtain complete data sets with precision estimations and reduce the error rate of the clustering method. It is noteworthy that as the number of missing data increases, the error rate of these methods also increases. This phenomenon indicates that the missing data make the information ambiguous, leading to a degradation in the performance of the clustering methods.
4.3. Experiment 3
In this section, we test the influence of the number of unknown attributes in the incomplete patterns. We set each data set to have 30% missing data and each incomplete pattern to have 30%, 50%, and 70% unknown attributes, respectively. We compare the performance of the BTC-MV with other clustering methods on six incomplete data sets, which are shown in Tables 5–7. The results of these experiments indicate that the increase of unknown attributes generally leads to a decrease in clustering performance, as missing data introduce uncertain information. However, compared with other methods, the method of the BTC-MV has superior performance. This experiment further validates the effectiveness and robustness of the BTC-MV method.
In this paper, the new BTC-MV method is proposed to meet the challenges of incomplete data clustering. The BTC-MV method estimates the unknown attributes by the weighted KNN strategy based on multiple views; the weights are represented by the variance of each attribute, which reflects the importance of the attribute. The attribute-level weighted imputation strategy improves the precision of the estimations. Then, the clustering method based on multiple views is proposed in BTC-MV, and the view weight expresses the reliability of the evidence from different spaces. Therefore, the membership values of the pattern belonging to various categories in multiple views cannot be equally weighted. Finally, in the BTC-MV method, a view-level weighted fusion strategy based on belief function theory is proposed to integrate the evidence from different source spaces. We conducted experiments on six UCI data sets to compare the performance of the BTC-MV method with that of other state-of-the-art methods. The experiment results show that the effectiveness of the BTC-MV method in clustering incomplete patterns.
In the BTC-MV method, the attribute-level weighted imputation strategy makes an important contribution in improving the accuracy of clustering incomplete patterns. However, it is costly to introduce large computations because the distances need to be calculated in the KNN strategy. We will consider using other methods to reduce the computational complexity in future work. In addition, we will also research other methods to optimize the data set in order to obtain superior clustering performance.
The data sets used in this proposal are extracted from the University of California Irvine machine learning repository.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
This work was supported by the Key Research and Development Project of Shaanxi Province (2020GY-186).
A. Frank and A. Asuncion, UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA, 2010.
E. Acuna and C. Rodriguez, “The Treatment of Missing Values and its Effect on Classifier Accuracy,” Classification Clustering & Data Mining Applications, pp. 639–647, 2004.View at: Google Scholar
L. Huang, H. Y. Chao, and C. D. Wang, “Multi-view intact space clustering,” in Proceedings of the 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), pp. 500–505, IEEE, Nanjing, China, November 2018.View at: Google Scholar
S. Bettoumi, C. Jlassi, and N. Arous, “Collaborative multi-view K-means clustering,” Soft Computing, vol. 23, no. 3, pp. 937–945, 2019.View at: Google Scholar
H. S. Al-Ash, D. Sarwinda, and T. Siswantining, “A novel centroid initialization in missing value imputation towards mixed datasets,” Communications in Mathematical Biology and Neuroscience, 2021.View at: Google Scholar
D. Li, J. Deogun, and W. Spaulding, “Towards Missing Data Imputation: A Study of Fuzzy K-Means Clustering Method, Rough Sets and Current Trends in Computing,” in Proceedings of the 4th International Conference, IEEE, Uppsala, Sweden, June 2004.View at: Google Scholar
D. J. Mundfrom and A. Whitcomb, “Imputing missing values: the effect on the accuracy of classification,” MLRV, vol. 25, no. 1, pp. 13–19, 1998.View at: Google Scholar
A. Belhadi, Y. Djenouri, and K. Norvag, “Engineering applications of artificial intelligence,” Space–time series clustering: Algorithms, taxonomy, and case study on urban smart cities, vol. 95, Article ID 103857, 2020.View at: Google Scholar
R. Lall and T. Robinson, “The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning,” Political Analysis, vol. 30, no. 2, 2022.View at: Google Scholar
T. Denoeux, “Theory of belief functions for data analysis and machine learning applications: review and prospects,” International Conference on knowledge science, engineering and management, vol. 6291, 2010.View at: Google Scholar
F. Smarandache and J. Dezert, “On the consistency of PCR6 with the averaging rule and its application to probability estimation,” in Proceedings of the 16th International Conference on Information Fusion, pp. 1119–1126, IEEE, Istanbul, Turkey, July 2013.View at: Google Scholar