Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2020 / Article
Special Issue

Data-driven Fuzzy Multiple Criteria Decision Making and its Potential Applications

View this Special Issue

Research Article | Open Access

Volume 2020 |Article ID 2901210 | https://doi.org/10.1155/2020/2901210

Dandan Yang, "Fuzzy Covering-Based Three-Way Clustering", Mathematical Problems in Engineering, vol. 2020, Article ID 2901210, 10 pages, 2020. https://doi.org/10.1155/2020/2901210

Fuzzy Covering-Based Three-Way Clustering

Guest Editor: Yi Su
Received17 Apr 2020
Revised28 Jun 2020
Accepted11 Jul 2020
Published31 Jul 2020

Abstract

This paper investigates the three-way clustering involving fuzzy covering, thresholds acquisition, and boundary region processing. First of all, a valid fuzzy covering of the universe is constructed on the basis of an appropriate fuzzy similarity relation, which helps capture the structural information and the internal connections of the dataset from the global perspective. Due to the advantages of valid fuzzy covering, we explore the valid fuzzy covering instead of the raw dataset for RFCM algorithm-based three-way clustering. Subsequently, from the perspective of semantic interpretation of balancing the uncertainty changes in fuzzy sets, a method of partition thresholds acquisition combining linear and nonlinear fuzzy entropy theory is proposed. Furthermore, boundary regions in three-way clustering correspond to the abstaining decisions and generate uncertain rules. In order to improve the classification accuracy, the k-nearest neighbor (kNN) algorithm is utilized to reduce the objects in the boundary regions. The experimental results show that the performance of the proposed three-way clustering based on fuzzy covering and kNN-FRFCM algorithm is better than the compared algorithms in most cases.

1. Introduction

Three-way decisions (3WD) proposed by Yao [1, 2] is a hot topic in various fields in recent years. Since it was put forward, the idea of tripartition has attracted many scholars to do research. Especially recently, great progress has been made in the theoretical research and model building of three-way decisions based on rough sets. For example, Liang and Liu et al. [36] proposed fuzzy three-way decision models and stochastic three-way decision models to deal with real-valued or linguistic-valued decision-making problems. Qian et al. [7] established multigranulation decision-theoretic rough set model based on granular computing theory. Hu [8, 9] introduced the concept of three-way decision space and established a three-way decision model based on partially ordered sets. Qi et al. [10] investigated the 3WD model in the framework of lattice theory. Li et al. [11] have constructed a cost-sensitive sequential three-way decision model to simulate the decision-making process from coarse granularity (high cost) to fine granularity (low cost) and please refer [1214] for further generalizations and applications of this model. Yao et al. [15] construct an optimization-based framework for three-way approximations of fuzzy sets. In the meanwhile, for dynamic objects and attributes, some algorithms and incremental 3WD models are designed for classification of dynamic data [16, 17]. From the viewpoint of application, three-way decisions have been widely used in research fields such as pattern recognition [18, 19], artificial intelligence [2022], engineering, managements [23], and social communities [24].

Based on the above backgrounds and work in three-way decisions, a novel method for three-way clustering based on fuzzy covering is discussed. First, the fuzzy covering of the dataset according to the reasonable fuzzy similarity relation is constructed. The fuzzy covering of the universe requires that the more similar the objects in the universe are, the more similar the corresponding fuzzy classes are. The fuzzy covering established in this way can better reflect the intrinsic relationship between objects in the universe. Therefore, clustering results will have more accuracy with valid fuzzy covering. One of the inevitable problems of clustering is threshold calculation. As is well known, for most of the three-way decision models mentioned above, we first need to obtain the pair of partition thresholds and . Different thresholds lead to different decision results. The appropriate partition thresholds make the decision more accurate, whereas the inappropriate thresholds distort the decision. Traditionally, the partition thresholds are usually selected according to the experts experience in advance [2527]. According to the loss function, Yao et al. [1] proposed a method to determine the thresholds by Bayesian risk decision theory. By using Shannon entropy as a measure of uncertainty, Deng et al. [28] present an information-theoretic approach to explain and calculate the thresholds. Zhou et al. [29] explore the shadowed set to automatically obtain the partition thresholds of the three-way decisions but cannot theoretically give a reasonable semantic explanation. To address this issue, inspired by the idea of balancing the uncertainty change of fuzzy sets, a threshold calculation method combining linear fuzzy entropy with nonlinear fuzzy entropy is proposed. This method provides a new scientific explanation for the generation of thresholds. And then, the boundary regions of three-way clustering are processed by the kNN algorithm to reduce uncertainty and improve decision accuracy.

The structure of the rest of this paper is as follows: Section 2 briefly introduces the necessary notions of three-way decisions. Section 3 focuses on constructing the fuzzy covering of the raw dataset according to the fuzzy similarity relation and some necessary conditions and discusses its related properties. In Section 4, a novel rough fuzzy C-means (FRFCM) algorithm based on valid fuzzy covering is established. Then, we investigate the partition thresholds by combining the linear and nonlinear fuzzy entropy. Furthermore, the framework for processing the boundary region of three-way clustering using the kNN algorithm is introduced. In Section 5, the validity and practicability of the algorithm are evaluated by experiment. Concluding remarks are given in Section 6.

2. Preliminaries

The basic concepts on three-way decisions are briefly reviewed in this section.

An information system is defined as a 4-tuple , where denotes a finite nonempty universe, is a nonempty finite of condition attributes, is a nonempty finite of decision attributes, and , where is a domain of attribute ; is an information function such that for every . If is a membership function value, then the value of object under attribute can be expressed as .

The trisecting-and-acting framework of three-way decisions is an extension of binary decision in order to overcome some shortcomings of binary decision. The traditional binary decision model only has acceptance and rejection options, which can easily lead to errors when the information available is insufficient to make an accurate judgment. Sometimes, the cost of wrong decisions is very high. Therefore, deferment decision is necessary, which allows decision makers to collect more information and make more accurate judgment. This is a strategy that people often adopt in the decision-making process, and deferment decision is consistent with human cognition. A three-way decision model based on the evaluation function and a pair of thresholds is shown as follows.

Definition 1. (see [30]). Let U be a finite nonempty universe, be an evaluation function, and a pair of thresholds, , then the positive, negative, and boundary regions of any subset are defined as follows:Evaluation function is the key of decision. The result of decision-making is different with different evaluation functions. There are various evaluation functions that can be adopted. If a fuzzy membership function is used as an evaluation function, then the induced three regions are defined by the following equations [31]:The three-valued approximations of a fuzzy set is described by Zadeh [32] as follows: (1) , if ; (2) does not belong to , if ; (3) and has an indeterminate status relative to , if . These three cases correspond to the three-way decisions of the above fuzzy set. When and , we obtain the qualitative three-way decisions of a fuzzy set. However, the qualitative decision model of fuzzy set is very restrictive, and we generally do not select these two thresholds.

3. Fuzzy Covering and Its Validity

The focus of this section is on the method of constructing valid fuzzy covering of raw data and discusses the properties of the fuzzy covering. Let us first recall some concepts that help us to better understand fuzzy covering.

Definition 2. (see [33, 34]). Let be a finite universe and be the fuzzy power set of . For each , we call with , a fuzzy -covering of , if for each . is called a fuzzy -covering approximation space. If for each , then is called a fuzzy covering of U. is called a fuzzy covering approximation space. for each , then is called a fuzzy partition of U. We call a fuzzy partition approximation space.

Definition 3. (see [35]). Let be a mapping . is called the degree of similarity between fuzzy sets and , if satisfies the following properties:(1)(2) = (3)if , then Some similarity measures are listed as follows:The fuzzy set in this paper is constructed by fuzzy similarity relation which satisfies the following properties. For any ,(1)(2)For a fuzzy similarity relation , , and , the membership of belonging to fuzzy set is denoted asObviously, if , it means that certainly belongs to . Conversely, if , it indicates that certainly does not belong to . is also called a fuzzy similarity class associated with on . Therefore, the set of fuzzy similarity classes constructed by relation is a fuzzy covering of universe .
In the following, we investigate the validity and related properties of the fuzzy covering of the raw dataset.

Definition 4. Let be a universe. is the fuzzy similarity relation on , and is the similarity relation on . is a fuzzy covering of constructed by fuzzy similarity relation R. For any , is the set of similarity objects with . is defined as a valid fuzzy covering of with respect to , if the following condition holds:where .
It is easy to know that the value of depends on and and the choice of . is generally assigned no less than 0.8. The closer the is to 1, the more relation the expresses the structure of sample space. If is less than 0.5, the fuzzy covering of the universe is invalid. The fuzzy covering satisfies that similar objects in have corresponding similar fuzzy classes, so the fuzzy covering more fully reflects the original distribution of objects in .

Proposition 1. Let , then .

Proof. It can be easily verified by the definition.

Remark 1. Let and be two valid fuzzy coverings of with respect to the same . We choose fuzzy covering with a larger validity index as research data.

4. Three-Way Clustering

4.1. Rough Fuzzy C-Means Algorithm Based on Fuzzy Covering

In this section, we discuss the rough fuzzy C-means algorithm with fuzzy covering. The reason for clustering with fuzzy covering is that each fuzzy similarity class can reflect the relationship with the whole dataset, avoiding the disadvantage of excessive loss of clustering information with raw data.

The combination of fuzzy set and rough set provides an important direction for uncertain reasoning. Lingras [36] developed rough C-means (RCM) by combining the C-means clustering algorithm with rough set theory. The new clustering center is only related to the positive region and the boundary region, unlike fuzzy C-means (FCM) [37], which is related to all objects. Since there is no membership involved, rough C-means (RCM) cannot effectively deal with the uncertainty caused by overlapping boundaries. In such circumstances, Mitra et al. [25] proposed a rough fuzzy C-means (RFCM) algorithm in which it combines the advantages of both fuzzy set and rough set into the framework of the C-means clustering algorithm. When dividing objects into approximation regions, replacing the absolute distance with a fuzzy membership is the innovation of the rough fuzzy C-means. This adjustment enhances the robustness of the clustering to deal with overlapping situations. Maji et al. [26] modified the calculation of the new clustering center in the RFCM model by assuming that the objects in the lower approximation have definite weights and the objects in the boundary have fuzzy weights. In what follows, we discuss the rough fuzzy C-means of fuzzy covering (FRFCM) algorithm, which is an RFCM algorithm based on fuzzy covering of the universe.

Suppose is a valid fuzzy covering of . The cluster centers are denoted as . In the FRFCM algorithm, is divided into clusters . The membership of to the cluster iswhere is the distance between and , , and . The parameter is the fuzzifier greater than 1.

A two-category dataset is taken to explain the influence of different parameters on classification. The membership degree of each object belonging to each cluster can be considered as a function which is related to relative distances and the fuzzifier parameter. Then, formula (6) translates to the following form:where denotes the relative distance of an object with respect to one of the clusters.

The uncertainty caused by different fuzzifier parameter can be illustrated in Figure 1.

It is easily to obtain that if the value of tends to 1, the memberships are most crisp, as well as the uncertainty of the system is reduced which is suitable for three-way clustering. In this circumstance, only objects that are approximately the same distance from each cluster center are divided into boundary regions. In addition, the parameter cannot be assigned with a very large value because as the value increases, the memberships of objects around the center of the cluster will be assigned to 1 and most objects are divided into boundary region which will increase the uncertainty of the system and the error rate of decision-making. Furthermore, the positive region of cluster may become empty.

The center vectors are updated as follows:where and can be considered as the contributions to the center by the fuzzy lower region and fuzzy boundary region, respectively. denotes the boundary region of cluster , where and are the lower and upper approximations of cluster with respect to relation R, respectively. The weighted values and usually satisfy and . In this paper, we take and .

The approximation regions are determined by the FRFCM algorithm with the following principles: if , where and , then , It also means . In this case, cannot be divided into the positive region of any clusters. Otherwise, and . Due to the particularity structure of the fuzzy covering of , the results of fuzzy covering clustering can well reflect the clustering results of the raw dataset through the above FRFCM algorithm.

4.2. Acquisition of Thresholds for Three-Way Clustering

In this section, we firstly review the shadowed set model for computing thresholds. Then, a novel method of calculating thresholds is proposed by combining the linear and nonlinear fuzzy entropy.

The FRFCM algorithm is an important tool to deal with imprecise, incomplete, and inconsistent data. The thresholds in FRFCM which determines the formation of approximation regions should be carefully selected. The unreasonable thresholds may cause the partition of approximate regions to be distorted, and clustering centers may deviate from the expected locations. Therefore, we should compute the partition thresholds scientifically according to some principles.

There are many methods to obtain the thresholds, and the most popular method is the shadowed set [38]. In fact, the shadowed set adopts the method of elevating and reducing membership degree, which divides the domain of fuzzy set into three regions. The corresponding membership function is as follows:where is the membership function of fuzzy set .

In the following study, only discrete fuzzy systems are considered, and similar models and conclusions can be obtained for continuous fuzzy systems. According to shadowed sets theory, the following formula is proposed to calculate the minimum value to obtain the optimal thresholds and :

However, the semantic interpretation of obtaining threshold pairs by using the above method is not very clear. Because the shadowed set model can not reasonably explain the relationship between the obtained shadowed set and the fuzziness of the raw fuzzy set, further research is needed. Various methods for measuring uncertainty are described in the literature [39]. Fuzzy entropy is an important tool to measure the uncertainty of fuzzy set and meets the following requirements.

Definition 5. (see [40]). Let be a fuzzy set on the universe of discourse . The fuzzy entropy of fuzzy set is the mapping , which satisfies the following four conditions:(1) if (2)(3), if or , then (4)It is easy to verify that, for any , or , the value of corresponding entropy function is 0, then the fuzzy entropy of the fuzzy set equals 0; i.e., the uncertainty of the fuzzy set is the minimum. When holds for any , the value of corresponding entropy function is 1, then the fuzzy set has maximum uncertainty. The commonly used linear and nonlinear fuzzy entropy functions are listed as follows [4143]:With the above fuzzy entropy functions of fuzzy measure, the corresponding fuzzy entropy of the fuzzy set can be easily obtained as follows:The basic idea of calculating the thresholds by fuzzy entropy is to reduce the uncertainty of the membership of the objects which are the elevating or reducing operation in the shadowed set to 0, while the membership of objects corresponding to the middle part in the shadowed set is adjusted to the maximal uncertainty; i.e., the fuzzy degree increases to 1. In what follows, we propose a flexible fuzzy entropy method which combines the linear fuzzy entropy function and nonlinear fuzzy entropy function to obtain the clustering thresholds. Then, the calculation model is as follows:where is a parameter adjusting the impacts of linear entropy and nonlinear entropy.
In equation (13), when , only linear fuzzy entropy function is used to calculate the thresholds. If , only nonlinear fuzzy entropy function is used to calculate the thresholds. The smaller the value of , the more the influence brought from the linear fuzzy entropy, and vice versa. In the subsequent experiments of this study, we assign .
Figure 2 illustrates the increase and decrease in fuzzy degree of the fuzzy entropy function by taking the linear fuzzy entropy function , the nonlinear fuzzy entropy function , and the fuzzy entropy function which is combined by and with equal weight as examples.
It can be seen from Figure 2 that the curve of flexible fuzzy entropy function lies between the curve of linear and nonlinear entropy functions. The method of using flexible fuzzy entropy to obtain the thresholds can prevent the uncertainty of fuzzy set measured by linear or nonlinear fuzzy entropy from being too small or too large, which leads to the partition thresholds unreasonable.
Thresholds used in RFCM and its related algorithms are usually user-defined. However, the threshold calculated by the above model can not only be interpreted from the change in fuzzy degree of fuzzy set but also be adjusted and optimized automatically.
According to and , the positive, boundary, and negative regions of each cluster can be expressed aswhere is the membership degree of the object belonging to the class.

4.3. Boundary Region Processing of Three-Way Clustering Based on kNN Algorithm

Following the above discussion on automatically selecting the optimal partition thresholds based on fuzzy entropy theory, this section will present the object processing in the boundary regions of three-way clustering.

In the three-way clustering, the boundary region objects are rarely further processed. k-nearest neighbor (kNN) algorithm [44] is a well-known nonparametric classifier, which is considered as one of the simplest methods in data mining and pattern recognition. The principle of the kNN algorithm is to find k nearest neighbors of a query in dataset and then predicts the query with the major class in the k nearest neighbors. In this paper, the kNN algorithm will be utilized to process the objects in the boundary regions. If the object does not find a positive region, it is still classified to the boundary region. Therefore, the uncertainty of the boundary region decreases with the decrease in the number of objects in the boundary region, and reclassifying the objects in the boundary region can improve the accuracy of the three-way clustering.

The details of updating the boundary region with the kNN algorithm are as follows.

Because the kNN algorithm mainly relies on limited adjacent objects for classification, it is more suitable than other methods for the overlap of class domain or the object set to be classified at the boundary region. Therefore, Algorithm 1 can handle the uncertain arising from the boundary region. Of course, dealing with the boundary region with the k-nearest neighbor algorithm will add extra computing burden and may also face the risk of misclassification of objects.

: a set of objects , the cluster centers , the positive region , boundary region , and the optimal value of k.
: the updated positive region and boundary region
: calculate the distance between and other objects, where ;
: find the region where the k points with the smallest distance are located;
: is the number of k objects in the positive region of class , where . is the number of k objects in the boundary region, and . If there is only one cluster , such that , then and else
: repeat Steps 1–3 until all boundary objects have been computed.

In what follows, based on valid fuzzy covering, FRFCM and kNN algorithms, we proposed a three-way clustering algorithm, which is called the kNN-FRFCM algorithm, and it can be formed, as shown in Algorithm 2.

Input: the valid fuzzy covering of universe , the cluster centers , and the initial fuzzy membership degrees ;
Output: the positive, boundary, and negative regions of each cluster, respectively.
: compute the optimal partition thresholds and for each cluster using formula (13);
: according to formula (14), determine the positive region , boundary region , and for each cluster by , , and fuzzy partition matrix ;
: update each clustering region by Algorithm 1;
: update the membership partition matrix by formula (6);
: update the cluster center with formula (8);
repeat Step 1 to Step 5 until convergence is reached;
: the results of fuzzy covering clustering are replaced by the corresponding objects in the universe.

Thus, according to Algorithm 2, we obtain three-way clustering results of the original dataset by using the valid fuzzy covering.

5. Experiment Analysis

Three-way clustering method based on fuzzy covering proposed in this paper is suitable for dataset with less data and dimension or data with similar amount of data and dimension. Otherwise, clustering with the fuzzy covering constructing by the data with a large amount of data and few dimension will cause the curse of dimensionality. In this paper, six datasets include Iris, Breast Cancer Wisconsin (Original) (BCWO) which eliminates the missing data, New thyroid, Seeds, Forest-type mapping (FTM), and CT from UCI Machine Learning Repository [45] for empirical study. On these datasets and their corresponding fuzzy covering, the results of clustering methods including FCM, RCM, RFCM, kNN-RCM, and kNN-RFCM are compared. In order to distinguish the results of the raw dataset and the fuzzy covering with the same algorithm, the clustering algorithms of the fuzzy covering are expressed as FFCM, FRCM, FRFCM, kNN-FRCM, and kNN-FRFCM, respectively. Details of the six datasets are described in Table 1.


No.Datasets objects attributes classes

1Iris15043
2BCWO683102
3New thyroid21553
4Seeds21073
5FTM326274
6CT221362

The partition threshold related to RCM and its related algorithms is set as 0.001. and involved in fuzzy covering are set as 0.8 and 0.9, respectively. The value of k in the kNN algorithm is assigned as 7, and the evaluation indexes such as the normalized mutual information (NMI) [47], ACC [48], and rand index (RI) [49] are utilized to investigate the validity of the algorithm. Furthermore, the reasonable values of fuzzifier involved in all comparison algorithms are greater than 1. and are selected, and the experimental comparison results are listed in Tables 27.


IrisSeeds
NMIACCRINMIACCRI

FCM0.74190.88670.87370.69490.89520.8744
RCM0.73280.84000.88910.66700.88570.8666
RFCM0.74190.88670.87370.66700.88570.8666
kNN-RCM0.77770.90000.88590.67430.89050.8693
kNN-RFCM0.74190.88670.87370.67430.89050.8693
FFCM0.82260.93330.91950.67480.89520.8713
FRCM0.77670.91330.91240.67480.89520.8713
FRFCM0.81120.92670.91600.67770.89520.8742
kNN-FRCM0.79190.92670.91240.67480.89520.8713
kNN-FRFCM0.82260.93330.91950.68520.90000.8770


BCWONew thyroid
NMIACCRINMIACCRI

FCM0.74780.96050.92400.49450.86050.7908
RCM0.73680.95020.92770.55850.87440.8203
RFCM0.75850.96050.93120.59660.88840.8180
kNN-RCM0.73680.95900.92770.55630.90230.7913
kNN-RFCM0.75460.96190.92670.59660.88840.8180
FFCM0.77590.96490.93210.62450.89770.8329
FRCM0.77590.96490.93210.65010.90700.8531
FRFCM0.77590.96490.93210.64480.90230.8523
kNN-FRCM0.77590.96490.93210.65830.91160.8540
kNN-FRFCM0.77590.96490.93210.65830.91160.8540


FTMCT
NMIACCRINMIACCRI

FCM0.72710.89390.90310.31180.81450.6964
RCM0.74750.89900.90390.32960.82350.7080
RFCM0.74110.89900.90180.33090.81900.7133
kNN-RCM0.74750.89900.90390.32960.82350.7080
kNN-RFCM0.74110.89900.90180.35500.83260.7234
FFCM0.78230.90910.91530.43270.83710.7260
FRCM0.78230.90910.91530.43270.83710.7260
FRFCM0.76770.89900.91280.42670.82810.7234
kNN-FRCM0.78230.90910.91530.43270.83710.7260
kNN-FRFCM0.79060.91410.92000.42440.83260.7200


IrisSeeds
NMIACCRINMIACCRI

FCM0.75820.89330.87970.69490.89520.8744
RCM0.73280.84000.88910.66700.88570.8666
RFCM0.73600.87330.87140.67690.88570.8746
kNN-RCM0.77770.90000.88590.67430.89050.8693
kNN-RFCM0.75820.89330.87970.67280.89050.8740
FFCM0.80240.92670.91240.66290.89050.8694
FRCM0.77670.91330.91240.67480.89520.8713
FRFCM0.79910.92000.91970.63450.87620.8643
kNN-FRCM0.79190.92670.91240.67480.89520.8713
kNN-FRFCM0.81360.93330.91970.64800.88570.8622


BCWONew thyroid
NMIACCRINMIACCRI

FCM0.74780.96050.92400.49450.86050.7908
RCM0.73680.95020.92770.55850.87440.8203
RFCM0.73910.95170.92680.60580.88840.8250
kNN-RCM0.73680.95900.92770.55630.90230.7913
kNN-RFCM0.73470.95750.91860.60580.88840.8250
FFCM0.77590.96490.93210.62450.89770.8329
FRCM0.77590.96490.93210.65010.90700.8531
FRFCM0.77780.96490.93400.64480.90230.8523
kNN-FRCM0.77590.96490.93210.65830.91160.8540
kNN-FRFCM0.78890.96780.93760.65830.91160.8540


FTMCT
NMIACCRINMIACCRI

FCM0.72710.89390.90310.31180.81450.6964
RCM0.74750.89900.90390.32960.82350.7080
RFCM0.74110.89900.90180.30230.79190.6954
kNN-RCM0.74750.89900.90390.32960.82350.7080
kNN-RFCM0.74110.89900.90180.32740.81900.7055
FFCM0.77460.90400.91070.43270.83710.7260
FRCM0.78230.90910.91530.43270.83710.7260
FRFCM0.76320.88380.91100.43380.81000.7374
kNN-FRCM0.78230.90910.91530.43270.83710.7260
kNN-FRFCM0.80740.91920.92460.47250.84160.7350

From Tables 27, it can be easily concluded that the selected fuzzy parameters have a significant impact on the performance of all comparison algorithms when dealing with the same dataset. Since the boundary region is the main cause of system uncertainty, thus, too large boundary regions are not required for three-way clustering and we need to pay attention to the uncertainty caused by the fuzzifier in the implementation of the algorithms. Moreover, the clustering results show that kNN-FRFCM algorithm has better performance than the other algorithms in most of cases. This is mainly because it can reduce the uncertainty of the system by reprocessing the objects in the boundary regions. From the clustering results, we can also obtain that the results of clustering based on fuzzy covering are mostly better than the results of clustering with raw data. Therefore, the valid fuzzy covering can replace the raw dataset for clustering, and the clustering results are better than the raw dataset. The premise that fuzzy covering can replace the raw dataset for clustering is to select the appropriate fuzzy similarity relation [46].

6. Conclusions

In this paper, a valid fuzzy covering of the raw dataset is constructed by some principles. Because the similarity between fuzzy similarity classes in the valid fuzzy covering can be used to measure the similarity between objects in the raw dataset, each fuzzy similarity class reflects the connection with the whole dataset, so valid fuzzy covering instead of the raw data for clustering can improve the precision of clustering. From the perspective of semantic explanation of uncertainty change in fuzzy sets, we investigate the method of combining linear fuzzy entropy with nonlinear fuzzy entropy to obtain decision threshold pairs. The advantage of calculating thresholds method in this paper not only objectively obtains the classification thresholds based on the objects intrinsic relations but also the formula is simple and easy to understand, as well as the method of calculating the thresholds avoids the inappropriate subjective assignment. Additionally, the objects in the boundary region obtained by the FRFCM algorithm are reprocessed by the kNN algorithm to reduce the uncertainty of the system.

Furthermore, we will continue to investigate the method of thresholds acquisition and the processing method of boundary region for three-way clustering following the idea of this paper. The three-way clustering in incremental information system is one of the future research directions too.

Data Availability

The experimental data supporting the findings of this study are available on the website provided in this article.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This work was supported by the Science Research Project of Inner Mongolia University for Nationalities with the title “Research on three-way clustering methods of preference linguistic data” (no. NMDYB18030) and Natural Science Foundation of Inner Mongolia Autonomous Region (nos. 2018MS01008 and 2020MS07008).

References

  1. Y. Yao, “Three-way decisions with probabilistic rough sets,” Information Sciences, vol. 180, no. 3, pp. 341–353, 2010. View at: Publisher Site | Google Scholar
  2. Y. Y. Yao, “An outline of a theory of three-way decisions,” in RSCTC 2012, LNCS (LNAI), Springer, Berlin, Germany, 2012. View at: Google Scholar
  3. D. Liang, D. Liu, W. Pedrycz, and P. Hu, “Triangular fuzzy decision-theoretic rough sets,” International Journal of Approximate Reasoning, vol. 54, no. 8, pp. 1087–1106, 2013. View at: Publisher Site | Google Scholar
  4. D. Liang and D. Liu, “Deriving three-way decisions from intuitionistic fuzzy decision-theoretic rough sets,” Information Sciences, vol. 300, pp. 28–48, 2015. View at: Publisher Site | Google Scholar
  5. D. Liang, W. Pedrycz, D. Liu, and P. Hu, “Three-way decisions based on decision-theoretic rough sets under linguistic assessment with the aid of group decision making,” Applied Soft Computing, vol. 29, pp. 256–269, 2015. View at: Publisher Site | Google Scholar
  6. D. Liu, T. R. Li, and D. C. Liang, “Three-way decisions in stochastic decision-theoretic rough sets,” in Transactions on Rough Sets XVIII, Springer, Berlin, Germany, 2014. View at: Google Scholar
  7. Y. Qian, H. Zhang, Y. Sang, and J. Liang, “Multigranulation decision-theoretic rough sets,” International Journal of Approximate Reasoning, vol. 55, no. 1, pp. 225–237, 2014. View at: Publisher Site | Google Scholar
  8. B. Q. Hu, “Three-way decisions space and three-way decisions,” Information Sciences, vol. 281, pp. 21–52, 2014. View at: Publisher Site | Google Scholar
  9. B. Q. Hu, “Three-way decision spaces based on partially ordered sets and three-way decisions based on hesitant fuzzy sets,” Knowledge-Based Systems, vol. 91, pp. 16–31, 2016. View at: Publisher Site | Google Scholar
  10. J. Qi, T. Qian, and L. Wei, “The connections between three-way and classical concept lattices,” Knowledge-Based Systems, vol. 91, pp. 143–151, 2016. View at: Publisher Site | Google Scholar
  11. H. X. Li, X. Z. Zhou, B. Huang, and D. Liu, “Cost-sensitive three-way decision: a sequential strategy,” in Proceedings of the International Conference on Rough Sets and Knowledge Technology, Halifax, NS, Canada, October 2013. View at: Google Scholar
  12. J. Yang, G. Wang, Q. Zhang, Y. Chen, and T. Xu, “Optimal granularity selection based on cost-sensitive sequential three-way decisions with rough fuzzy sets,” Knowledge-Based Systems, vol. 163, pp. 131–144, 2019. View at: Publisher Site | Google Scholar
  13. H. Ju, W. Pedrycz, H. Li, W. Ding, X. Yang, and X. Zhou, “Sequential three-way classifier with justifiable granularity,” Knowledge-Based Systems, vol. 163, pp. 103–119, 2019. View at: Publisher Site | Google Scholar
  14. Y. Fang, C. Gao, and Y. Yao, “Granularity-driven sequential three-way decisions: a cost-sensitive approach to classification,” Information Sciences, vol. 507, pp. 644–664, 2020. View at: Publisher Site | Google Scholar
  15. Y. Yao, S. Wang, and X. Deng, “Constructing shadowed sets and three-way approximations of fuzzy sets,” Information Sciences, vol. 412-413, pp. 132–153, 2017. View at: Publisher Site | Google Scholar
  16. X. Yang, T. Li, D. Liu, H. Chen, and C. Luo, “A unified framework of dynamic three-way probabilistic rough sets,” Information Sciences, vol. 420, pp. 126–147, 2017. View at: Publisher Site | Google Scholar
  17. Q. Zhang, G. Lv, Y. Chen, and G. Wang, “A dynamic three-way decision model based on the updating of attribute values,” Knowledge-Based Systems, vol. 142, pp. 71–84, 2018. View at: Publisher Site | Google Scholar
  18. H. Li, L. Zhang, B. Huang, and X. Zhou, “Sequential three-way decision and granulation for cost-sensitive face recognition,” Knowledge-Based Systems, vol. 91, pp. 241–251, 2016. View at: Publisher Site | Google Scholar
  19. A. V. Savchenko, “Sequential three-way decisions in multi-category image recognition with deep features based on distance factor,” Information Sciences, vol. 489, pp. 18–36, 2019. View at: Publisher Site | Google Scholar
  20. Y. Zhang, Z. Zhang, D. Miao, and J. Wang, “Three-way enhanced convolutional neural networks for sentence-level sentiment classification,” Information Sciences, vol. 477, pp. 55–64, 2019. View at: Publisher Site | Google Scholar
  21. A. Campagner and D. Ciucci, “Three-way and semisupervised decision tree learning based on orthopartitions,” in Proceedings of the International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Cádiz, Spain, June 2018. View at: Google Scholar
  22. H.-R. Zhang and F. Min, “Three-way recommender systems based on random forests,” Knowledge-Based Systems, vol. 91, pp. 275–286, 2016. View at: Publisher Site | Google Scholar
  23. D. Liu, T. Li, and D. Liang, “Three-way government decision analysis with decision-theoretic rough sets,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 20, no. 1, pp. 119–132, 2012. View at: Publisher Site | Google Scholar
  24. Y. L. Liu, L. Pan, X. Y. Jia, C. J. Wang, and J. Y. Xie, “Three-way decision based overlapping community detection,” in Proceedings of the International Conference on Rough Sets and Knowledge Technology, Halifax, NS, Canada, October 2013. View at: Google Scholar
  25. S. Mitra, H. Banka, and W. Pedrycz, “Rough-fuzzy collaborative clustering,” IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), vol. 36, no. 4, pp. 795–805, 2006. View at: Publisher Site | Google Scholar
  26. P. Maji and S. Pal, “RFCM: A hybrid clustering algorithm using rough and fuzzy sets,” Fundamenta Informaticae, vol. 80, no. 4, pp. 475–496, 2007. View at: Google Scholar
  27. P. Maji and S. K. Pal, “Rough set based generalized fuzzy C-means algorithm and quantitative indices,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 37, no. 6, pp. 1529–1540, 2007. View at: Publisher Site | Google Scholar
  28. X. Deng and Y. Yao, “An information-theoretic interpretation of thresholds in probabilistic rough sets,” in Proceedings of the International Conference on Rough Sets and Knowledge Technology, Chengdu, China, August 2012. View at: Google Scholar
  29. J. Zhou, W. Pedrycz, and D. Miao, “Shadowed sets in the characterization of rough-fuzzy clustering,” Pattern Recognition, vol. 44, no. 8, pp. 1738–1749, 2011. View at: Publisher Site | Google Scholar
  30. Y. Yao, “Three-way decisions and cognitive computing,” Cognitive Computation, vol. 8, no. 4, pp. 543–554, 2016. View at: Publisher Site | Google Scholar
  31. X. Deng and Y. Yao, “Decision-theoretic three-way approximations of fuzzy sets,” Information Sciences, vol. 279, pp. 702–715, 2014. View at: Publisher Site | Google Scholar
  32. L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, no. 3, pp. 338–353, 1965. View at: Publisher Site | Google Scholar
  33. L. Ma, “Two fuzzy covering rough set models and their generalizations over fuzzy lattices,” Fuzzy Sets and Systems, vol. 294, pp. 1–17, 2016. View at: Publisher Site | Google Scholar
  34. C. Wang, D. Chen, and Q. Hu, “Fuzzy information systems and their homomorphisms,” Fuzzy Sets and Systems, vol. 249, pp. 128–138, 2014. View at: Publisher Site | Google Scholar
  35. G. Deng, Y. Jiang, and J. Fu, “Monotonic similarity measures between fuzzy sets and their relationship with entropy and inclusion measure,” Fuzzy Sets and Systems, vol. 287, pp. 97–118, 2016. View at: Publisher Site | Google Scholar
  36. P. Lingras and C. West, “Interval set clustering of web users with rough k-means,” Journal of Intelligent Information Systems, vol. 23, no. 1, pp. 5–16, 2004. View at: Publisher Site | Google Scholar
  37. J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, NY, USA, 1981.
  38. W. Pedrycz, “Interpretation of clusters in the framework of shadowed sets,” Pattern Recognition Letters, vol. 26, no. 15, pp. 2439–2449, 2005. View at: Publisher Site | Google Scholar
  39. G. J. Klir and T. A. Folger, Fuzzy Sets, Uncertainty, and Information, Prentice-Hall, Englewood Cliffs, NJ, USA, 1988.
  40. X. C. Liu, “Entropy, distance measure and similarity measure of fuzzy sets and their relations,” Fuzzy Sets and Systems, vol. 52, no. 3, pp. 305–318, 1992. View at: Publisher Site | Google Scholar
  41. N. R. Pal and J. C. Bezdek, “Measuring fuzzy uncertainty,” IEEE Transactions on Fuzzy Systems, vol. 2, no. 2, pp. 107–118, 1994. View at: Publisher Site | Google Scholar
  42. G. J. Klir, U. H. St. Clair, and B. Yuan, Fuzzy Set Theory: Foundations and Applications, Prentice-Hall, Englewood Cliffs, NJ, USA, 1997.
  43. K. Yao, “Sine entropy of uncertain set and its applications,” Applied Soft Computing, vol. 22, pp. 432–442, 2014. View at: Publisher Site | Google Scholar
  44. A. Mucherino, P. J. Papajorgji, and P. M. Pardalos, “K-nearest neighbor classification,” Data Mining in Agriculture, Springer, Berlin, Germany, 2009. View at: Google Scholar
  45. D. Dua and C. Graff, UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA, USA, 2020, http://archive.ics.uci.edu/ml.
  46. Q. Q. Gu, C. Ding, and J. W. Han, “On trivial solution and scale transfer problems in graph regularized NMF,” in Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 1288–1293, Barcelona, Catalonia, Spain, July 2011. View at: Google Scholar
  47. Y. Lei, J. C. Bezdek, J. Chan, N. X. Vinh, S. Romano, and J. Bailey, “Extending information-theoretic validity indices for fuzzy clustering,” IEEE Transactions on Fuzzy Systems, vol. 25, no. 4, pp. 1013–1018, 2017. View at: Publisher Site | Google Scholar
  48. W. Xu, X. Liu, and Y. Gong, “Document clustering based on non-negative matrix factorization,” in Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 267–273, Toronto Canada, July 2003. View at: Google Scholar
  49. R. J. G. B. Campello, “A fuzzy extension of the rand index and other related indexes for clustering and classification assessment,” Pattern Recognition Letters, vol. 28, no. 7, pp. 833–841, 2007. View at: Publisher Site | Google Scholar

Copyright © 2020 Dandan Yang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views110
Downloads258
Citations

Related articles

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.