Uncertain Distribution-Based Similarity Measure of Concepts
The similarity of concepts is a basic task in the field of artificial intelligence, e.g., image retrieval, collaborative filtering, and public opinion guidance. As a powerful tool to express the uncertain concepts, similarity measure based on cloud model (SMCM) is always utilized to measure the similarity between two concepts. However, current studies on SMCM have two main limitations: (1) the similarity measures based on conceptual intension lack interpretability for merging the numerical characteristics and cannot discriminate some different concepts. (2) The similarity measures based on conceptual extension are always instable and inefficient. To address the above problems, an uncertain distribution-based similarity measure of cloud model (UDCM) is proposed in this paper. By analyzing the definition of the CM, we propose a new complete uncertainty including first-order and second-order uncertainty to calculate the uncertainty more accurately. Then, based on the difference between the complete uncertainty of two concepts, the computing process of UDCM and its some properties are introduced. Finally, we exhibit its advantages by comparing with other methods and verify its validity by experiments.
The similarity of concepts is a basic sense for human cognition, which is also a fundamental task in artificial intelligence. It plays a crucial role in semantic information retrieval systems [1–3], sense disambiguation [1, 4], and information extraction [3, 5]. Due to the uncertainty of the concept, for different people, their cognitions for the same concept are also different [6–10]. To describe various forms of uncertainty, researchers proposed many models: probability models for randomness , fuzzy set models for vagueness , and rough set models for inconformity and incompleteness . Cloud model (CM)  proposed by Li et al. described the uncertain concepts in human language and realized human bidirectional cognition between their extension and intension. Owing to the strong expression ability for uncertain concepts, similarity measure based on cloud model (SMCM) has been applied in many scenarios, such as image segmentation [15–18], collaborative filtering [19–21], and synthetic evaluation [22–26]. Although SMCM has been successful in many scenarios, there are two urgent problems which need to be addressed.
From the perspective on conceptual intension, the similarity of the numerical characteristics can depict the similarity of concepts. To express the complex form of uncertainty, different numerical characteristics express different meaning for an uncertain concept. There lacks a reasonable method of merging numerical characteristics to measure similarity.
Example 1. Figure 1 shows the shooting results of three shooters. We are asked to evaluate the similarity of performances between , , and . It is an intractable problem. As shown in Figure 1, the variance of ’s shooting results is close to , which means the same psychological state is shared by them, but the mean value of ’s performance is far from , which means a significant difference between their shooting levels. Although ’s shooting results are scattered, the mean value of performance is close to . For the psychological state, is more close to than , but is more close to than from the aspect of shooting level. That is, considering the similarity from different angles, different results can be obtained. Hence, we need a more comprehensive method to measure the similarity.
On the contrary, the similarity measure is always computed by random realization, which directly reflects the similarity from abundant samples of concepts. The more the samples are generated, the higher the accuracy of SMCM is. Therefore, we have to spend excessive computing time acquiring accurate SMCM. Besides, due to random realization, the results of SMCM are different each time, which can be illustrated in the following example:
Example 2. Supposing two cloud models and , their similarity is computed by fuzzy distance-based similarity (FDCM) , wherein the estimated overall score is with . Table 1 shows results of their 10 similarity measures.
From the above analyses, it can be seen that SMCM still needs further study. To address the above problems, in this paper, we propose a new notion called complete uncertainty to depict the whole uncertainty in the process from numerical characteristics to the conceptual extension. Then, a new SMCM is presented based on completed uncertainty, which reflects the similarity of the uncertain distribution of two concepts. Compared with the SMCM based on extension, the new SMCM has an invariable result. Besides, it has a more reasonable method to merge numerical characteristics compared with the SMCM based on intension. Moreover, because that new SMCM reflects the complete uncertainty of the CM, it can acquire more accurate similarity results between two concepts.
The remainder of this paper is organized as follows. The related definitions of the CM and current SMCM are introduced in Section 2. We propose uncertain distribution-based SMCM and elaborate the calculation in detail in Section 3. Section 4 provides four experiments to show the effectiveness of the proposed method. Final conclusions are presented in Section 5.
In this section, we review relative concepts of the CM and current methods for similarity of the cloud model.
2.1. Cloud Model
Definition 1 (see ). Let be a nonempty infinite set and . If there is a number , which is a random realization of the concept and satisfies , where and the certainty degree of on is , then the distribution of on is a Gaussian cloud or normal cloud, and each is defined as a Gaussian cloud drop.
As a crucial model of CM, the Gaussian cloud model is applied widely due to the universality of the Gaussian distribution, and we only discuss the Gaussian cloud model in this paper. Gaussian cloud model introduces three numerical characteristics including , , and , which denote mathematical expectation, entropy, and hyperentropy. It accords with human thought [14, 28–32] and depicts a unified framework of randomness and vagueness in the human cognitive process. Herein, the expectation represents the basic determinate domain of the qualitative concept, the entropy represents the uncertainty for the qualitative concept, and the hyperentropy represents the uncertainty for entropy. Figure 2 shows the shape and the characteristic curves of CM , i.e., curve , which is called the inner envelope, and , which is called the outer envelope. We also call curve the expectation curve.
In cognitive computing, cloud drops are called conceptual extension, i.e., samples of a concept. The numerical characteristics, expectation , entropy , and hyper-entropy , are called conceptual intension representing the essence of a concept. There are many algorithms to implement bidirectional transformation between extension and intension of the CM, which can be called forward cloud transformation (FCT) algorithm and backward cloud transformation (BCT) algorithm. Due to the limitation of space, we do not expatiate these algorithms in this paper, and relative methods can be found in [14, 32, 33].
2.2. Similarity Measure of Cloud Model
To describe the similarity between two cloud concepts, many SMCM are proposed currently. Generally speaking, a suitable similarity measure should assure the correct conclusions in specific situations and require discriminability, efficiency, stability, and interpretability. Zhang et al.  used the average distance of cloud drops generated by FCT to measure the distance between two cloud models. This method is called concept extension-based similarity measure (CS). It is understandable and accords with human cognition. However, calculation of average distance is highly complex and instable. Other researchers study SMCM based on concept intension. Likeness comparing method based on cloud model (LICM)  employs included angle cosine of vectors composed by numerical characteristics to measure similarity. It has high efficiency, but ignores the relationships among numerical characteristics leading to unreasonable results sometimes. Li et al.  proposed expectation based on cloud model (ECM) and max boundary based on cloud model (MCM), which define similarity measure by overlapping the area of characteristic curves. These methods employ characteristic curves to denote the certainty degrees of cloud drops and then use similarity of uncertainty to measure similarity of concepts. Other SMCM based on characteristic curves can be found in [16, 27, 36, 37]. Table 2 shows the comparison of similarity measures mentioned above on discriminability, efficiency, stability, and interpretability perspectives. In reality, due to merits on high efficiency, stability, and interpretability, MCM and ECM are widely applied in many situations.
From Table 2, the current popular SMCM are inefficient to distinguish two different concepts except CS because they only use partial uncertain relationship to measure the similarity between concepts. For example, if two cloud models have same expectation and entropy, their similarity calculated by ECM equals to 1, although they have different hyperentropy. In other words, ECM only uses the uncertain relationship between expectation and entropy, which is called first-order uncertainty. Ignoring hyperentropy will cause incorrect results. In the next section, we define a new notion of CM called complete (first-order and second-order) uncertainty and propose a new SMCM based on complete uncertainty, which focuses on the difference between distributions of two concepts.
3. Uncertain Distribution-Based SMCM
3.1. Complete Uncertainty of the Cloud Model
From discussion in Section 2, it is a critical step for SMCM based on conceptual intension to compute the whole uncertainty of CM. Due to the principle, we only consider cloud drops of CM in the interval . Let be a certainty degree on cloud drop ; the uncertainty of CM is calculated by . This integral is very complex because also varies depending on the random variable . So, there are two forms of uncertainty in Definition 1. First-order uncertainty is certainty degree of drops, and second-order uncertainty is the uncertainty of certainty degree. The complete uncertainty should include two uncertainties and reflect the relationships among three numerical characteristics. Figure 3 shows the two forms of uncertainty and their relationships. Hence, for each cloud drop, its uncertainty is a fuzzy set , where is the membership function depending on . Then, the complete uncertainty of CM is denoted by
3.2. Uncertain Distribution-Based SMCM
Based on formula (1), uncertain distribution-based similarity measure of cloud model (UDCM) can be defined as the following:
Definition 2. Let and be two cloud models; uncertain distribution-based similarity measure of cloud model between and is defined as the following:where is the length of , and and are uncertainties of x for and , respectively.
In Definition 2, is a similarity measure of the fuzzy set. Therefore, UDCM is a framework of SMCM. There are various UDCM depending on different definitions of . In this paper, we define as the following:where and are two fuzzy sets on universal set and is the cardinal number of the fuzzy set.
Figure 4 shows the illustration of UDCM. Two horizontal scatter diagrams represent the relationship between cloud drops and certain degree, i.e., first-order uncertainty. Two vertical curves represent the uncertainty of certain degree, where , i.e., second-order uncertainty. Owing to formula (3), is the ratio of the blue area to the green area. From Figure 4, we know that , so .
Next, we try to calculate for each . Its uncertainty is a fuzzy set:Firstly, we should find the relationship between membership functions of two fuzzy sets when their elements are equal. For each , let and be variables for membership functions of and , respectively. Sincewe haveThen,So, (2) can be written asIt is obvious that membership functions of both fuzzy sets and are exponential parts of the Gaussian distribution for each , respectively. Hence, we only calculate the intersection and the union of two exponential parts respective with and . We do not introduce the process of calculation in this paper; more details can be found in .
The remainder is to calculate the integral in formula (8). Unfortunately, its integrand is not an elementary function, and the result of the integral is not an analytic expression. We have to calculate it by the numerical method. In definition of integration, it satisfies the following equationFunction obtains the maximal integer less than . For convenience, we denote function as . There are two cases of integral interval . If two intervals and are intersecting, let and , and can be calculated byIf two intervals and are separated, let and , and can be calculated byIn order to explain calculation of Definition 2 clearly, we compute UDCM of and . Since and , we have .
We divide into 9 intervals, while and . For each point , is calculated which is given in Table 3.
Then, we calculate . Although this result is imprecise while , we can calculate a more precise value with increasing, and the result converges to a stable value ultimately. Figure 5 shows that converges to 0.0724 with increasing, which illustrates that we can obtain a stable result by Definition 2.
There is a special situation of Definition 2 which is as follows.
Theorem 1. Let and be two cloud models; if they satisfy and , the UDCM of them is
Proof. Since and , according to Definition 2, , and , and . For each , the value of is invariable. Without loss of generality, we assume . Due to , there areHence, the similarity of and isThe UDCM of and isIn order to illustrate UDCM has a high ability to distinguish two different concepts, we have the following theorem.
Lemma 1. Let be an integrable function and be an interval. satisfies , where is the length of interval if and only if holds on , almost everywhere.
Proof. To prove this lemma, we must employ measure theory. Sufficiency: suppose that holds on , almost everywhere. Due to countable additivity of integration,where means the set of satisfying .
Necessary: suppose is a natural number.where is the Lebesgue measure. Due to , for each , there is . The set can be denoted by . Hence, .
Theorem 2. Let and be two cloud models. if and only if , , and .
Proof. Obviously, sufficiency can be proved by Theorem 1. Necessary: in Definition 2, . Because of Lemma 1, we know that holds on , almost everywhere. It induces that, for any and , the equationis satisfied. As we know, function is determined uniquely by parameters and . Hence, , , and hold.
4. Experimental Analysis
4.1. Comparison with Other SMCM
In order to demonstrate high discriminability of UDCM clearly as Theorem 2 claimed, we calculate similarity among four cloud models, , , , and , using different similarity measures. It is obvious that , , , and are four different CMs. Table 4 shows the similarity values of different SMCM. For LICM, due to proportional numerical characteristics of two cloud models, the angle cosine of two vectors composed by numerical characteristics of and is equal to 1. It only distinguishes the difference in the shape of CM, which is unrelated to expectation. and have the same expectation and entropy. Their difference cannot be found using ECM. Analogously, the difference between and cannot be found using MCM because they have the same outer envelope. UDCM can distinguish the difference between any two different CMs from the perspective of uncertain distribution. It equips high discriminability which is significant to artificial intelligence domains. We will analyze its merits and demonstrate its performance in experiments as the following.
4.2. Classification of Time Series
An appropriate SMCM is important for the time series classification. BCT reserves uncertain features in processing reduction of the dimensions. CM has been applied in many relative domains of time series . Besides, similarity measure is the other critical factor to classifying after dimension reduction. In order to test the performance, we conduct the experiment to compare these measures based on the standard dataset. We download the synthetic control chart dataset from the machine learning data repository, University of California at Irvine . There are 600 time series and 6 classes in this dataset. We randomly divide the dataset into six equal portions and successively adopt cross-validation based on these portions. For each database, the training set contains 500 data, and the testing set contains 100 data. We employ misclassification rate to evaluate performances of LICM, ECM, MCM, and UDCM as follows:
As shown in Figure 6, the performance of UDCM surpasses other methods. LICM only captures tendency information, and the loss of distribution information results in high misclassification rate. Compared with ECM and MCM, UDCM utilizes more accurate uncertainty to measure similarity and can capture the second-order uncertain relationship to describe similarity more appropriately.
4.3. Shooting Experiment
In order to verify similarity results in accordance with the uncertainty distribution of concepts, we measure the similarity of four shooters’ performance. We suppose cloud models of four shooters’ performance , , , and . Cloud drops represent the deviation of hitting from bull’s eye. Results of shooting are denoted as the 10-point system (off-target denoted as 0 point). Table 5 exhibits the score statistics of 100 times shooting simulated by FCT with respect to each shooter. The results of shooting and histograms of score statistics are shown in Figure 7. Similarities of score distributions are calculated by Jensen–Shannon divergence aswhere . For , similarities are defined as
Table 6 shows the similarity between and other shooters by different similarity measures. After 100 times shooting, the sort of uncertain distribution similarities is . In LICM, the sort is . In ECM, the sort is . Compared with other SMCM, the similarity by UDCM is in accordance with the similarity of shooting result distribution. It can be seen that UDCM takes into account more comprehensive similarity than other conceptual intension-based SMCM.
4.4. Application in Multicriteria Group Decision-Making
In multicriteria group decision-making (MCDM), the linguistic variable is a good choice to express personal sense. There are various linguistic variables applied widely in different fields, e.g., 2-tuple linguistic model [41, 42], probabilistic uncertain linguistic model [43, 44], intuitionistic fuzzy linguistic model , and hesitant fuzzy linguistic model [46, 47]. All of the aforementioned linguistic models can only describe the fuzziness but not randomness. Wang et al.  proposed conversion between the linguistic variable and the cloud model, which has the ability to describe fuzziness and randomness in human cognition. Based on their work, other researchers utilized cloud model in group decision-making [27, 48, 49]. As shown in Exampled 1 and 2, we have to cost more running time to acquire correct results. In this section, we will demonstrate the merit of UDCM in MCDM.
Example 3. (continued from Example 2). As shown in Figure 8, the similarity between and is calculated by FDCM with different times of Monte Carlo simulation. The results shock around 1 with increasing and exceed 1 sometimes. Therefore, FDCM is not a real similarity measure of cloud model. Figure 9 shows the convergence of FDCM and UDCM. UDCM is tending towards stability with increasing, while it holds on for FDCM.
To acquire a more accurate similarity, Wang et al.  claimed that FDCM has to execute no less than 50,000 Monte Carlo simulations to obtain a stable estimated overall score . In the remainder, we calculate the stable FDCM with . To verify our method validity in MCDM, we make a decision for the application example in  by UDCM. The experiment is run on a personal computer with Windows 10 and Inter (R) Core (TM) i7-7700 CPU 3.6 GHz, and DDR3, 16 GB memory. Matlab R2015b software is used. Let the consensus threshold ; we make the decision using FDCM and UDCM, respectively, in MCDM. Their final group decision matrices, ranking result, and cost time are listed in Table 7, where the parameter is the number of times for executing the Monte Carlo simulation to obtain the overall score of the cloud model. Although the decision results are consistent, UDCM only takes one-sixth of the time by FDCM. Hence, UDCM is an efficient method for FDCM.
Similarity of concepts is a fundamental study in uncertain artificial intelligence. By utilizing FCT and BCT, bidirectional cognitive transformation between intension and extension of a concept is realized by the CM. Furthermore, CM reflects the uncertainty of qualitative concept itself and, meanwhile, reveals the objective relationship between probability and fuzziness in the uncertain concept. As a significant expression, distribution is always utilized to describe the uncertain phenomenon. Based on this, we propose a new similarity measure UDCM and introduce its calculation in detail. Due to employing complete uncertainty, it acquires similarity results in accordance with the uncertain distribution and then gives some valuable consultations in synthetic evaluation. Besides, UDCM has merits of discriminability and stability and is an effective tool for cognitive computing. Finally, UDCM is also a framework of SMCM. Employing different forms of second-order uncertainty will result in different results. In the future, selection of uncertain forms for different situations also deserves to be studied.
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported by Innovation and Exploration Project of Guizhou Province (QKHPTRC 5727–06), PhD Initiation Fund of Zunyi Normal University (ZSBS04), and PhD Training Program of Chongqing University of Posts and Telecommunication (no. BYJS201902).
H. A. Nguyen and H. Al-Mubaid, “New ontology-based semantic similarity measure for the biomedical domain,” in Proceedings of the 2006 IEEE International Conference on Granular Computing, pp. 623–628, Atlanta, USA, 2006.View at: Google Scholar
K. M. Sim and P. T. Wong, “Web-based information retrieval using agent and ontology,” in Proceedings of the Asia-Pacific Conference on Web Intelligence, pp. 384–388, Maebashi, Japan, 2001.View at: Google Scholar
V. Rus, N. Niraula, and R. Banjade, “Similarity measures based on latent dirichlet allocation,” in Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, pp. 459–470, Karlovasi, Samos, Greece, 2013.View at: Google Scholar
R. Laza, R. Pavn, M. Reboiro-Jato, and F. Fdez-Riverola, “Assessing the suitability of mesh ontology for classifying medline documents,” in Proceedings of the 5th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2011), pp. 337–344, Salamanca, Spain, 2011.View at: Google Scholar
R. R. Yager, “Modeling uncertainty using partial information,” Information Sciences, vol. 121, no. 3-4, pp. 271–294, 1999.View at: Google Scholar
D. Y. Li, C. Y. Liu, Y. Du, and X. Han, “Artificial intelligence with uncertainty,” Journal of Software, vol. 15, no. 11, pp. 1583–1594, 2004.View at: Google Scholar
D. Y. Li and Y. Du, Artificial Intelligence with Uncertainty, IEEE, Piscataway, NJ, USA, 2017.
B. Vahid, A. Jahangiri, and S. G. Machiani, “Multi-class us traffic signs 3d recognition and localization via image-based point cloud model using color candidate extraction and texture-based recognition,” Advanced Engineering Informatics, vol. 32, pp. 263–274, 2017.View at: Google Scholar
G. W. Zhang, D. Y. Li, and P. Li, “A collaborative filtering recommendation algorithm based on cloud model,” Journal of Software, vol. 18, no. 10, pp. 2403–2411, 2007.View at: Google Scholar
Y. P. Xiao, H. C. Sun, and T. J. Dai, “A rating prediction method based on cloud model in social recommendation system,” ACTA Electronica Sinica, vol. 46, no. 7, pp. 1762–1776, 2018.View at: Google Scholar
J. G. Sun and L. R. Ai, “Collaborative filtering recommendation algorithm based on item attribute and cloud model filling,” Journal of Computer Applications, vol. 32, no. 3, pp. 658–660, 2012.View at: Google Scholar
C. Liu and D. Y. Li, “Study on the universality of the normal cloud model,” Engineering Science, vol. 6, no. 8, pp. 28–34, 2004.View at: Google Scholar
D. Y. Li, H. J. Meng, and X. M. Shi, “Membership clouds and membership cloud generators,” Journal of Computer Research and Development, vol. 6, pp. 15–20, 1995.View at: Google Scholar
J.-Q. Wang, P. Wang, J. Wang, H.-Y. Zhang, and X.-H. Chen, “Atanassov’s interval-valued intuitionistic linguistic multicriteria group decision-making method based on the trapezium cloud model,” IEEE Transactions on Fuzzy Systems, vol. 23, no. 3, pp. 542–554, 2015.View at: Publisher Site | Google Scholar
Y. Zhang, D. N. Zhao, and D. Y. Li, “The similar cloud and measurement method,” Information and Control, vol. 33, no. 2, pp. 129–132, 2004.View at: Google Scholar
H. Li, C. H. Guo, and W. R. Qiu, “Similarity measurement between normal cloud models,” ACTA Electronica Sinica, vol. 39, no. 11, pp. 2561–2567, 2011.View at: Google Scholar
C. B. Cai, W. Fang, and J. Zhao, “Research of interval-based cloud similarity comparison algorithm,” Journal of Chinese Computer System, vol. 32, no. 12, pp. 2456–2460, 2011.View at: Google Scholar
J. Yang, G. Y. Wang, Q. H. Zhang, and L. Feng, “Similarity measure of multi-granularity cloud model,” Pattern Recognition and Artificial Intelligence, vol. 31, no. 8, pp. 677–692, 2018.View at: Google Scholar
X. Zha and S. H. Ni, “Indirect computation approach of cloud model similarity based on conception skipping,” System Engineering and Electronics, vol. 37, no. 7, pp. 1676–1682, 2015.View at: Google Scholar
UCI Machine Learning Repository, 2020, https://archive.ics.uci.edu/ml/datasets.
T. He, G. Wei, C. Wei, and J. Wang, Codas Method for Pythagorean 2-tuple Linguistic Multiple Attribute Group Decision Making, IEEE Access, Piscataway, NJ, USA, 2019.
H. C. Liu, L. E. Wang, and Z. W. Li, “Improving risk evaluation in fmea with cloud model and hierarchical topsis method,” IEEE Transactions on Fuzzy Systems, vol. 27, no. 1, pp. 84–95, 2018.View at: Google Scholar