Computational Intelligence and Neuroscience

Volume 2018, Article ID 1838639, 9 pages

https://doi.org/10.1155/2018/1838639

## A New Knowledge Characteristics Weighting Method Based on Rough Set and Knowledge Granulation

^{1}Business School, University of Shanghai for Science and Technology, Shanghai 200093, China^{2}Nantong University, Nantong, Jiangsu 226017, China

Correspondence should be addressed to Shiping Chen; moc.qq@86245265

Received 12 November 2017; Revised 14 April 2018; Accepted 26 April 2018; Published 31 May 2018

Academic Editor: Paolo Gastaldo

Copyright © 2018 Zhenquan Shi and Shiping Chen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The knowledge characteristics weighting plays an extremely important role in effectively and accurately classifying knowledge. Most of the existing characteristics weighting methods always rely heavily on the experts’ a priori knowledge, while rough set weighting method does not rely on experts’ a priori knowledge and can meet the need of objectivity. However, the current rough set weighting methods could not obtain a balanced redundant characteristic set. Too much redundancy might cause inaccuracy, and less redundancy might cause ineffectiveness. In this paper, a new method based on rough set and knowledge granulation theories is proposed to ascertain the characteristics weight. Experimental results on several UCI data sets demonstrate that the weighting method can effectively avoid subjective arbitrariness and avoid taking the nonredundant characteristics as redundant characteristics.

#### 1. Introduction

In data mining, in order to effectively classify the knowledge, we need to make proper assessment on the knowledge characteristics sets. Therefore, it is very important to compute the weights of characteristics sets. Weights reflect the role of characteristics in the classification process and directly affect the validity and accuracy of the classifier. The common weighting methods include experts scoring method, fuzzy statistics method [1–3], Analytic Hierarchy Process (AHP) method [4–6], and Principal Component Analysis (PCA) method [7, 8]. In these methods, the a priori knowledge must be used.

Rough set theory was firstly proposed by Pawlak in 1982 [9]. It has become an extremely useful tool to handle the imprecision and uncertainty knowledge [9, 10]. Rough set theory can be used to analyze and process the fuzzy or uncertain data without the a priori knowledge [11–17]. Now, the rough set theory has been widely used in pattern recognition [18–20], data mining [21–23], machine learning [24–29], and other fields [30–36].

In recent years, the rough set method has been studied to calculate the characteristics weight. For instance, based on the concepts of characteristics importance, Wang et al. proposed a method to determine the characteristics weights. However, this method did not consider the influence of decision characteristics on conditional characteristics [37]. Cao and Liang combined the characteristics importance of the rough set and the experts’ a priori knowledge to determine the characteristics weight [38]. This method achieved the unity of the subjective a priori knowledge with the objective situations, but it ignored the internal difference in the equivalent partitions. Therefore, some nonredundant characteristics would be handled by redundant characteristics. Bao et al. proposed a method ascertaining characteristics weight based on rough set and conditional information entropy. It avoids some nonredundant characteristics to be handled by redundant characteristics. But in this method the characteristics importance obtained by redundant characteristics was higher than that got by nonredundant characteristics [39]. Zhu and Chen constructed the priority queue of characteristics importance to improve Bao’s research. They presented a weighting method based on the conditional information entropy and rough set, but that method also involved additional costs [40].

In this paper, a new knowledge characteristics weighting method based on the rough set and knowledge granulation theory is proposed. The accuracy of equivalent partitions in knowledge characteristics is studied and the difference in equivalence classes is analyzed. Experimental results on several UCI data sets confirm our theoretical results. By comparing the numerical results with those of the AHP method, the PCA method, and two rough set based methods, we can draw the conclusion that our new method can effectively avoid taking nonredundant characteristics as redundant characteristics and can improve classification accuracy.

The rest of the paper is structured as follows. Some basic concepts about rough set are briefly introduced in Section 2. In Section 3, a new knowledge characteristics weighting method is proposed and studied. Some experimental results are given in Section 4 to show the effectiveness of the proposed weighting method. Finally, we end this paper with some conclusions in Section 5.

#### 2. Basic Concepts

##### 2.1. Rough Set

Rough set theory takes knowledge as a partition of the objects domain. The equivalence relations and equivalence classes produced by the equivalence relations are valid information or knowledge about the objects domain. Let denote the universe of objects, which is a nonempty set. is the equivalence relation on , called the knowledge on the universe . The equivalence relation divides into the disjoint subsets; it is denoted as or , representing all the equivalence classes. For the subset of the universe , there are the equivalence classes . In general, there are two approximation sets—the lower approximation (set) and the upper approximation (set) . The lower approximation (set) of the set is also defined as the positive region . The set will be referred to as the -boundary region of . Obviously, when the border area is larger, the set divided by is rougher. Therefore, the roughness of rough set about the equivalence relation can be achieved; it is denoted by The accuracy of rough set about the equivalence relation is defined aswhere represents the number of the elements in the collection, . When , is defined as the accuracy set about the equivalence relation . When , is defined as the rough set about the equivalence relation .

Suppose and are two equivalence relations about the universe , if , for , there is . Thus, the equivalence classes can be considered finer than the equivalence classes and the knowledge is more accurate than the knowledge ; see [37–40] for details.

##### 2.2. Knowledge Granularity

By the rough set theory, people learn that knowledge is related to the equivalence classes, which shows that knowledge is granular. That is why some scholars also identify the structure of knowledge granularity by the equivalence classes and calculate the size of the knowledge granularity [39].

Suppose that is a knowledge base, and is an equivalence relation, also known as knowledge. Knowledge granularity is defined asIf the granularity of reaches its minimum, then . If reaches the universe , i.e., the granularity reaches its maximum, then . If , it indicates that the objects and belong to the same equivalence class with the equivalence relation ; they are indiscernible. Obviously the smaller is, the stronger the discernibility of becomes.

Assume that is an equivalence relation, is a knowledge base, and is the equivalence class. According to (3), the knowledge granularity can be expressed asAnd the discernibility of is defined asAccording to (4), there is . Therefore, we have .

#### 3. Knowledge Characteristics Weighting Based on Rough Set and Knowledge Granulation

Cao and Liang calculated the characteristics weights by the cardinality of the positive region set over the cardinality of the discourse set, but the results may be inaccurate [38]. For example, on the field . Let , and let and be defined as the equivalence relation on . Then the following equivalence classes can be obtained: Their positive areas about on and are . The weight of the knowledge characteristics , in which represents the number of the elements in the collection . And the weight is also shown in . Thus . It is obvious that the characteristics weights are the same, but the equivalence classes of these two characteristics are different.

In order to solve the problems above, we use the knowledge granularity to study the relationship between the various subsets in the complex sets of the equivalence classes and propose a method based on the knowledge granularity to compute the discernibility of knowledge characteristics. Then, the knowledge characteristics weights according to the relationship between the discernibility and the weights of knowledge characteristics will be determined.

##### 3.1. The Discernibility of Knowledge Characteristics

We first give a definition about the discernibility of the knowledge characteristics.

*Definition 1. *Suppose that is a knowledge base, is the equivalence relation, and is a characteristic. Let and . Then, the discernibility of is denoted by

By Definition 1, we know that the larger is, the more discernible the ability of becomes. When we select two objects randomly on , there are ways. After adding characteristic into , the characteristic discernibility increases from to . Thus, the number of equivalence classes is more than or equal to the original set. Thus, the ability of such discernibility is improved, and the discernibility increases.

Theorem 2. *Let , , , and denote as discernibility of ; then there is .*

*Proof. *From (4) and (5), we have After adding characteristic into , the characteristic discernibility increases from to , and the number of equivalence classes increases. Thus, there exists such that . And we have which shows .

When the granularity of attains its minimum, there is only one element in . When reaches the universe , reaches its maximum. Then we obtain Thus, is proved.

##### 3.2. Method to Determine Characteristics Weight

To propose our new characteristics weight method, we further give two definitions.

*Definition 3. *Suppose that is a knowledge base and , where denotes the condition characteristics and denotes the decision characteristics. identifies the equivalence classes on the universe equivalence partitioned by the decision characteristics . is the discernibility of on the universe . The discernibility of the knowledge characteristics on is defined as

According to (2) and (5), we have the following formulation of :

*Definition 4. *Suppose that is a knowledge base and , where is the condition characteristics and is the decision characteristics. identifies the equivalence classes on the universe equivalence partitioned by the decision characteristics . For condition characteristics , the discernibility of is and the discernibility of is . Then the discernibility of the is defined as

According to Definitions 3 and 4, we present a new formula to compute the weight of characteristic in the following definition. Detailed computation process is shown in Algorithm 1.