Abstract
This paper focuses on constructing uncertainty measures by the pure rough set approach in ordered information system. Four types of definitions of lower and upper approximations and corresponding uncertainty measurement concepts including accuracy, roughness, approximation quality, approximation accuracy, dependency degree, and importance degree are investigated. Theoretical analysis indicates that all the four types can be used to evaluate the uncertainty in ordered information system, especially that we find that the essence of the first type and the third type is the same. To interpret and help understand the approach, experiments about real-life data sets have been conducted to test the four types of uncertainty measures. From the results obtained, it can be shown that these uncertainty measures can surely measure the uncertainty in ordered information system.
1. Introduction
Rough set theory, originated by Pawlak in the early 1980s [1, 2], is an extension of the classical set theory and can be regarded as a soft computing tool to handle imprecision, vagueness, and uncertainty in the data analysis. The theory has been found successful applications in the field of pattern recognition [3], medical diagnosis [4], data mining [5, 6], conflict analysis [7], algebra [8, 9], and other fields [10–12]. Recently, the theory has generated a great deal of interest among more and more researchers.
Until now, several extensions of the rough set model have been proposed in terms of various requirements. For example, by exploring the relationship between rough sets and modal logics, Yao proposed and examined a number of extended rough set models. Then with respect to graded and probabilistic modal logics, graded and probabilistic rough set models are also discussed in [13]. Also Yao summarized various formulations of the standard rough set theory. It demonstrated how those formulations can be adopted to develop different generalized rough set theories. The relationships between rough set theory and other theories are discussed in [14]. In [15], Wu presented a general framework for the study of mathematical structure of rough sets in infinite universes of discourse. Lower and upper approximations of a crisp set with respect to an infinite approximation space are first defined. And the connections between rough sets and Dempster-Shafer theory of evidence are also explored. Also some other extensions have been introduced, such as the variable precision rough set (VPRS) model [16], the rough set model based on tolerance relation [17, 18], the Bayesian rough set model [19], the fuzzy rough set model, and the rough fuzzy set model [20, 21]. And many achievements have been made in rough set theory. For example, Grzymala-Busse [22] developed a system LERS for rule induction, which can handle inconsistencies and induce both certain and possible rules. Polkowski [23] worked on using granular rough mereological structures in classification of data. Skowron et al. [24] worked on the relation and the combination of rough set theory and granular computing [25]. Lin proposed granular computing model based on binary relations [26]. Yao studied three-way decisions in probabilistic rough set model [27, 28]. Equivalence relation is a basic notion in Pawlak’s rough set model. However, the original rough set theory approaches do not consider attributes with preference-ordered domains, that is, criteria. In many real situations, we are often faced to the problems in which the ordering of properties of the considered attribute values plays a crucial role. One such type of problem is the ordering of objects. For this reason, Yao considered the problem of mining ordering rules as finding association between orderings of attribute values and the overall ordering of objects in [29]. For mining ordering rules, the notion of information tables is generalized to ordered information tables by adding order relations on attribute values. And Iwiński has also addressed the problem from the ranking of objects in information systems [30, 31]. Moreover, Greco et al. proposed an extension of rough set theory, called the dominance-based rough set approach (DRSA) to take into account the ordering properties of criteria [32–34]. This innovation is mainly based on substitution of the indiscernibility relation by a dominance relation. Moreover, Greco et al. characterize the DRSA as well as decision rules induced from rough approximations, while the usefulness of the DRSA and its advantages over the CRSA (classical rough set approach) are presented [32–34]. In DRSA, condition attributes are criteria and classes are preference ordered. Several studies have been made about properties and algorithmic implementations of DRSA [35–37].
Uncertainty measurement is an important issue in rough set theory. Pure rough set approach and information theory approach are two methodologies to deal with uncertainty measure problem in rough set theory. In pure rough set approach, the accuracy measure, the roughness measure, the approximation quality measure, the approximation accuracy measure, the dependency degree measure, and importance degree measure are important numerical characterizations that quantify the imprecision of a rough set caused by its boundary region. Recently, Yao [38] studied two definitions of approximations and associated measures based on equivalence relations. In information theory approach, entropy and its variants have been introduced into rough set theory [39–42].
Classical rough set model is based on equivalence relation or partition. Thus, the corresponding uncertainty measures are not suitable for ordered information system. Several authors have defined uncertainty measures in ordered information system by information theory approach. Xu et al. introduced the concepts of rough entropy and knowledge granulation in ordered information system [43]. Also, Xu et al. defined the knowledge granulation, knowledge entropy, and knowledge uncertainty measure in ordered information system and gave some of their properties [44]. However, there are few studies on uncertainty measurement issue based on pure rough set approach in ordered information system. In this paper, we mainly focus on extending Pawlak’s pure rough set uncertainty measures to ordered information system.
The organization of the remainder of this paper is as follows. In Section 2, some basic concepts in classical rough set theory and ordered information system are reviewed. Four types of lower and upper approximations and their corresponding uncertainty measures are investigated in Section 3, and some important properties are studied. Also we find that the essence of the first type and the third type is the same. In Section 4, four types of uncertainty measures are tested on some real-life data. And in Section 5, we conclude the paper with a summary and outlook for further research.
2. Preliminaries
In this section, we review some basic notions in classical rough set theory and ordered information system rough set.
Throughout this paper, we assume that the universe is a nonempty finite set, and the class of all subsets of is denoted by , and the complement of in is denoted by~.
2.1. Rough Set Approximations in Classical Information System
A classical information system is an order triple , where is a nonempty finite set of objects, is a nonempty finite set of condition attributes, and, for any , is a map, where is the domain of the attribute . In particular, a classical target information system is given by , where is a nonempty finite set of decision attributes, and for any , is a map, where is the domain of the attribute .
Suppose that is a classical information system, and ; let be a partition of induced by the attribute subset . For any , ; more information can be found in [45–47].
Let be a subset of ; the lower and upper approximations are defined, respectively, as follows:
From the definition, we can see that two different approaches have been employed for the constructing of lower and upper approximations. The first one is element-based approach, while the second one is class-based approach. The lower approximation of a set with respect to is the set of all objects, which certainly belongs to with respect to . The upper approximation of a set with respect to is the set of all objects, which possibly belongs to with respect to .
Let be a classical target information system and let be the set of decision classes of the information system .
2.2. Uncertainty Measures in Rough Set Theory
Rough sets can also be characterized numerically by accuracy measure, roughness measure, and approximation quality, which can be used for evaluating uncertainty of a set. And approximation accuracy can be used to evaluate the uncertainty of a rough classification [2]. Besides, dependency degree and importance degree can be employed to evaluate condition attribute subset with respect to decision attribute [1]. The definitions of the uncertainty measures are shown as follows.
Definition 1 (see [2]). Let be a classical information system, , and . The accuracy of set according to is The roughness of set with respect to is And the approximation quality of set with respect to is
In fact, the roughness measure is the well-known Marczewski-Steinhaus distance between the lower and upper approximations according to Yao [48].
Definition 2 (see [2]). Let be a classical decision information system, be the classification of the universe , and be the attribute subset that . The approximation accuracy of according to is The dependency degree and importance degree of with respect to are defined as [1]
According to the definitions of these measures, we know that the accuracy measure is equal to the degree of the completeness of knowledge about the given object set and the approximation quality can also evaluate the completeness degree of the set , while the roughness measure represents the incompleteness of the knowledge. Meanwhile, the approximation accuracy provides the percentage of possible correct decisions when classifying objects by employing the attribute set . The dependency degree and importance degree are used to measure the degree of the dependency and the importance of with respect to .
Moreover, to investigate the uncertainty measures, a partial relation is defined such that given two families of the equivalence relations and are induced by the attribute subsets and , respectively. One can define if and only if, for each , there exist such that ; then is said to be coarser than (or is finer than ). If and , then is said to be strictly coarser than (or is strictly finer than ) and it can be denoted by .
Since we have many uncertainty measurements to measure the uncertainty, not all the measures can be reasonable. If the accuracy measure, roughness measure, approximation quality measure, approximation accuracy measure, dependency degree measure, and importance degree measure are reasonable, they should have the following properties.
Accuracy. Let be a classical information system and . If , then .
Roughness. Let be a classical information system and . If , then .
Approximation Quality. Let be a classical information system and . If , then .
Approximation Accuracy. Let be a classical decision information system and . If , then .
Dependence Degree. Let be a classical decision information system and . If , then .
Importance Degree. Let be a classical decision information system and . If , then .
Obviously, these measures are reasonable to be used as uncertainty measures in classical rough set theory.
2.3. Ordered Information Systems and Dominance Relation
An ordered information system is an order triple , where is a nonempty finite set of objects, is a nonempty finite set of condition attributes, and, for any , is a map, where is the domain of the attribute . In particular, an ordered decision information system is given by , where is a nonempty finite set of decision attributes, and, for any , is a map, where is the domain of the attribute .
Definition 3 (see [34]). Let be an ordered information system, for ; then is called the dominance relation with respect to : And the dominance class of an object with respect to an attribute subset is
In ordered information system, just like it in classical information system, assume that is coarser than (or is finer than ), denoted by , if, for any , . If and , then is said to be strictly coarser than (or is strictly finer than ) and it can be denoted by .
Note that if , then .
Definition 4. Let , be two ordered information systems; they have the same object set, attribute set, but they may have different attribute values on some objects. If, for any , , either or if , we can get , and then we say is coarser than (or is finer than ), which is denoted by .
Note that if , then exist and , such that .
Theorem 5. Let , be two ordered information systems and . If , then, for any , .
Proof. (1) If, for any , , , then, for any , we have .
(2) If there exists , , such that . So if , then . Hence, .
This completes the proof.
3. Approximations and Uncertainty Measures in Ordered Information System
In this section, we investigate four types of definitions of lower approximation and upper approximation in ordered information system. We focus on the problem of whether these definitions are appropriate for the uncertainty measures (accuracy, roughness, approximation quality, approximation accuracy, dependency degree, and importance degree). Actually, forms a covering on based on the dominance relation discussed in the last section. Thus, one can obtain four types definitions of lower and upper approximations based on coverings. In fact, the first, the third, and the fourth types of definitions of lower approximations and upper approximations were studied by Yao in [49]. Yao defined the three types of approximation operators based on an arbitrary relation, while in this paper the relation is confined to the dominance relation defined in the last section. Essentially, we note that dominance relation is only one special type of binary relations. Most important of all, the granule in ordered information system is in fact a successor neighborhood as used in [49]. They are natural or direct extensions of Pawlak rough set model just by replacing the equivalence relation with the dominance relation, while the second definition just changes the element-based approach with the class-based approach, which can be viewed as indirect extensions of Pawlak rough set model.
3.1. The First Type of Approximations and Corresponding Measures
In this subsection, we will consider the first type of lower and upper approximations which are the element-based type. It can be defined as follows.
Definition 6. Let be an ordered information system, , and . The first type of lower approximation and upper approximation of according to are defined as follows:
Based on the above definition of lower and upper approximations, one can define the accuracy, roughness, and approximation quality based on the first type as
For an ordered decision information system and , let is the set of equivalence decision classes of the ordered decision information system; then the approximation accuracy of according to can be defined as
The dependency degree and importance degree of with respect to can also be defined as
We investigate some new properties which are important when investigating whether the uncertainty measurement concepts including accuracy, roughness, approximation quality, approximation accuracy, dependency degree, and importance degree are appropriate for uncertainty measures or not.
Theorem 7. Let be an ordered information system and ; for any , one has(1),(2),(3),(4).
Proof. (1) Suppose ; then ; according to the definition of , we have, for any , . For any , ; then , so . Thus, .
(2) Suppose ; according to the definition of , there exist , , such that . Then and . For any , ; then , so . Thus, .
(3) Suppose ; then and according to the definition of , we have for any , . For any , ; then , so . Thus, .
(4) Suppose ; according to the definition of , there exist , , such that . Then and . For any , ; then , so . Thus, .
Thus, the theorem is proved.
From the theorem above, one can get the following theorem easily.
Theorem 8. Let be an ordered decision information system and ; for any , the following properties hold:(1),(2),(3),(4),(5),(6),(7) ,(8),(9),(10),(11),(12).
Proof. (1) Suppose ; then . From (1) and (3) in Theorem 7, we have
(2) Suppose ; from (2) and (4) in Theorem 7, we have
(3) It is straightforward by (1).
(4) It is straightforward by (2).
(5) Suppose ; then . From (1) and (3) in Theorem 7, we have
(6) Suppose ; from (2) and (4) in Theorem 7, we have
(7) Suppose ; then . From (1) and (3) in Theorem 7, we have
Then,
So,
(8) It can be proved similar to (7) by (2) and (4) in Theorem 7.
(9) Suppose ; then . From (1) and (3) in Theorem 7, we have
(10) It can be proved similar to (9) by (2) and (4) in Theorem 7.
(11) From (9), we have
Then,
So,
(12) It can be proved similar to (11) by (2) and (4) in Theorem 7.
The theorem above shows that the accuracy, roughness, approximation quality, approximation accuracy, dependency degree, and importance degree measures of Definition 6 are reasonable. Therefore, , , , , , and can be used as the uncertainty measures.
3.2. The Second Type of Approximations and Corresponding Measures
In this subsection, we will consider the second type of lower and upper approximations which are the class-based type. It can be defined as follows.
Definition 9. Let be an ordered information system, , and . The second type of lower approximation and upper approximation of according to are defined as follows:
Based on the above definition of lower and upper approximations, one can define the accuracy, roughness, and approximation quality based on the second type as
For an ordered decision information system , , is the set of equivalence decision class of the ordered information system; then the approximation accuracy of according to can be defined as
The dependency degree and importance degree of with respect to can also be defined as
Similarly, we investigate some new properties which are important when investigating whether the uncertainty measurement concepts including accuracy, roughness, approximation quality, approximation accuracy, dependency degree, and importance degree are appropriate for uncertainty measures or not.
Theorem 10. Let be an ordered information system and ; for any , one has(1),(2),(3),(4).
Proof. (1) Suppose ; then and according to the definition of , we have, for any , .
For any , we have and . It is clear that and . Then we only need to prove and . For any , there exist such that . Then, for any , . While , for any , and . So, for any , ; hence . Thus and .
For any , there exist such that . Then and , so . Thus, .
(2) Suppose ; according to the definition of , there exist , , such that . Then and .
For any , we have and . It is clear that and . Then we only need to prove and . For any , there exist such that . Then for any , . While , then, for any , . So for any , ; hence . Thus and .
For any , there exist such that . Then and , so . Thus, .
(3) Suppose ; then . According to the definition of , we have, for any , . For any , ; then , so . Thus, .
(4) Suppose , according to the definition of , there exist , , such that . Then and . For any , ; then , so . Thus, .
Thus, the theorem is proved.
From the theorem above, one can get the following theorem easily.
Theorem 11. Let be an ordered decision information system and ; for any , the following properties hold:(1),(2),(3),(4),(5),(6),(7) ,(8), (9),(10),(11),(12).
Proof. The proof is similar to Theorem 8.
The theorem above shows that the accuracy, roughness, approximation quality, approximation accuracy, dependency degree, and importance degree measures of Definition 9 are reasonable. Therefore, , , , , , and can also be used as the uncertainty measures.
3.3. The Third Type of Approximations and Corresponding Measures
In this subsection, we will consider the third type of lower and upper approximations which the lower approximation is class-based lower approximation, and the upper approximation is defined by the duality. They can be defined as follows.
Definition 12. Let be an ordered information system, , and . The third type of lower approximation and upper approximation of according to are defined as follows:
Based on the above definition of lower and upper approximations, one can define the accuracy, roughness, and approximation quality based on the third type as
For an ordered decision information system , . be the set of equivalence decision classes of the ordered decision information system; then the approximation accuracy of according to can be defined as
The dependency degree and importance degree of with respect to can also be defined as
Similarly, we investigate some new properties which are important when investigating whether the uncertainty measurement concepts including accuracy, roughness, approximation quality, approximation accuracy, dependency degree, and importance degree are appropriate for uncertainty measures or not.
Theorem 13. Let be an ordered information system and ; for any , one has(1),(2),(3),(4).
Proof. (1) The proof is the same with (1) in Theorem 10.
(2) The proof is the same with (2) in Theorem 10.
(3) From (1) and Definition 12, we have .
(4) From (2) and Definition 12, we have .
Thus, the theorem is proved.
From the theorem above, one can get the following theorem easily.
Theorem 14. Let be an ordered decision information system and ; for any , the following properties hold:(1),(2),(3),(4),(5),(6),(7) ,(8)…,(9),(10),(11),(12).
Proof. The proof is similar to Theorem 8.
The theorem above shows that the accuracy, roughness, approximation quality, approximation accuracy, dependency degree, and importance degree measures of Definition 9 are reasonable. Therefore, , , , , , and can also be used as the uncertainty measures.
3.4. The Fourth Type of Approximations and Corresponding Measures
In this subsection, we will consider the fourth type of lower and upper approximations which the upper approximation is class-based upper approximation, and the lower approximation is defined by the duality. They can be defined as follows.
Definition 15. Let be an ordered information system, , and . The fourth type of lower approximation and upper approximation of according to are defined as follows:
Based on the above definition of lower and upper approximations, one can define the accuracy, roughness, and approximation quality based on the fourth type as
For an ordered decision information system , . Let is the set of equivalence decision classes of the ordered decision information system; then the approximation accuracy of according to can be defined as
The dependency degree and importance degree of with respect to can also be defined as
Similarly, we investigate some new properties which are important when investigating whether the uncertainty measurement concepts including accuracy, roughness, approximation quality, approximation accuracy, dependency degree, and importance degree are appropriate for uncertainty measures or not.
Theorem 16. Let be an ordered information system and ; for any , one has(1),(2),(3),(4).
Proof. (3) Suppose ; then . According to the definition of , we have, for any , . For any , ; then , so . Thus, .
(1) According to (3), we have ; therefore, .
(4) It can be proved similar to (3).
(2) It can be proved similar to (1).
Thus, the theorem is proved.
From the theorem above, one can get the following theorem easily.
Theorem 17. Let be an ordered decision information system and ; for any , the following properties hold:(1),(2),(3),(4),(5)