Theory and Application on Rough Set, Fuzzy Logic, and Granular ComputingView this Special Issue
Research Article | Open Access
On Distribution Reduction and Algorithm Implementation in Inconsistent Ordered Information Systems
As one part of our work in ordered information systems, distribution reduction is studied in inconsistent ordered information systems (OISs). Some important properties on distribution reduction are studied and discussed. The dominance matrix is restated for reduction acquisition in dominance relations based information systems. Matrix algorithm for distribution reduction acquisition is stepped. And program is implemented by the algorithm. The approach provides an effective tool for the theoretical research and the applications for ordered information systems in practices. For more detailed and valid illustrations, cases are employed to explain and verify the algorithm and the program which shows the effectiveness of the algorithm in complicated information systems.
In Pawlak’s original rough set theory , partition or equivalence (indiscernibility) is an important and primitive concept. However, partition or equivalence relation, as the indiscernibility relation in Pawlak’s original rough set theory, is still restrictive for many applications. To address this issue, several interesting and meaningful extensions to equivalence relation have been proposed in the past, such as neighborhood operators , tolerance relations , and others [4–10]. Moreover, the original rough set theory does not consider attributes with preference ordered domain, that is, criteria. In many real life practices, we often face problems in which the ordering of properties of the considered attributes plays a crucial role. One such type of problem is the ordering of objects. For this reason, Greco et al. proposed an extension rough set theory, called the dominance based rough set approach (DRSA), to take into account the ordering properties of criteria [11–16]. This innovation is mainly based on substitution of the indiscernibility relation by a dominance relation. Moreover, Greco et al. characterizes the DRSA and decision rules induced from rough approximations, while the usefulness of the DRSA and its advantages over the CRSA (classical rough set approach) are presented [11–16]. In DRSA, condition attributes are criteria and classes are preference ordered. Several studies have been made about properties and algorithmic implementations of DRSA [10, 17–19].
Nevertheless, only a limited number of methods using DRSA to acquire knowledge in inconsistent ordered information systems have been proposed and studied. Pioneering work on inconsistent ordered information systems with the DRSA has been proposed by Greco et al. [11–16], but they did not clearly point out the semantic explanation of unknown values. Shao and Zhang  further proposed an extension of the dominance relation in incomplete ordered information systems. Their work was established on the basis of the assumption that all unknown values are lost. Despite this, they did not mention the underlying concept of attribute reduction in inconsistent ordered decision system but they mentioned an approach to attribute reduction in consistent ordered information systems. Therefore, the purpose of this paper is to develop approaches to attribute reductions in inconsistent ordered information systems (IOIS). In this paper, theories and approaches of distribution reduction are investigated in inconsistent ordered information systems. Furthermore, algorithm of matrix computation of distribution reduction is introduced, from which we can provide a new approach to attributes reductions in inconsistent ordered information systems.
The rest of this paper is organized as follows. Some preliminary concepts are briefly recalled in Section 2. In Section 3, theories and approaches of distribution reduction are investigated in IOIS. In Section 4, we restate the definition of dominance matrix in ordered information systems and step the matrix algorithm for distribution reduction acquisition. Preparations are implemented to place the algorithm and the program is designed. The algorithm and the corresponding program we designed can provide a tool to theoretical research and applications of criterion based information system. Cases are employed to illustrate the algorithm and the program in Section 5. It is shown that the algorithm and program are effective in complicated information system. Furthermore conclusions on what we study in this paper are drawn to understand this paper briefly.
2. Ordered Information Systems
An information system with decisions is an ordered quadruple , where is a nonempty finite set of objects; is a nonempty finite attributes set; denotes the set of condition attributes; denotes the set of decision attributes, ; , is the value of on is the domain of ; , is the value of on is the domain of . In an information system, if the domain of an attribute is ordered according to a decreasing or increasing preference, then the attribute is a criterion. An information system is called an ordered information system (OIS) if all condition attributes are criterions.
Assume that the domain of a criterion is completely preordered by an outranking relation ; then means that is at least as good as with respect to criterion . And we can say that dominates . In the following, without any loss of generality, we consider condition and decision criterions having a numerical domain; that is, ( denotes the set of real numbers).
We define by according to increasing preference, where and . For a subset of attributes , means that for any . That is to say dominates with respect to all attributes in . Furthermore, we denote by . In general, we indicate an ordered information system with decision by . Thus the following definition can be obtained.
Let be an ordered information system with decisions, for ; denote and are called dominance relations of information system .
If we denote then the following properties of a dominance relation are trivial.
Let be a dominance relation. The following properties hold.(1) is reflexive and transitive, but not symmetric, so it is not an equivalence relation.(2)If , then .(3)If , then .(4)If , then and .(5) if and only if .(6) constitute a covering of .
For any subset of , and of , define and are said to be the lower and upper approximations of with respect to a dominance relation . And the approximations have also some properties which are similar to those of Pawlak approximation spaces.
For an ordered information system with decisions , if , then this information system is consistent, otherwise, this information system is inconsistent (IOIS).
Example 1. An ordered information system is given in Table 1.
From the table, we have
Obviously, by the above, we have , so the system in Table 1 is inconsistent.
For a simple description, the following information system with decisions is based on dominance relations, that is, ordered information system.
3. Theories of Distribution Reduction in Inconsistent Ordered Information Systems
Let be an information system with decisions, and dominance relations derived from condition attributes set and decision attributes set , respectively. For , denote where . Furthermore, we let be a distribution function about attributions set and maximum distribution function about attributions set .
Definition 2. Let and be two vectors with dimensions. If , we say that is equal to and is denoted by . If , we say that is less than and is denoted by . Otherwise, if it exists such that , we say is not less than and it is denoted by ,
such as and .
From the above, we can have the following propositions immediately.
Proposition 3. Let be an inconsistent information system.(1)If , then , .(2)If , then , .(3)If , then , .(4)If , then , .
Definition 4. Let be an inconsistent information system. If , for all , we say that is a distribution consistent set of . If is a distribution consistent set, and no proper subset of is a distribution consistent set, then is called a distribution consistent reduction of .
Definition 5. Let be an inconsistent information system. If , for all , we say that is a maximum distribution consistent set of . If is a maximum distribution set, and no proper subset of is a maximum distribution consistent set, then is called a maximum distribution consistent reduction of .
Example 6. For the system in Table 1, if we denote
then we can have
When , it can be easily checked that , for all , so that and are true and is a distribution consistent set of . Furthermore, we can examine that and are not consistent sets of . That is to say is a distribution reduction and is a maximum distribution reduction of .
Moreover, it can easily be calculated that and are not distribution consistent sets of . Thus there exist only one distribution reduction and maximum distribution reduction of in the system of Table 1, which are .
The distribution consistent set and the maximum distribution consistent set are related in the following theorem.
Theorem 7. Let be an ordered information system and is a distribution consistent set of if and only if is a maximum distribution consistent set of .
Proof. It can be proved immediately from corresponding definitions and properties. From the definitions of distribution and maximum distribution consistent set, the key results of the implication is that always holds for any while is a distribution consistent set or maximum distribution consistent set. Thus, the theorem can be acquired immediately.
Theorem 8. Let be an ordered information system.: is a distribution consistent set of . : While , holds for any .Then we have .
Proof. We will prove . Assume that when , does not hold and that implies . So we can obtain by Proposition 3(3). On the other hand, since is a distribution consistent set of , we have and . Hence we can get , which is a contradiction. The theorem is proved.
The distribution consistent set requires that the classification ability of the consistent remains the same with the original data table. That is, , which is a distribution consistent set of , must satisfy the fact that holds for any . This is very strict and other reductions studied in  may not reach this special condition.
4. Matrix Algorithm for Distribution Reduction Acquisition in Inconsistent Ordered Information Systems
In this section, the dominance matrices will be put as a restatement and matrices will be employed to realize the calculation of distribution reductions.
Definition 9. Let be an ordered information system, and . Denote The matrix is called dominance matrix of attributes set . If , we say that the order of is .
Definition 10. Let be an ordered information system and and are dominance matrices of attributes sets . The intersection of and is defined by
The intersection defined above can be implemented by the operator in Matlab platform, , that is, the product of elements in corresponding positions. Then the following properties are obvious.
Proposition 11. Let , be dominance matrices of attributes sets ; the following results always hold.(1).(2).
From the above, we can see that a dominance relation of objects has one-one correspondence to a dominance matrix. The combination of dominance relations can be realized by the corresponding matrices and the dominance relations can be compared by the corresponding matrices from the following definitions.
Definition 12. Let and be matrices with dimensions and and row vectors, respectively. If holds, for any , we say that is less than and it is denoted by .
By the definitions, dominance matrices have the following properties straightly.
Proposition 13. Let be an ordered information system and . The dominance matrices with respect to and are, respectively, and . Then .
In the following, we give the preparation of matrix computation for distribution reductions in ordered information systems.
Proposition 14. Let be an ordered information system and and . Then and any vector represents the dominance class of object by the values 0 and 1, where 0 means the object not included in the class and 1 means the object included in the class.
Theorem 15. Let be an ordered information system and . is a consistent set if and only if .
Proof. As is known, holds since .
() For is a distribution consistent set, one can have. Then, for any and , we have . Since , it is obvious that . That is, the row vectors in and are correspondingly the same. Then .
() Since , we can easily obtain that holds for any and . Then holds for any and . We can obtain that holds for any . That is, is a distribution consistent set.
To acquire reductions in inconsistent ordered information system, the matrices can be the only forms of storage in computing. And we illustrate the progress to calculate the reductions as shown in Algorithm 1.
The algorithm and the distribution reduction allow us to calculate reductions which keep the classification ability the same with the original system in a brief way. And we do not need to acquire every approximation of the decisions. It shortens the computing time and provides an effective tool for knowledge acquisition in criterion based rough set theory. The flow chart of the Algorithm 1 can be designed and it is placed in Figure 1.
Analysis to Time Complexity of Algorithm 1. Let be an ordered information system. is the simplified universe. The number of objects in original information system not being simplified is denoted by . There are condition attributes in ; that is, . The number of compressed decision classes is . We take a variable to stand for the time complexity in an implementation. In the next, we can analyze the time complexity of Algorithm 1 step by step.
The time complexity to simplify the original information system is for any two objects being compared and is denoted by . Since , , and , the time complexities to be classified by condition attributes and decision are, respectively, and . For decision classes being merged by comparing classes of any two objects, the time complexity is . Now the consistency of the information system needs to be checked by comparing the condition class and decision class of any object. If the information system is consistent, the time complexity to check consistency is . If the information system is inconsistent, the time complexity to check consistency is less than . Thus, the time complexity to check consistency is no more than ; that is, it is presented as . Then, the possible and compatible distribution functions can be calculated and the time complexity is . The time complexity to calculate each of these two functions is and is denoted by . The analysis to Step 1 is finished.
For Step 2, the time complexity to calculate possible and compatible distribution decision matrices, respectively, is denoted by . Thus, the time complexity to calculate distribution decision matrices is . The time complexity of Step 2 is completed.
The first two steps are preparations to calculate reductions. The next Step 3 to Step 5 are the steps which run the operations. There are subsets and the dominance matrices are with dimensions . In addition, the representation is the combinatorial number which means the number of selections to chose elements from ones. We consider that the judgement of a vector if it is zero runs one operation and the comparison of two vectors runs according to the dimension of the vectors. Therefore, the time complexities to compare and with , respectively, are . And the time complexity to compare every line vector of with zero is . The possible and compatible distribution matrices are obtained by reassignment values times. And the time complexities to process possible and compatible distribution matrices, respectively, are both . Then, we have that the total time complexity of Step 3 is . The judgement in Step 4 just need to run according to the number of and the time complexity is .
Since we just need to compute the intersection of nonzero 1st order possible (or compatible) distribution matrices, the maximum time complexities can be analyzed in the next steps but not the true ones in computing. Therefore, the maximum time complexity relies on the number of attribute subsets . The worst case is that no minimum reduction exists in the information system and all subsets are calculated in the algorithm. Thus, the maximum time complexity of Step 5 is .
From the above analysis, we can know that the maximum time complexity of the main part in the algorithm (Step 3 to Step 5) is .
Hence, the maximum time complexity of the main algorithm is approximately .
5. Experimental Computing and Case Study
We design programs and employ two cases to demonstrate the effective of the method in the last section. This experimental computing program is running on a personal computer with the following hardware and software configuration. The configuration of the computer is a bit low but the program runs well and fast. It also shows the advantage of Algorithm 1 and the corresponding computing program (see Table 2).
An inconsistent ordered information system on animals sleep is presented in Table 3.
The information system is denoted by , where is the condition attribute set and is the single dominance decision. There are 42 objects which represent the species of animals and 10 attributes with numerical values in the ordered information system. The animals’ names are showed in Table 3 and the interpretations of the attributes will be listed as follows. The interpretations and the units of attributes are represented as shown in Table 4.
By the experimental computing program, the distribution reductions of the system can be calculated and they are represented in the following. The operating time to compute this case is 0.158581 seconds.
The distribution reductions are
And it can be verified by taking the computer as an assistant that the above sets are reductions of the data table. Detailed progress of the verifying are not arranged here. From the results, we can easily see that the reductions studied in this paper are different from the ones approached in since these reductions are , , , and . They are different kinds of reductions in ordered information systems and can adapt to different needs in practices. From the definition of different reductions, we can also easily obtain that possible and compatible reductions are usually subsets of distribution reduction. This is not strict and should be studied and verified separately and theatrically. And the work may be taken into account as one part of the future studies in our work.
Finally, we take other inconsistent ordered information system to acquire the distribution reduction, respectively. And the descriptions on the data tables are listed in Table 5.
From the results in Table 5, we can obtain that the algorithm and the program we studied in this paper could be effective and useful to acquire distribution reductions in practice. The numbers of objects and attributes can increase the computing time. But the matrices storage has the ability to shorten the memory and computing time. And it can be helpful in research theoretically and it is applicable.
As is known, many information systems are data tables considering criteria for various factors in practise. Therefore, it is meaningful to study the attribute reductions in inconsistent information system on the basis of dominance relations. In this paper, distribution reduction is restated in inconsistent ordered information systems. Some properties and theorems are studied and discussed. A fact is certified that the distribution reduction is equivalent to the maximum distribution reduction in ordered information systems. Theorems on distribution reduction are implemented to create preparations for reduction acquisition and the dominance matrix is also restated to acquire distribution reductions in criterion based information systems. The matrix algorithm for distribution reduction acquisition is stepped and programmed. The algorithm can provide an approach and the program can be effective for theoretical research on knowledge reductions in criterion based inconsistent information systems. Dominance matrices are the only relied parameters which need to be considered without others such as approximations and subinformation systems being brought in. Furthermore, cases are employed to illustrate the validity of the matrix method and the program, which shows that the effectiveness of the algorithm in complicated information systems.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
This work is supported by the National Natural Science Foundation of China (no. 61100116), the Natural Science Foundation Project of Jiangshu Province (no. BK2011492), and the Youth Foundation of Xuzhou Institute of Technology (no. xky2011201).
- Z. Pawlak, “Rough sets,” International Journal of Computer and Information Sciences, vol. 11, no. 5, pp. 341–356, 1982.
- Y. Y. Yao, “Relational interpretations of neighborhood operators and rough set approximation operators,” Information Sciences, vol. 111, no. 1–4, pp. 239–259, 1998.
- A. Skowron and J. Stepaniuk, “Tolerance approximation spaces,” Fundamenta Informaticae, vol. 27, no. 2-3, pp. 245–253, 1996.
- Y. Leung, W. Wu, and W. Zhang, “Knowledge acquisition in incomplete information systems: a rough set approach,” European Journal of Operational Research, vol. 168, no. 1, pp. 164–180, 2005.
- S. Im, Z. Raś, and H. Wasyluk, “Action rule discovery from incomplete data,” Knowledge and Information Systems, vol. 25, no. 1, pp. 21–33, 2010.
- K. Thangavel and A. Pethalakshmi, “Dimensionality reduction based on rough set theory: a review,” Applied Soft Computing Journal, vol. 9, no. 1, pp. 1–12, 2009.
- H. Wang and S. Wang, “Mining incomplete survey data through classification,” Knowledge and Information Systems, vol. 24, no. 2, pp. 221–233, 2010.
- W. Z. Wu, Y. Leung, and W. X. Zhang, “Connections between rough set theory and Dempster-Shafer theory of evidence,” International Journal of General Systems, vol. 31, no. 4, pp. 405–430, 2002.
- W. Wu, M. Zhang, H. Li, and J. Mi, “Knowledge reduction in random information systems via Dempster-Shafer theory of evidence,” Information Sciences, vol. 174, no. 3-4, pp. 143–164, 2005.
- W. Xu, Y. Li, and X. Liao, “Approaches to attribute reductions based on rough set and matrix computation in inconsistent ordered information systems,” Knowledge-Based Systems, vol. 27, pp. 78–91, 2012.
- S. Greco, B. Matarazzo, and R. Slowinski, “Rough approximation of a preference relation by dominance relations,” European Journal of Operational Research, vol. 117, no. 1, pp. 63–83, 1999.
- S. Greco, B. Matarazzo, and R. Slowinski, “A new rough set approach to multicriteria and multiattribute classificatioin,” in Rough Sets and Current Trends in Computing (RSCTC '98), L. Polkowsik and A. Skowron, Eds., vol. 1424 of Lecture Notes in Artificial Intelligence, pp. 60–67, Springer, Berlin, Germany, 1998.
- S. Greco, B. Matarazzo, and R. Slowinski, “A new rough sets approach to evaluation of bankruptcy risk,” in Operational Tools in the Management of Financial Risks, X. Zopounidis, Ed., pp. 121–136, Kluwer, Dordrecht, The Netherlands, 1999.
- S. Greco, B. Matarazzo, and R. Slowinski, “Rough sets theory for multicriteria decision analysis,” European Journal of Operational Research, vol. 129, no. 1, pp. 1–47, 2001.
- S. Greco, B. Matarazzo, and R. Slowinski, “Rough sets methodology for sorting problems in presence of multiple attributes and criteria,” European Journal of Operational Research, vol. 138, no. 2, pp. 247–259, 2002.
- S. Greco, B. Matarazzo, and R. Slowinski, “Dominance-based rough Set approach as a proper way of handling graduality in rough set theory,” in Transaction on Rough Sets VII, vol. 4400 of Lecture Notes in Computer Science, pp. 36–52, 2007.
- K. Dembczyński, R. Pindur, and R. Susmaga, “Generation of exhaustive set of rules within dominance-based rough set approach,” Electronic Notes in Theoretical Computer Science, vol. 82, no. 4, pp. 99–110, 2003.
- K. Dembczyński, R. Pindur, and R. Susmaga, “Dominance-based rough set classifier without induction of decision rules,” Electronic Notes in Theoretical Computer Science, vol. 82, no. 4, pp. 84–95, 2003.
- R. Susmaga, R. Słowiński, S. Greco, and B. Matarazzo, “Generation of reducts and rules in multi-attribute and multi-criteria classification,” Control and Cybernetics, vol. 29, no. 4, pp. 968–988, 2000.
- M. W. Shao and W. X. Zhang, “Dominance relation and relus in an incomplete ordered information system,” International Journal of Intelligent Systems, vol. 20, pp. 13–27, 2005.
- D. Yu, Q. Hu, and C. Wu, “Uncertainty measures for fuzzy relations and their applications,” Applied Soft Computing Journal, vol. 7, no. 3, pp. 1135–1143, 2007.
Copyright © 2014 Yanqin Zhang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.